Method for recognizing gunshot in audio data, method for driving motor and related device

文档序号:1193556 发布日期:2020-09-01 浏览:27次 中文

阅读说明:本技术 音频数据中枪声的识别方法、马达的驱动方法及相关装置 (Method for recognizing gunshot in audio data, method for driving motor and related device ) 是由 缪丽林 李伟雷 于 2020-05-21 设计创作,主要内容包括:本申请公开了一种音频数据中枪声的识别方法、马达的驱动方法及相关装置,其中,该识别方法首先获取待处理音频数据,待处理音频数据中包括多组音频组合,每组音频组合包括两帧音频单元,以在后续的枪声识别过程中以帧为单位进行,为单次枪声的精确识别提供基础,然后计算每帧音频单元的平均能量值,并根据音频组合中每帧音频单元的平均能量值进行枪声判定,实现对单次枪击时发出的枪声的精确识别,为马达对于连续多次枪击时精确振动奠定了基础,最后在音频组合中包含枪声时,确定音频组合对应的振感判定数据,当音频组合对应的振感判定数据不同时,马达对应的振动强度也可以有所不同,为马达根据枪声大小执行不同强度的振动奠定基础。(The application discloses a method for identifying a gunshot in audio data, a method for driving a motor and a related device, wherein the method for identifying the gunshot in the audio data comprises the steps of firstly obtaining the audio data to be processed, wherein the audio data to be processed comprises a plurality of groups of audio combinations, each group of audio combinations comprises two frames of audio units, the audio combinations are carried out in the follow-up gunshot identification process by taking frames as units, a foundation is provided for the accurate identification of a single gunshot, then the average energy value of each frame of audio unit is calculated, the gunshot judgment is carried out according to the average energy value of each frame of audio unit in the audio combinations, the accurate identification of the gunshot sent out in a single gunshot is realized, a foundation is laid for the accurate vibration of the motor in continuous multiple gunshots, finally, when the audio combinations contain the gunshots, vibration sense judgment data corresponding to the audio combinations are determined, and when the vibration sense judgment data corresponding to the audio combinations are different, the, and a foundation is laid for the motor to execute vibration with different intensities according to the size of the gunshot.)

1. A method for identifying a gunshot in audio data, comprising:

acquiring audio data to be processed, wherein the audio data to be processed comprises a plurality of groups of audio combinations, each group of audio combinations comprises two adjacent audio units in time sequence, each audio unit comprises M audio sampling points, and M is a positive integer greater than 1;

filtering each frame of audio unit to filter the energy of the audio sampling points with the frequency higher than the preset frequency;

calculating the average energy value of each frame of audio unit after filtering;

and performing gunshot judgment on the audio combinations according to the average energy value of each frame of audio unit in each group of audio combinations, and determining vibration sense judgment data corresponding to the audio combinations when the audio combinations are judged to contain the gunshots.

2. The method of claim 1, wherein the audio combination comprises a first audio unit and a second audio unit that is later in time sequence.

3. The method as claimed in claim 2, wherein the performing the gunshot determination on the audio combination according to the average energy value of each frame of the audio unit in each set of the audio combination comprises:

determining the data type of a first audio unit in an audio combination to be judged according to the average energy value of the first audio unit in the audio combination to be judged, the average energy value of a second audio unit in the audio combination to be judged and the average energy value of a second audio unit in a last audio combination adjacent to the audio combination to be judged, wherein the data type comprises peak data and valley data;

judging whether the data type of a first audio unit in the audio combination to be judged is peak data or not, and if so, judging that the first audio unit in the audio combination to be judged contains a gunshot;

if not, judging whether the average energy value of the second audio unit in the audio combination to be judged is larger than a preset threshold value, if so, judging that the second audio unit in the audio combination to be judged contains the gunshot, and if not, judging that the audio combination to be judged does not contain the gunshot.

4. The method of claim 3, wherein the determining the vibration sense determination data corresponding to the audio combination comprises:

when the second audio unit in the audio combination to be judged comprises the gunshot, determining the value of the vibration sensation judgment data as the average energy value of the second audio unit in the audio combination to be judged; and/or

When the first audio unit in the audio combination to be judged comprises the gunshot, determining the value of the vibration sensation judgment data as the average energy value of the first audio unit in the audio combination to be judged.

5. The method as claimed in claim 3, wherein the determining the data type of the first audio unit in the audio combination to be determined according to the average energy value of the first audio unit in the audio combination to be determined, the average energy value of the second audio unit in the audio combination to be determined, and the average energy value of the second audio unit in the previous audio combination adjacent to the audio combination to be determined comprises:

when the average energy value of a first audio unit in the audio combination to be judged, the average energy value of a second audio unit in the audio combination to be judged and the average energy value of a second audio unit in a last audio combination adjacent to the audio combination to be judged meet a first inequality, judging the first audio unit in the audio combination to be judged as peak data;

the first inequality includes:

Figure FDA0002501973830000021

and/or

When the average energy value of a first audio unit in the audio combination to be determined, the average energy value of a second audio unit in the audio combination to be determined and the average energy value of a second audio unit in a last audio combination adjacent to the audio combination to be determined satisfy a second inequality, determining the first audio unit in the audio combination to be determined as valley data;

the second inequality includes:wherein n is larger than or equal to 2, AVE (n-1) represents the average energy value of the first audio unit in the audio combination to be judged, AVE (n) represents the average energy value of the second audio unit in the audio combination to be judged, and AVE (n-2) represents the average energy value of the second audio unit in the previous audio combination adjacent to the audio combination to be judged.

6. The method for identifying the gunshot in the audio data according to claim 1, wherein the acquiring the audio data to be processed comprises:

acquiring an original audio signal;

sampling the original audio signal to obtain an audio data stream;

framing the audio data stream by taking the M audio sampling points as step lengths to obtain a plurality of audio units, and performing 0 complementing treatment on the audio units with less than M audio sampling points;

and dividing two adjacent frame audio units in time sequence into a group of audio combinations, wherein the audio combinations form the audio data to be processed.

7. A method of driving a motor, comprising:

after the fact that the audio data to be processed contain the gunshot is determined, vibration sense judging data corresponding to each frame of audio combination in the audio data to be processed are determined, and the vibration sense judging data represent the amplitude of a vibration waveform of the vibration of a driving motor;

determining the vibration level corresponding to the audio frequency combination according to the value of the vibration sense judgment data corresponding to each frame of audio frequency combination;

while the audio combination is played, a drive motor executes a vibration waveform corresponding to a vibration level corresponding to the audio combination.

8. The method according to claim 7, wherein the determining the vibration level corresponding to the audio combination according to the magnitude of the vibration sensation determination data corresponding to each audio combination comprises:

when the vibration sensation judgment data is larger than a first preset value and smaller than a second preset value, determining the vibration level corresponding to the audio combination as a first level;

when the vibration sensation judgment data is larger than or equal to a second preset value and smaller than a third preset value, determining the vibration level corresponding to the audio frequency combination as a second level;

when the vibration sensation judgment data is larger than or equal to a third preset value and smaller than a fourth preset value, determining the vibration level corresponding to the audio combination as a third level;

when the vibration sensation judgment data is larger than or equal to a fourth preset value, determining the vibration level corresponding to the audio frequency combination as a fourth level;

the vibration intensity of the vibration waveform corresponding to the first level, the vibration intensity of the vibration waveform corresponding to the second level, the vibration intensity of the vibration waveform corresponding to the third level and the vibration intensity of the vibration waveform corresponding to the fourth level are sequentially increased.

9. A system for identifying gunshot in audio data, comprising:

the data acquisition module is used for acquiring audio data to be processed, wherein the audio data to be processed comprises a plurality of groups of audio combinations, each group of audio combinations comprises two adjacent frames of audio units in time sequence, each frame of audio unit comprises M audio sampling points, and M is a positive integer greater than 1;

the filtering processing module is used for filtering each frame of audio unit so as to filter the energy of the audio sampling points with the frequency higher than the preset frequency;

the energy calculation module is used for calculating the average energy value of each frame of audio unit after filtering processing;

and the gunshot judging module is used for carrying out gunshot judgment on the audio frequency combination according to the average energy value of each frame of audio frequency unit in the audio frequency combination by each group, and determining vibration sense judging data corresponding to the audio frequency combination when the audio frequency combination is judged to contain the gunshot.

10. The system of claim 9, wherein the audio combination comprises a first audio unit and a second audio unit that is chronologically subsequent.

11. The system of claim 10, wherein the gunshot determination module comprises:

the data type determining unit is used for determining the data type of a first audio unit in the audio combination to be judged according to the average energy value of the first audio unit in the audio combination to be judged, the average energy value of a second audio unit in the audio combination to be judged and the average energy value of a second audio unit in a last audio combination adjacent to the audio combination to be judged, wherein the data type comprises peak data and valley data;

the gunshot judging unit is used for judging whether the data type of a first audio unit in the audio combination to be judged is peak data or not, and if so, judging that the first audio unit in the audio combination to be judged contains the gunshot;

if not, judging whether the average energy value of the second audio unit in the audio combination to be judged is larger than a preset threshold value, if so, judging that the second audio unit in the audio combination to be judged contains the gunshot, and if not, judging that the audio combination to be judged does not contain the gunshot.

12. The system for recognizing gunshot in audio data as claimed in claim 11, wherein the determining the data type of the first audio unit in the audio combination to be determined according to the average energy value of the first audio unit in the audio combination to be determined, the average energy value of the second audio unit in the audio combination to be determined, and the average energy value of the second audio unit in the previous audio combination adjacent to the audio combination to be determined specifically comprises:

when the average energy value of a first audio unit in the audio combination to be judged, the average energy value of a second audio unit in the audio combination to be judged and the average energy value of a second audio unit in a last audio combination adjacent to the audio combination to be judged meet a first inequality, judging the first audio unit in the audio combination to be judged as peak data;

the first inequality includes:

and/or

When the average energy value of a first audio unit in the audio combination to be determined, the average energy value of a second audio unit in the audio combination to be determined and the average energy value of a second audio unit in a last audio combination adjacent to the audio combination to be determined satisfy a second inequality, determining the first audio unit in the audio combination to be determined as valley data;

the second inequality includes:

Figure FDA0002501973830000052

13. An electronic device, comprising: a processor and a storage medium; wherein the content of the first and second substances,

the storage medium having program code stored thereon;

the processor is configured to call program code stored on the storage medium to perform a method of identifying a gunshot in audio data according to any of claims 1-6.

14. The electronic device of claim 13, further comprising: a motor;

the processor is further configured to perform the steps of:

after the fact that the audio data to be processed contain the gunshot is determined, vibration sense judging data corresponding to each frame of audio combination in the audio data to be processed are determined, and the vibration sense judging data represent the amplitude of a vibration waveform of the vibration of a driving motor;

determining the vibration level corresponding to the audio frequency combination according to the value of the vibration sense judgment data corresponding to each frame of audio frequency combination;

while the audio combination is played, a drive motor executes a vibration waveform corresponding to a vibration level corresponding to the audio combination.

Technical Field

The present application relates to the field of data processing technologies, and in particular, to a method for identifying a gunshot in audio data, a method for driving a motor, and a related device.

Background

With the improvement of data processing capability of mobile intelligent devices, most mobile intelligent devices have functions of running various game programs, wherein shooting games become one of the mainstream games with simple and clear game rhythm.

In the shooting type game, a player controls the movement and shooting of characters to play a game with a specific task as a target, and in the process, in order to improve the real feeling of the shooting of the player, in addition to game sound played from a loudspeaker, more dimensional game feedback can be realized through vibration of a motor built in a mobile intelligent device.

However, in the prior art, the effect of identifying the gunshot when the gunshot is continuously performed is poor, and usually only the start time and the end time of the gunshot can be identified, after the start time and the end time of the gunshot are identified, the built-in motor in the mobile intelligent device can only continuously vibrate from the start time of the gunshot to the end time of the gunshot, so that the vibration of the motor cannot be matched with the actual gunshot time, and the requirement of a player on accurate vibration during continuous multiple times of gunshots cannot be met.

Disclosure of Invention

In order to solve the technical problem, the application provides a method for identifying a gunshot in audio data, a method for driving a motor and a related device, so as to solve the problem that the gunshot identification effect is poor in the prior art.

In order to achieve the technical purpose, the embodiment of the application provides the following technical scheme:

a method for identifying a gunshot in audio data, comprising:

acquiring audio data to be processed, wherein the audio data to be processed comprises a plurality of groups of audio combinations, each group of audio combinations comprises two adjacent audio units in time sequence, each audio unit comprises M audio sampling points, and M is a positive integer greater than 1;

filtering each frame of audio unit to filter the energy of the audio sampling points with the frequency higher than the preset frequency;

calculating the average energy value of each frame of audio unit after filtering;

and performing gunshot judgment on the audio combinations according to the average energy value of each frame of audio unit in each group of audio combinations, and determining vibration sense judgment data corresponding to the audio combinations when the audio combinations are judged to contain the gunshots.

Optionally, the audio combination includes a first audio unit and a chronologically subsequent second audio unit.

Optionally, the performing the gunshot determination on the audio combination according to the average energy value of each frame of the audio unit in each group of the audio combinations includes:

determining the data type of a first audio unit in an audio combination to be judged according to the average energy value of the first audio unit in the audio combination to be judged, the average energy value of a second audio unit in the audio combination to be judged and the average energy value of a second audio unit in a last audio combination adjacent to the audio combination to be judged, wherein the data type comprises peak data and valley data;

judging whether the data type of a first audio unit in the audio combination to be judged is peak data or not, and if so, judging that the first audio unit in the audio combination to be judged contains a gunshot;

if not, judging whether the average energy value of the second audio unit in the audio combination to be judged is larger than a preset threshold value, if so, judging that the second audio unit in the audio combination to be judged contains the gunshot, and if not, judging that the audio combination to be judged does not contain the gunshot.

Optionally, the determining of the vibration sensation determination data corresponding to the audio combination includes:

when the second audio unit in the audio combination to be judged comprises the gunshot, determining the value of the vibration sensation judgment data as the average energy value of the second audio unit in the audio combination to be judged; and/or

When the first audio unit in the audio combination to be judged comprises the gunshot, determining the value of the vibration sensation judgment data as the average energy value of the first audio unit in the audio combination to be judged.

Optionally, the determining the data type of the first audio unit in the audio combination to be determined according to the average energy value of the first audio unit in the audio combination to be determined, the average energy value of the second audio unit in the audio combination to be determined, and the average energy value of the second audio unit in the previous audio combination adjacent to the audio combination to be determined includes:

when the average energy value of a first audio unit in the audio combination to be judged, the average energy value of a second audio unit in the audio combination to be judged and the average energy value of a second audio unit in a last audio combination adjacent to the audio combination to be judged meet a first inequality, judging the first audio unit in the audio combination to be judged as peak data;

the first inequality includes:

and/or

When the average energy value of a first audio unit in the audio combination to be determined, the average energy value of a second audio unit in the audio combination to be determined and the average energy value of a second audio unit in a last audio combination adjacent to the audio combination to be determined satisfy a second inequality, determining the first audio unit in the audio combination to be determined as valley data;

the second inequality includes:

Figure BDA0002501973840000032

wherein n is larger than or equal to 2, AVE (n-1) represents the average energy value of the first audio unit in the audio combination to be judged, AVE (n) represents the average energy value of the second audio unit in the audio combination to be judged, and AVE (n-2) represents the average energy value of the second audio unit in the previous audio combination adjacent to the audio combination to be judged.

Optionally, the acquiring the audio data to be processed includes:

acquiring an original audio signal;

sampling the original audio signal to obtain an audio data stream;

framing the audio data stream by taking the M audio sampling points as step lengths to obtain a plurality of audio units, and performing 0 complementing treatment on the audio units with less than M audio sampling points;

and dividing two adjacent frame audio units in time sequence into a group of audio combinations, wherein the audio combinations form the audio data to be processed.

A method of driving a motor, comprising:

after the fact that the audio data to be processed contain the gunshot is determined, vibration sense judging data corresponding to each frame of audio combination in the audio data to be processed are determined, and the vibration sense judging data represent the amplitude of a vibration waveform of the vibration of a driving motor;

determining the vibration level corresponding to the audio frequency combination according to the value of the vibration sense judgment data corresponding to each frame of audio frequency combination;

while the audio combination is played, a drive motor executes a vibration waveform corresponding to a vibration level corresponding to the audio combination.

Optionally, the determining, according to the magnitude of the vibration sensation determination data corresponding to each frame of audio combination, a vibration level corresponding to the audio combination includes:

when the vibration sensation judgment data is larger than a first preset value and smaller than a second preset value, determining the vibration level corresponding to the audio combination as a first level;

when the vibration sensation judgment data is larger than or equal to a second preset value and smaller than a third preset value, determining the vibration level corresponding to the audio frequency combination as a second level;

when the vibration sensation judgment data is larger than or equal to a third preset value and smaller than a fourth preset value, determining the vibration level corresponding to the audio combination as a third level;

when the vibration sensation judgment data is larger than or equal to a fourth preset value, determining the vibration level corresponding to the audio frequency combination as a fourth level;

the vibration intensity of the vibration waveform corresponding to the first level, the vibration intensity of the vibration waveform corresponding to the second level, the vibration intensity of the vibration waveform corresponding to the third level and the vibration intensity of the vibration waveform corresponding to the fourth level are sequentially increased.

A system for identifying gunshot in audio data, comprising:

the data acquisition module is used for acquiring audio data to be processed, wherein the audio data to be processed comprises a plurality of groups of audio combinations, each group of audio combinations comprises two adjacent frames of audio units in time sequence, each frame of audio unit comprises M audio sampling points, and M is a positive integer greater than 1;

the filtering processing module is used for filtering each frame of audio unit so as to filter the energy of the audio sampling points with the frequency higher than the preset frequency;

the energy calculation module is used for calculating the average energy value of each frame of audio unit after filtering processing;

and the gunshot judging module is used for carrying out gunshot judgment on the audio frequency combination according to the average energy value of each frame of audio frequency unit in the audio frequency combination by each group, and determining vibration sense judging data corresponding to the audio frequency combination when the audio frequency combination is judged to contain the gunshot.

Optionally, the audio combination includes a first audio unit and a chronologically subsequent second audio unit.

Optionally, the gunshot determination module includes:

the data type determining unit is used for determining the data type of a first audio unit in the audio combination to be judged according to the average energy value of the first audio unit in the audio combination to be judged, the average energy value of a second audio unit in the audio combination to be judged and the average energy value of a second audio unit in a last audio combination adjacent to the audio combination to be judged, wherein the data type comprises peak data and valley data;

the gunshot judging unit is used for judging whether the data type of a first audio unit in the audio combination to be judged is peak data or not, and if so, judging that the first audio unit in the audio combination to be judged contains the gunshot;

if not, judging whether the average energy value of the second audio unit in the audio combination to be judged is larger than a preset threshold value, if so, judging that the second audio unit in the audio combination to be judged contains the gunshot, and if not, judging that the audio combination to be judged does not contain the gunshot.

Optionally, the process of determining, by the data type determining unit, the data type of the first audio unit in the audio combination to be determined according to the average energy value of the first audio unit in the audio combination to be determined, the average energy value of the second audio unit in the audio combination to be determined, and the average energy value of the second audio unit in the previous audio combination adjacent to the audio combination to be determined specifically includes:

when the average energy value of a first audio unit in the audio combination to be judged, the average energy value of a second audio unit in the audio combination to be judged and the average energy value of a second audio unit in a last audio combination adjacent to the audio combination to be judged meet a first inequality, judging the first audio unit in the audio combination to be judged as peak data;

the first inequality includes:

and/or

When the average energy value of a first audio unit in the audio combination to be determined, the average energy value of a second audio unit in the audio combination to be determined and the average energy value of a second audio unit in a last audio combination adjacent to the audio combination to be determined satisfy a second inequality, determining the first audio unit in the audio combination to be determined as valley data;

the second inequality includes:wherein n is more than or equal to 2, AVE (n-1) represents the average energy value of the first audio unit in the audio combination to be determined, and AVE (n) represents the average energy valueAVE (n-2) represents the average energy value of the second audio unit in the previous audio combination adjacent to the audio combination to be determined.

An electronic device, comprising: a processor and a storage medium; wherein the content of the first and second substances,

the storage medium having program code stored thereon;

the processor is configured to call program code stored on the storage medium to execute the method for identifying a gunshot in audio data according to any one of the above.

Optionally, the method further includes: a motor;

the processor is further configured to perform the steps of:

after the fact that the audio data to be processed contain the gunshot is determined, vibration sense judging data corresponding to each frame of audio combination in the audio data to be processed are determined, and the vibration sense judging data represent the amplitude of a vibration waveform of the vibration of a driving motor;

determining the vibration level corresponding to the audio frequency combination according to the value of the vibration sense judgment data corresponding to each frame of audio frequency combination;

while the audio combination is played, a drive motor executes a vibration waveform corresponding to a vibration level corresponding to the audio combination.

It can be seen from the above technical solutions that the present application provides a method for identifying a gunshot in audio data, a method for driving a motor, and a related device, wherein the method for identifying a gunshot in audio data first obtains audio data to be processed, the audio data to be processed includes a plurality of sets of audio combinations, each set of audio combination includes two frames of audio units, and is performed in a subsequent gunshot identification process by taking a frame as a unit, so as to provide a basis for accurate identification of a single gunshot, then calculates an average energy value of each frame of audio unit, and performs gunshot determination according to the average energy value of each frame of audio unit in the audio combination, thereby realizing accurate identification of a gunshot emitted during a single gunshot, laying a basis for accurate vibration of a motor during multiple continuous gunshots, and finally determining vibration determination data corresponding to the audio combination when the audio combination includes a gunshot, when the vibration sense judgment data corresponding to the audio combination is different, the vibration intensity corresponding to the motor can be different, and a foundation is laid for the motor to execute vibration with different intensities according to the size of the gunshot.

In addition, before the average energy value of the audio unit is calculated, the method for identifying the gunshot in the audio data also carries out filtering processing on each frame of audio unit, and filters out the energy of the audio sampling point with the frequency higher than the preset frequency, so that the interference of other signals with higher frequencies in the audio unit on the identification of the gunshot is filtered out, and the accuracy of the gunshot identification is improved.

Drawings

In order to more clearly illustrate the embodiments of the present application or the technical solutions in the prior art, the drawings needed to be used in the description of the embodiments or the prior art will be briefly introduced below, it is obvious that the drawings in the following description are only embodiments of the present application, and for those skilled in the art, other drawings can be obtained according to the provided drawings without creative efforts.

Fig. 1 is a schematic flowchart illustrating a method for identifying a gunshot in audio data according to an embodiment of the present application;

fig. 2 is a schematic flowchart of a method for identifying a gunshot in audio data according to another embodiment of the present application;

fig. 3 is a schematic flow chart of a driving method of a motor according to an embodiment of the present application;

fig. 4 is a schematic structural diagram of a system for recognizing a gunshot in audio data according to an embodiment of the present application;

fig. 5 is a schematic structural diagram of a system for recognizing a gunshot in audio data according to another embodiment of the present application.

Detailed Description

The technical solutions in the embodiments of the present application will be clearly and completely described below with reference to the drawings in the embodiments of the present application, and it is obvious that the described embodiments are only a part of the embodiments of the present application, and not all of the embodiments. All other embodiments, which can be derived by a person skilled in the art from the embodiments given herein without making any creative effort, shall fall within the protection scope of the present application.

The embodiment of the application provides a method for identifying a gunshot in audio data, as shown in fig. 1, the method includes:

s101: acquiring audio data to be processed, wherein the audio data to be processed comprises a plurality of groups of audio combinations, each group of audio combinations comprises two adjacent frames of audio units in time sequence, each frame of audio unit comprises M audio sampling points, and M is a positive integer greater than 1;

optionally, the value of M may be 1024 or an integer power of 2 such as 512, that is, each frame of the audio unit may include 1024 audio sampling points, and the energy of the audio sampling points represents the intensity of sound in the audio.

S102: filtering each frame of audio unit to filter the energy of the audio sampling points with the frequency higher than the preset frequency;

the purpose of filtering each frame of the audio unit is to filter out high-frequency components of audio sampling points which may cause interference to gunshot recognition by the audio unit, and generally, the frequencies of the audio sampling points of the gunshot in the audio unit are mostly concentrated in the range of 60Hz to 200Hz, so that the value of the preset frequency can be 225Hz, that is, the energy of the audio sampling points with the frequencies higher than 225Hz is filtered out, so that most of voices and background sounds in the audio unit can be filtered out. Of course, in other embodiments of the present application, the value of the preset frequency may also be 220Hz, 210Hz, 230Hz, and the like, which is not limited in the present application.

In addition, in step S102, when the filtering process is performed on each frame of the audio unit, only the high-frequency energy of the audio sampling points with the frequency higher than the preset frequency may be filtered, that is, the energy of the audio sampling points with the frequency higher than the preset frequency is limited below the preset value, and the energy of the audio sampling points with the frequency higher than the preset frequency does not need to be completely filtered.

S103: calculating the average energy value of each frame of audio unit after filtering;

specifically, step S103 may specifically include:

taking the absolute value of the value of each frame of audio sampling point of each frame of audio unit after filtering;

summing absolute values of each frame of audio sampling points in each frame of audio unit, and dividing the summed value by the number M of the audio sampling points in each frame of audio unit to obtain an average energy value of the audio unit.

S104: and performing gunshot judgment on the audio combination according to the average energy value of each frame of the audio unit in the audio combination, and determining vibration sense judgment data corresponding to the audio combination when the audio combination is judged to contain the gunshot.

The method for identifying the gunshot in the audio data comprises the steps of firstly obtaining audio data to be processed, wherein the audio data to be processed comprises a plurality of groups of audio combinations, each group of audio combinations comprises two frames of audio units, the method is carried out in the following gunshot recognition process by taking a frame as a unit, provides a basis for the accurate recognition of a single gunshot, then calculating the average energy value of each frame of audio unit, judging the gunshot according to the average energy value of each frame of audio unit in the audio combination, realizing the accurate identification of the gunshot emitted during single gunshot, laying a foundation for the accurate vibration of a motor during continuous multiple gunshots, and finally determining vibration sense judgment data corresponding to the audio combination when the audio combination contains the gunshot, when the vibration sense judgment data corresponding to the audio combination is different, the vibration intensity corresponding to the motor can be different, and a foundation is laid for the motor to execute vibration with different intensities according to the size of the gunshot.

As described below in the following, a specific logic of performing the gunshot determination in the embodiment of the present application is optionally, in an embodiment of the present application, the audio combination includes a first audio unit and a second audio unit that is chronologically subsequent;

for example, assuming that the audio combination includes a K frame audio unit and a K +1 frame audio unit, the K frame audio unit that is chronologically preceding is the first audio unit, and the K +1 frame audio unit that is chronologically succeeding is the second audio unit.

As shown in fig. 2, step S104 specifically includes:

s1041: determining the data type of a first audio unit in an audio combination to be judged according to the average energy value of the first audio unit in the audio combination to be judged, the average energy value of a second audio unit in the audio combination to be judged and the average energy value of a second audio unit in a last audio combination adjacent to the audio combination to be judged, wherein the data type comprises peak data and valley data;

s1042: judging whether the data type of a first audio unit in the audio combination to be judged is peak data or not, and if so, judging that the first audio unit in the audio combination to be judged contains a gunshot;

if not, judging whether the average energy value of the second audio unit in the audio combination to be judged is larger than a preset threshold value, if so, judging that the second audio unit in the audio combination to be judged contains the gunshot, and if not, judging that the audio combination to be judged does not contain the gunshot.

S1043: and when the audio combination is judged to contain the gunshot, determining vibration sense judgment data corresponding to the audio combination.

In this embodiment, when the first audio unit and the second audio unit in each group of audio combinations in the audio data to be processed are plotted into a curve with time as abscissa and average energy value as ordinate, the first audio unit corresponding to the maximum point on the curve is a peak of the curve, which is referred to as peak data, and the first audio unit corresponding to the minimum point in the curve is a valley of the curve, which is referred to as valley data. It can be seen that the average energy value of the first audio unit corresponding to the wave number data is large and may include a gunshot, while the average energy value of the first audio unit corresponding to the valley data is small and generally does not include a gunshot.

When the average energy value of the second audio unit in the audio combination to be determined is greater than the preset threshold value, the average energy value of the second audio unit is larger, and after filtering processing is performed on each frame of the audio unit, the high-frequency component of each audio sampling point in each frame of the audio unit is filtered, and the second audio unit with the average energy value still greater than the preset threshold value can be determined as the audio unit containing the gunshot.

A feasible determining process of the vibration sensation determination data corresponding to the audio combination is described below, and optionally, step S1043 specifically includes:

s10431: when the second audio unit in the audio combination to be judged comprises the gunshot, determining the value of the vibration sensation judgment data as the average energy value of the second audio unit in the audio combination to be judged;

and/or

S10432: when the first audio unit in the audio combination to be judged comprises the gunshot, determining the value of the vibration sensation judgment data as the average energy value of the first audio unit in the audio combination to be judged.

In this embodiment, the average energy value of the audio unit including the gunshot in the audio combination to be determined is determined as the value of the vibration sense determination data, which is beneficial to making the value of the vibration sense determination data closer to the size of the gunshot, and providing a basis for a subsequent motor to execute vibration of a corresponding size according to the vibration sense determination data.

A specific feasible step of step S1041 is described below, where step S1041 specifically includes:

when the average energy value of a first audio unit in the audio combination to be judged, the average energy value of a second audio unit in the audio combination to be judged and the average energy value of a second audio unit in a last audio combination adjacent to the audio combination to be judged meet a first inequality, judging the first audio unit in the audio combination to be judged as peak data;

the first inequality includes:

Figure BDA0002501973840000101

and/or

When the average energy value of a first audio unit in the audio combination to be determined, the average energy value of a second audio unit in the audio combination to be determined and the average energy value of a second audio unit in a last audio combination adjacent to the audio combination to be determined satisfy a second inequality, determining the first audio unit in the audio combination to be determined as valley data;

the second inequality includes:wherein n is larger than or equal to 2, AVE (n-1) represents the average energy value of the first audio unit in the audio combination to be judged, AVE (n) represents the average energy value of the second audio unit in the audio combination to be judged, and AVE (n-2) represents the average energy value of the second audio unit in the previous audio combination adjacent to the audio combination to be judged.

As can be seen from the first inequality, when the first audio unit in the audio combination to be determined is the maximum point in the audio data to be processed, we can determine it as the peak point, and the peak point may be the gunshot (it needs to be finally determined whether the first audio unit includes the gunshot or the second audio unit includes the gunshot in combination with the magnitude relationship between the second audio unit and the preset threshold). This is in accordance with the waveform before and after the occurrence of the gunshot at the time of shooting.

As can be seen from the second inequality, when the second audio unit in the audio combination to be determined is a minimum point in the audio data to be processed, we can determine it as a valley point, and the valley point corresponds to a location in the audio data to be processed where the sound is small, and the shot sound is not usually included here.

On the basis of the foregoing embodiment, in an optional embodiment of the present application, a specific obtaining method of audio data to be processed is provided, and optionally, step S101 specifically includes:

s1011: acquiring an original audio signal;

s1012: sampling the original audio signal to obtain an audio data stream;

s1013: framing the audio data stream by taking the M audio sampling points as step lengths to obtain a plurality of audio units, and performing 0 complementing treatment on the audio units with less than M audio sampling points;

s1014: and dividing two adjacent frame audio units in time sequence into a group of audio combinations, wherein the audio combinations form the audio data to be processed.

In this embodiment, the original audio signal may be a game audio signal obtained from a mobile phone or similar mobile smart device.

In step S1012, the sampling rate may be 48K and the sampling depth may be 16 bits.

In step S1013, the audio unit that needs to be subjected to the 0 complementing process is usually the last audio unit, that is, when the audio data stream cannot be evenly divided by M, the number of audio sampling points of the last audio unit is less than M, and is complemented by 0.

Correspondingly, an embodiment of the present application further provides a driving method of a motor, as shown in fig. 3, including:

s201: after the fact that the audio data to be processed contain the gunshot is determined, vibration sense judging data corresponding to each frame of audio combination in the audio data to be processed are determined, and the vibration sense judging data represent the amplitude of a vibration waveform of the vibration of a driving motor;

s202: determining the vibration level corresponding to the audio frequency combination according to the value of the vibration sense judgment data corresponding to each frame of audio frequency combination;

s203: while the audio combination is played, a drive motor executes a vibration waveform corresponding to a vibration level corresponding to the audio combination.

In this embodiment, in the driving process of the motor, the vibration level corresponding to the audio combination is determined according to the value of the vibration sense judgment data corresponding to each frame of audio combination, so that the purpose of executing vibration of different vibration levels according to the size of the gunshot is achieved, and the vibration sense feedback of the user is enriched.

Optionally, step S202 specifically includes:

s2021: when the vibration sensation judgment data is larger than a first preset value and smaller than a second preset value, determining the vibration level corresponding to the audio combination as a first level;

s2022: when the vibration sensation judgment data is larger than or equal to a second preset value and smaller than a third preset value, determining the vibration level corresponding to the audio frequency combination as a second level;

s2023: when the vibration sensation judgment data is larger than or equal to a third preset value and smaller than a fourth preset value, determining the vibration level corresponding to the audio combination as a third level;

s2024: when the vibration sensation judgment data is larger than or equal to a fourth preset value, determining the vibration level corresponding to the audio frequency combination as a fourth level;

the vibration intensity of the vibration waveform corresponding to the first level, the vibration intensity of the vibration waveform corresponding to the second level, the vibration intensity of the vibration waveform corresponding to the third level and the vibration intensity of the vibration waveform corresponding to the fourth level are sequentially increased.

In this embodiment, the four preset values are set to distinguish the four vibration levels, and certainly, in some optional embodiments of the present application, more vibration levels may be set according to actual requirements, which is not limited in the present application.

The system for recognizing the gunshot in the audio data provided in the embodiment of the present application is described below, and the system for recognizing the gunshot in the audio data described below and the method for recognizing the gunshot in the audio data described above may be referred to in correspondence with each other.

Accordingly, an embodiment of the present application provides a system for identifying a gunshot in audio data, as shown in fig. 4, including:

the data acquisition module 100 is configured to acquire audio data to be processed, where the audio data to be processed includes multiple groups of audio combinations, each group of audio combination includes two frames of audio units adjacent in time sequence, each frame of audio unit includes M audio sampling points, and M is a positive integer greater than 1;

the filtering processing module 200 is configured to perform filtering processing on each frame of the audio unit to filter out energy of audio sampling points with frequencies higher than a preset frequency;

an energy calculating module 300, configured to calculate an average energy value of each frame of audio unit after filtering processing;

the gunshot determination module 400 is configured to perform gunshot determination on the audio combination according to the average energy value of each frame of the audio unit in the audio combination, and determine vibration sense determination data corresponding to the audio combination when it is determined that the audio combination includes a gunshot.

Optionally, the audio combination includes a first audio unit and a chronologically subsequent second audio unit.

Optionally, as shown in fig. 5, the gunshot determination module 400 includes:

a data type determining unit 410, configured to determine a data type of a first audio unit in an audio combination to be determined according to an average energy value of the first audio unit in the audio combination to be determined, an average energy value of a second audio unit in the audio combination to be determined, and an average energy value of a second audio unit in a previous audio combination adjacent to the audio combination to be determined, where the data type includes peak data and valley data;

a gunshot judging unit 420, configured to judge whether a data type of a first audio unit in the audio combination to be judged is peak data, and if so, judge that the first audio unit in the audio combination to be judged contains a gunshot;

if not, judging whether the average energy value of the second audio unit in the audio combination to be judged is larger than a preset threshold value, if so, judging that the second audio unit in the audio combination to be judged contains the gunshot, and if not, judging that the audio combination to be judged does not contain the gunshot.

Optionally, the process of determining the data type of the first audio unit in the audio combination to be determined by the data type determining unit 410 according to the average energy value of the first audio unit in the audio combination to be determined, the average energy value of the second audio unit in the audio combination to be determined, and the average energy value of the second audio unit in the previous audio combination adjacent to the audio combination to be determined specifically includes:

when the average energy value of a first audio unit in the audio combination to be judged, the average energy value of a second audio unit in the audio combination to be judged and the average energy value of a second audio unit in a last audio combination adjacent to the audio combination to be judged meet a first inequality, judging the first audio unit in the audio combination to be judged as peak data;

the first inequality includes:

Figure BDA0002501973840000141

and/or

When the average energy value of a first audio unit in the audio combination to be determined, the average energy value of a second audio unit in the audio combination to be determined and the average energy value of a second audio unit in a last audio combination adjacent to the audio combination to be determined satisfy a second inequality, determining the first audio unit in the audio combination to be determined as valley data;

the second inequality includes:wherein n is larger than or equal to 2, AVE (n-1) represents the average energy value of the first audio unit in the audio combination to be judged, AVE (n) represents the average energy value of the second audio unit in the audio combination to be judged, and AVE (n-2) represents the average energy value of the second audio unit in the previous audio combination adjacent to the audio combination to be judged.

Correspondingly, an embodiment of the present application further provides an electronic device, including: a processor and a storage medium; wherein the content of the first and second substances,

the storage medium having program code stored thereon;

the processor is configured to call the program code stored on the storage medium to execute the method for identifying a gunshot in audio data according to any of the embodiments.

Optionally, the electronic device further includes: a motor;

the processor is further configured to perform the steps of:

after the fact that the audio data to be processed contain the gunshot is determined, vibration sense judging data corresponding to each frame of audio combination in the audio data to be processed are determined, and the vibration sense judging data represent the amplitude of a vibration waveform of the vibration of a driving motor;

determining the vibration level corresponding to the audio frequency combination according to the value of the vibration sense judgment data corresponding to each frame of audio frequency combination;

while the audio combination is played, a drive motor executes a vibration waveform corresponding to a vibration level corresponding to the audio combination.

In summary, the embodiment of the present application provides a method for identifying a gunshot in audio data, a method for driving a motor, and a related device, wherein the method for identifying a gunshot in audio data first obtains audio data to be processed, the audio data to be processed includes a plurality of sets of audio combinations, each set of audio combination includes two frames of audio units, and the audio combinations are performed in the following process of identifying the gunshot by taking frames as units, so as to provide a basis for accurate identification of a single gunshot, then calculate an average energy value of each frame of audio unit, and perform gunshot judgment according to the average energy value of each frame of audio unit in the audio combinations, thereby realizing accurate identification of the gunshot emitted during a single gunshot, laying a basis for accurate vibration of the motor during multiple continuous gunshots, and finally determining vibration judgment data corresponding to the audio combinations when the audio combinations include the gunshots, when the vibration sense judgment data corresponding to the audio combination is different, the vibration intensity corresponding to the motor can be different, and a foundation is laid for the motor to execute vibration with different intensities according to the size of the gunshot.

In addition, before the average energy value of the audio unit is calculated, the method for identifying the gunshot in the audio data also carries out filtering processing on each frame of audio unit, and filters out the energy of the audio sampling point with the frequency higher than the preset frequency, so that the interference of other signals with higher frequencies in the audio unit on the identification of the gunshot is filtered out, and the accuracy of the gunshot identification is improved.

In the present specification, the embodiments are described in a progressive manner, each embodiment focuses on differences from other embodiments, and the same and similar parts among the embodiments are referred to each other.

The previous description of the disclosed embodiments is provided to enable any person skilled in the art to make or use the present application. Various modifications to these embodiments will be readily apparent to those skilled in the art, and the generic principles defined herein may be applied to other embodiments without departing from the spirit or scope of the application. Thus, the present application is not intended to be limited to the embodiments shown herein but is to be accorded the widest scope consistent with the principles and novel features disclosed herein.

20页详细技术资料下载
上一篇:一种医用注射器针头装配设备
下一篇:一种切换转向控制的仿真动物骑行装置

网友询问留言

已有0条留言

还没有人留言评论。精彩留言会获得点赞!

精彩留言,会给你点赞!

技术分类