Outdoor sound acquisition and storage equipment and sound processing method

文档序号:70702 发布日期:2021-10-01 浏览:45次 中文

阅读说明:本技术 一种野外声音采集与存储设备及声音处理方法 (Outdoor sound acquisition and storage equipment and sound processing method ) 是由 李松斌 刘鹏 林道友 张遥 于 2021-06-24 设计创作,主要内容包括:本发明属于声音采集设备与声音信号处理技术领域,具体地说,涉及一种野外声音采集与存储设备,其包括:方形壳体、麦克风、野外声音采集与存储电路板和可充电电池组(40);方形壳体内设置野外声音采集与存储电路板和可充电电池组(40),麦克风设置在方形壳体的外壁上;方形壳体的两侧壁上分别沿垂直于侧壁的方向向外延伸固定耳(34),所述固定耳(34)为多孔结构,通过固定装置与固定耳(34)固定,将整个设备单独固定在数据采集点或野外环境中,或多个设备彼此通过固定耳(34)连接后,再通过固定装置穿过固定耳(34)固定在数据采集点或野外环境中,实时采集野外声音。(The invention belongs to the technical field of sound acquisition equipment and sound signal processing, and particularly relates to field sound acquisition and storage equipment, which comprises: the device comprises a square shell, a microphone, a field sound collecting and storing circuit board and a rechargeable battery pack (40); a field sound acquisition and storage circuit board and a rechargeable battery pack (40) are arranged in the square shell, and a microphone is arranged on the outer wall of the square shell; the two side walls of the square shell are respectively provided with fixing lugs (34) which extend outwards along the direction perpendicular to the side walls, the fixing lugs (34) are of a porous structure and are fixed with the fixing lugs (34) through fixing devices, the whole equipment is independently fixed in a data acquisition point or a field environment, or after a plurality of pieces of equipment are mutually connected through the fixing lugs (34), the equipment penetrates through the fixing lugs (34) to be fixed in the data acquisition point or the field environment through the fixing devices, and field sound is acquired in real time.)

1. A field sound collection and storage device, comprising: the device comprises a square shell, a microphone, a field sound collecting and storing circuit board and a rechargeable battery pack (40);

a field sound acquisition and storage circuit board and a rechargeable battery pack (40) are arranged in the square shell, and a microphone is arranged on the outer wall of the square shell;

the two side walls of the square shell are respectively provided with fixing lugs (34) which extend outwards along the direction perpendicular to the side walls, the fixing lugs (34) are of a porous structure and are fixed with the fixing lugs (34) through fixing devices, the whole equipment is independently fixed in a data acquisition point or a field environment, or after a plurality of pieces of equipment are mutually connected through the fixing lugs (34), the equipment penetrates through the fixing lugs (34) to be fixed in the data acquisition point or the field environment through the fixing devices, and field sound is acquired in real time.

2. The field sound collection and storage device of claim 1, wherein the square housing is a three-layer structure comprising: an upper cover (10), a middle layer cover (20) and a bottom shell (30); the middle layer cover (20) is positioned between the upper cover (10) and the bottom shell (30), the upper cover (10) is positioned above the middle layer cover (20), and the bottom shell (30) is positioned below the middle layer cover (20);

the upper cover (10), the middle layer cover (20) and the bottom shell (30) are all cuboid waterproof structures with openings, and the interiors of the upper cover, the middle layer cover and the bottom shell are all hollow structures; the middle layer cover (20) is nested in the upper cover (10), the opening directions of the upper cover (10) and the middle layer cover (20) are the same, the opening direction of the bottom shell (30) is opposite to the opening direction of the upper cover (10), and the opening direction of the bottom shell (30) is opposite to the opening direction of the middle layer cover (20);

the same sides of the three are respectively provided with corresponding rotating shaft fixing seats, the rotating shaft (50) penetrates through and is fixed on the respective rotating shaft fixing seats of the three, the three are connected together, and the three can be rotated around the rotating shaft (50) to be opened or closed.

3. The field sound collection and storage device according to claim 2, wherein the middle part of the outer wall of the upper cover (10) is provided with a first locking part (11), the middle part of the outer wall of the middle layer (20) is provided with a second locking part (25), and the middle part of the outer wall of the bottom shell (30) is provided with a third locking part (32); the first locking part (11) passes through the second locking part (25) and then is in locking fit connection with the third locking part (32);

the opening part of upper cover (10) outwards extends first extension rectangle section all around, and the opening part of middle level lid (20) outwards extends second extension rectangle section all around, and the opening part of drain pan (30) outwards extends third extension rectangle section all around, and first extension rectangle section lid is on second extension rectangle section, and second extension rectangle section lid is on third extension rectangle section, and three extension rectangle sections are range upon range of setting.

4. The field sound collection and storage device of claim 2, wherein two side walls of said bottom housing (30) have fixing lugs (34) extending outwardly in a direction perpendicular to said side walls;

the other two side walls of the bottom shell (30) are respectively and symmetrically provided with detachable microphone fixing components (31), and each microphone is arranged in the detachable microphone fixing components (31);

the removable microphone securing assembly (31) comprises: a microphone fixing tube (311) and a microphone cover (312);

a first end (3111) of the microphone fixing tube (311) extends outwards in a step-shaped structure along the radial direction, and extends outwards in an axial direction based on the step-shaped structure to form a cylindrical section, and an external thread is additionally arranged on the outer circumference of the cylindrical section to form a bolt which is in threaded fit connection with an internal thread arranged on the side wall of the bottom shell (30);

the microphone is arranged in the microphone fixing tube (311), a boss extends outwards from the second end (3112) of the microphone fixing tube (311) along the axial direction, the microphone cover (312) is covered on the boss and is in press fit connection with the second end (3112) of the microphone fixing tube (311), and the inner wall of the microphone cover (312) is provided with a sound-transmitting waterproof membrane.

5. The field sound collection and storage device of claim 1, wherein the sound collection and storage circuit board comprises: the device comprises a human-computer interaction module, an adjustable gain module, a voice coding and decoding module, an MCU (microprogrammed control unit) central control module, a mobile memory and a power management module;

the human-computer interaction module is used for displaying the equipment state in real time and changing the setting according to the displayed equipment state;

the adjustable gain module is used for selecting a corresponding mode according to the on-off states of the two groups of dial control switches, amplifying the voice analog signal output by the microphone and then transmitting the amplified voice analog signal to the voice coding and decoding module, or directly transmitting the voice analog signal output by the microphone to the voice coding and decoding module;

the voice coding and decoding module is used for coding and decoding the received amplified voice analog signal, converting the received amplified voice analog signal into a voice digital signal and sending the voice digital signal to the MCU central control module, or converting the received voice analog signal into a sound digital signal and sending the sound digital signal to the MCU central control module;

the MCU central control module is used for processing the voice digital signal or the sound digital signal to obtain a corresponding processed signal;

the mobile memory is used for storing the processed signals, packaging the processed signals into WAV lossless format data and writing the WAV lossless format data into the mobile memory;

and the power supply management module is used for supplying power to each module in the sound acquisition and storage circuit board.

6. The field sound collection and storage device of claim 5, wherein the adjustable gain module comprises: the high gain circuit comprises a first group of dial control switches, a high gain circuit and a second group of dial control switches; the first group of dial control switches and the second group of dial control switches are correspondingly arranged at the input end and the output end of the high-gain circuit;

the first group of dial control switches and the second group of dial control switches are both in a closed state, namely in a first mode, the high-gain circuit is switched on, the microphone converts the collected sound signal to obtain a voice analog signal, the signal is an electric signal, the converted voice analog signal is input to the high-gain circuit to be amplified to obtain an amplified signal, and the amplified signal is sent to the voice coding and decoding module;

the first group of dial control switches and the second group of dial control switches are in an off state, namely in a mode two, the high-gain circuit is not conducted, and the microphone directly inputs the converted voice analog signals to the voice editing and decoding module.

7. The field sound collection and storage device of claim 5, wherein the sound collection and storage circuit board further comprises: and the sound processing module is used for denoising the signal processed by the MCU central control module stored in the mobile memory based on the auditory masking effect to obtain a denoised signal, then carrying out active frame detection on the denoised signal based on a three-parameter endpoint detection algorithm, extracting the MFCC and LogFBank characteristics of the voice segment, and combining adjacent segments to generate an audio summary based on cosine similarity.

8. The field sound collection and storage device of claim 7, wherein the sound processing module comprises the specific processes of:

collecting sound signals based on field sound collection and storage equipment by using a timestamp as a file name, performing analog-to-digital conversion to obtain voice digital signals, storing the voice digital signals, and creating a current recording file;

reading the sound digital signals of each frame in the current sound recording file, and calculating the critical band power and the noise coefficient of the sound digital signals frame by frame;

calculating the spread spectrum power of each frame of sound digital signal according to the critical band power of each frame of sound digital signal;

calculating a masking coefficient of each frame of sound digital signal according to the noise coefficient and the spread spectrum power of each frame of sound digital signal;

calculating a masking threshold according to the masking coefficient and the absolute hearing threshold of the human ear;

denoising the sound digital signals of each frame by using a masking threshold value to obtain enhanced voice signals of each frame;

calculating the short-time energy, the short-time average zero crossing rate and the noise coefficient of each frame of the enhanced voice signal frame by frame;

jointly judging whether the short-time energy, the short-time average zero-crossing rate and the noise coefficient of each frame of the enhanced voice signal meet preset double-threshold conditions or not;

determining the starting point and the end point of the audio frequency in each frame of the enhanced voice signal according to the judgment result to obtain a plurality of independent voice sections;

extracting the average Mel frequency cepstrum coefficient characteristics and LogFBank characteristics of all voice frames in each voice section to form voice section characteristic vectors;

and calculating the cosine distance between the feature vectors of every two adjacent voice sections, and merging the two voice sections when the cosine distance reaches a preset threshold value to finally obtain a plurality of audio summaries.

9. A sound processing method implemented by the field sound collection and storage device of any one of claims 1 to 8, the method comprising:

step 1) detecting that a recording key arranged in the field sound acquisition and storage equipment is pressed down, initializing, and then detecting whether a mobile memory arranged on a sound acquisition and storage circuit board exists or not;

if the mobile memory does not exist, prompting to insert the mobile memory;

if the mobile memory exists, opening up a cache space and entering a recording mode;

step 2) further judging whether the size of the residual storage space of the mobile memory is enough to store an audio file;

if the size of the residual storage space of the mobile memory is not enough to store an audio file, closing the recording key and prompting to replace the mobile memory;

if the size of the residual storage space of the mobile memory is enough to store an audio file, the timestamp is used as a file name, a cyclic recording state is entered, after a recording file of 30 minutes is generated, the current file writing operation is finished, a recording file is generated, the recording file is processed and audio extraction is carried out, and a plurality of audio abstracts are obtained;

then creating a new recording file, and continuing to perform the recording and audio extraction operation to obtain a plurality of audio abstracts; wherein, a sound recording file is created every 30 minutes;

and 3) if the stop key is detected to be pressed, returning the file pointer to the file header, updating header file information, releasing the memory space, and ending sound recording.

10. The sound processing method of claim 9, wherein the sound recording file is processed and audio extracted to obtain a plurality of audio digests; the specific process comprises the following steps:

collecting sound signals based on field sound collection and storage equipment by using a timestamp as a file name, performing analog-to-digital conversion to obtain voice digital signals, storing the voice digital signals, and creating a current recording file;

reading the sound digital signals of each frame in the current sound recording file, and calculating the critical band power and the noise coefficient of the sound digital signals frame by frame;

calculating the spread spectrum power of each frame of sound digital signal according to the critical band power of each frame of sound digital signal;

calculating a masking coefficient of each frame of sound digital signal according to the noise coefficient and the spread spectrum power of each frame of sound digital signal;

calculating a masking threshold according to the masking coefficient and the absolute hearing threshold of the human ear;

denoising the sound digital signals of each frame by using a masking threshold value to obtain enhanced voice signals of each frame;

calculating the short-time energy, the short-time average zero crossing rate and the noise coefficient of each frame of the enhanced voice signal frame by frame;

jointly judging whether the short-time energy, the short-time average zero-crossing rate and the noise coefficient of each frame of the enhanced voice signal meet preset double-threshold conditions or not;

determining the starting point and the end point of the audio frequency in each frame of the enhanced voice signal according to the judgment result to obtain a plurality of independent voice sections;

extracting the average Mel frequency cepstrum coefficient characteristics and LogFBank characteristics of all voice frames in each voice section to form voice section characteristic vectors;

and calculating the cosine distance between the feature vectors of every two adjacent voice sections, and merging the two voice sections when the cosine distance reaches a preset threshold value to finally obtain a plurality of audio summaries.

Technical Field

The invention belongs to the technical field of sound acquisition equipment and sound signal processing, and particularly relates to field sound acquisition and storage equipment and a sound processing method.

Background

Sound is not only the primary communication medium for human communication, but also the primary means of communication for other animals on earth. Since the sound frequency and the auditory frequency of different species are different, the sound field environment composed of the diversity of living things is the subject of the current intense research. By recording the sounds of different frequency bands, the type and quantity of living beings can be deduced, and an important reference is provided for natural environment protection. In the early days, people often need to go deep into the mountainous densely forest for collecting the audio data of field species, and also need to disguise or even live in the field for avoiding interfering the normal life of animals, which brings great inconvenience and danger to relevant scientific researchers. Meanwhile, the indirect human involvement may also cause the living environment of the species to be affected. Therefore, it is needless to say that the importance of developing a miniaturized recording apparatus which can be operated in the field for a long period of time and can record the sound information of animals in real time is high.

At present, the existing sound recorder used in the field environment is generally required to be able to record signals with the bandwidth of 100Hz to 45KHz, because the range covers the pronunciation frequency of most animals. At present, most of domestic and foreign research and development on recording products are focused on consumer electronic recording products for recording human voice, and research and development on sound recording instruments for field environment are not much. The application occasions of such products for recording human voice limit the recordable bandwidth range, and receivers of consumer electronic products are mostly used during communication, so that the requirements on the tone quality are not high, the corresponding sensitivity is low, and the requirements on sound pickup under the field complex environment are difficult to meet.

In addition, the occupation ratio of invalid data in the sound collected in the field in the total recording time length is very high, a large amount of time and energy are needed for positioning effective animal sound fragments from the recording data with mass time lengths in a manual labeling mode, and the labeling result is easily influenced by personal factors of different labels, so that the labeling result has great difference.

Disclosure of Invention

In order to solve the defects in the prior art, the invention provides the field sound acquisition and storage equipment, which solves the problems of short time and poor environmental adaptability of the field sound acquisition equipment of the existing equipment, can conveniently and continuously acquire and store field sound information under the condition of not influencing the living environment of wild animals, and can quickly and accurately identify and locate effective sound fragments of wild animal species from mass long-time recording data and generate corresponding audio summaries.

The invention provides a field sound collecting and storing device, which comprises: the device comprises a square shell, a microphone, a field sound acquisition and storage circuit board and a rechargeable battery pack;

a field sound acquisition and storage circuit board and a rechargeable battery pack are arranged in the square shell, and a microphone is arranged on the outer wall of the square shell;

the two side walls of the square shell are respectively provided with fixing lugs extending outwards along the direction perpendicular to the side walls, the fixing lugs are of a porous structure and are fixed with the fixing lugs through fixing devices, the whole device is independently fixed in a data acquisition point or a field environment, or after a plurality of devices are mutually connected through the fixing lugs, the device penetrates through the fixing lugs through the fixing devices and is fixed in the data acquisition point or the field environment, and field sound is acquired in real time.

As an improvement of the above technical solution, the square housing has a three-layer structure, and includes: an upper cover, a middle layer cover and a bottom shell; the middle layer cover is positioned between the upper cover and the bottom shell, the upper cover is positioned above the middle layer cover, and the bottom shell is positioned below the middle layer cover;

the upper cover, the middle layer cover and the bottom shell are all cuboid waterproof structures with openings, and the interiors of the upper cover, the middle layer cover and the bottom shell are all hollow structures; the middle layer cover is nested in the upper cover, the opening directions of the upper cover and the middle layer cover are the same, the opening direction of the bottom shell is opposite to the opening direction of the upper cover, and the opening direction of the bottom shell is opposite to the opening direction of the middle layer cover;

the same side all around of the three sets up corresponding rotation axis fixing base respectively, passes the rotation axis and fixes on the respective rotation axis fixing base of three, links together the three to make the three rotate around the rotation axis separately and open or the rotation is closed.

As one improvement of the above technical solution, a first locking part is arranged in the middle of the outer wall of the upper cover, a second locking part is arranged in the middle of the outer wall of the middle layer, and a third locking part is arranged in the middle of the outer wall of the bottom shell; the first locking part passes through the second locking part and then is in locking fit connection with the third locking part;

the opening part of upper cover outwards extends first epitaxial rectangle section all around, and the opening part of middle level lid outwards extends second epitaxial rectangle section all around, and the opening part of drain pan outwards extends the third epitaxial rectangle section all around, and first epitaxial rectangle section lid is on the second epitaxial rectangle section, and the second epitaxial rectangle section lid is on the third epitaxial rectangle section, and three epitaxial rectangle section is range upon range of setting.

As an improvement of the above technical solution, two side walls of the bottom case respectively extend outward along a direction perpendicular to the side walls to form fixing lugs;

the other two side walls of the bottom shell are respectively and symmetrically provided with detachable microphone fixing components, and each microphone is arranged in the detachable microphone fixing components;

this can dismantle fixed subassembly of microphone includes: a microphone fixing tube and a microphone cover;

the first end of the microphone fixing pipe extends outwards along the radial direction to form a step-shaped structure, and extends outwards along the axial direction to form a cylindrical section based on the step-shaped structure, and external threads are additionally arranged on the outer circumference of the cylindrical section to form a bolt which is in threaded fit connection with internal threads arranged on the side wall of the bottom shell;

the microphone is arranged in the microphone fixing tube, the second end of the microphone fixing tube extends outwards along the axial direction to form a boss, the microphone cover is covered on the boss and is in press fit connection with the second end of the microphone fixing tube, and the inner wall of the microphone cover is provided with a sound-transmitting waterproof membrane.

As an improvement of the above technical solution, the sound collection and storage circuit board includes: the device comprises a human-computer interaction module, an adjustable gain module, a voice coding and decoding module, an MCU (microprogrammed control unit) central control module, a mobile memory and a power management module;

the human-computer interaction module is used for displaying the equipment state in real time and changing the setting according to the displayed equipment state;

the adjustable gain module is used for selecting a corresponding mode according to the on-off states of the two groups of dial control switches, amplifying the voice analog signal output by the microphone and then transmitting the amplified voice analog signal to the voice coding and decoding module, or directly transmitting the voice analog signal output by the microphone to the voice coding and decoding module;

the voice coding and decoding module is used for coding and decoding the received amplified voice analog signal, converting the received amplified voice analog signal into a voice digital signal and sending the voice digital signal to the MCU central control module, or converting the received voice analog signal into a sound digital signal and sending the sound digital signal to the MCU central control module;

the MCU central control module is used for processing the voice digital signal or the sound digital signal to obtain a corresponding processed signal;

the mobile memory is used for storing the processed signals, packaging the processed signals into WAV lossless format data and writing the WAV lossless format data into the mobile memory;

and the power supply management module is used for supplying power to each module in the sound acquisition and storage circuit board.

As an improvement of the above technical solution, the adjustable gain module includes: the high gain circuit comprises a first group of dial control switches, a high gain circuit and a second group of dial control switches; the first group of dial control switches and the second group of dial control switches are correspondingly arranged at the input end and the output end of the high-gain circuit;

the first group of dial control switches and the second group of dial control switches are both in a closed state, namely in a first mode, the high-gain circuit is switched on, the microphone converts the collected sound signal to obtain a voice analog signal, the signal is an electric signal, the converted voice analog signal is input to the high-gain circuit to be amplified to obtain an amplified signal, and the amplified signal is sent to the voice coding and decoding module;

the first group of dial control switches and the second group of dial control switches are in an off state, namely in a mode two, the high-gain circuit is not conducted, and the microphone directly inputs the converted voice analog signals to the voice editing and decoding module.

As an improvement of the above technical solution, the sound collection and storage circuit board further includes: and the sound processing module is used for denoising the signal processed by the MCU central control module stored in the mobile memory based on the auditory masking effect to obtain a denoised signal, then carrying out active frame detection on the denoised signal based on a three-parameter endpoint detection algorithm, extracting the MFCC and LogFBank characteristics of the voice segment, and combining adjacent segments to generate an audio summary based on cosine similarity.

As one of the improvements of the above technical solution, the specific process of the sound processing module is as follows:

collecting sound signals based on field sound collection and storage equipment by using a timestamp as a file name, performing analog-to-digital conversion to obtain voice digital signals, storing the voice digital signals, and creating a current recording file;

reading the sound digital signals of each frame in the current sound recording file, and calculating the critical band power and the noise coefficient of the sound digital signals frame by frame;

calculating the spread spectrum power of each frame of sound digital signal according to the critical band power of each frame of sound digital signal;

calculating a masking coefficient of each frame of sound digital signal according to the noise coefficient and the spread spectrum power of each frame of sound digital signal;

calculating a masking threshold according to the masking coefficient and the absolute hearing threshold of the human ear;

denoising the sound digital signals of each frame by using a masking threshold value to obtain enhanced voice signals of each frame;

calculating the short-time energy, the short-time average zero crossing rate and the noise coefficient of each frame of the enhanced voice signal frame by frame;

jointly judging whether the short-time energy, the short-time average zero-crossing rate and the noise coefficient of each frame of the enhanced voice signal meet preset double-threshold conditions or not;

determining the starting point and the end point of the audio frequency in each frame of the enhanced voice signal according to the judgment result to obtain a plurality of independent voice sections;

extracting the average Mel frequency cepstrum coefficient characteristics and LogFBank characteristics of all voice frames in each voice section to form voice section characteristic vectors;

and calculating the cosine distance between the feature vectors of every two adjacent voice sections, and merging the two voice sections when the cosine distance reaches a preset threshold value to finally obtain a plurality of audio summaries.

The invention also provides a sound processing method, which comprises the following steps:

step 1) detecting that a recording key arranged in the field sound acquisition and storage equipment is pressed down, initializing, and then detecting whether a mobile memory arranged on a sound acquisition and storage circuit board exists or not;

if the mobile memory does not exist, prompting to insert the mobile memory;

if the mobile memory exists, opening up a cache space and entering a recording mode;

step 2) further judging whether the size of the residual storage space of the mobile memory is enough to store an audio file;

if the size of the residual storage space of the mobile memory is not enough to store an audio file, closing the recording key and prompting to replace the mobile memory;

if the size of the residual storage space of the mobile memory is enough to store an audio file, the timestamp is used as a file name, a cyclic recording state is entered, after a recording file of 30 minutes is generated, the current file writing operation is finished, a recording file is generated, the recording file is processed and audio extraction is carried out, and a plurality of audio abstracts are obtained;

then creating a new recording file, and continuing to perform the recording and audio extraction operation to obtain a plurality of audio abstracts; wherein, a sound recording file is created every 30 minutes;

and 3) if the stop key is detected to be pressed, returning the file pointer to the file header, updating header file information, releasing the memory space, and ending sound recording.

As one improvement of the above technical solution, the recording file is processed and audio is extracted to obtain a plurality of audio summaries; the specific process comprises the following steps:

collecting sound signals based on field sound collection and storage equipment by using a timestamp as a file name, performing analog-to-digital conversion to obtain voice digital signals, storing the voice digital signals, and creating a current recording file;

reading the sound digital signals of each frame in the current sound recording file, and calculating the critical band power and the noise coefficient of the sound digital signals frame by frame;

calculating the spread spectrum power of each frame of sound digital signal according to the critical band power of each frame of sound digital signal;

calculating a masking coefficient of each frame of sound digital signal according to the noise coefficient and the spread spectrum power of each frame of sound digital signal;

calculating a masking threshold according to the masking coefficient and the absolute hearing threshold of the human ear;

denoising the sound digital signals of each frame by using a masking threshold value to obtain enhanced voice signals of each frame;

calculating the short-time energy, the short-time average zero crossing rate and the noise coefficient of each frame of the enhanced voice signal frame by frame;

jointly judging whether the short-time energy, the short-time average zero-crossing rate and the noise coefficient of each frame of the enhanced voice signal meet preset double-threshold conditions or not;

determining the starting point and the end point of the audio frequency in each frame of the enhanced voice signal according to the judgment result to obtain a plurality of independent voice sections;

extracting the average Mel frequency cepstrum coefficient characteristics and LogFBank characteristics of all voice frames in each voice section to form voice section characteristic vectors;

and calculating the cosine distance between the feature vectors of every two adjacent voice sections, and merging the two voice sections when the cosine distance reaches a preset threshold value to finally obtain a plurality of audio summaries.

Compared with the prior art, the invention has the beneficial effects that:

1. structurally, the sound collection that provides is convenient with storage device installation and dismantlement, has carried out water repellent to whole square shell, has carried out dismantling and waterproof setting to the microphone, has strengthened the waterproof performance of whole equipment, more is applicable to in the adverse circumstances such as forest high humidity and perennial rainwater weather.

2. In hardware, an adjustable gain module in the sound acquisition and storage circuit board can adjust the gain sensitivity through an external selector switch, the adjustable range of the gain is enlarged, the environmental adaptability of the system is improved, and the system can be used for collecting sound in a longer distance; the low-power-consumption design is adopted, the existing clock running frequency is changed to 50MHz at 168MHz, the power consumption of the whole circuit board is greatly reduced, the endurance time of the equipment is prolonged, and the equipment can be always in a working state by combining solar charging.

3. By modifying the type of the address variable, the storage space of the mobile memory is increased to at least 2T, and the large-capacity mobile storage device is supported to store the sound data.

4. Based on the auditory masking effect, the noise reduction is carried out on the sound digital signal without completely eliminating the noise, thereby reducing the distortion of the voice and improving the auditory comfort level

5. The accuracy of end point detection can be improved by a three-parameter double-threshold end point detection algorithm, the cosine distance between two adjacent speech section feature vectors is calculated by utilizing the speech section feature vectors obtained by the average MFCC feature and the LogFBank feature, the adjacent speech sections are combined based on the correlation, the cosine distance of the remaining two adjacent speech section feature vectors is traversed, the correlation judgment is carried out, an audio summary is generated, redundant speech sections are further eliminated, the audio summary with purer speech is obtained, and the follow-up research and use are greatly facilitated.

Drawings

FIG. 1 is a schematic diagram of a field sound collection and storage device according to the present invention;

FIG. 2 is a schematic view of the open top cover and middle cover of the field sound collection and storage device of FIG. 1 in accordance with the present invention;

FIG. 3 is a bottom view of a field sound collection and storage device of the present invention of FIG. 1;

FIG. 4 is a schematic structural view of a microphone mounting base on a bottom case of the field sound collection and storage device of FIG. 1 in accordance with the present invention;

FIG. 5 is a schematic view of a microphone cover on a bottom case of the outdoor sound collection and storage device of FIG. 1 according to the present invention;

FIG. 6 is a schematic diagram of the internal structure of the sound collection and storage circuit board of the field sound collection and storage device of FIG. 1 according to the present invention;

FIG. 7 is a schematic diagram of the gain adjustment circuit of the adjustable gain module of the sound collection and storage circuit board of the field sound collection and storage device of FIG. 1 in accordance with the present invention;

FIG. 8 is a flow chart of a sound processing method of the present invention;

FIG. 9 is a flow chart of the sound processing method of FIG. 8 for processing the sound recording file and extracting audio to obtain a plurality of audio digests.

Reference numerals:

10. upper cover 20, middle layer cover

30. Bottom case 40, rechargeable battery pack

50. Rotating shaft

11. First locking part

21. Circuit board card slot 22 and LED screen

23. Function key 24 and waterproof silica gel ring

25. Second locking member 26, storage medium through hole

31. Detachable microphone fixing component 32 and third locking part

33. Battery jar 34, fixed ear

311. Microphone fixing base 312 and microphone cover

3111. First end 3112 and second end

Detailed Description

The invention will now be further described with reference to the accompanying drawings.

As shown in fig. 1, the present invention provides a field sound collecting and storing device, which comprises: a square shell, a microphone, a field sound collecting and storing circuit board and a rechargeable battery pack 40;

a field sound collecting and storing circuit board and a rechargeable battery pack 40 are arranged in the square shell, and a microphone is arranged on the outer wall of the square shell;

as shown in fig. 2, two side walls of the square housing respectively extend outward to form fixing lugs 34 along a direction perpendicular to the side walls, the fixing lugs 34 are of a porous structure and are fixed with the fixing lugs 34 through fixing devices, so that the whole equipment is independently fixed in a data acquisition point or a field environment, or a plurality of pieces of equipment are connected with each other through the fixing lugs 34 and then fixed in the data acquisition point or the field environment through the fixing lugs 34 through the fixing devices, and field sound is acquired in real time;

the microphone is used for acquiring field sound signals in real time, converting the field sound signals into corresponding electric signals and sending the electric signals to the sound acquisition and storage circuit board;

the sound acquisition and storage circuit board is used for receiving the electric signal sent by the microphone and processing and storing the signal;

the rechargeable battery pack is used for providing a power supply for normal operation for the sound acquisition and storage circuit board.

As shown in fig. 1, the housing is a three-layer structure, which includes: an upper cover 10, a middle cover 20 and a bottom case 30; the middle layer cover 20 is located between the upper cover 10 and the bottom case 30, the upper cover 10 is located above the middle layer cover 20, and the bottom case 30 is located below the middle layer cover 20;

the upper cover 10, the middle cover 20 and the bottom shell 30 are all cuboid waterproof structures with openings, and the interiors of the upper cover, the middle cover and the bottom shell are all hollow structures; the middle layer cover 20 is nested in the upper cover 10, the opening directions of the upper cover 10 and the middle layer cover 20 are the same, the opening direction of the bottom shell 30 is opposite to the opening direction of the upper cover 10, and the opening direction of the bottom shell 30 is opposite to the opening direction of the middle layer cover 20;

the same sides of the three are respectively provided with corresponding rotating shaft fixing seats, the rotating shaft 50 penetrates through and is fixed on the respective rotating shaft fixing seats of the three, the three are connected together, and the three can be respectively rotated around the rotating shaft 50 to be opened or closed.

As shown in fig. 1, a first locking member 11 is disposed in the middle of the outer wall of the upper cover 10, a second locking member 25 is disposed in the middle of the outer wall of the middle layer 20, and a third locking member 32 is disposed in the middle of the outer wall of the bottom case 30; the first locking part 11 passes through the second locking part 25 and then is in locking fit connection with the third locking part 32, so as to enclose the field sound collecting and storing circuit board and the rechargeable battery pack 40 in a square casing and prevent the field sound collecting and storing circuit board and the rechargeable battery pack from being damaged.

The opening part of upper cover 10 extends first epitaxial rectangle section outward all around, and the opening part of middle level lid 20 extends the second epitaxial rectangle section outward all around, and the opening part of drain pan 30 extends the third epitaxial rectangle section outward all around, and first epitaxial rectangle section lid is on the second epitaxial rectangle section, and the second epitaxial rectangle section lid is on the third epitaxial rectangle section, and three epitaxial rectangle sections are range upon range of setting.

The front surface of the second external extension rectangular section is provided with a groove, and a waterproof silica gel ring 24 is stuck in the groove; the back of the water-proof solar cell is provided with a groove, a waterproof silica gel ring 24 is adhered in the groove, the waterproof silica gel ring 24 is additionally arranged between every two adjacent three extending rectangular sections through the stacking arrangement of the three extending rectangular sections, and the effects of seepage prevention and water prevention are achieved.

As shown in fig. 1, the hollow structure inside the upper cover 10 is a placing groove for placing the middle layer cover 20, and the hollow structure inside the middle layer cover 20 is a circuit board slot 21 for placing a sound collecting and storing circuit board; the inner hollow structure of the bottom case 30 is a battery well 33 for accommodating a rechargeable battery pack 40.

As shown in fig. 1, a man-machine interface is additionally arranged at the top of the middle layer cover 20, and the man-machine interface is electrically connected with the sound collection and storage circuit board and is used for setting and displaying the state of the sound collection and storage circuit board;

the human-computer interaction interface comprises: an LED screen 22 and a plurality of function keys 23;

the LED screen 22 is used for displaying the working state of the sound acquisition and storage circuit board;

the function key 23 is used for setting the state of the sound acquisition and storage circuit board according to actual needs; each function key corresponds to one working state of the circuit board and can be set as required;

as shown in fig. 2, a sidewall of the middle layer cover 20 is provided with a storage medium through hole 26 for manually inserting and pulling out a storage medium, and the storage medium can store field sound data collected by the circuit board.

Fixing lugs 34 extend outwards from two side walls of the bottom shell 30 in a direction perpendicular to the side walls, the fixing lugs 34 are fixed through fixing devices, the whole equipment is independently fixed in a data acquisition point or a field environment, or after a plurality of pieces of equipment are connected through the fixing lugs 34, the equipment penetrates through the fixing lugs 34 through the fixing devices and is fixed in the data acquisition point or the field environment, and field sound is acquired in real time;

as shown in fig. 3, the other two sidewalls of the bottom case 30 are respectively and symmetrically provided with detachable microphone fixing components 31, and each microphone is arranged in the detachable microphone fixing component 31;

as shown in fig. 3, 4 and 5, the detachable microphone fixing assembly 31 includes: a microphone fixing tube 311 and a microphone cover 312;

a first end 3111 of the microphone fixing tube 311 extends outward in a radial direction to form a stepped structure, and extends outward in an axial direction to form a cylindrical section based on the stepped structure, and an external thread is additionally arranged on an outer circumference of the cylindrical section to form a bolt, and the bolt is in threaded fit connection with an internal thread formed on a side wall of the bottom case 30;

the microphone is disposed in the microphone fixing tube 311, a boss is extended outward from the second end 3112 of the microphone fixing tube 311 in the axial direction, the microphone cover 312 is covered on the boss and is in press fit connection with the second end 3112 of the microphone fixing tube 311, and a sound-transmitting waterproof film is disposed on the inner wall of the microphone cover 312 to seal the microphone in the microphone fixing tube 311, so as to ensure the environment for collecting sound.

The top of the microphone cover 312 is provided with a plurality of sound-transmitting holes to improve the sound collection effect.

Charging holes are additionally formed in the side wall of the periphery of the bottom shell 30 and used for connecting the rechargeable battery pack with external power supply equipment and charging the rechargeable battery pack at any time; wherein, should be equipped with waterproof silica gel lid on the hole of charging, when not charging, will charge the hole and cover, avoid follow-up unable charging.

The sound-transmitting waterproof membrane is arranged between the microphone fixing tube 311 and the microphone cover 312, so that the waterproof protection of the microphone can be enhanced under the condition of ensuring the sound collection effect; the front and the back through the epitaxial rectangle section at middle level lid all set up the recess to paste waterproof rubber ring in the recess that corresponds, can make waterproof driving fit between upper cover, middle level lid and the lower floor's lid, strengthened the holistic waterproof performance of equipment shell, make equipment can more adapt to adverse circumstances such as high humidity and the annual rainwater weather in the open-air forest.

In this embodiment, the microphones shown are MIC analog microphones for picking up stereo signals.

The sound collection and storage circuit card provided by the invention is convenient to mount and dismount, and can be used for facilitating the later maintenance of each component in the square shell through the rotatable connection among the three-layer shells; through the detachable microphone fixing component, the microphone is conveniently detached, replaced and other maintenance operations are carried out, the microphone is prevented from being directly exposed to the field environment, and the microphone is protected; the upper cover and the middle layer cover are adopted to protect the sound collection and storage circuit board and the rechargeable battery pack, and the waterproof performance of the whole equipment is improved.

As shown in fig. 6, the sound collection and storage circuit board includes: the device comprises a human-computer interaction module, an adjustable gain module, a voice coding and decoding module, an MCU (microprogrammed control unit) central control module, a mobile memory and a power management module;

the human-computer interaction module is used for displaying the equipment state in real time and changing the setting according to the displayed equipment state;

specifically, the human-computer interaction module includes: comprises an LCD display screen and a plurality of function keys;

the LCD display screen is used for displaying the state of the equipment in real time;

the function key is used for setting state parameters in the sound acquisition and storage circuit board;

for example, the LCD display screen main page display may be divided into four lines, the first line displaying usage units, the second line displaying sampling rate, gain and status, the third line displaying temperature and file count, and the fourth line displaying real time.

The function button is including the start sound recording and the stop sound recording of the whole circuit board of control to and include: and setting function keys of corresponding parameters in the four rows.

The man-machine interaction module can facilitate the operation of a user and check the working state of the whole circuit board, and the user experience is improved.

The LCD display screen backlight is driven by an NPN triode, a diode is connected between the MCU central control module and the collector electrode of the NPN triode in series, the negative electrode of the diode is connected with the collector electrode of the triode, and the MCU central control module drives the LCD display screen backlight power supply to be powered on and powered off, so that the system power consumption is further reduced. Specifically, the LCD display screen used by the human-computer interaction module can drive the backlight to be turned on and off through the S8550 triode, after the fact that the key is pressed is detected, the MCU central control module can drive the triode to light the LCD display screen backlight, and if no operation is performed within 10 seconds, the backlight is controlled to be turned off automatically.

The adjustable gain module is used for selecting a corresponding mode according to the on-off states of the two groups of dial control switches, amplifying the voice analog signal output by the microphone and then transmitting the amplified voice analog signal to the voice coding and decoding module, or directly transmitting the voice analog signal output by the microphone to the voice coding and decoding module;

specifically, as shown in fig. 7, the adjustable gain module includes: the high gain circuit comprises a first group of dial control switches, a high gain circuit and a second group of dial control switches; the first group of dial control switches and the second group of dial control switches are correspondingly arranged at the input end and the output end of the high-gain circuit;

the first group of dial control switches and the second group of dial control switches are both in a closed state, namely in a first mode, the high-gain circuit is switched on, the microphone converts the collected sound signal to obtain a voice analog signal, the signal is an electric signal, the converted voice analog signal is input to the high-gain circuit to be amplified to obtain an amplified signal, and the amplified signal is sent to the voice coding and decoding module;

the first group of dial control switches and the second group of dial control switches are in an off state, namely in a mode two, the high-gain circuit is not conducted, and the microphone directly inputs the converted voice analog signals to the voice editing and decoding module. The voice coding and decoding module codes the voice analog signals to obtain voice digital signals, and sends the voice digital signals to the MCU central control module, the MCU central control module performs noise reduction processing on the received voice digital signals, and then the voice digital signals are transmitted to the mobile memory through the DMA, so that data collection and storage are realized.

The adjustable gain module can realize the adjustment of signal gain sensitivity by respectively arranging a group of dial control switches at the input end and the output end of the adjustable gain module, increases the adjustable range of gain, improves the environmental adaptability and can be used for long-distance sound collection.

As shown in fig. 7, the on state of the switch 1 means that the two analog microphones MIC1 and MIC2 are connected to the gain circuit, and the path directly connected to the encoder is disconnected;

the closed state of the change-over switch 1 means that the two paths of analog microphones MIC1 and MIC2 are disconnected with the gain circuit and are communicated with a passage directly connected with the encoder;

the on state of the switch 2 means that the gain circuit is communicated with the codec;

the off state of the switch 2 means that the gain circuit is disconnected from the codec.

If the change-over switch 2 is not set, when the adjustable gain module is switched to the mode two, because the high-gain circuit is still in the power-on state, an interference electric signal is generated and superposed in the voice analog signal, noise in the voice analog signal is increased, noise of the output voice analog signal is large, and effective sound segments of the extracted signal are influenced.

The gain is adjustable by setting a mode I and a mode II for the adjustable gain module, and the high-gain circuit can respectively carry out low-noise amplification and automatic gain processing on voice analog signals sent by the microphone heads of the two microphones MIC1 and MIC2 by adopting a double-chip MAX9814 chip. The high-gain circuit can amplify the voice analog signal by 40dB with low noise and is matched with coding and decoding, namely a coding and decoding chip (the gain adjustable range is-12 dB to 37.5dB), so that the gain sensitivity can reach-13.9 dBu to the maximum. If the voice analog signal is not amplified by the high-gain circuit and the internal gain of the coding and decoding chip is reduced to-12 dB, the lowest gain sensitivity is calculated to be-53 dBu. The gain sensitivity is calculated as follows:

20lg[(0.2V/Pa)÷(1V/Pa)]=-13.9dBu

wherein 20 is a constant, 0.2V/Pa is a parameter after microphone sensitivity amplification, and 1V/Pa is a microphone parameter;

in the aspect of low power consumption design, clocks of the existing MCU central control module all work under 168MHz clock frequency, and the running frequency of the clocks is changed to 50MHz clock frequency, so that the running power consumption of the MCU central control module is reduced by about one third, and the recording effect of sound is not influenced.

The voice coding and decoding module is used for coding and decoding the received amplified voice analog signal, converting the received amplified voice analog signal into a voice digital signal and sending the voice digital signal to the MCU central control module, or converting the received voice analog signal into a sound digital signal and sending the sound digital signal to the MCU central control module;

the MCU central control module is used for processing the voice digital signal or the sound digital signal to obtain a corresponding processed signal;

the mobile memory is used for storing the processed signals, packaging the processed signals into WAV lossless format data and writing the WAV lossless format data into the mobile memory;

the removable memory may be an SD eXtended Capacity (secure memory card) with a storage Capacity of up to 1 TB.

And the power supply management module is used for supplying power to each module in the sound acquisition and storage circuit board.

The power management module is responsible for supplying power to each module, and can generate corresponding voltage by using a low-power-consumption low drop regulator (low dropout linear regulator) voltage reduction chip according to the power consumption condition, so that the power management module is more power-saving compared with a power chip using a high-current wide voltage stabilization domain. The main IC use voltages of the sound collection and storage circuit board provided by the invention comprise 3.3V and 5V, the 3.3V voltage is generated by using an ultra-low power consumption LDO chip xc6206p332mr, the 5V voltage is generated by using an L4995J chip, the corresponding static drive current is 270mA, and the working requirement can be completely met.

Through the low-power-consumption design, the overall power consumption of the sound collection and storage circuit board can be reduced to be within 0.5W. The circuit board can be externally connected with a 15W solar panel to charge a rechargeable battery pack, if the power generation efficiency is 40% after the circuit board is irradiated for 4 hours a day, 24W electric energy can be generated one day, and the energy consumption of the sound collection and storage circuit board is 12W one day, so that the sound collection and storage circuit board is charged by utilizing solar energy, the endurance time is prolonged, and the circuit board can be ensured to be in a working state all the time.

The technical indicators that the circuit board can reach are as follows:

TABLE 1

Therefore, the sound collecting and storing circuit board provided by the invention has the advantages of long endurance, high sensitivity, adaptability to field environment, large capacity and the like, solves the problems of short field recording time, poor environmental adaptability, large outdoor recorded sound noise and the like of small equipment, and can conveniently and continuously collect sound information under the condition of not influencing the life of wild animals.

The sound collection and storage circuit board further comprises: and the sound processing module is used for denoising the signals processed by the MCU central control module stored in the mobile memory based on the auditory masking effect to obtain denoised signals, then carrying out active frame detection on the denoised signals based on a three-parameter endpoint detection algorithm, extracting the MFCC and LogFBank characteristics of the voice segments, and combining adjacent segments based on cosine similarity to generate a plurality of audio summaries.

Specifically, the specific process of the sound processing module is as follows:

collecting sound signals based on field sound collection and storage equipment by using a timestamp as a file name, performing analog-to-digital conversion to obtain voice digital signals, storing the voice digital signals, and creating a current recording file;

reading the sound digital signals of each frame in the current sound recording file, and calculating the critical band power and the noise coefficient of the sound digital signals frame by frame;

calculating the spread spectrum power of each frame of sound digital signal according to the critical band power of each frame of sound digital signal;

calculating a masking coefficient of each frame of sound digital signal according to the noise coefficient and the spread spectrum power of each frame of sound digital signal;

calculating a masking threshold according to the masking coefficient and the absolute hearing threshold of the human ear;

denoising the sound digital signals of each frame by using a masking threshold value to obtain enhanced voice signals of each frame;

calculating the short-time energy, the short-time average zero crossing rate and the noise coefficient of each frame of the enhanced voice signal frame by frame;

jointly judging whether the short-time energy, the short-time average zero-crossing rate and the noise coefficient of each frame of the enhanced voice signal meet preset double-threshold conditions or not;

determining the starting point and the end point of the audio frequency in each frame of the enhanced voice signal according to the judgment result to obtain a plurality of independent voice sections;

extracting the average Mel-Frequency Cepstral Coefficient (MFCC) characteristics and LogFBank characteristics of all voice frames in each voice section to form a voice section characteristic vector;

and calculating the cosine distance between the feature vectors of every two adjacent voice sections, and merging the two voice sections when the cosine distance reaches a preset threshold value to finally obtain a plurality of audio summaries.

When the voice file is stored, after the voice file is interrupted by Direct Memory Access (Direct Memory Access), the voice file is subjected to noise reduction processing, and data is written into the recording file. The noise reduction algorithm is optimized, so that the noise reduction algorithm can well run in a small 32-bit central control chip STM32F4, and the noise reduction effect is obvious.

The invention also optimizes the code parameters of the sound recording file, so that the control chip in STM32F4 series supports the SDXC memory card with extrapolation of 1TB at the highest. Specifically, the file system of the 0.13a version is written into an STM32F4 code, and the address variable for addressing and reading in the code is set to an 'signaled long' format, so that the device can support an SDXC memory card with the capacity of 1T.

The SDXC card stores in a sector storage mode, encapsulates data into a whole block after a series of commands need to be configured, and reads and writes the data in a data bit width of 4 bits per period. The existing SDXC card theoretically supports 2TB memory space to the maximum, but is limited by technology, and the maximum capacity of the memory card sold on the market at present is only 1 TB. The invention enables the designed sound acquisition and storage equipment to support the SDXC card with the storage capacity of at least 2T by modifying the type of the address variable.

As shown in fig. 8, the present invention further provides a sound processing method based on a field sound collection and storage device, the method includes:

step 1) detecting that a recording key (function key) arranged in the field sound acquisition and storage equipment is pressed down, initializing, and then detecting whether a mobile memory arranged on a sound acquisition and storage circuit board exists or not;

if the mobile memory does not exist, prompting to insert the mobile memory;

if the mobile memory exists, opening up a cache space and entering a recording mode;

step 2) further judging whether the size of the residual storage space of the mobile memory is enough to store an audio file;

if the size of the residual storage space of the mobile memory is not enough to store an audio file, closing the recording key and prompting to replace the mobile memory;

if the size of the residual storage space of the mobile memory is enough to store an audio file, a current recording file is created, a time stamp (namely real-time) is used as a file name, and a cyclic recording state is entered, and during the period, sound data are stored in a sound file through a DMA (direct memory access) transmission mode, wherein the transmission mode does not influence the judgment of the storage condition. After a recording file of 30 minutes is generated, the current file writing operation is finished, a recording file is generated, and the recording file is processed and audio extraction is carried out to obtain a plurality of audio abstracts;

then creating a new recording file, and continuing to perform the recording and audio extraction operation to obtain a plurality of audio abstracts; wherein, a sound recording file is created every 30 minutes;

and 3) if the stop key is detected to be pressed, returning the file pointer to the file header, updating header file information, releasing the memory space, finishing sound recording, and closing the coding and decoding chip to reduce power consumption.

In this embodiment, the removable memory is an SD card.

As shown in fig. 9, the recording file is processed and audio extracted to obtain a plurality of audio digests; the specific process comprises the following steps:

step 100), collecting sound signals based on field sound collection and storage equipment by using a timestamp as a file name, performing analog-to-digital conversion to obtain voice digital signals, storing the voice digital signals, and creating a current recording file;

step 110) reading the sound digital signals of each frame in the current sound recording file, and calculating the power and the noise coefficient of the critical band of the sound digital signals frame by frame;

specifically, each frame of voice digital signal is divided into critical bands by using 25 bands divided in the existing critical band division table, and the power B of the ith critical band of each frame of voice digital signal is calculatedi(m):

Wherein P (m, k) is the power spectrum of the original sound signal; wherein m represents the number of the frame number, and k represents the discrete frequency point; bhAn upper boundary representing a frequency corresponding to the ith critical band; blIndicating the lower boundary of the frequency corresponding to the ith critical band.

Calculating the power B of 25 critical bands with the frame number m from 0 to 24 according to the formula (1)i(0),Bi(1),…,Bi(24)。

The noise coefficient z (m) of the mth frame is calculated according to equation (2):

wherein G ismThe power geometric mean value of the original sound signal is obtained; a. themFor the arithmetic mean of the power of the original sound signal, the calculation formula is as follows:

where K is the total number of bands of the power spectrum.

Step 120) calculating the spread spectrum power of each frame of sound digital signal according to the critical band power of each frame of sound digital signal;

in particular, assume that the spreading function is defined as

Wherein Δ i-j represents a critical band number difference, i, j 1,2, …,24, 25;

using spreading function SFi,jTo Bi(m) spreading to obtain spread power Ci(m):

Step 130) calculating a masking coefficient of each frame of sound digital signal according to the noise coefficient and the spread spectrum power of each frame of sound digital signal;

specifically, the masking coefficient T of the ith critical bandiThe following equation (4) can be obtained:

wherein Q isiCalculated by the following formula:

Qi=14.5+i-z(i+9)

wherein z is a noise coefficient;

step 140) calculating a masking threshold value according to the masking coefficient and the absolute hearing threshold of the human ear;

specifically, the masking threshold T (m, i) can be calculated by the following equation (5):

wherein the content of the first and second substances,the absolute hearing threshold of the human ear is represented as a known value.

Step 150) denoising the sound digital signals of each frame by using a masking threshold value to obtain enhanced voice signals of each frame;

specifically, when noise reduction is performed based on the masking threshold T (m, i) of each frequency band, the gain function G (m, k) is calculated for the sound digital signal of each frame and each discrete frequency point:

wherein D (m, k) represents a noise magnitude spectrum; t (m, k) is a masking threshold corresponding to the mth frame of discrete frequency point k; t (m, k) can be obtained according to the frequency band to which the frequency point belongs.

The enhanced speech signal X (m, k) is calculated according to the following equation (7):

X(m,k)=G(m,k)Y(m,k) (7)

where Y (m, k) is the magnitude spectrum of the noisy speech.

The denoising process is based on the auditory masking effect, noise reduction is carried out on the sound digital signal, the noise is not required to be completely eliminated, the distortion of voice is reduced, and the auditory comfort level is improved.

Step 160) calculating the short-time energy, the short-time average zero crossing rate and the noise coefficient of each frame of the enhanced voice signal frame by frame;

step 170) jointly judging whether the short-time energy, the short-time average zero-crossing rate and the noise coefficient z (m) of each frame of the enhanced voice signal meet the preset double-threshold condition;

specifically, the end point detection is performed based on the short-time energy e (m), the short-time average zero-crossing rate c (m), and the noise coefficient z (m) of the mth frame:

1. assume that a higher threshold value is set for e (m)And a lower threshold valueSetting a higher threshold for c (m)And a lower threshold valueSetting a higher threshold for the noise factor z (m)And a lower threshold value

2. If the m-th frame satisfiesAnd isThen, the m frames are judged as the starting point of the audio segment.

3. Using lower threshold valuesAndsearch backwards from the m frames, as long as the short-term energy of the m + n frame is higher thanShort-term average zero crossing rate higher thanAnd the noise coefficient is lower thanThe search continues until the short-time energy e (m) of a frame is lower than or equal toShort-time average zero crossing rate c (m) lower than or equal toAnd the noise coefficient z (m) is higher than or equal toThe frame is determined to be the end of the audio segment.

Step 180) determining the starting point of the audio segment and the end point of the audio segment in each frame of the enhanced voice signal according to the judgment result to obtain a plurality of independent voice segments;

step 190) extracting average Mel Frequency Cepstrum Coefficient (MFCC) features and LogFBank features of all voice frames in each voice section to form voice section feature vectors;

in particular, based on the end point detection result, a plurality of independent audio segments may be obtained. For each independent audio segment, extracting the average MFCC characteristics V of all the voice frames respectivelyMFCCAnd average LogFBank feature VLogFThe two kinds of features are connected in turn to form a speech segment feature vector V ═ VMFCC VLogF]。

Step 200) calculating the cosine distance between the feature vectors of every two adjacent voice sections, and merging the two voice sections when the cosine distance reaches a preset threshold value to finally obtain a plurality of audio summaries.

The invention adopts cosine distance to judge the relevance, when the cosine distance between the feature vectors of two adjacent voice sections is more than 0.5, the two adjacent voice sections have stronger relevance, and the two adjacent voice sections are merged;

when the cosine distance between the feature vectors of two adjacent voice sections is less than or equal to 0.5, the two adjacent voice sections have no relevance, and continuously traverse and judge whether the cosine distance between the feature vectors of the remaining two adjacent voice sections reaches a preset threshold value or not until the end;

after all the voice frames are processed, the combined voice sections are arranged to obtain a final audio summary, namely effective animal vocals.

The invention provides an endpoint detection scheme for respectively setting double thresholds for three parameters to carry out endpoint detection, which can improve the accuracy of endpoint detection, calculates the cosine distance between the feature vectors of two adjacent voice sections by utilizing the feature vectors of the voice sections obtained by the average MFCC feature and the LogFBank feature, and carries out correlation judgment, thereby further eliminating redundant voice sections, obtaining an audio summary with purer voice and being greatly convenient for subsequent research and use.

In addition, human subjective feeling is the final evaluation criterion for evaluating the noise reduction effect, and for some traditional noise reduction methods, the noise reduction is performed based on a certain criterion (such as the minimum mean square error criterion). In practice, however, a minimum mean square error does not necessarily mean that the noise perceived by the human ear is minimal. The person's subjective perception of sound is the result of a combination of physiological and psychological aspects. The noise reduction method based on the auditory masking effect does not need to completely eliminate the noise, and only the condition that the residual noise is not sensed by people is met, so that the distortion of voice is reduced, and the auditory comfort is improved.

Finally, it should be noted that the above embodiments are only used for illustrating the technical solutions of the present invention and are not limited. Although the present invention has been described in detail with reference to the embodiments, it will be understood by those skilled in the art that various changes may be made and equivalents may be substituted without departing from the spirit and scope of the invention as defined in the appended claims.

24页详细技术资料下载
上一篇:一种医用注射器针头装配设备
下一篇:一种说话人转换点检测方法、装置、设备及存储介质

网友询问留言

已有0条留言

还没有人留言评论。精彩留言会获得点赞!

精彩留言,会给你点赞!