Voice processing method and device

文档序号:972947 发布日期:2020-11-03 浏览:18次 中文

阅读说明:本技术 一种语音处理方法及装置 (Voice processing method and device ) 是由 李健 沈忱 王玉好 梁志婷 于 2020-08-04 设计创作,主要内容包括:本发明提供了一种语音处理方法及装置,该方法包括:获取麦克风阵列采集的多路语音数据,其中,所述麦克风阵列包括多个麦克风,每个麦克风采集的语音数据携带有麦克标识;确定所述多路语音数据的声音强度;根据所述多路语音数据的声音强度以及所述多路语音数据携带的麦克标识进行语音分离,可以解决相关技术中在环境音复杂的场景中无法正确分离语音的问题,通过使用多个定向麦克风阵列的方式,在适当嘈杂的环境中,分离说话人的声音。(The invention provides a voice processing method and a device, wherein the method comprises the following steps: acquiring multi-channel voice data acquired by a microphone array, wherein the microphone array comprises a plurality of microphones, and the voice data acquired by each microphone carries a microphone identifier; determining the sound intensity of the multi-channel voice data; according to the sound intensity of the multi-path voice data and the microphone identification carried by the multi-path voice data, voice separation is carried out, the problem that voice cannot be correctly separated in a scene with complex environmental sound in the related technology can be solved, and the sound of a speaker is separated in a proper noisy environment by using a plurality of directional microphone arrays.)

1. A method of speech processing, comprising:

acquiring multi-channel voice data acquired by a microphone array, wherein the microphone array comprises a plurality of microphones, and the voice data acquired by each microphone carries a microphone identifier;

determining the sound intensity of the multi-channel voice data;

and carrying out voice separation according to the sound intensity of the multi-path voice data and the microphone identification carried by the multi-path voice data.

2. The method of claim 1, wherein performing voice separation according to the sound intensity of the multiple voice data and the mike identifier carried by the multiple voice data comprises:

determining the difference value of the sound intensity of each two paths of voice data in the multi-path voice data;

and carrying out audio track combination on the two paths of voice data corresponding to the difference value of the sound intensity smaller than the preset threshold value to obtain a combined target audio track.

3. The method of claim 1, wherein performing voice separation according to the sound intensity of the multiple voice data and the mike identifier carried by the multiple voice data comprises:

performing character conversion on the multi-channel voice data to obtain a plurality of voice texts;

acquiring two voice texts with the largest number of characters in the plurality of voice texts;

merging the two voice texts to obtain a merged voice text;

and converting the combined voice text into a combined target audio track.

4. The method of claim 2 or 3, wherein after performing voice separation according to the sound intensity of the multi-path voice data and the microphone identifier carried by the multi-path voice data, the method further comprises:

and performing character conversion on the target audio track to obtain a target voice text.

5. The method of claim 4, wherein after performing voice separation according to the sound intensity of the multi-channel voice data and the mike identifier carried by the multi-channel voice data, the method further comprises:

performing voiceprint recognition on the target audio track to obtain audio data of a plurality of target objects;

and associating the audio data of the target objects with the target voice texts to obtain the audio data and the voice texts of the target objects.

6. The method of claim 2 or 3, wherein after performing voice separation according to the sound intensity of the multi-path voice data and the microphone identifier carried by the multi-path voice data, the method further comprises:

determining the positions corresponding to the two microphone identifications corresponding to the target audio track according to the corresponding relation between the prestored microphone identifications and the position information;

determining the position of a target object corresponding to the target audio track according to the positions corresponding to the two microphone identifications;

and driving a video acquisition device to focus on the position of the target object.

7. A speech processing apparatus, comprising:

the system comprises an acquisition module, a processing module and a processing module, wherein the acquisition module is used for acquiring multi-path voice data acquired by a microphone array, the microphone array comprises a plurality of microphones, and the voice data acquired by each microphone carries a microphone identifier;

the determining module is used for determining the sound intensity of the multi-channel voice data;

and the separation module is used for carrying out voice separation according to the sound intensity of the multi-path voice data and the microphone identification carried by the multi-path voice data.

8. The apparatus of claim 7, wherein the separation module comprises:

the determining unit is used for determining the difference value of the sound intensity of each two paths of voice data in the multi-path voice data;

and the merging unit is used for merging the two paths of voice data corresponding to the difference value of the sound intensity smaller than the preset threshold value to obtain a merged target audio track.

9. A computer-readable storage medium, in which a computer program is stored, wherein the computer program is configured to carry out the method of any one of claims 1 to 6 when executed.

10. An electronic device comprising a memory and a processor, wherein the memory has stored therein a computer program, and wherein the processor is arranged to execute the computer program to perform the method of any of claims 1 to 6.

Technical Field

The invention relates to the field of data processing, in particular to a voice processing method and device.

Background

Currently, sound recorders requiring voice separation in the market are mostly used in quiet environments (e.g., in automobiles) or background sound regular environments (e.g., watching television), the separation mode is two-dimensional horizontal placement or one-dimensional horizontal placement, and 2-6 MICs are used to determine the direction and type of sound (voice and noise) through the propagation speed of sound, so as to separate the sound (audio track) of people in different directions. In the above manner, in a complex environment (service site), a scene in which background sound changes may result in that human voice (noise adulterated, environmental sound) cannot be correctly separated.

For the problem that speech cannot be correctly separated in a scene with complex environmental sounds in the related art, no solution is provided yet.

Disclosure of Invention

The embodiment of the invention provides a voice processing method and a voice processing device, which are used for at least solving the problem that voice cannot be correctly separated in a scene with complex environmental sounds in the related technology.

According to an embodiment of the present invention, there is provided a speech processing method including:

acquiring multi-channel voice data acquired by a microphone array, wherein the microphone array comprises a plurality of microphones, and the voice data acquired by each microphone carries a microphone identifier;

determining the sound intensity of the multi-channel voice data;

and carrying out voice separation according to the sound intensity of the multi-path voice data and the microphone identification carried by the multi-path voice data.

Optionally, performing voice separation according to the sound intensity of the multiple paths of voice data and the microphone identifier carried by the multiple paths of voice data includes:

determining the difference value of the sound intensity of each two paths of voice data in the multi-path voice data;

and carrying out audio track combination on the two paths of voice data corresponding to the difference value of the sound intensity smaller than the preset threshold value to obtain a combined target audio track.

Optionally, performing voice separation according to the sound intensity of the multiple paths of voice data and the microphone identifier carried by the multiple paths of voice data includes:

performing character conversion on the multi-channel voice data to obtain a plurality of voice texts;

acquiring two voice texts with the largest number of characters in the plurality of voice texts;

merging the two voice texts to obtain a merged voice text;

and converting the combined voice text into a combined target audio track.

Optionally, after performing voice separation according to the sound intensity of the multiple paths of voice data and the microphone identifier carried by the multiple paths of voice data, the method further includes:

and performing character conversion on the target audio track to obtain a target voice text.

Optionally, after performing voice separation according to the sound intensity of the multiple paths of voice data and the microphone identifier carried by the multiple paths of voice data, the method further includes:

performing voiceprint recognition on the target audio track to obtain audio data of a plurality of target objects;

and associating the audio data of the target objects with the target voice texts to obtain the audio data and the voice texts of the target objects.

Optionally, after performing voice separation according to the sound intensity of the multiple paths of voice data and the microphone identifier carried by the multiple paths of voice data, the method further includes:

determining the positions corresponding to the two microphone identifications corresponding to the target audio track according to the corresponding relation between the prestored microphone identifications and the position information;

determining the position of a target object corresponding to the target audio track according to the positions corresponding to the two microphone identifications;

and driving a video acquisition device to focus on the position of the target object.

According to another embodiment of the present invention, there is also provided a speech processing apparatus including:

the system comprises an acquisition module, a processing module and a processing module, wherein the acquisition module is used for acquiring multi-path voice data acquired by a microphone array, the microphone array comprises a plurality of microphones, and the voice data acquired by each microphone carries a microphone identifier;

the determining module is used for determining the sound intensity of the multi-channel voice data;

and the separation module is used for carrying out voice separation according to the sound intensity of the multi-path voice data and the microphone identification carried by the multi-path voice data.

Optionally, the separation module comprises:

the determining unit is used for determining the difference value of the sound intensity of each two paths of voice data in the multi-path voice data;

and the first merging unit is used for merging the two paths of voice data corresponding to the difference value of the sound intensity smaller than the preset threshold value to obtain a merged target audio track.

Optionally, the separation module comprises:

the character conversion unit is used for carrying out character conversion on the multi-path voice data to obtain a plurality of voice texts;

the acquiring unit is used for acquiring two voice texts with the largest number of characters in the plurality of voice texts;

the second merging unit is used for merging the two voice texts to obtain a merged voice text;

a conversion unit for converting the merged phonetic text into a merged target audio track.

Optionally, the apparatus further comprises:

and the character conversion module is used for performing character conversion on the target audio track to obtain a target voice text.

Optionally, the apparatus further comprises:

the voiceprint recognition module is used for carrying out voiceprint recognition on the target audio track to obtain audio data of a plurality of target objects;

and the association module is used for associating the audio data of the target objects with the target voice texts to obtain the audio data and the voice texts of the target objects.

Optionally, the apparatus further comprises:

the determining module is used for determining the positions corresponding to the two microphone identifications corresponding to the target audio track according to the corresponding relation between the prestored microphone identifications and the position information;

the determining module is used for determining the position of a target object corresponding to the target audio track according to the positions corresponding to the two microphone identifications;

and the focusing module is used for driving the video acquisition device to focus on the position of the target object.

According to a further embodiment of the present invention, a computer-readable storage medium is also provided, in which a computer program is stored, wherein the computer program is configured to perform the steps of any of the above-described method embodiments when executed.

According to yet another embodiment of the present invention, there is also provided an electronic device, including a memory in which a computer program is stored and a processor configured to execute the computer program to perform the steps in any of the above method embodiments.

According to the invention, multi-path voice data acquired by a microphone array is acquired, wherein the microphone array comprises a plurality of microphones, and the voice data acquired by each microphone carries a microphone identifier; determining the sound intensity of the multi-channel voice data; according to the sound intensity of the multi-path voice data and the microphone identification carried by the multi-path voice data, voice separation is carried out, the problem that voice cannot be correctly separated in a scene with complex environmental sound in the related technology can be solved, and the sound of a speaker is separated in a proper noisy environment by using a plurality of directional microphone arrays.

Drawings

The accompanying drawings, which are included to provide a further understanding of the invention and are incorporated in and constitute a part of this application, illustrate embodiment(s) of the invention and together with the description serve to explain the invention without limiting the invention. In the drawings:

fig. 1 is a block diagram of a hardware configuration of a mobile terminal of a voice processing method of an embodiment of the present invention;

FIG. 2 is a flow diagram of a method of speech processing according to an embodiment of the present invention;

FIG. 3 is a block diagram of a speech processing apparatus according to an embodiment of the present invention;

FIG. 4 is a first block diagram of a speech processing apparatus according to a preferred embodiment of the present invention;

fig. 5 is a block diagram two of a speech processing apparatus according to a preferred embodiment of the present invention.

Detailed Description

The invention will be described in detail hereinafter with reference to the accompanying drawings in conjunction with embodiments. It should be noted that the embodiments and features of the embodiments in the present application may be combined with each other without conflict.

It should be noted that the terms "first," "second," and the like in the description and claims of the present invention and in the drawings described above are used for distinguishing between similar elements and not necessarily for describing a particular sequential or chronological order.

14页详细技术资料下载
上一篇:一种医用注射器针头装配设备
下一篇:一种音频文件切割位置处理方法及装置

网友询问留言

已有0条留言

还没有人留言评论。精彩留言会获得点赞!

精彩留言,会给你点赞!