Audio signal processing method and device and storage medium

文档序号：1339713 发布日期：2020-07-17 浏览：4次中文

阅读说明：本技术 音频信号的处理方法及装置、存储介质 (Audio signal processing method and device and storage medium ) 是由侯海宁于 2020-03-06 设计创作，主要内容包括：本公开是关于一种音频信号处理方法及装置、存储介质。该方法包括：由至少两个麦克风获取至少两个声源各自发出的音频信号,以获得至少两个麦克风各自的原始带噪信号；对于时域上的每一帧,根据至少两个麦克风各自的原始带噪信号,获取至少两个声源各自的频域估计信号；将预定的频带范围划分为多个频率子带,其中,每个频率子带包含多个频点,且任意两个相邻的频率子带具有重叠频带；根据在各频率子带内各频点的频域估计信号,确定频率子带包含的各频点的加权系数；根据加权系数,确定各频点的分离矩阵；基于分离矩阵及原始带噪信号,获得至少两个声源各自发出的音频信号。通过本公开实施例的技术方案,能够减少语音的损伤,提升语音信号质量。(The disclosure relates to an audio signal processing method and apparatus, and a storage medium. The method comprises the following steps: acquiring audio signals sent by at least two sound sources respectively by at least two microphones to obtain original noisy signals of the at least two microphones respectively; for each frame in the time domain, acquiring respective frequency domain estimation signals of at least two sound sources according to respective original noisy signals of at least two microphones; dividing a preset frequency band range into a plurality of frequency sub-bands, wherein each frequency sub-band comprises a plurality of frequency points, and any two adjacent frequency sub-bands have overlapping frequency bands; determining the weighting coefficient of each frequency point contained in each frequency sub-band according to the frequency domain estimation signal of each frequency point in each frequency sub-band; determining a separation matrix of each frequency point according to the weighting coefficient; based on the separation matrix and the original noisy signal, audio signals emitted by at least two sound sources are obtained. Through the technical scheme of the embodiment of the disclosure, the damage of voice can be reduced, and the quality of voice signals is improved.)

1. An audio signal processing method, comprising:

acquiring audio signals emitted by at least two sound sources respectively by at least two microphones to obtain original noisy signals of the at least two microphones respectively;

for each frame in the time domain, acquiring respective frequency domain estimation signals of the at least two sound sources according to the respective original noisy signals of the at least two microphones;

dividing a preset frequency band range into a plurality of frequency sub-bands, wherein each frequency sub-band comprises a plurality of frequency points, and any two adjacent frequency sub-bands have overlapping frequency bands;

determining the weighting coefficient of each frequency point contained in each frequency sub-band according to the frequency domain estimation signal of each frequency point in each frequency sub-band;

determining a separation matrix of each frequency point according to the weighting coefficient;

and obtaining audio signals sent by at least two sound sources respectively based on the separation matrix and the original noisy signals.

2. The method according to claim 1, wherein said determining weighting coefficients of frequency points included in each of the frequency subbands according to the frequency domain estimation signals of the frequency points in the frequency subband comprises:

determining a distribution function of the frequency domain estimation signals according to the frequency domain estimation signals of each frequency point in each frequency sub-band;

and determining the weighting coefficient of each frequency point according to the distribution function.

3. The method according to claim 2, wherein said determining a distribution function of the frequency domain estimation signals according to the frequency domain estimation signals of each frequency point in each frequency sub-band comprises:

determining the square of the ratio of the frequency domain estimation signal of each frequency point in each frequency sub-band to the standard deviation;

summing the squares of the ratios of the frequency points in each frequency sub-band to determine a first sum;

obtaining the square sum of the first sum corresponding to each frequency sub-band to obtain a second sum;

and determining the distribution function according to the exponential function with the second sum as a variable.

4. The method according to claim 2, wherein said determining a distribution function of the frequency domain estimation signals according to the frequency domain estimation signals of each frequency point in each frequency sub-band comprises:

determining the square of the ratio between the frequency domain estimation signal and the standard deviation of each frequency point in each frequency sub-band;

summing the squares of the ratios of the frequency points in each frequency sub-band to determine a third sum;

determining a fourth sum according to a predetermined power of the third sum corresponding to each frequency sub-band;

and determining the distribution function according to the exponential function with the fourth sum as a variable.

5. The method according to any one of claims 1 to 4, wherein the dividing the predetermined frequency band range into a plurality of frequency sub-bands comprises:

dividing a predetermined frequency band range into C frequency sub-bands, wherein C is an integer greater than 1;

the arbitrary two adjacent frequency sub-bands have overlapping frequency bands, including:

the first frequency point of the c-th frequency sub-band is smaller than the last frequency point of the c-1 th frequency sub-band; wherein C is greater than or equal to 2 and less than or equal to C.

6. An audio signal processing apparatus, comprising:

the system comprises a first acquisition module, a second acquisition module and a third acquisition module, wherein the first acquisition module is used for acquiring audio signals emitted by at least two sound sources by at least two microphones respectively so as to obtain original noisy signals of the at least two microphones respectively;

a second obtaining module, configured to obtain, for each frame in a time domain, frequency domain estimation signals of the at least two sound sources according to the original noisy signals of the at least two microphones, respectively;

the device comprises a dividing module, a receiving module and a processing module, wherein the dividing module is used for dividing a preset frequency band range into a plurality of frequency sub-bands, each frequency sub-band comprises a plurality of frequency points, and any two adjacent frequency sub-bands have overlapping frequency bands;

a first determining module, configured to determine, according to the frequency domain estimation signal of each frequency point in each frequency sub-band, a weighting coefficient of each frequency point included in the frequency sub-band;

the second determining module is used for determining the separation matrix of each frequency point according to the weighting coefficient;

and the third acquisition module is used for acquiring audio signals sent by at least two sound sources respectively based on the separation matrix and the original noisy signals.

7. The apparatus of claim 6, wherein the first determining module comprises:

a first determining submodule, configured to determine a distribution function of the frequency domain estimation signal according to the frequency domain estimation signal of each frequency point in each frequency subband;

and the second determining submodule is used for determining the weighting coefficient of each frequency point according to the distribution function.

8. The apparatus of claim 7, wherein the first determining submodule is specifically configured to:

determining the square of the ratio of the frequency domain estimation signal of each frequency point in each frequency sub-band to the standard deviation;

summing the squares of the ratios of the frequency points in each frequency sub-band to determine a first sum;

obtaining the square sum of the first sum corresponding to each frequency sub-band to obtain a second sum;

and determining the distribution function according to the exponential function with the second sum as a variable.

9. The apparatus of claim 7, wherein the first determining submodule is specifically configured to:

determining the square of the ratio between the frequency domain estimation signal and the standard deviation of each frequency point in each frequency sub-band;

summing the squares of the ratios of the frequency points in each frequency sub-band to determine a third sum;

determining a fourth sum according to a predetermined power of the third sum corresponding to each frequency sub-band;

and determining the distribution function according to the exponential function with the fourth sum as a variable.

10. The apparatus according to any one of claims 5 to 9, wherein the partitioning module is specifically configured to:

dividing a predetermined frequency band range into C frequency sub-bands, wherein C is an integer greater than 1;

the arbitrary two adjacent frequency sub-bands have overlapping frequency bands, including:

11. An apparatus for processing an audio signal, the apparatus comprising at least: a processor and a memory for storing executable instructions operable on the processor, wherein:

the processor is configured to execute the executable instructions, which when executed perform the steps of the method for processing an audio signal as provided in any of the preceding claims 1 to 5.

12. A non-transitory computer-readable storage medium, wherein computer-executable instructions are stored in the computer-readable storage medium, and when executed by a processor, implement the steps in the method for processing an audio signal as provided in any one of claims 1 to 5.

Technical Field

The present disclosure relates to the field of signal processing, and in particular, to a method and an apparatus for processing an audio signal, and a storage medium.

Background

Disclosure of Invention

The disclosure provides a method and a device for processing an audio signal and a storage medium.

According to a first aspect of embodiments of the present disclosure, there is provided an audio signal processing method, including:

acquiring audio signals emitted by at least two sound sources respectively by at least two microphones to obtain original noisy signals of the at least two microphones respectively;

determining a separation matrix of each frequency point according to the weighting coefficient;

and obtaining audio signals sent by at least two sound sources respectively based on the separation matrix and the original noisy signals.

In some embodiments, the determining, according to the frequency domain estimation signal of each frequency point in each frequency sub-band, a weighting coefficient of each frequency point included in the frequency sub-band includes:

determining a distribution function of the frequency domain estimation signals according to the frequency domain estimation signals of each frequency point in each frequency sub-band;

and determining the weighting coefficient of each frequency point according to the distribution function.

In some embodiments, the determining a distribution function of the frequency domain estimation signals according to the frequency domain estimation signals of each frequency point in each frequency sub-band includes: