Audio signal processing method and device and storage medium

文档序号:1339713 发布日期:2020-07-17 浏览:4次 中文

阅读说明:本技术 音频信号的处理方法及装置、存储介质 (Audio signal processing method and device and storage medium ) 是由 侯海宁 于 2020-03-06 设计创作,主要内容包括:本公开是关于一种音频信号处理方法及装置、存储介质。该方法包括:由至少两个麦克风获取至少两个声源各自发出的音频信号,以获得至少两个麦克风各自的原始带噪信号;对于时域上的每一帧,根据至少两个麦克风各自的原始带噪信号,获取至少两个声源各自的频域估计信号;将预定的频带范围划分为多个频率子带,其中,每个频率子带包含多个频点,且任意两个相邻的频率子带具有重叠频带;根据在各频率子带内各频点的频域估计信号,确定频率子带包含的各频点的加权系数;根据加权系数,确定各频点的分离矩阵;基于分离矩阵及原始带噪信号,获得至少两个声源各自发出的音频信号。通过本公开实施例的技术方案,能够减少语音的损伤,提升语音信号质量。(The disclosure relates to an audio signal processing method and apparatus, and a storage medium. The method comprises the following steps: acquiring audio signals sent by at least two sound sources respectively by at least two microphones to obtain original noisy signals of the at least two microphones respectively; for each frame in the time domain, acquiring respective frequency domain estimation signals of at least two sound sources according to respective original noisy signals of at least two microphones; dividing a preset frequency band range into a plurality of frequency sub-bands, wherein each frequency sub-band comprises a plurality of frequency points, and any two adjacent frequency sub-bands have overlapping frequency bands; determining the weighting coefficient of each frequency point contained in each frequency sub-band according to the frequency domain estimation signal of each frequency point in each frequency sub-band; determining a separation matrix of each frequency point according to the weighting coefficient; based on the separation matrix and the original noisy signal, audio signals emitted by at least two sound sources are obtained. Through the technical scheme of the embodiment of the disclosure, the damage of voice can be reduced, and the quality of voice signals is improved.)

1. An audio signal processing method, comprising:

acquiring audio signals emitted by at least two sound sources respectively by at least two microphones to obtain original noisy signals of the at least two microphones respectively;

for each frame in the time domain, acquiring respective frequency domain estimation signals of the at least two sound sources according to the respective original noisy signals of the at least two microphones;

dividing a preset frequency band range into a plurality of frequency sub-bands, wherein each frequency sub-band comprises a plurality of frequency points, and any two adjacent frequency sub-bands have overlapping frequency bands;

determining the weighting coefficient of each frequency point contained in each frequency sub-band according to the frequency domain estimation signal of each frequency point in each frequency sub-band;

determining a separation matrix of each frequency point according to the weighting coefficient;

and obtaining audio signals sent by at least two sound sources respectively based on the separation matrix and the original noisy signals.

2. The method according to claim 1, wherein said determining weighting coefficients of frequency points included in each of the frequency subbands according to the frequency domain estimation signals of the frequency points in the frequency subband comprises:

determining a distribution function of the frequency domain estimation signals according to the frequency domain estimation signals of each frequency point in each frequency sub-band;

and determining the weighting coefficient of each frequency point according to the distribution function.

3. The method according to claim 2, wherein said determining a distribution function of the frequency domain estimation signals according to the frequency domain estimation signals of each frequency point in each frequency sub-band comprises:

determining the square of the ratio of the frequency domain estimation signal of each frequency point in each frequency sub-band to the standard deviation;

summing the squares of the ratios of the frequency points in each frequency sub-band to determine a first sum;

obtaining the square sum of the first sum corresponding to each frequency sub-band to obtain a second sum;

and determining the distribution function according to the exponential function with the second sum as a variable.

4. The method according to claim 2, wherein said determining a distribution function of the frequency domain estimation signals according to the frequency domain estimation signals of each frequency point in each frequency sub-band comprises:

determining the square of the ratio between the frequency domain estimation signal and the standard deviation of each frequency point in each frequency sub-band;

summing the squares of the ratios of the frequency points in each frequency sub-band to determine a third sum;

determining a fourth sum according to a predetermined power of the third sum corresponding to each frequency sub-band;

and determining the distribution function according to the exponential function with the fourth sum as a variable.

5. The method according to any one of claims 1 to 4, wherein the dividing the predetermined frequency band range into a plurality of frequency sub-bands comprises:

dividing a predetermined frequency band range into C frequency sub-bands, wherein C is an integer greater than 1;

the arbitrary two adjacent frequency sub-bands have overlapping frequency bands, including:

the first frequency point of the c-th frequency sub-band is smaller than the last frequency point of the c-1 th frequency sub-band; wherein C is greater than or equal to 2 and less than or equal to C.

6. An audio signal processing apparatus, comprising:

the system comprises a first acquisition module, a second acquisition module and a third acquisition module, wherein the first acquisition module is used for acquiring audio signals emitted by at least two sound sources by at least two microphones respectively so as to obtain original noisy signals of the at least two microphones respectively;

a second obtaining module, configured to obtain, for each frame in a time domain, frequency domain estimation signals of the at least two sound sources according to the original noisy signals of the at least two microphones, respectively;

the device comprises a dividing module, a receiving module and a processing module, wherein the dividing module is used for dividing a preset frequency band range into a plurality of frequency sub-bands, each frequency sub-band comprises a plurality of frequency points, and any two adjacent frequency sub-bands have overlapping frequency bands;

a first determining module, configured to determine, according to the frequency domain estimation signal of each frequency point in each frequency sub-band, a weighting coefficient of each frequency point included in the frequency sub-band;

the second determining module is used for determining the separation matrix of each frequency point according to the weighting coefficient;

and the third acquisition module is used for acquiring audio signals sent by at least two sound sources respectively based on the separation matrix and the original noisy signals.

7. The apparatus of claim 6, wherein the first determining module comprises:

a first determining submodule, configured to determine a distribution function of the frequency domain estimation signal according to the frequency domain estimation signal of each frequency point in each frequency subband;

and the second determining submodule is used for determining the weighting coefficient of each frequency point according to the distribution function.

8. The apparatus of claim 7, wherein the first determining submodule is specifically configured to:

determining the square of the ratio of the frequency domain estimation signal of each frequency point in each frequency sub-band to the standard deviation;

summing the squares of the ratios of the frequency points in each frequency sub-band to determine a first sum;

obtaining the square sum of the first sum corresponding to each frequency sub-band to obtain a second sum;

and determining the distribution function according to the exponential function with the second sum as a variable.

9. The apparatus of claim 7, wherein the first determining submodule is specifically configured to:

determining the square of the ratio between the frequency domain estimation signal and the standard deviation of each frequency point in each frequency sub-band;

summing the squares of the ratios of the frequency points in each frequency sub-band to determine a third sum;

determining a fourth sum according to a predetermined power of the third sum corresponding to each frequency sub-band;

and determining the distribution function according to the exponential function with the fourth sum as a variable.

10. The apparatus according to any one of claims 5 to 9, wherein the partitioning module is specifically configured to:

dividing a predetermined frequency band range into C frequency sub-bands, wherein C is an integer greater than 1;

the arbitrary two adjacent frequency sub-bands have overlapping frequency bands, including:

the first frequency point of the c-th frequency sub-band is smaller than the last frequency point of the c-1 th frequency sub-band; wherein C is greater than or equal to 2 and less than or equal to C.

11. An apparatus for processing an audio signal, the apparatus comprising at least: a processor and a memory for storing executable instructions operable on the processor, wherein:

the processor is configured to execute the executable instructions, which when executed perform the steps of the method for processing an audio signal as provided in any of the preceding claims 1 to 5.

12. A non-transitory computer-readable storage medium, wherein computer-executable instructions are stored in the computer-readable storage medium, and when executed by a processor, implement the steps in the method for processing an audio signal as provided in any one of claims 1 to 5.

Technical Field

The present disclosure relates to the field of signal processing, and in particular, to a method and an apparatus for processing an audio signal, and a storage medium.

Background

Disclosure of Invention

The disclosure provides a method and a device for processing an audio signal and a storage medium.

According to a first aspect of embodiments of the present disclosure, there is provided an audio signal processing method, including:

acquiring audio signals emitted by at least two sound sources respectively by at least two microphones to obtain original noisy signals of the at least two microphones respectively;

for each frame in the time domain, acquiring respective frequency domain estimation signals of the at least two sound sources according to the respective original noisy signals of the at least two microphones;

dividing a preset frequency band range into a plurality of frequency sub-bands, wherein each frequency sub-band comprises a plurality of frequency points, and any two adjacent frequency sub-bands have overlapping frequency bands;

determining the weighting coefficient of each frequency point contained in each frequency sub-band according to the frequency domain estimation signal of each frequency point in each frequency sub-band;

determining a separation matrix of each frequency point according to the weighting coefficient;

and obtaining audio signals sent by at least two sound sources respectively based on the separation matrix and the original noisy signals.

In some embodiments, the determining, according to the frequency domain estimation signal of each frequency point in each frequency sub-band, a weighting coefficient of each frequency point included in the frequency sub-band includes:

determining a distribution function of the frequency domain estimation signals according to the frequency domain estimation signals of each frequency point in each frequency sub-band;

and determining the weighting coefficient of each frequency point according to the distribution function.

In some embodiments, the determining a distribution function of the frequency domain estimation signals according to the frequency domain estimation signals of each frequency point in each frequency sub-band includes:

determining the square of the ratio of the frequency domain estimation signal of each frequency point in each frequency sub-band to the standard deviation;

summing the squares of the ratios of the frequency points in each frequency sub-band to determine a first sum;

obtaining the square sum of the first sum corresponding to each frequency sub-band to obtain a second sum;

and determining the distribution function according to the exponential function with the second sum as a variable.

In some embodiments, the determining a distribution function of the frequency domain estimation signals according to the frequency domain estimation signals of each frequency point in each frequency sub-band includes:

determining the square of the ratio between the frequency domain estimation signal and the standard deviation of each frequency point in each frequency sub-band;

summing the squares of the ratios of the frequency points in each frequency sub-band to determine a third sum;

determining a fourth sum according to a predetermined power of the third sum corresponding to each frequency sub-band;

and determining the distribution function according to the exponential function with the fourth sum as a variable.

In some embodiments, the dividing the predetermined frequency band range into a plurality of frequency sub-bands comprises:

dividing a predetermined frequency band range into C frequency sub-bands, wherein C is an integer greater than 1;

the arbitrary two adjacent frequency sub-bands have overlapping frequency bands, including:

the first frequency point of the c-th frequency sub-band is smaller than the last frequency point of the c-1 th frequency sub-band; wherein C is greater than or equal to 2 and less than or equal to C.

According to a second aspect of the embodiments of the present disclosure, there is provided an audio signal processing apparatus including:

the system comprises a first acquisition module, a second acquisition module and a third acquisition module, wherein the first acquisition module is used for acquiring audio signals emitted by at least two sound sources by at least two microphones respectively so as to obtain original noisy signals of the at least two microphones respectively;

a second obtaining module, configured to obtain, for each frame in a time domain, frequency domain estimation signals of the at least two sound sources according to the original noisy signals of the at least two microphones, respectively;

the device comprises a dividing module, a receiving module and a processing module, wherein the dividing module is used for dividing a preset frequency band range into a plurality of frequency sub-bands, each frequency sub-band comprises a plurality of frequency points, and any two adjacent frequency sub-bands have overlapping frequency bands;

a first determining module, configured to determine, according to the frequency domain estimation signal of each frequency point in each frequency sub-band, a weighting coefficient of each frequency point included in the frequency sub-band;

the second determining module is used for determining the separation matrix of each frequency point according to the weighting coefficient;

and the third acquisition module is used for acquiring audio signals sent by at least two sound sources respectively based on the separation matrix and the original noisy signals.

In some embodiments, the first determining module comprises:

a first determining submodule, configured to determine a distribution function of the frequency domain estimation signal according to the frequency domain estimation signal of each frequency point in each frequency subband;

and the second determining submodule is used for determining the weighting coefficient of each frequency point according to the distribution function.

In some embodiments, the first determining submodule is specifically configured to:

determining the square of the ratio of the frequency domain estimation signal of each frequency point in each frequency sub-band to the standard deviation;

summing the squares of the ratios of the frequency points in each frequency sub-band to determine a first sum;

obtaining the square sum of the first sum corresponding to each frequency sub-band to obtain a second sum;

and determining the distribution function according to the exponential function with the second sum as a variable.

In some embodiments, the first determining submodule is specifically configured to:

determining the square of the ratio between the frequency domain estimation signal and the standard deviation of each frequency point in each frequency sub-band;

summing the squares of the ratios of the frequency points in each frequency sub-band to determine a third sum;

determining a fourth sum according to a predetermined power of the third sum corresponding to each frequency sub-band;

and determining the distribution function according to the exponential function with the fourth sum as a variable.

In some embodiments, the dividing module is specifically configured to:

dividing a predetermined frequency band range into C frequency sub-bands, wherein C is an integer greater than 1;

the arbitrary two adjacent frequency sub-bands have overlapping frequency bands, including:

the first frequency point of the c-th frequency sub-band is smaller than the last frequency point of the c-1 th frequency sub-band; wherein C is greater than or equal to 2 and less than or equal to C.

According to a third aspect of the embodiments of the present disclosure, there is provided an apparatus for processing an audio signal, the apparatus at least comprising: a processor and a memory for storing executable instructions operable on the processor, wherein:

the processor is configured to execute the executable instructions, and the executable instructions perform the steps of any one of the audio signal processing methods.

According to a fourth aspect of embodiments of the present disclosure, there is provided a non-transitory computer-readable storage medium having stored therein computer-executable instructions that, when executed by a processor, implement the steps in any of the methods of processing an audio signal described above.

The technical scheme provided by the embodiment of the disclosure can have the following beneficial effects: the embodiment of the disclosure divides a frequency band into a plurality of frequency sub-bands, and enables adjacent frequency sub-bands to have frequency points which are mutually overlapped. The frequency values of the frequency points in the frequency sub-bands have small difference and strong dependency, while the frequency values of the frequency points in different frequency sub-bands have strong dependency. Meanwhile, the adjacent frequency sub-bands have overlapped frequency points, so that the dependency is also realized, and the chain structure of the whole frequency band is realized.

Compared with the prior art, the method has the advantages that the processing mode that the same dependency exists among all frequency points is assumed, the separation matrix determined through the weighting coefficient has stronger separation performance, and in the condition that the separated signals are closer to the actual situation, the farther the frequency point intervals are, the weaker the dependency is, and the closer the distance intervals are, the stronger the dependency is, so that the accuracy of signal separation is improved, and the voice damage after separation is reduced.

It is to be understood that both the foregoing general description and the following detailed description are exemplary and explanatory only and are not restrictive of the disclosure.

Drawings

The accompanying drawings, which are incorporated in and constitute a part of this specification, illustrate embodiments consistent with the invention and together with the description, serve to explain the principles of the invention.

FIG. 1 is a first flowchart illustrating a method of audio signal processing according to an exemplary embodiment;

FIG. 2 is a flowchart II illustrating a method of audio signal processing according to an exemplary embodiment;

fig. 3 is a block diagram illustrating an application scenario of an audio signal processing method according to an exemplary embodiment.

FIG. 4 is a flowchart three illustrating a method of audio signal processing according to an exemplary embodiment;

fig. 5 is a block diagram illustrating a structure of an audio signal processing apparatus according to an exemplary embodiment;

fig. 6 is a block diagram illustrating a physical structure of an audio signal processing apparatus according to an exemplary embodiment.

Detailed Description

Reference will now be made in detail to the exemplary embodiments, examples of which are illustrated in the accompanying drawings. When the following description refers to the accompanying drawings, like numbers in different drawings represent the same or similar elements unless otherwise indicated. The embodiments described in the following exemplary embodiments do not represent all embodiments consistent with the present invention. Rather, they are merely examples of apparatus and methods consistent with certain aspects of the invention, as detailed in the appended claims.

Fig. 1 is a flowchart illustrating a method of processing an audio signal according to an exemplary embodiment, as shown in fig. 1, including the steps of:

step S101, acquiring audio signals sent by at least two sound sources by at least two microphones respectively to obtain original noisy signals of the at least two microphones respectively;

step S102, for each frame in a time domain, acquiring respective frequency domain estimation signals of the at least two sound sources according to the respective original noisy signals of the at least two microphones;

step S103, dividing a preset frequency band range into a plurality of frequency sub-bands, wherein each frequency sub-band comprises a plurality of frequency points, and any two adjacent frequency sub-bands have overlapping frequency bands;

step S104, determining the weighting coefficient of each frequency point contained in each frequency sub-band according to the frequency domain estimation signal of each frequency point in each frequency sub-band;

s105, determining a separation matrix of each frequency point according to the weighting coefficient;

and S106, obtaining audio signals sent by at least two sound sources respectively based on the separation matrix and the original noisy signals.

The method disclosed by the embodiment of the disclosure is applied to the terminal. Here, the terminal is an electronic device into which two or more microphones are integrated. For example, the terminal may be a vehicle-mounted terminal, a computer, a server, or the like. In an embodiment, the terminal may further be: an electronic device connected to a predetermined device in which two or more microphones are integrated; and the electronic equipment receives the audio signal collected by the predetermined equipment based on the connection and sends the processed audio signal to the predetermined equipment based on the connection. For example, the predetermined device is a sound box or the like.

In practical application, the terminal includes at least two microphones, and the at least two microphones simultaneously detect audio signals emitted by at least two sound sources respectively, so as to obtain original noisy signals of the at least two microphones respectively. Here, it is understood that in the present embodiment, the at least two microphones detect the audio signals emitted by the two sound sources synchronously.

In the embodiment of the present disclosure, the number of the microphones is 2 or more, and the number of the sound sources is 2 or more.

In the embodiment of the present disclosure, the original noisy signal is: comprising a mixed signal of the sounds emitted by at least two sound sources. For example, the number of the microphones is 2, namely a microphone 1 and a microphone 2; the number of the sound sources is 2, namely a sound source 1 and a sound source 2; the original noisy signal of said microphone 1 is an audio signal comprising a sound source 1 and a sound source 2; the original noisy signal of the microphone 2 is also an audio signal comprising both the sound source 1 and the sound source 2.

For example, the number of the microphones is 3, namely a microphone 1, a microphone 2 and a microphone 3; the number of the sound sources is 3, namely a sound source 1, a sound source 2 and a sound source 3; the original noisy signal of the microphone 1 is an audio signal comprising a sound source 1, a sound source 2 and a sound source 3; the original noisy signals of said microphone 2 and said microphone 3 are likewise audio signals each comprising a sound source 1, a sound source 2 and a sound source 3.

It will be appreciated that if the sound emitted by one sound source is an audio signal in a corresponding microphone, the signals from the other sound sources in the microphone are noise signals. The disclosed embodiments require recovery of sound sources emanating from at least two sound sources from at least two microphones.

It will be appreciated that the number of sound sources is generally the same as the number of microphones. If the number of microphones is smaller than the number of sound sources in some embodiments, the number of sound sources may be reduced to a dimension equal to the number of microphones.

It will be understood that when the microphones collect audio signals from sound sources, the audio signals of at least one frame of audio frame may be collected, and the collected audio signals are the original noisy signals of each microphone. The original noisy signal may be either a time domain signal or a frequency domain signal. If the original signal with noise is a time domain signal, the time domain signal can be converted into a frequency domain signal according to the operation of time-frequency conversion.

Here, the time domain signal may be frequency domain transformed based on Fast Fourier Transform (FFT). Alternatively, the time-domain signal may be frequency-domain transformed based on short-time fourier transform (STFT). Alternatively, the time domain signal may also be frequency domain transformed based on other fourier transforms.

For example, if the time domain signal of the p-th microphone in the n-th frame is:transforming the time domain signal of the nth frame into a frequency domain signal, and determining the original noisy signal of the nth frame as follows:and m is the discrete time point number of the nth frame of time domain signal, and k is a frequency point. Thus, the present embodiment can obtain the original noisy signal of each frame through the time domain to frequency domain variation. Of course, the original noisy signal for each frame may be obtained based on other fast fourier transform equations, which is not limited herein.

According to the original noisy signal of the frequency domain, an initial frequency domain estimation signal can be obtained in a priori estimation mode.

Illustratively, the method may be based on an initialized separation matrix, such as an identity matrix; or separating the original signal with noise according to the separation matrix obtained from the previous frame to obtain the frequency domain estimation signal of each frame of each sound source. Therefore, the method provides a basis for separating the audio signals of the sound sources based on the frequency domain estimated signals and the separation matrix.

In the embodiment of the present disclosure, in order to ensure the dependency between frequency points close to each other, the whole frequency band may be divided into a plurality of frequency sub-bands. And, adjacent frequency subbands overlap each other, and are linked in order like a chain.

Therefore, the dependency among the frequency points in the frequency sub-bands can be ensured, and the dependency can be transmitted among different sub-bands in a chain manner due to the overlapping of the frequency points among the adjacent frequency sub-bands.

Compared with the prior art, the method has the advantages that the processing mode that the same dependency exists among all frequency points is assumed, the separation matrix determined through the weighting coefficient has stronger separation performance, and in the condition that the separated signals are closer to the actual situation, the farther the frequency point spacing distance is, the weaker the dependency is, and the closer the distance is, the stronger the dependency is, so that the accuracy of signal separation is improved, and the voice damage after separation is reduced.

In some embodiments, as shown in fig. 2, in the step S104, the determining a weighting coefficient of each frequency point included in each frequency sub-band according to the frequency domain estimation signal of each frequency point in each frequency sub-band includes:

step S201, determining a distribution function of the frequency domain estimation signals according to the frequency domain estimation signals of each frequency point in each frequency sub-band;

step S202, determining the weighting coefficient of each frequency point according to the distribution function.

In the embodiment of the present disclosure, the frequency point corresponding to each frequency domain estimation component may be continuously updated based on the weighting coefficient of each frequency point in each frequency subband, the frequency domain estimation signal of each frame, and the like, so that the separation matrix updated by each frequency point in the frequency domain estimation component may have better separation performance, and thus, the accuracy of the separated audio signal may be further improved.

Here, a distribution function of the frequency domain estimation signal can be constructed from the frequency domain estimation signal of each frequency bin in each frequency sub-band. In the process of constructing the distribution function, the distribution function may be constructed based on each frequency sub-band of the audio signal within the entire frequency band.

For example, the separation matrix may be determined based on eigenvalues solved by the covariance matrix. Covariance matrix Vp(k, n) satisfies the following relationshipWherein β is a smoothing coefficient, Vp(k, n-1) is the updated covariance of the previous frame, Xp(k, n) is the original noisy signal of the current frame,the matrix is transposed for the conjugate of the original noisy signal of the current frame.Are weighting coefficients. Wherein the content of the first and second substances,are auxiliary variables.Referred to as a contrast function. Here, the first and second liquid crystal display panels are,a multi-dimensional super-gaussian prior probability density distribution model based on the whole frequency band, i.e. the above distribution function, is represented for the p-th sound source.Matrix vector representing the frequency domain estimated signal of the p sound source in the n frame, Yp(n) estimating the signal in the frequency domain for the nth frame for the p sound source, YpAnd (k, n) represents the frequency domain estimation signal of the p sound source at the k frequency point of the n frame.

In the embodiment of the present disclosure, the signal may be estimated based on the frequency domain on each frequency subband by the distribution function described above. The weighting coefficient determined in this way only needs to consider the prior probability density of the corresponding frequency point in each frequency sub-band, compared with the prior probability density of all frequency points in the whole frequency band in the related technology. Therefore, on one hand, calculation can be simplified, and on the other hand, frequency points far away from each other in the whole frequency band do not need to be considered. That is to say, the processing mode considers that different dependencies exist among frequency points with different distances, and the closer the distance, the stronger the dependency, so that the separation performance of the separation matrix is improved, and the subsequent separation of high-quality audio signals based on the separation matrix is facilitated.

In some embodiments, the determining a distribution function of the frequency domain estimation signals according to the frequency domain estimation signals of each frequency point in each frequency sub-band includes:

determining the square of the ratio of the frequency domain estimation signal of each frequency point in each frequency sub-band to the standard deviation;

summing the squares of the ratios of the frequency points in each frequency sub-band to determine a first sum;

obtaining the square sum of the first sum corresponding to each frequency sub-band to obtain a second sum;

and determining the distribution function according to the exponential function with the second sum as a variable.

The entire frequency band may be divided into L frequency sub-bands, ClA set of frequency points representing all frequency points comprised in the l-th frequency subband may be represented. Wherein, ClAre respectively reacted with Cl+1And Cl-1There is an overlap. In this way, it is ensured that the entire frequency band forms a chain structure.

Based on this, the above distribution function can be defined according to the following formula (1):

in the above formula (1), k is a frequency point, and Y isp(k, n) is the frequency domain estimation signal of the frequency point k of the p sound source in the n frame,is the variance, l is the frequency subband, α is the coefficient, YpBased on the above equation (1), according to the frequency points in each frequency sub-band, the square of the ratio of the frequency domain estimation signal of each frequency point to the standard deviation is respectively calculated, namely, k ∈ C is calculatedlThe first sum corresponding to each frequency point set is squared and summed, i.e. the sum of the squares of the first sums from 1 to L is summed, to obtain the second sum, and then the distribution function is obtained based on the exponential function of the second sum.

In the embodiment of the present disclosure, the above formula is operated based on the frequency points included in the frequency point set of each frequency sub-band, and then operated based on each frequency sub-band, so that, compared to the prior art, all the frequency points in the whole frequency band are directly operated, for example,in the processing mode that all frequency points are assumed to have the same dependency, the dependency among the frequency points in the frequency sub-bands is enhanced, the dependency among the frequency points in different frequency sub-bands is weakened, and meanwhile, due to the fact that the frequency point sets of adjacent frequency sub-bands have overlapped parts, the dependency can be transmitted among the sub-bands in a chained mode. Therefore, the method better accords with the signal characteristics of the actual audio signal, and improves the accuracy of signal separation.

In some embodiments, the determining a distribution function of the frequency domain estimation signals according to the frequency domain estimation signals of each frequency point in each frequency sub-band includes:

determining the square of the ratio between the frequency domain estimation signal and the standard deviation of each frequency point in each frequency sub-band;

summing the squares of the ratios of the frequency points in each frequency sub-band to determine a third sum;

determining a fourth sum according to a predetermined power of the third sum corresponding to each frequency sub-band;

and determining the distribution function according to the exponential function with the fourth sum as a variable.

The entire frequency band may be divided into L frequency sub-bands, ClA set of frequency points representing all frequency points comprised in the l-th frequency subband may be represented. Wherein, ClAre respectively reacted with Cl+1And Cl-1There is an overlap. In this way, it is ensured that the entire frequency band forms a chain structure.

Based on this, the distribution function can also be defined according to the following formula (2):

in the above formula (2), k is a frequency point, and Y isp(k, n) is the frequency domain estimation signal of the frequency point k of the p sound source in the n frame,based on the formula (2), according to the frequency points in each frequency sub-band, the ratio of the frequency domain estimation signal of each frequency point to the standard deviation is squared, and then the squared values corresponding to each frequency point in the frequency sub-band are summed, namely the first sum, to the predetermined power of the first sum corresponding to each frequency point set (the formula (2) takes the power of 2/3 as an example), and the sum is summed to obtain the second sum, and then the distribution function is obtained based on the exponential function of the second sum.

The formula (2) is similar to the formula (1), and the frequency points included in the frequency point set of the frequency sub-band are all operated, and then the operation is performed based on each frequency sub-band, which has the same technical effect as the formula (1) in the above embodiment with respect to the prior art, and is not described here again.

In some embodiments, the dividing the predetermined frequency band range into a plurality of frequency sub-bands comprises:

dividing a predetermined frequency band range into C frequency sub-bands, wherein C is an integer greater than 1;

the arbitrary two adjacent frequency sub-bands have overlapping frequency bands, including:

the first frequency point of the c-th frequency sub-band is smaller than the last frequency point of the c-1 th frequency sub-band; wherein C is greater than or equal to 2 and less than or equal to C.

In the embodiment of the present disclosure, the predetermined frequency band range may be a frequency band range common to audio signals, or may be a frequency band range determined based on a frequency band in which the original noisy signal is located. The predetermined frequency band range is divided into C frequency sub-bands, and overlapping frequency points need to exist between each adjacent frequency sub-band.

Therefore, use is made here ofc,hcThe first frequency point and the last frequency point which represent the c frequency sub-band satisfy the following conditions:

lc<hc-1,c=2,...,C。

therefore, overlapping frequency points between adjacent frequency sub-bands can be guaranteed. Thus, the frequency points in the same frequency sub-band are close in distance and have strong dependence, and the frequency sub-bands spaced far apart have weak dependence. Meanwhile, since each frequency sub-band forms a chain connection, there is also a dependency between adjacent frequency sub-bands.

Embodiments of the present disclosure also provide the following examples:

FIG. 4 is a flow chart illustrating a method of audio signal processing according to an exemplary embodiment; in the audio signal processing method, as shown in fig. 3, the sound source includes a sound source 1 and a sound source 2, and the microphone includes a microphone 1 and a microphone 2. Based on the audio signal processing method, the audio signals of the sound source 1 and the sound source 2 are restored from the original noisy signals of the microphone 1 and the microphone 2. As shown in fig. 4, the method comprises the steps of:

if the frame length of the system is Nfft, the frequency point K is Nfft/2+ 1.

Step S401: initializing W (k) and Vp(k);

Wherein the initialization comprises the following steps:

1) initializing a separation matrix of each frequency point;

wherein, theThe K is a frequency point, and the K is 1, L and K.

2) Initializing weighted covariance matrix V of each sound source at each frequency pointp(k)。

Wherein the content of the first and second substances,is a zero matrix; wherein p is used to represent a microphone; p is 1, 2.

Step S402: obtaining an original noisy signal of a p microphone in an n frame;

to pairWindowing and Nfft point obtaining corresponding frequency domain signals:wherein m is the number of points selected by Fourier transform; wherein the STFT is a short-time Fourier transform; the above-mentionedTime domain signals of the nth frame of the p microphone; here, the time domain signal is an original noisy signal.

Then the X ispThe view of (k, n)The measured signals are: x (k, n) ═ X1(k,n),X2(k,n)]T(ii) a Wherein, [ X ]1(k,n),X2(k,n)]TIs a transposed matrix.

Step S403: obtaining prior frequency domain estimation of two sound source signals by using W (k) of a previous frame;

let the a priori frequency domain estimates of the two source signals Y (k, n) be [ Y [ [ Y ]1(k,n),Y2(k,n)]TWherein Y is1(k,n),Y2(k, n) are estimated values of the sound source 1 and the sound source 2 at the time frequency points (k, n), respectively.

The observation matrix X (k, n) is separated by a separation matrix W (k) to obtain: y (k, n) ═ w (k)' X (k, n); where W' (k) is the separation matrix of the previous frame (i.e., the frame previous to the current frame).

Then the prior frequency domain estimation of the p sound source in the n frame is:

step S404: updating a weighted covariance matrix Vp(k,n);

Calculating an updated weighted covariance matrix:wherein β is a smoothing factor, in one embodiment, β is 0.98, wherein V isp(k, n-1) is the weighted covariance matrix of the previous frame; the above-mentionedIs XpConjugate transpose of (k, n); the above-mentionedIs a weighting coefficient, whereinIs an auxiliary variable; the above-mentionedAs a comparison function.

Wherein, theA multi-dimensional super-gaussian prior probability density function based on the whole frequency band is represented for the p-th sound source. In one embodiment of the present invention, the substrate is,at this time, if saidThen the

But this probability density distribution assumes that the same dependency exists between all bins. Actually, the dependence is weak when the distance between the frequency points is far, and the dependence is strong when the distance between the frequency points is near. Therefore, the embodiment of the present disclosure provides that the whole frequency band is divided into small frequency sub-bands, so that the frequency points in the frequency sub-bands are close to each other, thereby ensuring strong dependency. Meanwhile, the adjacent frequency sub-bands have overlapped frequency points, and the frequency sub-bands are linked in sequence like a chain. A prior probability density distribution model over the subbands is then constructed. Therefore, the intra-subband dependency is ensured, and due to the overlapping of frequency points, the dependency can be transmitted in a chain mode among different subbands. The distribution model constructed by the method in the embodiment of the present disclosure, that is, the distribution function, may be referred to as a chain overlapping subband distribution model.

Distribution functionThe construction method of (2) is as follows:

the entire frequency band is divided into C sub-bands. lc,hcThe first frequency point and the last frequency point which represent the c sub-band satisfy the following conditions:

lc<hc-1,c=2,...,C

this ensures that there is partial overlap of adjacent frequency bands. The whole frequency band forms a chain structure from high to low. The new distribution function is defined as follows:

or the like, or, alternatively,

wherein the content of the first and second substances,representing variance, in particular α -1,

based on this, the weighting coefficients are:

alternatively, the first and second electrodes may be,

step S405: solving the feature problem to obtain a feature vector ep(k,n);

Here, said epAnd (k, n) is a feature vector corresponding to the p-th microphone.

Wherein, solving the characteristic problem: v2(k,n)ep(k,n)=λp(k,n)V1(k,n)ep(k, n) to obtain,

wherein the content of the first and second substances,

step S406: obtaining an updated separation matrix W (k) of each frequency point;

based on the characteristic vector of the characteristic problem, the updated separation matrix of the current frame is obtained

Step S407: obtaining posterior frequency domain estimation of two sound source signals by utilizing W (k) of a current frame;

separating original noise signals by using W (k) of current frame to obtain posterior frequency domain estimation Y (k, n) ([ Y) of two sound source signals1(k,n),Y2(k,n)]T=W(k)X(k,n)。

Step S408: and performing time-frequency conversion according to the posterior frequency domain estimation to obtain a separated time domain signal.

Are respectively pairedISTFT and overlap addition are carried out to obtain a separated time domain sound source signalNamely, it isWhere m is 1, …, Nfft. p is 1, 2.

By the method provided by the embodiment of the disclosure, the separation performance can be improved, the voice damage degree after separation is reduced, and the recognition performance is improved. Meanwhile, the equivalent interference suppression performance can be achieved by using fewer microphones, and the cost of an intelligent product is reduced.

Fig. 5 is a block diagram illustrating an apparatus for processing an audio signal according to an exemplary embodiment. Referring to fig. 5, the apparatus 500 includes a first obtaining module 501, a second obtaining module 502, a dividing module 503, a first determining module 504, a second determining module 505, and a third obtaining module 506.

A first obtaining module 501, configured to obtain, by at least two microphones, audio signals emitted by at least two sound sources, respectively, so as to obtain original noisy signals of the at least two microphones, respectively;

a second obtaining module 502, configured to, for each frame in a time domain, obtain frequency domain estimation signals of the at least two sound sources according to the original noisy signals of the at least two microphones, respectively;

a dividing module 503, configured to divide a predetermined frequency band range into a plurality of frequency sub-bands, where each frequency sub-band includes a plurality of frequency points, and any two adjacent frequency sub-bands have overlapping frequency bands;

a first determining module 504, configured to determine, according to the frequency domain estimation signal of each frequency point in each frequency subband, a weighting coefficient of each frequency point included in the frequency subband;

a second determining module 505, configured to determine a separation matrix of each frequency point according to the weighting coefficient;

a third obtaining module 506, configured to obtain, based on the separation matrix and the original noisy signal, audio signals sent by at least two sound sources respectively.

In some embodiments, the first determining module comprises:

a first determining submodule, configured to determine a distribution function of the frequency domain estimation signal according to the frequency domain estimation signal of each frequency point in each frequency subband;

and the second determining submodule is used for determining the weighting coefficient of each frequency point according to the distribution function.

In some embodiments, the first determining submodule is specifically configured to:

determining the square of the ratio of the frequency domain estimation signal of each frequency point in each frequency sub-band to the standard deviation;

summing the squares of the ratios of the frequency points in each frequency sub-band to determine a first sum;

obtaining the square sum of the first sum corresponding to each frequency sub-band to obtain a second sum;

and determining the distribution function according to the exponential function with the second sum as a variable.

In some embodiments, the first determining submodule is specifically configured to:

determining the square of the ratio between the frequency domain estimation signal and the standard deviation of each frequency point in each frequency sub-band;

summing the squares of the ratios of the frequency points in each frequency sub-band to determine a third sum;

determining a fourth sum according to a predetermined power of the third sum corresponding to each frequency sub-band;

and determining the distribution function according to the exponential function with the fourth sum as a variable.

In some embodiments, the dividing module is specifically configured to:

dividing a predetermined frequency band range into C frequency sub-bands, wherein C is an integer greater than 1;

the arbitrary two adjacent frequency sub-bands have overlapping frequency bands, including:

the first frequency point of the c-th frequency sub-band is smaller than the last frequency point of the c-1 th frequency sub-band; wherein C is greater than or equal to 2 and less than or equal to C.

With regard to the apparatus in the above-described embodiment, the specific manner in which each module performs the operation has been described in detail in the embodiment related to the method, and will not be elaborated here.

Fig. 6 is a block diagram illustrating a physical structure of an audio signal processing apparatus 600 according to an exemplary embodiment. For example, the apparatus 600 may be a mobile phone, a computer, a digital broadcast terminal, a messaging device, a game console, a tablet device, a medical device, an exercise device, a personal digital assistant, and so forth.

Referring to fig. 6, apparatus 600 may include one or more of the following components: a processing component 601, a memory 602, a power component 603, a multimedia component 604, an audio component 605, an input/output (I/O) interface 606, a sensor component 607, and a communication component 608.

The processing component 601 generally controls the overall operation of the device 600, such as operations associated with display, telephone calls, data communications, camera operations, and recording operations. The processing component 601 may include one or more processors 610 to execute instructions to perform all or a portion of the steps of the methods described above. Further, the processing component 601 may also include one or more modules that facilitate interaction between the processing component 601 and other components. For example, the processing component 601 may include a multimedia module to facilitate interaction between the multimedia component 604 and the processing component 601.

The memory 610 is configured to store various types of data to support operations at the apparatus 600. Examples of such data include instructions for any application or method operating on the apparatus 600, contact data, phonebook data, messages, pictures, videos, and so forth. The memory 602 may be implemented by any type or combination of volatile or non-volatile storage devices, such as Static Random Access Memory (SRAM), electrically erasable programmable read-only memory (EEPROM), erasable programmable read-only memory (EPROM), programmable read-only memory (PROM), read-only memory (ROM), magnetic memory, flash memory, magnetic or optical disks.

The power supply component 603 provides power to the various components of the device 600. The power supply component 603 may include: a power management system, one or more power sources, and other components associated with generating, managing, and distributing power for the device 600.

The multimedia component 604 includes a screen that provides an output interface between the device 600 and a user, in some embodiments, the screen may include a liquid crystal display (L CD) and a Touch Panel (TP). if the screen includes a touch panel, the screen may be implemented as a touch screen to receive input signals from a user.

Audio component 605 is configured to output and/or input audio signals. For example, audio component 605 includes a Microphone (MIC) configured to receive external audio signals when apparatus 600 is in an operational mode, such as a call mode, a recording mode, and a voice recognition mode. The received audio signals may further be stored in the memory 610 or transmitted via the communication component 608. In some embodiments, audio component 605 also includes a speaker for outputting audio signals.

The I/O interface 606 provides an interface between the processing component 601 and peripheral interface modules, which may be keyboards, click wheels, buttons, etc. These buttons may include, but are not limited to: a home button, a volume button, a start button, and a lock button.

The sensor component 607 includes one or more sensors for providing various aspects of status assessment for the apparatus 600. For example, the sensor component 607 may detect the open/closed state of the apparatus 600, the relative positioning of components, such as a display and keypad of the apparatus 600, the sensor component 607 may also detect a change in the position of the apparatus 600 or a component of the apparatus 600, the presence or absence of user contact with the apparatus 600, orientation or acceleration/deceleration of the apparatus 600, and a change in the temperature of the apparatus 600. The sensor component 607 may include a proximity sensor configured to detect the presence of nearby objects in the absence of any physical contact. The sensor component 607 may also include a light sensor, such as a CMOS or CCD image sensor, for use in imaging applications. In some embodiments, the sensor component 607 may also include an acceleration sensor, a gyroscope sensor, a magnetic sensor, a pressure sensor, or a temperature sensor.

The communication component 608 is configured to facilitate wired or wireless communication between the apparatus 600 and other devices. The apparatus 600 may access a wireless network based on a communication standard, such as WiFi, 2G, or 3G, or a combination thereof. In an exemplary embodiment, the communication component 608 receives a broadcast signal or broadcast related information from an external broadcast management system via a broadcast channel. In an exemplary embodiment, the communication component 608 further includes a Near Field Communication (NFC) module to facilitate short-range communications. For example, the NFC module may be implemented based on Radio Frequency Identification (RFID) technology, infrared data association (IrDA) technology, Ultra Wideband (UWB) technology, Bluetooth (BT) technology, or other technologies.

In an exemplary embodiment, the apparatus 600 may be implemented by one or more Application Specific Integrated Circuits (ASICs), Digital Signal Processors (DSPs), Digital Signal Processing Devices (DSPDs), programmable logic devices (P L D), Field Programmable Gate Arrays (FPGAs), controllers, microcontrollers, microprocessors, or other electronic components for performing the above-described methods.

In an exemplary embodiment, a non-transitory computer readable storage medium comprising instructions, such as the memory 602 comprising instructions, executable by the processor 610 of the apparatus 600 to perform the above-described method is also provided. For example, the non-transitory computer readable storage medium may be a ROM, a Random Access Memory (RAM), a CD-ROM, a magnetic tape, a floppy disk, an optical data storage device, and the like.

A non-transitory computer readable storage medium having instructions therein, which when executed by a processor of a mobile terminal, enable the mobile terminal to perform any of the methods provided in the above embodiments.

Other embodiments of the invention will be apparent to those skilled in the art from consideration of the specification and practice of the invention disclosed herein. This application is intended to cover any variations, uses, or adaptations of the invention following, in general, the principles of the invention and including such departures from the present disclosure as come within known or customary practice within the art to which the invention pertains. It is intended that the specification and examples be considered as exemplary only, with a true scope and spirit of the invention being indicated by the following claims.

It will be understood that the invention is not limited to the precise arrangements described above and shown in the drawings and that various modifications and changes may be made without departing from the scope thereof. The scope of the invention is limited only by the appended claims.

20页详细技术资料下载
上一篇:一种医用注射器针头装配设备
下一篇:音频信号处理方法及装置、存储介质

网友询问留言

已有0条留言

还没有人留言评论。精彩留言会获得点赞!

精彩留言,会给你点赞!