Method and apparatus for adjusting audio signal and audio system

文档序号:1713762 发布日期:2019-12-13 浏览:20次 中文

阅读说明:本技术 用于调整音频信号的方法和装置以及音频系统 (Method and apparatus for adjusting audio signal and audio system ) 是由 马桂林 陈哲 于 2017-05-04 设计创作,主要内容包括:提供一种用于调整音频信号的方法和装置以及一种音频系统。所述方法包括:获得候选音频信号(S1);获得环境中的当前噪声信号(S2);计算所述候选音频信号与所述当前噪声信号之间的响度的第一多个差值(S3);用多个目标因数来修改所述第一多个差值,其中所述多个目标因数是基于多个测试噪声信号和对应的多个测试音频信号而获得(S4);用修改后的第一多个差值来修改所述候选音频信号以获得目标音频信号(S5)。因此,可以补偿由于噪声信号导致的所感知到的音频信号的响度损失,并且可以解决过度补偿的问题。(a method and apparatus for adjusting an audio signal and an audio system are provided. The method comprises the following steps: obtaining a candidate audio signal (S1); obtaining a current noise signal in the environment (S2); calculating a first plurality of difference values of loudness between the candidate audio signal and the current noise signal (S3); modifying the first plurality of difference values with a plurality of target factors, wherein the plurality of target factors are obtained based on a plurality of test noise signals and a corresponding plurality of test audio signals (S4); modifying the candidate audio signal with the modified first plurality of difference values to obtain a target audio signal (S5). Thus, the loss of perceived loudness of the audio signal due to the noise signal may be compensated and the problem of overcompensation may be solved.)

1. A method of obtaining a plurality of target factors for adjusting an audio signal, the method comprising:

Obtaining a test audio signal and a test noise signal, wherein the test noise signal has a frequency band covering a preset number of barks;

Obtaining a plurality of differences in loudness between the test audio signal and the test noise signal; and

Determining a particular target factor for a particular bark in the plurality of target factors for modifying the plurality of difference values, wherein the particular target factor ranges from 0 to 1 and is determined based on a predetermined requirement.

2. The method of claim 1, further comprising:

The above steps are repeated until a first predetermined number of target factors for the first predetermined number of bark is obtained based on the plurality of test noise signals and the corresponding plurality of test audio signals.

3. The method of claim 2, further comprising:

Obtaining a second predetermined number of target factors for a second predetermined number of bark based on the first predetermined number of target factors and the first predetermined number of bark using linear interpolation.

4. The method of claim 1, wherein obtaining the plurality of difference values of loudness between the test audio signal and the test noise signal comprises:

Calculating a Power Spectral Density (PSD) of the test audio signal in a frequency domain and a PSD of the test noise signal in the frequency domain, respectively;

Processing the PSD of the test audio signal in the frequency domain and the PSD of the test noise signal in the frequency domain with a psychoacoustic masking model, respectively, to obtain an audio signal in a Barker domain on a decibel (dB) scale and to obtain a noise signal in the Barker domain on the dB scale; and

Calculating the plurality of differences in the bark domain at the dB scale by performing a subtraction between the test audio signal in the bark domain at the dB scale and the test noise signal in the bark domain at the dB scale.

5. The method of claim 4, wherein determining the particular target factor for the particular bark for modifying the plurality of difference values comprises:

modifying the plurality of differences in the bark domain on the dB scale by a particular factor, wherein the particular factor ranges between 0 and 1;

obtaining a plurality of linear gain values in the frequency domain based on the modified plurality of difference values in the bark domain at the dB scale;

Modifying the test audio signal with the plurality of linear gain values;

monitoring playback of the modified test audio signal; and

adjusting the particular factor for the particular bark that is in the center of the frequency band of the test noise signal until the modified test audio signal meets the predetermined requirement.

6. The method of claim 1, wherein the test noise signal has a frequency band that overlaps at least a portion of a frequency band of the test audio signal.

7. The method of claim 1, wherein the test noise signal has a frequency band covering 3 to 5 barks.

8. A method for adjusting an audio signal, the method comprising:

obtaining a candidate audio signal;

Obtaining a current noise signal in an environment;

Calculating a first plurality of differences in loudness between the candidate audio signal and the current noise signal;

Modifying the first plurality of difference values with a plurality of target factors, wherein the plurality of target factors are obtained based on a plurality of test noise signals and a corresponding plurality of test audio signals; and

Modifying the candidate audio signal with the modified first plurality of difference values to obtain a target audio signal.

9. The method of claim 8, wherein obtaining the plurality of target factors based on the plurality of test noise signals and the corresponding plurality of test audio signals comprises:

obtaining a first predetermined number of target factors for a first predetermined number of barks in order to obtain a relationship between the first predetermined number of target factors and the first predetermined number of barks, wherein each of the first predetermined number of barks is centered at a frequency band of a test noise signal; and

Obtaining a second predetermined number of target factors corresponding to a second predetermined number of barks in the barker domain that are not at the center of the frequency bands of the plurality of test noise signals based on the relationship between the first predetermined number of target factors and the first predetermined number of barks using linear interpolation.

10. The method of claim 9, wherein obtaining a particular target factor of the predetermined number of target factors for a particular bark of the predetermined number of barks comprises:

Obtaining a test noise signal and a test audio signal, wherein the test noise signal has a frequency band covering a preset number of barks;

obtaining a second plurality of difference values in decibel (dB) scale in the bark domain by performing a subtraction between the test noise signal and the test audio signal;

Modifying the second plurality of differences in the bark domain on the dB scale with a particular target factor, wherein the particular target factor ranges between 0 and 1;

Obtaining a second plurality of linear gain values in the frequency domain based on the modified second plurality of difference values in the bark domain at the dB scale;

modifying the test audio signal with the second plurality of linear gain values;

monitoring playback of the modified test audio signal; and

Adjusting the particular target factor for the particular bark at the center of the frequency band of the test noise signal until the modified test audio signal meets a predetermined requirement.

11. The method of claim 10, wherein the test noise signal has a frequency band that overlaps at least a portion of a frequency band of the test audio signal.

12. The method of claim 10, wherein the test noise signal has a frequency band covering 3 to 5 barks.

13. the method of claim 8, wherein the first plurality of differences in loudness between the candidate audio signal and the current noise signal is obtained by:

Obtaining a Power Spectral Density (PSD) of the candidate audio signal in a frequency domain and a PSD of the current noise signal in the frequency domain, respectively;

processing the PSD of the candidate audio signal in the frequency domain and the PSD of the current noise signal in the frequency domain with a psycho-acoustic masking model, respectively, to obtain a dB-scaled candidate audio signal in the bark domain and a dB-scaled current noise signal in the bark domain; and

calculating the first plurality of difference values in the bark domain at the dB scale by performing a subtraction between the candidate audio signal in the bark domain at the dB scale and the current noise signal in the bark domain at the dB scale.

14. The method of claim 13, wherein modifying the candidate audio signal with the modified first plurality of difference values to obtain the target audio signal comprises:

Transforming the obtained first plurality of difference values in the Barker domain at the dB scale into a third plurality of difference values in the frequency domain at the dB scale using linear interpolation;

Obtaining a first plurality of linear gain values in the frequency domain based on the third plurality of difference values in the frequency domain at the dB scale; and

performing a multiplication between the candidate audio signal and the first plurality of linear gain values in the frequency domain to obtain the target audio signal in the frequency domain.

15. The method of claim 14, further comprising:

Transforming the target audio signal from the frequency domain to a time domain; and

Outputting the target audio signal in the time domain.

16. The method of claim 13, further comprising:

The PSD of the current noise signal in the frequency domain is followed by time and frequency smoothing and nonlinear smoothing; and

Performing the temporal and frequency smoothing on the PSD of the candidate audio signal in the frequency domain before processing the PSD of the candidate audio signal in the frequency domain and the PSD of the current noise signal in the frequency domain with the psychoacoustic masking model, respectively.

17. The method of claim 13, further comprising:

performing pitch correction on the PSD of the current noise signal in the frequency domain before processing the PSD of the candidate audio signal in the frequency domain and the PSD of the current noise signal in the frequency domain with the psychoacoustic masking model, respectively;

wherein the pitch correction is obtained based on flatness measurements for a plurality of subbands.

18. The method of claim 13, wherein the candidate audio signal is a multi-channel audio signal in the time domain, the method further comprising:

transforming the multi-channel audio signal from the time domain to the frequency domain; and

Averaging the multi-channel audio signal in the frequency domain to obtain a mono audio signal in the frequency domain in order to calculate a PSD of the mono audio signal in the frequency domain as the PSD of the candidate audio signal in the frequency domain.

19. An audio system, characterized in that the audio system comprises:

An audio playback device configured to play an audio signal;

A microphone configured to detect a noise signal in an environment;

A storage device configured to store a plurality of target factors, wherein the plurality of target factors are adapted to modify an audio signal; and

A processor configured to:

obtaining candidate audio signals to be played by the audio playback device;

Obtaining a current noise signal detected by the microphone;

Calculating a first plurality of differences in loudness between the candidate audio signal and the current noise signal;

Modifying the first plurality of difference values with the plurality of target factors;

modifying the candidate audio signal with the modified first plurality of difference values to obtain a target audio signal; and

Controlling the audio playback device to play the target audio signal.

20. The audio system according to claim 19, wherein the plurality of target factors are obtained according to any one of claims 1 to 7.

21. the audio system of claim 19, wherein the audio playback device is a headset.

22. The audio system of claim 21, wherein the microphone is an in-line microphone of the headset.

23. The audio system of claim 19, wherein the storage device and the processor are integrated in the audio playback device.

24. The audio system of claim 19, further comprising a master device, wherein the audio playback device is in communication with the master device, and wherein the storage device and the processor are integrated in the master device.

Technical Field

the present invention relates generally to the field of audio signal processing, and more particularly to a method of adjusting an audio signal, a device for adjusting an audio signal, and an audio system.

Background

in the presence of ambient noise, the listener's perceived loudness of the audio signal of an audio playback device may be affected by competing noise sounds. Since audio loudness is a psychoacoustic correlation of the physical strength of an audio signal, the perceived audio loudness may decrease as the competing sound levels increase. More theoretical basis for audio loudness can be found in the article published by Moore, Brian c.j. et al, JAES VOLUME 45, pp.224-240, 4 th (date of note 1997, month 4).

Measures have been taken to acoustically optimize the audio system, but have a limited effect on achieving good acoustic results in poor acoustic environments.

Therefore, a method for improving the acoustic performance of an audio playback device in a noisy environment is needed.

disclosure of Invention

According to one embodiment of the present invention, a method of obtaining a plurality of target factors for adjusting an audio signal is provided. The method of obtaining a plurality of target factors for adjusting an audio signal comprises: obtaining a test audio signal and a test noise signal, wherein the test noise signal has a frequency band covering a preset number of barks; obtaining a plurality of differences in loudness between the test audio signal and the test noise signal; and determining a particular target factor for a particular bark in the plurality of target factors for modifying the plurality of difference values, wherein the particular target factor ranges from 0 to 1 and is determined based on a predetermined requirement.

in some embodiments, the method further comprises: the above steps are repeated until a first predetermined number of target factors for the first predetermined number of bark is obtained based on the plurality of test noise signals and the corresponding plurality of test audio signals.

In some embodiments, the method further comprises: obtaining a second predetermined number of target factors for a second predetermined number of bark based on the first predetermined number of target factors and the first predetermined number of bark using linear interpolation.

in some embodiments, obtaining the plurality of differences in loudness between the test audio signal and the test noise signal comprises: calculating a Power Spectral Density (PSD) of the test audio signal in a frequency domain and a PSD of the test noise signal in the frequency domain, respectively; processing the PSD of the test audio signal in the frequency domain and the PSD of the test noise signal in the frequency domain with a psychoacoustic masking model, respectively, to obtain an audio signal in a Barker domain on a decibel (dB) scale and to obtain a noise signal in the Barker domain on the dB scale; and calculating the plurality of differences in the bark domain at the dB scale by performing a subtraction between the test audio signal in the bark domain at the dB scale and the test noise signal in the bark domain at the dB scale.

In some embodiments, determining the particular target factor for the particular bark for modifying the plurality of difference values comprises: modifying the plurality of differences in the bark domain on the dB scale by a particular factor, wherein the particular factor ranges between 0 and 1; obtaining a plurality of linear gain values in the frequency domain based on the modified plurality of difference values in the bark domain at the dB scale; modifying the test audio signal with the plurality of linear gain values; monitoring playback of the modified test audio signal; and adjusting the particular factor for the particular bark at the center of the frequency band of the test noise signal until the modified test audio signal meets the predetermined requirement.

In some implementations, the test noise signal has a frequency band that overlaps at least a portion of a frequency band of the test audio signal.

in some embodiments, the test noise signal has a frequency band covering 3 to 5 barks.

according to an embodiment of the present invention, there is also provided an apparatus for obtaining a plurality of target factors for adjusting an audio signal accordingly. The means for obtaining a plurality of target factors for adjusting an audio signal comprises: a first obtaining circuit configured to obtain a test audio signal and a test noise signal, wherein the test noise signal has a frequency band covering a preset number of barks; a second obtaining circuit configured to obtain a plurality of differences in loudness between the test audio signal and the test noise signal; and a determination circuit configured to determine a particular target factor for a particular bark of the plurality of target factors for modifying the plurality of difference values, wherein the particular target factor ranges from 0 to 1 and is determined based on a predetermined requirement.

According to one embodiment of the present invention, a method of conditioning an audio signal is provided. The method for adjusting the audio signal comprises the following steps: obtaining a candidate audio signal; obtaining a current noise signal in an environment; calculating a first plurality of differences in loudness between the candidate audio signal and the current noise signal; modifying the first plurality of difference values with a plurality of target factors, wherein the plurality of target factors are obtained based on a plurality of test noise signals and a corresponding plurality of test audio signals; and modifying the candidate audio signal with the modified first plurality of difference values to obtain a target audio signal.

In some embodiments, obtaining the plurality of target factors based on the plurality of test noise signals and the corresponding plurality of test audio signals comprises: obtaining a first predetermined number of target factors for a first predetermined number of barks in order to obtain a relationship between the first predetermined number of target factors and the first predetermined number of barks, wherein each of the first predetermined number of barks is centered at a frequency band of a test noise signal; and obtaining a second predetermined number of target factors corresponding to a second predetermined number of barks in the barker domain that are not at the center of the frequency bands of the plurality of test noise signals based on the relationship between the first predetermined number of target factors and the first predetermined number of barks using linear interpolation.

In some implementations, obtaining a particular target factor of the predetermined number of target factors for a particular bark of the predetermined number of barks includes: obtaining a test noise signal and a test audio signal, wherein the test noise signal has a frequency band covering a preset number of barks; obtaining a second plurality of difference values in the Barker domain on a dB scale by performing a subtraction between the test noise signal and the test audio signal; modifying the second plurality of differences in the bark domain on the dB scale with a particular target factor, wherein the particular target factor ranges between 0 and 1; obtaining a second plurality of linear gain values in the frequency domain based on the modified second plurality of difference values in the bark domain at the dB scale; modifying the test audio signal with the second plurality of linear gain values; monitoring playback of the modified test audio signal; and adjusting the particular target factor for the particular bark at the center of the frequency band of the test noise signal until the modified test audio signal meets a predetermined requirement.

In some implementations, the test noise signal has a frequency band that overlaps at least a portion of a frequency band of the test audio signal.

In some embodiments, the test noise signal has a frequency band covering 3 to 5 barks.

In some implementations, the first plurality of difference values of loudness between the candidate audio signal and the current noise signal is obtained by: obtaining a PSD of the candidate audio signal in the frequency domain and a PSD of the current noise signal in the frequency domain, respectively; processing the PSD of the candidate audio signal in the frequency domain and the PSD of the current noise signal in the frequency domain with a psycho-acoustic masking model, respectively, to obtain a dB-scaled candidate audio signal in the bark domain and a dB-scaled current noise signal in the bark domain; and calculating the first plurality of difference values in the bark domain at the dB scale by performing a subtraction between the candidate audio signal in the bark domain at the dB scale and the current noise signal in the bark domain at the dB scale.

in some implementations, modifying the candidate audio signal with the modified first plurality of difference values to obtain the target audio signal comprises: transforming the obtained first plurality of difference values in the Barker domain at the dB scale into a third plurality of difference values in the frequency domain at the dB scale using linear interpolation; obtaining a first plurality of linear gain values in the frequency domain based on the third plurality of difference values in the frequency domain at the dB scale; and performing a multiplication between the candidate audio signal and the first plurality of linear gain values in the frequency domain to obtain the target audio signal in the frequency domain.

in some embodiments, the method of adjusting an audio signal further comprises: transforming the target audio signal from the frequency domain to a time domain; and outputting the target audio signal in the time domain.

in some embodiments, the method of adjusting an audio signal further comprises: the PSD of the current noise signal in the frequency domain is followed by time and frequency smoothing and nonlinear smoothing; and performing the temporal and frequency smoothing on the PSD of the candidate audio signal in the frequency domain before processing the PSD of the candidate audio signal in the frequency domain and the PSD of the current noise signal in the frequency domain with the psychoacoustic masking model, respectively.

in some embodiments, the method of adjusting an audio signal further comprises: performing pitch correction on the PSD of the current noise signal in the frequency domain before processing the PSD of the candidate audio signal in the frequency domain and the PSD of the current noise signal in the frequency domain with the psychoacoustic masking model, respectively; wherein the pitch correction is obtained based on flatness measurements for a plurality of subbands.

In some implementations, the candidate audio signal is a multi-channel audio signal in the time domain, the method further comprising: transforming the multi-channel audio signal from the time domain to the frequency domain; and averaging the multi-channel audio signal in the frequency domain to obtain a mono audio signal in the frequency domain in order to calculate a PSD of the mono audio signal in the frequency domain as the PSD of the candidate audio signal in the frequency domain.

According to one embodiment of the present invention, an audio system is provided. The audio system includes: an audio playback device configured to play an audio signal; a microphone configured to detect a noise signal in an environment; a storage device configured to store a plurality of target factors, wherein the plurality of target factors are adapted to modify an audio signal; and a processor configured to: obtaining candidate audio signals to be played by the audio playback device; obtaining a current noise signal detected by the microphone; calculating a first plurality of differences in loudness between the candidate audio signal and the current noise signal; modifying the first plurality of difference values with the plurality of target factors; modifying the candidate audio signal with the modified first plurality of difference values to obtain a target audio signal; and controlling the audio playback device to play the target audio signal.

in some embodiments, the plurality of target factors is obtained according to the method of obtaining a plurality of target factors for adjusting an audio signal described above.

in some implementations, the audio playback device is a headset.

in some implementations, the microphone is an in-line microphone of the headset.

In some embodiments, the storage device and the processor are integrated in the audio playback device.

in some embodiments, the audio system further comprises a master device, wherein the audio playback device is in communication with the master device, and the storage device and the processor are integrated in the master device.

The present invention has the following advantages as compared with the conventional art.

In case a noise signal is present in the environment, a first plurality of difference values of the loudness between the candidate audio signal and the current noise signal is calculated and modified with a plurality of target factors, and subsequently the candidate audio signal is modified with the modified first plurality of difference values to obtain the target audio signal, such that a loss of perceived loudness of the audio signal due to competing sounds of the noise signal may be compensated. A plurality of target factors is obtained based on a plurality of test noise signals and a corresponding plurality of test audio signals covering a wide frequency band, such that the plurality of target factors is adapted to almost all noise signals in the environment.

in addition, a plurality of target factors are obtained for each bark in the bark domain, so that the first plurality of difference values can be modified at each bark and the candidate audio signals can be accurately adjusted to avoid overcompensation as much as possible.

Drawings

The foregoing and other features of the present invention will become more fully apparent from the following description and appended claims, taken in conjunction with the accompanying drawings. Understanding that these drawings depict only several embodiments in accordance with the invention and are not therefore to be considered to be limiting of its scope, the invention will be described with additional specificity and detail through use of the accompanying drawings.

Fig. 1 schematically shows a flow diagram of a method for adjusting a candidate audio signal according to an embodiment of the invention;

Fig. 2 schematically shows a flow diagram of a method for adapting a candidate audio signal according to another embodiment of the present invention;

FIG. 3 schematically illustrates a flow chart of a method for obtaining a plurality of target factors, according to one embodiment of the invention;

Fig. 4 schematically shows an apparatus for adapting a candidate audio signal according to an embodiment of the present invention;

Fig. 5 schematically shows an audio system according to an embodiment of the invention; and

Fig. 6 schematically shows an audio system according to another embodiment of the invention.

Detailed Description

In the following detailed description, reference is made to the accompanying drawings, which form a part hereof. In the drawings, like reference numerals generally refer to like parts throughout the various views unless the context indicates otherwise. The illustrative embodiments described in the detailed description, drawings, and claims are not intended to be limiting. Other embodiments may be utilized, and other changes may be made, without departing from the spirit or scope of the subject matter presented herein. It will be readily understood that the aspects of the present invention, as generally described herein, and illustrated in the figures, can be arranged, substituted, combined, and designed in a wide variety of different configurations, all of which are explicitly contemplated and make part of this invention.

Referring to fig. 1, fig. 1 schematically shows a flow diagram of a method for adjusting a candidate audio signal according to an embodiment of the present invention. The method includes steps S1, S2, S3, S4, and S5.

In S1, the candidate audio signal is obtained. In some embodiments, the obtained candidate audio signal is in the time domain.

in S2, a current noise signal in the environment is obtained. In particular, the current noise signal may refer to any competing sound in the environment that interferes with the associated audio signal.

As mentioned before, the loudness of an audio signal perceived by a listener may be affected by noise signals in the environment. In some embodiments of the present invention, the candidate audio signals may be adjusted with a first plurality of differences in loudness between the candidate audio signals and the current noise signal to compensate for the masking effect of the current noise signal on the candidate audio signals in the psychoacoustic model.

In S3, a first plurality of differences in loudness between the candidate audio signal and the current noise signal is calculated.

However, the inventors' research shows that if the absolute difference in loudness between the candidate audio signal and the current noise signal is directly applied to compensate the candidate audio signal, the candidate audio signal will be overcompensated.

To address the overcompensation problem, in S4, the plurality of difference values are modified with a plurality of target factors, wherein the plurality of target factors are obtained based on a plurality of test noise signals and a corresponding plurality of test audio signals. In some embodiments, the plurality of target factors is greater than zero and less than one in order to reduce the first plurality of difference values.

In S5, the candidate audio signal is modified with the modified first plurality of difference values to obtain a target audio signal.

in some embodiments, modifying the candidate audio signal with the modified first plurality of difference values comprises: obtaining a first plurality of linear gain values based on the modified first plurality of difference values; and performing a multiplication between the candidate audio signal and the first plurality of linear gain values in the frequency domain to obtain a target audio signal in the frequency domain.

In some embodiments, the method of adjusting candidate audio signals shown in fig. 1 may further include: the target audio signal is transformed from the frequency domain to the time domain, and the target audio signal in the time domain is output.

In other embodiments, modifying the candidate audio signal with the modified first plurality of difference values may comprise: obtaining a first plurality of linear gain values based on the modified first plurality of difference values; and performing a convolution between the candidate audio signal and the first plurality of linear gain values in the time domain to obtain a target audio signal in the time domain. Then, the method of adjusting the candidate audio signal may further include outputting the target audio signal in the time domain.

Referring to fig. 2, fig. 2 schematically shows a flow chart of a method for adjusting a candidate audio signal according to another embodiment of the present invention. The method comprises the following steps.

In S31a and S31b, a current noise signal in the time domain in the environment and a candidate audio signal in the time domain are obtained, respectively, wherein the candidate audio signal is a multi-channel audio signal.

In S32a and S32b, the current noise signal and the multi-channel audio signal are transformed from the time domain to the frequency domain, respectively. Specifically, the multi-channel audio signal and the current noise signal may be transformed from the time domain to the frequency domain using, for example, a Fast Fourier Transform (FFT) method.

In S33, the multi-channel audio signal in the frequency domain is averaged to obtain a mono audio signal in the frequency domain. Specifically, a summation operation is performed on a multi-channel audio signal in the frequency domain, and then an arithmetic average of the sum in the frequency domain is obtained. All subsequent processing of the candidate audio signals is performed on the mono audio signal. In other embodiments, a more accurate loudness summation of a multi-channel audio signal may be performed according to a psychoacoustic model. Note that when the candidate audio signal is a monaural audio signal, step S33 may be omitted.

In S34a and S34b, the PSD of the current noise signal in the frequency domain and the PSD of the mono audio signal in the frequency domain are calculated, respectively.

In S35a and S35b, time and frequency smoothing is performed on the PSD of the current noise signal in the frequency domain and the PSD of the mono audio signal in the frequency domain, respectively, in order to simulate human perception of various noise sounds. The inventors have found that noise signals with different frequencies require different smoothing factors. Specifically, severe smoothing should be applied to the high frequency noise signal, and mild smoothing should be applied to the low frequency noise signal, such that the smoothing strength is inversely proportional to the frequency resolution in each bark.

In S36, nonlinear smoothing is performed on the PSD of the current noise signal in the frequency domain so that when a noise signal is detected, compensation may not be triggered too easily and needs to be terminated immediately when the noise signal ends.

in S37, a pitch correction is performed on the PSD of the current noise signal in the frequency domain, wherein the pitch correction is obtained based on flatness measurements for a plurality of subbands.

Specifically, referring to the psychoacoustic model, pitch correction (i.e., attenuation) is quantified by:

pitch shift (sharpness) (14.5+ bark) + (1-sharpness) × 5.5 (1)

Where "14.5 + bark" and "5.5" are correction values in decibels (dB) of an ideal tone (sharpness-1) and an ideal white noise (sharpness-0), respectively, the sharpness being obtained by flatness measurement, and the values of sharpness corresponding to different barks are generally normalized between 0 and 1.

Equation (1) for determining pitch correction should be modified in view of the fact that noise to be corrected by equation (1) generally has a fairly wide bandwidth, whereas the noise signal in the current implementation is relatively narrow.

in some implementations, pitch correction is quantified by:

Pitch shift (14.5+ bark +1+30) + (1-sharpness) 5.5 (2)

Where sharpness is obtained by flatness measurement, flatness measurement is performed on a plurality of sub-bands obtained by dividing a complete band into a number of sub-bands. In one implementation, a full band of noise covering a frequency band from 20Hz to 20000Hz is first split into eight to 10 sub-bands, and then the flatness of each of the eight to ten sub-bands is measured. However, the number of sub-bands divided from the complete band will not be limited thereto. The bandwidth of the sub-bands depends on the particular application.

Specifically, for each subband, the flatness and frequency corresponding to the maximum amplitude of the noise are measured, and the measured frequency is transformed into bark in the bark domain. Using the measured flatness and barker and equation (2), the pitch offset can be derived.

in other embodiments, to achieve a smoother modification from low to high frequencies, other correction options may be tested, one of which is:

pitch shift (sharpness) (14.5+ bark 2+1) + (1-sharpness) × 5.5 (3)

in S38a and S38b, the PSD of the current noise signal in the frequency domain and the PSD of the mono audio signal in the frequency domain are separately processed using a psychoacoustic masking model, so that a psychoacoustic masking relationship between the current noise signal and the mono audio signal can be obtained, and the current noise signal in the bark domain at a dB scale and the mono audio signal in the bark domain at a dB scale can be obtained.

Those skilled in the art will appreciate that psychoacoustic models are commonly used to study sound perception and masking effects between signals and masking sounds. The specific processing of the PSD of the noise signal and the PSD of the mono audio signal in the bark domain by the psycho-acoustic masking model will not be discussed in detail below.

In S39, a first plurality of difference values in the bark domain at the dB scale are calculated by performing a subtraction between the mono audio signal in the bark domain at the dB scale obtained in S38a and S38b and the current noise signal in the bark domain at the dB scale.

however, as previously mentioned, if the first plurality of difference values are directly applied to compensate the candidate audio signal, the candidate audio signal will be overcompensated. To address the overcompensation, a plurality of target factors are applied to modify the first plurality of difference values to reduce the first plurality of difference values.

In S40, the first plurality of difference values in the bark domain on a dB scale are correspondingly modified with a plurality of target factors in the bark domain, wherein the plurality of target factors in the bark domain are obtained based on the plurality of test noise signals and the corresponding plurality of test audio signals.

in some embodiments, the plurality of target factors in the bark domain are greater than zero and less than one.

In some embodiments, the plurality of target factors are adjusted manually, the plurality of target factors not being adaptive in real-time.

In S41, a first plurality of linear gain values corresponding to different bark in the frequency domain is obtained based on the modified first plurality of difference values in the bark domain on a dB scale.

Specifically, the modified first plurality of difference values at the dB scale in the bark domain are transformed into a third plurality of difference values at the dB scale in the frequency domain using linear interpolation, and the third plurality of difference values at the dB scale in the frequency domain are subsequently transformed into a first plurality of linear gain values in the frequency domain.

in some embodiments, when the loudness of the candidate audio signal at a frequency point is greater than or equal to the loudness of the current noise signal at the frequency point, it may no longer be necessary to compensate for the candidate audio signal at the frequency point. Thus, when the difference value at a frequency point is greater than or equal to zero, that is, the candidate audio signal is greater than or equal to the current noise signal at the frequency point, then the gain value at the frequency point is set to 1; and when the difference at a frequency point is less than zero, that is, the candidate audio signal is less than the current noise signal at the frequency point, then the gain value at the frequency point is set to a number greater than 1.

Thereafter, in S42, each of the multi-channel audio signals is modified with a first plurality of linear gain values in the frequency domain in order to obtain a target multi-channel audio signal.

in some implementations, the multi-channel audio signal is first transformed from the time domain to the frequency domain, and then each of the multi-channel audio signals in the frequency domain is multiplied by a first plurality of linear gain values in the frequency domain in order to obtain a target multi-channel audio signal in the frequency domain.

In some embodiments, the target multi-channel audio signal is transformed from the frequency domain to the time domain and then the target multi-channel audio signal in the time domain is output.

in other embodiments, the first plurality of linear gain values is transformed from the frequency domain to the time domain, and then convolved with each of the multi-channel audio signals in the time domain with the first plurality of linear gain values in the time domain to obtain the target multi-channel audio signal in the time domain, wherein the convolution may be implemented by an Infinite Impulse Response (IIR) filter or a Finite Impulse Response (FIR) filter. Subsequently, the target multi-channel audio signal in the time domain may be output.

The inventors' studies indicate that candidate audio signals at different frequencies will compensate differently, and thus, in some embodiments, multiple target factors are determined separately for each bark in the bark domain.

Those skilled in the art will appreciate that there are 25 barks in the barker domain. Thus, 25 target factors for each of the 25 barks would be calculated.

in some embodiments, obtaining 25 target factors comprises: obtaining a first predetermined number of target factors for a first predetermined number of barks in order to obtain a relationship between the first predetermined number of target factors and the first predetermined number of barks, wherein each of the first predetermined number of barks is at a center of a frequency band of a test noise signal; and obtaining a second predetermined number of the 25 target factors corresponding to a second predetermined number of barks in a barker domain based on a relationship between the first predetermined number of target factors and the first predetermined number of barks, wherein the second predetermined number of target factors is a remaining number of the 25 target factors other than the first predetermined number of target factors, such that a plurality of target factors corresponding to each of the barkers in the barker domain may be obtained.

Note that the first predetermined number of target factors depends on the specific requirements for test accuracy and test complexity. The first predetermined number of target factors may be in the range of 2 to 25. The present invention does not impose a limit on the first predetermined number of target factors.

Referring to fig. 3, fig. 3 schematically illustrates a flow chart of a method for obtaining a plurality of target factors, according to an embodiment of the present invention. The method comprises the following steps.

In S381, a variable i is set to 1.

Subsequently, in step S382, it is determined whether the variable i is less than 25. If so, the method is directed to step S383; otherwise, the method is directed to step S390.

in step S383, a test noise signal and a test audio signal are obtained, wherein the test noise signal covers a frequency band from bark (i) to bark (i + a), wherein a is a preset number, so that the influence of the spreading function in the psychoacoustic masking model can be reduced and the overall tone impression and singing portion in the audio signal can be adjusted.

As known to those skilled in the art, noise signals in natural environments are often diverse and have a wide frequency band. In the present embodiment, a plurality of test noise signals covering different frequency bands may be simulated and applied to determine a plurality of target factors, which will therefore also adapt to the noise signal in the natural environment, which may be considered as various combinations of the plurality of test noise signals.

In some embodiments, a band pass filter is used to simulate a test noise signal with white noise. Optionally, a low pass filter and a high pass filter may be applied at both cut-off frequencies of the test noise signal to increase frequency selectivity.

The test noise signal should not cover too narrow a frequency band, in view of the fact that the influence of the spreading function in the psychoacoustic model can be reduced and the overall tone impression of the instrument and the singing part of the music will be adjusted as the frequency bandwidth of the test noise signal increases. On the other hand, if the frequency bandwidth of the test noise signal is too large, the determined plurality of target factors will not be sufficiently accurate. Therefore, the frequency band covered by the test noise signal should not be too wide. Thus, the first predetermined number of barks covered by the test noise signal should not be too small or too large, i.e. a moderate preset number a is required.

In some embodiments, the preset number a may be set to 2. That is, the test noise signal covers three barks: bark (i), bark (i +1) and bark (i + 2).

In other embodiments, the preset number a may be set to 4. That is, the test noise signal covers five barks.

Subsequently, a second plurality of differences in loudness between the test audio signal and the test noise signal needs to be obtained.

In S384, a second plurality of difference values in the bark domain at a dB scale is obtained by performing a subtraction between the test noise signal and the test audio signal.

In some embodiments, a method for obtaining a second plurality of differences in the bark domain on a dB scale comprises: respectively calculating the PSD of a test audio signal in a frequency domain and the PSD of a test noise signal in the frequency domain; processing the PSD of the test audio signal in the frequency domain and the PSD of the test noise signal in the frequency domain with a psychoacoustic masking model, respectively, to obtain an audio signal in the bark domain at a dB scale and to obtain a noise signal in the bark domain at a dB scale; and calculating a second plurality of difference values in the bark domain at the dB scale by performing a subtraction between the test audio signal in the bark domain at the dB scale and the test noise signal in the bark domain at the dB scale.

The method for obtaining the second plurality of difference values in the bark domain on a dB scale may refer to steps S32a through S38a, S32b through S38b, and S39 shown in fig. 2, which will not be discussed in detail below.

In step S385, the second plurality of difference values is modified by a target factor TF (i + a/2) at a center bark (i + a/2), where TF (i + a/2) ranges from 0 to 1. In some embodiments, an initial target factor TF (i + a/2) of 1 is used. Note that step S385 differs from step S40 shown in fig. 2 in that the first plurality of difference values are modified with a plurality of target factors in step S40.

In step S386, a second plurality of linear gain values in the frequency domain is obtained based on the modified second plurality of difference values in the bark domain on a dB scale.

The method for obtaining the second plurality of linear gain values may refer to step S41 in the method shown in fig. 2, which will not be discussed in detail below.

In S387, the test audio signal is modified with a second plurality of linear gain values. A specific method for modifying the test audio signal may refer to step S42 in the method shown in fig. 2, which will not be discussed in detail herein.

In step S388, the playback of the modified test audio signal is monitored and at the same time the target factor TF (i + a/2) at the center bark (i + a/2) is adjusted until the modified test audio signal meets the predetermined requirement.

in some embodiments, the predetermined requirement is that the modified test audio signal sounds natural or sounds like no noise signal is present.

In other embodiments, the predetermined requirement is that the modified test audio signal can be slightly compensated in almost the entire band to improve the perceived signal-to-noise ratio.

In some embodiments, when a is an odd number, the center of the frequency band of the test audio signal is bark (i + (a-1)/2) or bark (i + (a + 1)/2); and when a is an even number, the center of the frequency band of the test audio signal is bark (i + a/2).

In S389, i is incremented by a +1, and the method is directed to S382.

when i is determined to be greater than or equal to 25, the method is directed to S390.

Note that, before step S390, a first predetermined number of target factors for a first predetermined number of barks, each of which is at the center of the frequency band of the test noise signal, is obtained, so that a relationship between the calculated first predetermined number of target factors and the first predetermined number of barks can be obtained.

In S390, a second predetermined number of target factors corresponding to a second predetermined number of barks in the barker domain that are not at the center of the bands of the plurality of test noise signals are calculated based on the calculated relationship between the first predetermined number of target factors and the first predetermined number of barks using a linear interpolation method, so for a barker that is not at the center of a band of a test noise signal, the corresponding target factor of the barker can be linearly interpolated from an adjacent barker near the center of the band of the test noise signal.

it is noted that in the above-described method for obtaining a plurality of target factors, the method is applied to simplify the calculation by first calculating a first predetermined number of target factors and then calculating a second predetermined number of target factors based on the first predetermined number of target factors. In practice, in order to determine each of the plurality of target factors, the methods shown in steps S381 to S389 may be applied.

in some embodiments, the predetermined number a can be two, three, four, or five, and so on.

In some embodiments, the first predetermined number a may be different in each cycle of the method shown in fig. 3.

In some implementations, a plurality of test noise signals and a plurality of test audio signals are applied to iteratively adjust the plurality of target factors such that the determined plurality of target factors are adapted to a majority of the noise signals and audio signals. For example, three test audio signals and three test noise signals are provided for each cycle of the method shown in fig. 3, and if time permits, steps S383 to S388 in each cycle may be performed 3 plus 3 times, so that a particular target factor is determined by using various combinations of the three test audio signals and the three noise signals.

In particular, taking into account the spreading function between barks, a number of target factors from 0.7 to 0.9 may be determined for each barker, which may reduce the accumulation gain from adjacent barkers. Note that the range of target factors from 0.7 to 0.9 is not widely tested for various audio signals, noise signals, and reference listening levels. Thus, in other embodiments, the multiple target factors may have different ranges for different audio signals, different noise signals, and different reference listening levels. The present invention is not limited thereto.

by the method shown in fig. 3, a plurality of tf (i) of bark (i) are respectively calculated, wherein i ranges from 1 to 25. The 25 target factors in the bark domain may then be applied to modify the first plurality of difference values as described in fig. 1 and 2.

As shown in fig. 3, in each cycle of the flowchart, a test noise signal and a test audio signal are used. Thus, the plurality of target factors is obtained based on a plurality of test noise signals and a corresponding plurality of test audio signals.

In some embodiments, the frequency bands of the plurality of test noise signals range from low frequencies to high frequencies.

In other embodiments, the frequency bands of the plurality of test noise signals range from high frequencies to low frequencies.

In some implementations, the test noise signal has a frequency band that overlaps at least a portion of a frequency band of the test audio signal.

In some embodiments, the test audio signal is band-passed during the adjustment of the target factor of the test noise signal for each frequency band, but it may be more difficult to judge the tone impression of the original audio signal.

Note that for low-frequency and high-frequency test noise signals, a test audio signal with wide spectrum coverage is required to prevent the test audio signal from being unable to cover the test noise signal in the frequency band, and thus unable to adjust the target factor of the test noise signal in the low-frequency region or the high-frequency region.

In some embodiments, some classical music signals are selected as the plurality of test audio signals. In some embodiments, sound from a drum or cello or male voice may be used as the test audio signal, and low-frequency noise may be used as the test noise signal. In other embodiments, a sound from a piano or violin or a female sound may be used as the test audio signal, and high-frequency noise may be used as the test noise signal.

In addition, the invention also provides a device for adjusting the audio signal. Referring to fig. 4, fig. 4 schematically shows an apparatus 40 for adjusting an audio signal according to an embodiment of the present invention.

the apparatus 40 comprises a first obtaining circuit 401, a second obtaining circuit 402, a calculating circuit 403, a first modifying circuit 404 and a second modifying circuit 405. Wherein the first obtaining circuit 401 is configured to obtain a candidate audio signal; the second obtaining circuit 402 is configured to obtain a current noise signal in the environment; the calculation circuit 403 is configured to calculate a first plurality of difference values of loudness between the candidate audio signal and the current noise signal; the first modifying circuit 404 is configured to modify the first plurality of difference values with a plurality of target factors, wherein the plurality of target factors are obtained based on a plurality of test noise signals and a corresponding plurality of test audio signals; and the second modifying circuit 405 is configured to modify the candidate audio signal with the modified first plurality of gain values to obtain the target audio signal.

In some embodiments, the plurality of target factors are pre-stored in a storage device, and the first modifying circuit 404 is further configured to load the plurality of target factors from the storage device to modify the first plurality of difference values.

The functions of each of the first obtaining circuit, the second obtaining circuit, the calculating circuit, the first modifying circuit and the second modifying circuit may be referred to correspondingly with respect to the description of the method for adjusting an audio signal as discussed above in fig. 1-2, which will not be described in detail herein.

In some embodiments, the means for adjusting the candidate audio signals may be a Processor, such as a Central Processing Unit (CPU), a Digital Signal Processor (DSP), a Field Programmable Gate Array (FPGA), or the like.

In addition, the invention also provides an audio system for controlling audio playback.

Referring to fig. 5, fig. 5 schematically illustrates an audio system 50 according to an embodiment of the present invention.

In some embodiments, the audio system 50 may comprise a master device 51 and an audio playback device 52, wherein the master device 51 comprises a multimedia signal source 510, a processor 511 and a storage device 512, the audio playback device 52 is equipped with a microphone 521, and the audio playback device 52 is in communication with the master device 51. The communication means between the audio playback device 52 and the main device 51 may be in a wired or wireless form.

In some embodiments, the audio playback device 52 is a headset and the main device 51 is a mobile phone or a computer.

Specifically, multimedia signal source 510 is configured to provide multimedia data. In some embodiments, the multimedia signal source 510 may be a multimedia player in a mobile phone or computer.

The audio playback device 52 is configured to play audio signals of the multimedia data. In some embodiments, the audio playback device 52 may be a headset. In other embodiments, an external microphone or earpiece may be used as the audio playback device 52.

The microphone 521 is configured to detect noise signals in the environment. In some embodiments, the microphone 521 may be a separate microphone from the audio playback device 52. In other embodiments, the audio playback device 52 is a headset and the microphone 521 may be an in-line microphone integrated into the headset through an audio line.

Note that the audio signal played by the audio playback device 52 is acoustically isolated from the microphone 521. The isolation can be achieved by providing a sufficiently long distance between the audio playback device 52 and the microphone 521 and good control of leakage from the audio playback device. If the audio playback device 52 is a headset with an embedded microphone on the audio line, the headset and the embedded microphone need to be acoustically calibrated by reference to the microphone.

The storage 512 is configured to store a plurality of target factors in the bark domain, wherein the plurality of target factors are obtained based on a plurality of test noise signals and a corresponding plurality of test audio signals. The storage device may be a Secure Digital (SD) card, an optical or magnetic disk, or the like. The method for obtaining the plurality of target factors may refer to fig. 3, which will not be discussed in detail herein.

The processor 511 is configured to: obtaining candidate audio signals to be played by the audio playback device 52; obtaining a current noise signal detected by the microphone 521; calculating a first plurality of differences in loudness between the candidate audio signal and the current noise signal; modifying the first plurality of difference values with a plurality of target factors stored in storage 512; modifying the candidate audio signal with the modified first plurality of difference values to obtain a target audio signal; and controls the audio playback device 52 to play the target audio signal. The processor 511 may be a CPU, DSP, FPGA, or the like.

In some embodiments, before the processor 511 modifies the first plurality of difference values with the plurality of target factors, the processor 511 is further configured to load the plurality of target factors from the storage device 512.

The functions of the processor 511 may be referred to the description regarding the method shown in fig. 1 to 2 and the apparatus for adjusting an audio signal shown in fig. 4, respectively, which will not be discussed in detail herein.

Referring to fig. 6, fig. 6 schematically illustrates an audio system 60 according to another embodiment of the present invention.

An audio system 60 for controlling audio playback comprises a main device 61 and an audio playback device 62, wherein the audio playback device 62 is equipped with a processor 620, a storage device 621 and a microphone 622, the main device 61 being in communication with the audio playback device 62 and being configured to provide a multimedia signal source 610.

In some embodiments, audio playback device 62 may be a headset, processor 620 may be a chip embedded in the headset, storage device 621 may be a memory card in the headset, and microphone 622 may be an embedded microphone integrated in the headset through an audio line.

In some embodiments, the master device 61 may be a computer. In other embodiments, the main device 61 may be a mobile phone. The master device 61 communicates with the audio playback device 62 by wire or wireless.

The functions of the processor 620, the storage 621, the microphone 622, the audio playback device 62 and the multimedia signal source 610 may refer to corresponding elements shown in fig. 5, which will not be discussed in detail herein.

A person skilled in the art will appreciate that the steps of all or part of the various methods of the embodiments described above may be performed by means of software or hardware. The software may include Visual Studio 2010 or beyond, audio, debugging using Visual Studio, stand-alone sample recording, and calibration/recording setup using sound cards. The hardware responsive to some computer programs may include a Firefoce UC sound card, a microphone input port, a headset output port, a reference headset with an embedded microphone, a reference microphone for acoustic calibration, an external loudspeaker for real-world noise environment simulation, and so forth. The computer program may be stored in a computer readable storage medium. The storage medium may be an optical disc, a magnetic disc, a Read Only Memory (ROM), a Random Access Memory (RAM), or the like.

Note that the reference microphone is not part of the user terminal product, but is a means for calibrating the audio playback means of the headset and a microphone (such as an in-line microphone) for detecting noise and for calibrating the acoustic transfer function between the audio playback means and the microphone for detecting noise.

Note that in order to accurately estimate the audio signal and the noise signal, the acoustic calibration should be applied to the hardware involved in the method for adjusting the candidate audio signal before starting the method. Specifically, the audio playback device, the microphone, and the noise path between the microphone and the audio playback device will be acoustically calibrated. In addition, the degree of crosstalk between the audio playback device and the microphone signal should be measured to assess the effect of the degree of crosstalk on the noise estimation, since the time constants for smoothing music and noise are different, and microphone sensitivity calibration may amplify the degree of crosstalk.

In summary, the present invention has the following advantages.

In case a noise signal is present in the environment, a first plurality of difference values of the loudness between the candidate audio signal and the current noise signal is calculated and modified with a plurality of target factors, and subsequently the candidate audio signal is modified with the modified first plurality of difference values to obtain the target audio signal, such that a loss of perceived loudness of the audio signal due to competing sounds of the noise signal may be compensated. A plurality of target factors is obtained based on a plurality of test noise signals and a corresponding plurality of test audio signals covering a wide frequency band, such that the plurality of target factors is adapted to almost all noise signals in the environment.

In addition, a plurality of target factors are obtained for each bark in the bark domain, so that the first plurality of difference values can be modified at each bark and the candidate audio signals can be accurately adjusted to avoid overcompensation as much as possible.

while various aspects and embodiments have been disclosed herein, other aspects and embodiments will be apparent to those skilled in the art. The various aspects and embodiments disclosed herein are for purposes of illustration and are not intended to be limiting, with the true scope and spirit being indicated by the following claims.

22页详细技术资料下载
上一篇:一种医用注射器针头装配设备
下一篇:弹性波元件及其制造方法

网友询问留言

已有0条留言

还没有人留言评论。精彩留言会获得点赞!

精彩留言,会给你点赞!