Method and device for post-processing of two-channel audio

文档序号：73357 发布日期：2021-10-01 浏览：43次中文

阅读说明：本技术 双声道音频后处理的方法和装置 (Method and device for post-processing of two-channel audio ) 是由范欣悦初钧晗吴珈旻张晨郑羲光于 2021-07-06 设计创作，主要内容包括：公开了一种双声道音频后处理的方法和装置。所述双声道音频后处理的方法包括：将左声道分成左声道低频部分和左声道中高频部分,将右声道分成右声道低频部分和右声道中高频部分；分别对左声道中高频部分和右声道中高频部分进行声场拓宽以获得拓宽后的左声道中高频部分和拓宽后的右声道中高频部分；以及将拓宽后的左声道中高频部分与所述左声道低频部分进行叠加,并且将拓宽后的右声道中高频部分与所述右声道低频部分进行叠加以得到拓宽后的双声道音频。(A method and apparatus for binaural audio post-processing are disclosed. The method for the two-channel audio post-processing comprises the following steps: dividing a left sound channel into a left sound channel low-frequency part and a left sound channel medium-high frequency part, and dividing a right sound channel into a right sound channel low-frequency part and a right sound channel medium-high frequency part; respectively carrying out sound field broadening on the middle-high frequency part of the left sound channel and the middle-high frequency part of the right sound channel so as to obtain the broadened middle-high frequency part of the left sound channel and the broadened middle-high frequency part of the right sound channel; and superposing the widened middle-high frequency part of the left sound channel and the widened low frequency part of the left sound channel, and superposing the widened middle-high frequency part of the right sound channel and the widened low frequency part of the right sound channel to obtain the widened two-channel audio.)

1. A method of binaural audio post-processing, the method comprising:

dividing a left sound channel into a left sound channel low-frequency part and a left sound channel medium-high frequency part, and dividing a right sound channel into a right sound channel low-frequency part and a right sound channel medium-high frequency part;

respectively carrying out sound field broadening on the middle-high frequency part of the left sound channel and the middle-high frequency part of the right sound channel so as to obtain a broadened middle-high frequency part of the left sound channel and a broadened middle-high frequency part of the right sound channel; and

and superposing the widened middle-high frequency part of the left sound channel and the widened low frequency part of the left sound channel, and superposing the widened middle-high frequency part of the right sound channel and the widened low frequency part of the right sound channel to obtain the widened two-channel audio.

2. The method of claim 1, wherein the step of separately field-widening the left channel mid-high frequency portion and the right channel mid-high frequency portion to obtain a widened left channel mid-high frequency portion and a widened right channel mid-high frequency portion comprises:

applying different filtering frequency response curves to the high-frequency part in the left channel and the high-frequency part in the right channel for equalization so as to increase the difference between the equalized high-frequency part in the left channel and the equalized high-frequency part in the right channel; and

and performing gain compensation on the middle-high frequency part of the equalized left channel and the middle-high frequency part of the equalized right channel to obtain the widened middle-high frequency part of the left channel and the widened middle-high frequency part of the right channel, and enabling the volume of the middle-high frequency part of the widened left channel to be the same as that of the middle-high frequency part of the left channel, and the volume of the middle-high frequency part of the widened right channel to be the same as that of the middle-high frequency part of the right channel.

3. The method of claim 1, further comprising: and carrying out secondary widening processing on the widened two-channel audio.

4. The method of claim 3, wherein the step of twice-widening the widened binaural audio comprises:

respectively carrying out short-time Fourier transform on the left channel and the right channel of the widened two-channel audio;

calculating an amplitude and a phase corresponding to each frequency of each frame based on a frequency domain obtained by short-time Fourier transform;

calculating an azimuth angle based on the magnitude;

adjusting the azimuth angle to widen the adjusted azimuth angle;

calculating an adjusted amplitude based on the adjusted azimuth; and

and performing short-time inverse Fourier transform based on the adjusted amplitude and the corresponding phase to obtain the secondary widened two-channel audio.

5. The method of claim 1, wherein the step of dividing the left channel into a left channel low frequency portion and a left channel mid-high frequency portion and the step of dividing the right channel into a right channel low frequency portion and a right channel mid-high frequency portion comprises: and passing the left channel through a cross filter to obtain the left channel low frequency part and the left channel medium-high frequency part, and passing the right channel through the cross filter to obtain the right channel low frequency part and the right channel medium-high frequency part.

6. The method of claim 1, wherein the division points of the left channel low frequency portion and the left channel mid-high frequency portion and the right channel low frequency portion and the right channel mid-high frequency portion are 200 Hz.

7. An apparatus for binaural audio post-processing, the apparatus comprising:

a frequency dividing unit configured to divide a left channel into a left channel low frequency part and a left channel middle high frequency part, and to divide a right channel into a right channel low frequency part and a right channel middle high frequency part;

a middle-high frequency sound field widening unit configured to perform sound field widening on the left channel middle-high frequency portion and the right channel middle-high frequency portion respectively to obtain a widened left channel middle-high frequency portion and a widened right channel middle-high frequency portion; and

a superimposing unit configured to superimpose the widened left channel middle-high frequency portion and the left channel low frequency portion, and superimpose the widened right channel middle-high frequency portion and the right channel low frequency portion to obtain a widened two-channel audio.

8. An electronic device, characterized in that the electronic device comprises:

at least one processor;

at least one memory storing computer-executable instructions,

wherein the computer-executable instructions, when executed by the at least one processor, cause the at least one processor to perform a method of binaural audio post-processing as claimed in any of claims 1-6.

9. A computer-readable storage medium, wherein instructions stored in the computer-readable storage medium, when executed by at least one processor, cause the at least one processor to perform a method of binaural audio post-processing as claimed in any of claims 1-6.

10. A computer program product comprising computer instructions, characterized in that the computer instructions, when executed by at least one processor, implement the method of binaural audio post-processing according to any of claims 1-6.

Technical Field

The present disclosure relates to the field of audio technologies, and in particular, to a method and an apparatus for dual-channel audio post-processing.

Background

For most streaming media platforms, only two channels can be directly acquired, so how to create a stereo surround sound feeling, expand a sound field, improve a listening feeling, and attract the attention of a user becomes a problem concerned in the current audio technology field.

In the prior art, the sound is usually converted from stereo to a two-Middle (MS) format, i.e. the sound is converted into a Middle sound and two-side sounds. The stereo width can be controlled by adjusting the ratio of the mid signal M and the two side signals S.

M＝(L+R)/2 (1)

S＝(L-R)/2 (2)

However, this approach does not achieve significant field broadening for audio with similar left and right channels.

Disclosure of Invention

The present disclosure is directed to a method and an apparatus for binaural audio post-processing, so as to achieve an effect of obviously expanding a sound field by performing different processing on left and right channels.

According to a first aspect of embodiments of the present disclosure, there is provided a method of binaural audio post-processing, the method comprising: dividing a left sound channel into a left sound channel low-frequency part and a left sound channel medium-high frequency part, and dividing a right sound channel into a right sound channel low-frequency part and a right sound channel medium-high frequency part; respectively carrying out sound field broadening on the middle-high frequency part of the left sound channel and the middle-high frequency part of the right sound channel so as to obtain the broadened middle-high frequency part of the left sound channel and the broadened middle-high frequency part of the right sound channel; and superposing the widened middle-high frequency part of the left sound channel and the widened low frequency part of the left sound channel, and superposing the widened middle-high frequency part of the right sound channel and the widened low frequency part of the right sound channel to obtain the widened two-channel audio.

Optionally, the step of respectively widening the sound field of the left channel middle and high frequency part and the right channel middle and high frequency part to obtain a widened left channel middle and high frequency part and a widened right channel middle and high frequency part may include: applying different filtering frequency response curves to the high-frequency part in the left channel and the high-frequency part in the right channel for equalization so as to increase the difference between the high-frequency part in the equalized left channel and the high-frequency part in the equalized right channel; and performing gain compensation on the equalized left channel middle-high frequency part and the equalized right channel middle-high frequency part to obtain an expanded left channel middle-high frequency part and an expanded right channel middle-high frequency part, wherein the expanded left channel middle-high frequency part has the same volume as the left channel middle-high frequency part, and the expanded right channel middle-high frequency part has the same volume as the right channel middle-high frequency part.

Optionally, the method may further include: and carrying out secondary widening processing on the widened two-channel audio.

Optionally, the step of performing a secondary widening process on the widened binaural audio may include: respectively carrying out short-time Fourier transform on the left channel and the right channel of the widened two-channel audio; calculating an amplitude and a phase corresponding to each frequency of each frame based on a frequency domain obtained by short-time Fourier transform; calculating an azimuth angle based on the amplitude; adjusting the azimuth angle to widen the adjusted azimuth angle; calculating an adjusted amplitude based on the adjusted azimuth; and performing a short-time inverse fourier transform based on the adjusted amplitudes and corresponding phases to obtain a twice-broadened binaural audio.

Optionally, the step of dividing the left channel into a left channel low frequency part and a left channel middle high frequency part, and the step of dividing the right channel into a right channel low frequency part and a right channel middle high frequency part may include: the left channel is passed through a cross filter to obtain a left channel low frequency portion and a left channel mid-high frequency portion, and the right channel is passed through a cross filter to obtain a right channel low frequency portion and a right channel mid-high frequency portion.

Alternatively, the division points of the left channel low frequency part and the high frequency part in the left channel and the division points of the right channel low frequency part and the high frequency part in the right channel may be 200 Hz.

Alternatively, the filter response curve may be generated using a comb filter.

Optionally, the step of adjusting the azimuth angle to widen the adjusted azimuth angle may include: setting the adjusted azimuth angle as p times of the original azimuth angle, and setting the adjusted azimuth angle as 90 degrees when the original azimuth angle is larger than 90/p degrees, wherein p is a real number larger than 1.

Optionally, the step of performing a secondary widening process on the widened binaural audio may further include: the low frequency part of the twice-broadened binaural audio is enhanced.

According to a second aspect of embodiments of the present disclosure, there is provided an apparatus for binaural audio post-processing, the apparatus comprising: a frequency dividing unit configured to divide a left channel into a left channel low frequency part and a left channel middle high frequency part, and to divide a right channel into a right channel low frequency part and a right channel middle high frequency part; a middle-high frequency sound field widening unit configured to perform sound field widening on the left channel middle-high frequency portion and the right channel middle-high frequency portion to obtain a widened left channel middle-high frequency portion and a widened right channel middle-high frequency portion, respectively; and the superposition unit is configured to superpose the widened middle-high frequency part of the left channel and the widened low frequency part of the left channel, and superpose the widened middle-high frequency part of the right channel and the widened low frequency part of the right channel to obtain the widened two-channel audio.

Alternatively, the medium-high frequency sound field widening unit may be configured to perform the steps of: applying different filtering frequency response curves to the high-frequency part in the left channel and the high-frequency part in the right channel for equalization so as to increase the difference between the equalized high-frequency part in the left channel and the equalized high-frequency part in the right channel; and performing gain compensation on the equalized left channel middle-high frequency part and the equalized right channel middle-high frequency part to obtain an expanded left channel middle-high frequency part and an expanded right channel middle-high frequency part, wherein the volume of the expanded left channel middle-high frequency part is the same as that of the left channel middle-high frequency part, and the volume of the expanded right channel middle-high frequency part is the same as that of the right channel middle-high frequency part.

Optionally, the apparatus may further include: a secondary widening unit configured to secondarily widen the widened binaural audio.

Optionally, the second widening unit may be configured to perform the following steps: respectively carrying out short-time Fourier transform on the left channel and the right channel of the widened two-channel audio; calculating an amplitude and a phase corresponding to each frequency of each frame based on a frequency domain obtained by short-time Fourier transform; calculating an azimuth based on the amplitude; adjusting the azimuth angle to widen the adjusted azimuth angle; calculating an adjusted amplitude based on the adjusted azimuth; and performing short-time inverse fourier transform based on the adjusted amplitude and the corresponding phase to obtain a twice-broadened binaural audio.

Alternatively, the frequency dividing unit may be configured to pass the left channel through a cross filter to obtain a left channel low frequency portion and a left channel mid-high frequency portion, and to pass the right channel through a cross filter to obtain a right channel low frequency portion and a right channel mid-high frequency portion.

Alternatively, the medium-high frequency sound field widening unit may be configured to generate the filtered frequency response curve using a comb filter.

Alternatively, the quadratic widening unit may be configured to set the adjusted azimuth angle to p times the original azimuth angle, and to set the adjusted azimuth angle to 90 degrees when the original azimuth angle is greater than 90/p degrees, where p is a real number greater than 1.

Optionally, the second widening unit may be further configured to enhance a low frequency part of the second widened binaural audio.

According to a third aspect of embodiments of the present disclosure, there is provided an electronic apparatus, characterized by comprising: at least one processor; at least one memory storing computer-executable instructions, wherein the computer-executable instructions, when executed by the at least one processor, cause the at least one processor to perform a method of binaural audio post-processing according to the present disclosure.

According to a fourth aspect of embodiments of the present disclosure, there is provided a computer-readable storage medium characterized in that instructions stored in the computer-readable storage medium, when executed by at least one processor, cause the at least one processor to perform a method of binaural audio post-processing according to the present disclosure.

According to a fifth aspect of embodiments of the present disclosure, there is provided a computer program product comprising computer instructions, characterized in that the computer instructions, when executed by at least one processor, implement a method of binaural audio post-processing according to the present disclosure.

The technical scheme provided by the embodiment of the disclosure at least brings the following beneficial effects:

the left and right sound channels in the double sound channels are separated into low frequency and middle and high frequency by adopting a cross filter, and the sound field widening effect is achieved by only carrying out different filtering on the middle and high frequency of the left and right sound channels. Because the low frequency is generally in the middle, the sound field of the low frequency is not changed as much as possible. In addition, the sound field is processed twice in the frequency domain obtained by short-time fourier transform, azimuth is calculated, and sound source localization is changed, thereby creating the effect of surround sound. Therefore, the sound field of the two-channel audio is widened, the effect of simulating surround sound is achieved, the present sense of sound and the immersion sense of audiences are improved, the sound field is narrow, and the audio beautification is realized.

It is to be understood that both the foregoing general description and the following detailed description are exemplary and explanatory only and are not restrictive of the disclosure.

Drawings

The accompanying drawings, which are incorporated in and constitute a part of this specification, illustrate embodiments consistent with the disclosure and, together with the description, serve to explain the principles of the disclosure and are not to be construed as unduly limiting the disclosure.

Fig. 1 is an application scenario diagram illustrating a method and apparatus for binaural audio post-processing according to the present disclosure;

FIG. 2 is a flow diagram illustrating a method of binaural audio post-processing according to an exemplary embodiment;

FIG. 3 is an example illustrating a crossover filter according to an example embodiment;

FIG. 4 is a flow diagram illustrating sound field widening of the mid-high frequency portion according to an exemplary embodiment;

FIG. 5 is an example illustrating different filter response curves applied to a high frequency portion in a left channel and a high frequency portion in a right channel in accordance with an illustrative embodiment;

FIG. 6 is a circuit diagram illustrating a comb filter according to an exemplary embodiment;

FIG. 7 is a schematic diagram illustrating the frequency response of a comb filter in accordance with an exemplary embodiment;

FIG. 8 is a flowchart illustrating a process for quadratic widening of widened binaural audio according to an exemplary embodiment;

FIG. 9 is an example illustrating an overall flow of a widening process for binaural audio according to an exemplary embodiment;

FIG. 10 is a block diagram illustrating an apparatus for two-channel audio post-processing according to an example embodiment; and

fig. 11 is a block diagram illustrating an electronic device according to an embodiment of the present disclosure.

Detailed Description

In order to make the technical solutions of the present disclosure better understood by those of ordinary skill in the art, the technical solutions in the embodiments of the present disclosure will be clearly and completely described below with reference to the accompanying drawings.

It should be noted that the terms "first," "second," and the like in the description and claims of the present disclosure and in the above-described drawings are used for distinguishing between similar elements and not necessarily for describing a particular sequential or chronological order. It is to be understood that the data so used is interchangeable under appropriate circumstances such that the embodiments of the disclosure described herein are capable of operation in sequences other than those illustrated or otherwise described herein. The implementations described in the following exemplary examples do not represent all implementations consistent with the present disclosure. Rather, they are merely examples of apparatus and methods consistent with certain aspects of the present disclosure, as detailed in the appended claims.

In view of the above-mentioned problems in the background art, for example, in the case where a significant sound field widening cannot be achieved for audio similar to the left and right channels, the present disclosure provides a method of binaural audio post-processing, and more specifically, a cross filter is used to separate the left and right channels of the binaural into low and medium high frequencies, and different filtering is performed only for the medium and high frequencies of the left and right channels to achieve the sound field widening effect. In addition, the sound field is processed twice in the frequency domain obtained by short-time fourier transform, azimuth is calculated, and sound source localization is changed, thereby creating the effect of surround sound.

Fig. 1 is an application scenario diagram illustrating a method and apparatus for binaural audio post-processing according to the present disclosure.

The audio source 101 may upload its audio assets to the server 102 over the network. Here, the audio source 101 may be a distribution end or a transmission end of an audio resource, for example, a personal blogger, a news website, a literature website, a user terminal of an UP master, and the like. The server 102 may be a recipient that obtains an audio asset, such as an audio podcast platform. The user terminal (e.g., the mobile phone 103_1, the pen-book computer 103_2, the tablet computer 103_3, etc.) can obtain the audio resource from the server 102 through the application client via the network. Due to the objective condition limitation, the audio resources uploaded from the audio source 101 usually have problems of narrow sound field, audio damage, etc. And the server 102 also encounters the same problems of sound field narrowing, audio impairment, etc. when the user listens to the audio asset through the application client of the user terminal 103_1, 103_2, or 103_3 while transmitting the audio asset requested by the user to the user terminal 103_1, 103_2, or 103_3 along with the corresponding audio. Applying the method and apparatus for binaural audio post-processing according to the present disclosure to this scenario, for example, the method and apparatus for binaural audio post-processing according to the present disclosure may be applied to at least one of the audio source 101, the server 102 and the user terminal, so that even if the audio source 101 uploads an audio resource to the server 102, which has problems of narrow sound field, damaged audio, and the like, the user may obtain an audio resource with sound field widening effect and/or audio beautifying effect.

Fig. 2 is a flowchart illustrating a method of binaural audio post-processing according to an exemplary embodiment.

Referring to fig. 2, in step S100, a left channel is divided into a left channel low frequency part and a left channel medium high frequency part, and a right channel is divided into a right channel low frequency part and a right channel medium high frequency part. According to the exemplary embodiments of the present disclosure, the low frequency part is not subjected to the sound field widening process in consideration that the directivity of the low frequency part is not strong in the case of the binaural audio, for example, the bass drum and the bass and the human voice are generally in the middle of the sound field. In this case, the original audio of the binaural can be divided into a low-frequency part where the directivity is weak and a middle-high frequency part where the directivity is relatively strong.

According to an exemplary embodiment of the present disclosure, a left channel may be passed through a cross filter to obtain a left channel low frequency portion and a left channel mid-high frequency portion, and a right channel may be passed through a cross filter to obtain a right channel low frequency portion and a right channel mid-high frequency portion. In an example embodiment, the division points of the left channel low frequency part and the high frequency part in the left channel and the division points of the right channel low frequency part and the high frequency part in the right channel may be set to 200 Hz. For convenience of description, the cross filter is used to perform frequency division and the frequency division point is set to 200Hz in the specification. However, the frequency dividing method and the frequency dividing point can be set according to the requirement of the audio resource. An example of a crossover filter will be further described with reference to fig. 3.

In step S200, the left channel mid-high frequency portion and the right channel mid-high frequency portion are sound field-broadened to obtain a broadened left channel mid-high frequency portion and a broadened right channel mid-high frequency portion, respectively. Step S200 will be described in detail below with reference to fig. 4 and 5.

In step S300, the widened middle and high frequency portions and the corresponding low frequency portions are superimposed to obtain a widened binaural audio. For example, the widened left channel mid-high frequency portion and the left channel low frequency portion may be superimposed to obtain a widened left channel audio; the widened high frequency portion of the right channel and the low frequency portion of the right channel may be superimposed to obtain the widened right channel audio.

Fig. 3 is an example illustrating a crossover filter according to an example embodiment.

According to an exemplary embodiment of the present disclosure, an original signal (e.g., one of a left channel original signal or a right channel original signal) may be passed through a cross Filter (cross Filter), thereby dividing an audio signal into a low frequency part and a middle and high frequency part. The cross filter is a multi-section filter whose overall amplitude frequency is correspondingly flat. Referring to fig. 3, a low frequency portion is shown by a dotted line, and a middle and high frequency portion is shown by a dotted line. For convenience of description, 200Hz is selected as a frequency dividing point (an intersection of a curve of a low frequency part and a curve of a middle and high frequency part) in fig. 3, by which a signal is divided into two frequency bands of the low frequency part and the middle and high frequency part. The low frequency part is not subjected to the sound field widening process, and the medium and high frequency parts will be subjected to the sound field widening in step S200 shown in fig. 2.

Fig. 4 is a flowchart illustrating sound field widening for mid-high frequency portions according to an exemplary embodiment. Fig. 5 is an example illustrating different filtering frequency response curves applied to a high frequency portion in a left channel and a high frequency portion in a right channel according to an exemplary embodiment. Fig. 6 is a circuit diagram illustrating a comb filter according to an exemplary embodiment. Fig. 7 is a frequency response diagram illustrating a comb filter according to an example embodiment.

Referring to fig. 2 and 4, step S200 may include step S210 and step S220.

In step S210, different filtered reverberation curves are applied to the high-frequency part in the left channel and the high-frequency part in the right channel to perform equalization so that a difference between the equalized high-frequency part in the left channel and the equalized high-frequency part in the right channel becomes large.

According to an exemplary embodiment of the present disclosure, referring to fig. 5, different filtering frequency response curves are applied to the high frequency part in the left channel and the high frequency part in the right channel for equalization so that the left-right ear difference becomes large. For example, the left channel attenuates frequencies near 2000Hz, while the right channel gains the same frequencies, thereby reducing the similarity of sound waves received by both ears. The lower the similarity, the wider the sound field is perceived by the human ear.

According to an exemplary embodiment of the present disclosure, the filtering response curve may be generated using a comb filter.

According to an exemplary embodiment of the present disclosure, referring to fig. 6, a comb filter is implemented by adding a delayed version of a signal to itself, a filter implemented by constructive and destructive interference. The frequency response of the comb filter is made up of a series of regularly spaced notches, giving the appearance of a comb.

The comb filter formula is as follows:

y[n]＝x[n]+αx[n-K] (3)，

where x n is the input audio signal, K is the number of delayed samples, and α is the scaling factor applied to the delayed signal. The magnitude of alpha can be controlled to also adjust the magnitude of the filtering.

Referring to fig. 7, it may be considered to process the left and right channels using a simple comb filter. For example, comb filters staggered by a predetermined frequency are designed for the left and right channels. For convenience of illustration, an example in which K is 19 and α is 1 is shown in fig. 7. For example, in the case of applying the comb filter shown in fig. 7, a comb filter staggered by 0.05 radians/sample normalized frequency may be designed for the left and right channels.

Referring back to fig. 4, in step S220, gain compensation is performed on the equalized left channel middle-high frequency portion and the equalized right channel middle-high frequency portion to obtain the widened left channel middle-high frequency portion and the widened right channel middle-high frequency portion, so that the volume of the widened left channel middle-high frequency portion is the same as the volume of the left channel middle-high frequency portion, and the volume of the widened right channel middle-high frequency portion is the same as the volume of the right channel middle-high frequency portion. That is, the volume of the widened left channel and the widened right channel remains unchanged.

Fig. 8 is a flowchart illustrating a second widening process for widened binaural audio according to an exemplary embodiment.

Referring to fig. 8, the method further includes a step S400 of secondarily widening the widened binaural audio in the step S400.

In step S410, short-time fourier transform is performed on the left channel and the right channel of the widened binaural audio respectively to convert the binaural audio to the frequency domain.

In step S420, the amplitude and phase corresponding to each frequency of each frame are calculated based on the frequency domain obtained by the short-time fourier transform.

In step S430, an azimuth is calculated based on the magnitude.

In step S440, the azimuth angle is adjusted to widen the adjusted azimuth angle.

In step S450, an adjusted amplitude is calculated based on the adjusted azimuth angle.

In step S460, a short-time inverse fourier transform is performed based on the adjusted amplitude and the corresponding phase to obtain a two-channel audio that is widened twice to achieve the surround sound effect.

Although not shown, the step of secondarily widening the widened binaural audio may also include the step of enhancing a low frequency portion of the secondarily widened binaural audio to enhance the drum and base sum and a portion of the vocal fundamental frequencies. For example, the Equalizer (EQ) may be used to boost the overall low frequency portion (e.g., by 3dB) by some amount (e.g., around 100 Hz).

Fig. 9 is an example illustrating an overall flow of widening processing for binaural audio according to an exemplary embodiment.

Referring to fig. 2 and 9, in step S100, the input signals are a time domain signal l (t) of a left channel and a time domain signal r (t) of a right channel. And frequency division is carried out through a cross filter, so that a left channel low-frequency part and a left channel medium-high frequency part, and a right channel low-frequency part and a right channel medium-high frequency part are obtained. For example,

L_low(t)，L_high(t)＝crossfilter(L(t)) (4)，

R_low(t)，R_high(t)＝crossfilter(R(t)) (5)，

wherein L is_low(t)，L_high(t) represents the low frequency part of the left channel and the medium-high frequency part of the left channel, R_low(t)，R_high(t) represents a low frequency portion of the right channel and a medium high frequency portion of the right channel, respectively.

In step S200, different equalization processes are performed for the left and right channel high frequencies:

L_{high_eq}(t)＝EQ₁(L_high(t)) (6)，

R_{high_eq}(t)＝EQ₂(R_high(t)) (7)，

wherein L is_{high_eq}(t) and R_{high_eq}(t) denotes the equalized left channel high frequency portion and the equalized right channel high frequency portion, respectively. Although not shown, step S200 may further include a step of gain-compensating the equalized left channel high frequency part and the equalized right channel high frequency part so that their volumes are kept constant.

In step S300, the equalized left channel high frequency portion L_{high_eq}(t) low frequency L with unprocessed left channel_low(t) superimposing and dividing the equalized right channel high frequency portion R_{high_eq}(t) and the unprocessed right channel low frequency R_low(t) superimposing to obtain a widened binaural audio (L)₁(t) and R₁(t))：

L₁(t)＝L_{high_eq}(t)+L_low(t) (8)，

R₁(t)＝R_{high_eq}(t)+R_low(t) (9)。

In step S400, the widened two-channel audio (L) is processed₁(t) and R₁(t)) performing a second widening process.

Referring to fig. 6 and 9, in step S410, Short Time Fourier Transform (STFT) is performed on each of the left and right channels of the widened binaural audio:

L₁(n，k)＝STFT(L₁(t)) (10)，

R₁(n，k)＝STFT(R₁(t)) (11)，

wherein N represents the nth frame signal, N can be used for representing a frame sequence, N is more than 0 and less than or equal to N, and N is the total number of frames; k is a central frequency sequence with the value of 0-K and is not more than K; (K is the number of total frequency points). Obtaining the frequency domain signal L of the left channel by short-time Fourier transform₁(n, k) and frequency domain signal R of the right channel₁(n，k)。

In step S420, the amplitude and phase corresponding to each frequency of each frame are calculated based on the frequency domain obtained by the short-time fourier transform:

L_Mag(n，k)＝abs(L₁(n，k)) (12)，

R_Mag(n，k)＝abs(R₁(n，k)) (13)，

L_Phase(n，k)＝angle(L₁(n，k)) (14)，

R_Phase(n，k)＝angle(R₁(n，k)) (15)，

wherein L is_Mag(n, k) and R_Mag(n, k) denotes the amplitude of the left channel and the amplitude of the right channel, L_Phase(n, k) and R_Phase(n, k) respectively represent the phase of the left channel and the phase of the right channel. The function abs represents a function for calculating the amplitude and the function angle represents a function for calculating the phase. The function abs and the function angle may have a form commonly used by those skilled in the art, and are not particularly limited herein. Examples of the present disclosure are not limited to a specific functional form as long as the amplitudes and phases of the left and right channels can be obtained. As an example, the function angle may have the form of arctan. For example,

L_Phase(n，k)＝arctan(Imag(L₁(n，k))/Real(L₁(n，k))) (16)，

R_Phase(n，k)＝arctan(Imag(R₁(n，k))/Real(R₁(n，k))) (17)，

wherein, Imag (L)₁(n, k)) and Imag (R)₁(n, k)) represent the imaginary parts of the left and right channels, respectively, Real (L)₁(n, k)) and Real (R)₁(n, k)) represent the real parts of the left and right channels, respectively.

In step S430, an azimuth is calculated based on the magnitude. For example,

θ(n，k)＝Azimuth(L_Mag(n，k)，R_Mag(n，k)) (18)。

the Azimuth function Azimuth may have a form commonly used by those skilled in the art, and is not particularly limited herein. Examples of the present disclosure are not limited to a specific functional form as long as an azimuth can be obtained. As an example, the Azimuth function Azimuth may have the form arctan. For example,

wherein the content of the first and second substances,may be at a 30 degree angle.

In step S440, the azimuth angle is adjusted to widen the adjusted azimuth angle. For example,

θ_out(n，k)＝f(θ(n，k)) (20)。

according to an exemplary embodiment of the present disclosure, the azimuth adjustment function f may have various forms. In an example, the azimuth adjustment function f may be configured to set the adjusted azimuth to p times the original azimuth, and to set the adjusted azimuth to 90 degrees when the original azimuth is greater than 90/p degrees, where p is a real number greater than 1. For example, p may be 1.5, and when the original azimuth is 30 degrees, the adjusted azimuth is 45 degrees. For another example, if the original azimuth is 61 degrees (i.e., greater than 90/1.5), the product of p and p will exceed 90 degrees, and the adjusted azimuth will be 90 degrees.

In step S450, an adjusted amplitude is calculated based on the adjusted azimuth angle. For example,

L_{Mag_out}(n，k)，R_{Mag_out}(n，k)＝

IAzimuth(θ_out(n，k)，L_Mag(n，k)，R_Mag(n，k)) (21)，

wherein L is_{Mag_out}(n, k) and R_{Mag_out}And (n, k) respectively represent the amplitude of the adjusted left channel and the amplitude of the adjusted right channel.

According to an exemplary embodiment of the present disclosure, the azimuth calculation function IAzimuth may have various forms. In an example, the azimuth calculation function IAzimuth may be represented as

Wherein the content of the first and second substances,

in step S460, a short-time inverse fourier transform is performed based on the adjusted amplitude and the corresponding phase to obtain a twice-broadened binaural audio. For example,

L_out(n，k)＝ISTFT(L_{Mag_out}(n，k)，L_Phase(n，k)) (24)，

R_out(n，k)＝ISTFT(R_{Mag_out}(n，k)，R_Phase(n，k)) (25)。

combining the left channel phase and the right channel phase which are stored before, performing ISTFT (short-time Fourier inverse transformation) to obtain a time domain result.

Although not shown, the step of performing the second widening process on the widened binaural audio may also perform volume boosting and low-frequency enhancement on the whole analog surround sound effect, so as to obtain a more shocking effect. For example, the overall low frequency portion (e.g., around 100 Hz) may be boosted by 3dB with an Equalizer (EQ).

According to the method for the post-processing of the dual-channel audio, the left and right channels in the dual-channel are separated into the low frequency and the medium and high frequency by adopting the cross filter, and different filtering frequency response curves are only applied to the medium and high frequencies of the left and right channels to achieve the effect of widening the sound field. Therefore, a significant sound field widening effect can be achieved even for audio of which the left and right channels are similar. In addition, the sound field is processed for the second time in the frequency domain obtained by the short-time Fourier transform, the azimuth angle is calculated, and the sound source location is changed, thereby creating the effect of the surround sound. In this case, the sound field can be further widened without affecting the original azimuth angle, and the surround sound effect can be achieved. The azimuth angle and the amplitude are adjusted in the secondary widening processing.

Fig. 10 is a block diagram illustrating an apparatus 10 for binaural audio post-processing according to an exemplary embodiment.

As an example, the methods shown in fig. 2 and fig. 4 and 5 may be performed by the apparatus 10 shown in fig. 10.

As shown in fig. 10, the apparatus 10 may be an apparatus for two-channel audio post-processing.

The apparatus 10 comprises: a frequency dividing unit 110, a medium-high frequency sound field widening unit 120, and a superimposing unit 130.

The frequency dividing unit 110 is configured to divide the left channel and the right channel into a low frequency part and a middle and high frequency part, respectively, wherein the middle and high frequency part includes a left channel middle and high frequency part and a right channel middle and high frequency part. The frequency-dividing unit 110 may be configured to perform the method described with reference to step S100 in fig. 2.

The middle-high frequency sound field widening unit 120 is configured to perform sound field widening on the left-channel middle-high frequency portion and the right-channel middle-high frequency portion, respectively, to obtain widened middle-high frequency portions. The middle and high frequency sound field widening unit 120 may be configured to perform the method described with reference to step S200 in fig. 2 and steps S210 and S220 in fig. 4.

The superimposing unit 130 is configured to superimpose the widened middle and high frequency part with the corresponding low frequency part to obtain the widened binaural audio. The superimposing unit 130 may be configured to perform the method described with reference to step S300 in fig. 2.

According to an exemplary embodiment of the present disclosure, the apparatus 10 may further include a secondary widening unit 140. The second-time widening unit 140 is configured to perform second-time widening processing on the widened two-channel audio. The secondary widening unit 140 may be configured to perform the method described with reference to step S400 in fig. 6.

With regard to the apparatus in the above-described embodiment, the specific manner in which each module/unit performs the operation has been described in detail in the embodiment related to the method, and will not be elaborated herein.

Fig. 11 is a block diagram illustrating an electronic device according to an embodiment of the present disclosure.

Referring to fig. 11, the electronic device 400 comprises at least one memory 401 and at least one processor 402, the at least one memory 401 storing computer-executable instructions that, when executed by the at least one processor 402, cause the at least one processor 402 to perform a method of binaural audio post-processing according to embodiments of the disclosure.

By way of example, the electronic device 400 may be a PC computer, tablet device, personal digital assistant, smart phone, or other device capable of executing the instructions described above. Here, the electronic device 400 need not be a single electronic device, but can be any collection of devices or circuits that can individually or jointly execute the above-described instructions (or sets of instructions). The electronic device 400 may also be part of an integrated control system or system manager, or may be configured as a portable electronic device that interfaces with local or remote (e.g., via wireless transmission).

In the electronic device 400, the processor 402 may include a Central Processing Unit (CPU), a Graphics Processing Unit (GPU), a programmable logic device, a special purpose processor system, a microcontroller, or a microprocessor. By way of example, and not limitation, processors may also include analog processors, digital processors, microprocessors, multi-core processors, processor arrays, network processors, and the like.

The processor 402 may execute instructions or code stored in the memory 401, wherein the memory 401 may also store data. The instructions and data may also be transmitted or received over a network via a network interface device, which may employ any known transmission protocol.

The memory 401 may be integrated with the processor 402, for example, by having RAM or flash memory disposed within an integrated circuit microprocessor or the like. Further, memory 401 may comprise a stand-alone device, such as an external disk drive, storage array, or any other storage device usable by a database system. The memory 401 and the processor 402 may be operatively coupled or may communicate with each other, such as through I/O ports, network connections, etc., so that the processor 402 can read files stored in the memory.

In addition, the electronic device 400 may also include a video display (such as a liquid crystal display) and a user interaction interface (such as a keyboard, mouse, touch input device, etc.). All components of electronic device 400 may be connected to each other via a bus and/or a network.

According to an embodiment of the present disclosure, there may also be provided a computer-readable storage medium, wherein instructions stored in the computer-readable storage medium, when executed by at least one processor, cause the at least one processor to perform a method of binaural audio post-processing according to an embodiment of the present disclosure. Examples of the computer-readable storage medium herein include: read-only memory (ROM), random-access programmable read-only memory (PROM), electrically erasable programmable read-only memory (EEPROM), random-access memory (RAM), dynamic random-access memory (DRAM), static random-access memory (SRAM), flash memory, non-volatile memory, CD-ROM, CD-R, CD + R, CD-RW, CD + RW, DVD-ROM, DVD-R, DVD + R, DVD-RW, DVD + RW, DVD-RAM, BD-ROM, BD-R, BD-R LTH, BD-RE, Blu-ray or compact disc memory, Hard Disk Drive (HDD), solid-state drive (SSD), card-type memory (such as a multimedia card, a Secure Digital (SD) card or a extreme digital (XD) card), magnetic tape, a floppy disk, a magneto-optical data storage device, an optical data storage device, a magnetic tape, a floppy disk, a magneto-optical data storage device, a magnetic tape, a magnetic data storage device, a magnetic tape, a magnetic data storage device, a magnetic tape, a magnetic data storage device, a magnetic tape, a magnetic data storage device, a magnetic tape, a magnetic, Hard disk, solid state disk, and any other device configured to store and provide a computer program and any associated data, data files, and data structures to a processor or computer in a non-transitory manner such that the processor or computer can execute the computer program. The computer program in the computer-readable storage medium described above may be run in an environment deployed in a computer apparatus, such as a client, a host, a proxy device, a server, or the like, and further in one example, the computer program and any associated data, data files, and data structures are distributed across networked computer systems such that the computer program and any associated data, data files, and data structures are stored, accessed, and executed in a distributed fashion by one or more processors or computers.

According to an embodiment of the present disclosure, there may also be provided a computer program product comprising computer instructions which, when executed by at least one processor, implement the method of dual-channel audio post-processing according to an embodiment of the present disclosure.

Other embodiments of the disclosure will be apparent to those skilled in the art from consideration of the specification and practice of the disclosure disclosed herein. This application is intended to cover any variations, uses, or adaptations of the disclosure following, in general, the principles of the disclosure and including such departures from the present disclosure as come within known or customary practice within the art to which the disclosure pertains. It is intended that the specification and examples be considered as exemplary only, with a true scope and spirit of the disclosure being indicated by the following claims.

It will be understood that the present disclosure is not limited to the precise arrangements described above and shown in the drawings and that various modifications and changes may be made without departing from the scope thereof. The scope of the present disclosure is limited only by the appended claims.

22页详细技术资料下载

上一篇：一种医用注射器针头装配设备

下一篇：音频处理装置和方法以及计算机可读存储介质

Method and device for post-processing of two-channel audio

相关技术

网友询问留言