Multi-channel audio coding

文档序号:914640 发布日期:2021-02-26 浏览:2次 中文

阅读说明:本技术 多声道音频编码 (Multi-channel audio coding ) 是由 扬·比特 埃伦尼·福托普楼 斯里坎斯·科塞 保洛维·马本 马库斯·马特拉斯 弗伦茨·罗伊特 于 2019-06-19 设计创作,主要内容包括:在多声道音频编码中,针对参数化音频编码器,通过计算用于频域中任意两个声道之间的ITD补偿的比较参数,可以实现改进的计算效率。这可以减轻对编码器参数估计的负面影响。(In multi-channel audio coding, improved computational efficiency can be achieved for parametric audio coders by calculating comparison parameters for ITD compensation between any two channels in the frequency domain. This may mitigate the negative impact on the encoder parameter estimation.)

1. A comparison device for a multi-channel audio signal, configured to:

deriving at least one inter channel time difference, ITD, parameter (ITD) of the audio signals of at least one pair of channels in an analysis window (w (τ)) for an ITD between the audio signals of the at least one pair of channelst),

Compensating the ITDs for the at least one pair of channels in the frequency domain by cyclic shifting using the at least one ITD parameter to generate at least one pair of ITD compensated frequency transforms (L)t,k,comp;Rt,k,comp),

Based on the at least one ITD parameter and the at least one pair of ITD compensated frequency variationsAlternatively, at least one comparison parameter is calculated

2. The comparison device of claim 1, further configured to: frequency transformation (L) of the audio signal using the at least one pair of channels in the analysis window (w (τ))t,k;Rt,k) To derive said at least one ITD parameter (ITD)t)。

3. The comparison device of claim 1 or 2, further configured to:

using an autocorrelation function (W) equal to or approximating said analysis windowX(n)=∑τw (τ) w (τ + n)) and the at least one ITD parameter.

4. The comparison device of claim 3, wherein:

the function is equal to or approximates a normalized version of the autocorrelation function of the analysis window

5. The comparison device of claim 4, further configured to:

the function is obtained by interpolating a normalized version of the autocorrelation function of the analysis window stored in a look-up table.

6. The comparison device of any one of claims 1 to 5, wherein:

the at least one comparison parameter comprises the at least one pair of ITD-compensated frequency transforms (L)t,k,comp;Rt,k,comp) At least one pair/side conversion (M)t,k;St,k) At least one side gain (g)t,b) Said at leastOne side gain is based on a mid-transform (M) of the at least one mid-to-side/side-transformst,k) Opposite side transformation (S)t,k) Predicted gain (S) oft,k=gt,bMt,kt,k)。

7. The comparison device of claim 6, wherein:

the at least one comparison parameter comprises a pass residual gain correction parameterCorrected at least one residual gain (r)t,b) Corresponding at least one corrected residual gain (r)t,b,corr) Said at least one residual gain (r)t,b) Is based on said medium transform (M)t,k) To said side transformation (S)t,k) Residual error (p) in the prediction of (1)t,k) As a function of the energy of (a) relative to the energy of said medium transformation

8. The comparison device of claim 7, further configured to:

using said at least one pair of ITD compensated frequency transforms (L)t,k,comp;Rt,k,comp) And the energy to calculate the at least one side gain and the at least one residual gain.

9. The comparison device of any of claims 7 to 8, further configured to:

by and calculating asSaid residual gain correction parameter ofCorresponding offset is used to correctThe at least one residual gain, wherein c is a scaling gain between the audio signals of the at least one pair of channels, andis a function approximating a normalized version of the autocorrelation function of the analysis window.

10. The comparison apparatus according to any one of claims 1 to 9, wherein:

the at least one comparison parameter comprises at least one inter-channel coherence ICC correction parameterAn estimate of ICC determined in the frequency domain (ICC) for correcting the at least one pair of audio signals based on the at least one ITD parameterb,t)。

11. The comparison device of any of claims 1 to 10, further configured to:

generating at least one downmix signal for the audio signals of the at least one pair of channels, wherein the at least one comparison parameter is calculatedFor restoring the audio signals of the at least one pair of channels from the at least one downmix signal.

12. The comparison device of any of claims 1 to 11, further configured to:

generating the at least one downmix signal based on the at least one pair of ITD compensated frequency transforms.

13. Multi-channel encoder comprising a comparison device according to claim 11 or 12, further configured to:

encoding the at least one downmix signal, the at least one ITD parameter and the at least one comparison parameter for transmission to a decoder.

14. A decoder for a multi-channel audio signal, configured to:

for at least one downmix signal, at least one inter-channel time difference, ITD, parameter and at least one comparison parameter received from an encoderThe decoding is carried out in such a way that,

restoring at least one pair of channels of audio signals from the at least one downmix signal by upmixing the at least one downmix signal using the at least one comparison parameter to generate at least one pair of decoded ITD-compensated frequency transforms

Generating at least one pair of ITD uncompensated decoded frequency transforms by uncompensating, in the frequency domain, the ITDs of the at least one pair of decoded ITD compensated frequency transforms for the at least one pair of channels by cyclic shifting using the at least one ITD parameter to reconstruct the ITDs of the audio signals of the at least one pair of channels in the time domain,

inverse frequency transforming the at least one pair of ITD uncompensated decoded frequency transforms to generate at least one pair of decoded audio signals of the at least one pair of channels.

15. A comparison method for a multi-channel audio signal, comprising:

deriving at least one inter channel time difference, ITD, parameter (ITD) of the audio signals of at least one pair of channels in an analysis window (w (τ)) for an ITD between the audio signals of the at least one pair of channelst),

Compensating the ITDs for the at least one pair of channels in the frequency domain by cyclic shifting using the at least one ITD parameter to generate at least one pair of ITD compensated frequency transforms (L)t,k,comp;Rt,k,comp),

Calculating at least one comparison parameter based on the at least one ITD parameter and the at least one pair of ITD compensated frequency transforms

Technical Field

The application relates to parametric multi-channel audio coding.

Background

The prior art methods for lossy parametric coding of stereo signals at low bit rates are based on parametric stereo as standardized in MPEG-4Part 3[1 ]. The general idea is to reduce the number of channels of a multi-channel system by calculating the downmix signal from two input channels after extracting the stereo/spatial parameters, which are sent as side information to the decoder. These stereo/spatial parameters may typically comprise inter-channel level differences ILD, inter-channel phase differences IPD and inter-channel coherence ICC, which may be calculated in subbands and capture spatial images to a certain extent.

However, this approach does not compensate or synthesize inter-channel time differences (ITDs) that are, for example, desired for downmixing or reproducing speech recorded using AB microphone settings or for synthesizing binaural rendered scenes. ITD synthesis has been addressed by Binaural Cue Coding (BCC) [2], which typically uses the parameters ILD and ICC while estimating the ITDs and performing channel alignment in the frequency domain.

Despite the existence of a time-domain ITD estimator, it is generally preferred that the ITD estimation applies a time-to-frequency transform, which allows spectral filtering of the cross-correlation function, and is computationally efficient. For complexity reasons it is desirable to use the same transform also used for extracting stereo/spatial parameters and possibly for downmixing the channels (this is also done in the BCC method).

However, this has one disadvantage: it is desirable to perform an accurate estimation of the stereo parameters on the aligned channels. But if the channels are aligned in the frequency domain (e.g. by cyclic shifts in the frequency domain) this may lead to a shift in the analysis window, which may negatively affect the parameter estimation. In the case of BCC, this mainly affects the measurement of ICC, wherein an increased window offset eventually pushes the ICC value towards zero even if the input signal is in fact completely coherent.

It is therefore an object to provide a concept for parameter calculation in multi-channel audio coding, which is able to compensate for inter-channel time differences while avoiding negative effects on spatial parameter estimation.

Disclosure of Invention

This object is achieved by the subject matter of the appended independent claims.

The present application is based on the following findings: in multi-channel audio coding, improved computational efficiency can be achieved by calculating at least one comparison parameter for ITD compensation between any two channels in the frequency domain used by a parametric audio encoder. The parametric encoder may use the at least one comparison parameter to mitigate the above-mentioned negative impact on the spatial parameter estimation.

Embodiments may comprise a parametric audio encoder intended to represent stereo or substantially spatial content by at least one downmix signal and additional stereo or spatial parameters. These stereo/spatial parameters may be ITDs, which may be estimated and compensated in the frequency domain before the remaining stereo/spatial parameters are calculated. This process may bias other stereo/spatial parameters, a problem that may otherwise have to be solved in an expensive way by recalculating the frequency-to-time transform. In the described embodiment, this problem can be alleviated considerably by applying a computationally inexpensive correction scheme that can use the values of the ITDs and some data of the underlying transformation.

Embodiments relate to a lossy parametric audio encoder that may be based on a weighted mid/side transform approach, may use stereo/spatial parameters IPD, ITD and two gain factors, and may operate in the frequency domain. Other embodiments may use different transforms and may use different spatial parameters as appropriate.

In an embodiment, the parametric audio encoder may be able to compensate and synthesize the ITDs in the frequency domain. It may feature a computationally efficient gain correction scheme that mitigates the negative effects of the window shift described above. Correction schemes for BCC encoders have also been proposed.

Drawings

Advantageous implementations of the application are the subject matter of the dependent claims. Preferred embodiments of the present application are described below with reference to the accompanying drawings, in which:

FIG. 1 shows a block diagram of a comparison device for a parametric encoder according to an embodiment of the present application;

FIG. 2 shows a block diagram of a parametric encoder according to an embodiment of the present application;

fig. 3 shows a block diagram of a parameter decoder according to an embodiment of the present application.

Detailed Description

Fig. 1 shows a comparison device 100 for a multi-channel audio signal. As shown, it may comprise inputs for audio signals of a pair of stereo channels, i.e. a left audio channel signal l (τ) and a right audio channel signal r (τ). Of course, other embodiments may include multiple channels to capture the spatial characteristics of the sound source.

The same overlapping window functions 11, 21w (τ) may be applied to the left input channel signal l (τ) and the right input channel signal r (τ), respectively, before transforming the time domain audio signals l (τ), r (τ) to the frequency domain. Furthermore, in an embodiment, a certain amount of zero padding may be added, which allows for an offset in the frequency domain. The windowed audio signal may then be provided to a corresponding Discrete Fourier Transform (DFT) block 12, 22 to perform a corresponding time-to-frequency transform. These may yield time-frequency intervals Lt,kAnd Rt,kK-1, which is a frequency transform of the audio signals of the pair of channels.

Said frequency conversion Lt,kAnd Rt,kMay be provided to the ITD detection and compensation block 20. The latter may be configured to use a frequency transformation L of the audio signals of the pair of channels in said analysis window w (τ)t,kAnd Rt,kTo derive the ITD parameters (here ITD)t) To represent the ITD between the audio signals of the pair of channels.Other embodiments may use different methods to derive the ITD parameters, which may also be determined in the time domain before the DFT block.

The derivation of the ITD parameters for calculating the ITDs may involve calculating an (possibly weighted) auto-or cross-correlation function. Traditionally, this can be done by applying an Inverse Discrete Fourier Transform (IDFT) to the termsAccording to the time-frequency interval Lt,kAnd Rt,kTo calculate.

The correct way to compensate for the measured ITD would be to perform the channel alignment in the time domain and then apply the same time-to-frequency transform again to the shifted channels in order to obtain ITD compensated time-frequency intervals (bins). However, to save complexity, the process can be approximated by performing a cyclic shift in the frequency domain. Accordingly, ITD compensation may be performed in the frequency domain by ITD detection and compensation block 20, e.g., by performing cyclic shifts by cyclic shift blocks 13 and 23, respectively, to produce

And

wherein ITDtThe ITD for frame t in the sample may be represented.

In an embodiment, this may advance the lagging channel by ITDt2 samples and can delay the lagging channel by ITDt2 samples. However, in another embodiment, if delay is critical, only the lagging channel is advanced by the ITDtOne sample may be beneficial, which does not increase the delay of the system.

As a result, the ITD detection and compensation block 20 may use the ITD parameters ITDtCompensating the ITD for the pair of channels in the frequency domain by cyclic shifting to compensate at its outputOut of process generation of a pair of ITD compensated frequency transforms Lt,k,comp,Rt,k,comp. In addition, the ITD detection and compensation block 20 may output derived ITD parameters, i.e., ITDtFor example, for transmission by a parametric encoder.

As shown in FIG. 1, the comparison and spatial parameter calculation block 30 may receive the ITD parameter ITDtAnd the ITD-compensated pair of frequency transforms Lt,k,comp,Rt,k,compAs its input signal. The comparison and spatial parameter calculation block 30 may use some or all of its input signals to extract stereo/spatial parameters of the multi-channel audio signal, such as the inter-phase difference IPD.

Furthermore, the comparison and spatial parameter calculation block 30 may be based on the ITD parameter ITDtAnd the ITD-compensated pair of frequency transforms Lt,k,comp,Rt,k,compGenerating at least one comparison parameter, here two gain factors g, for a parametric encodert,bAnd rt,b,corr. Other embodiments may additionally or alternatively use frequency translation Lt,k,Rt,kAnd/or the spatial/stereo parameters extracted in the comparison and spatial parameter calculation block 30 to generate at least one comparison parameter.

The at least one comparison parameter may be used as part of a computationally efficient correction scheme to mitigate the negative impact of the above-mentioned offset in the analysis window w (τ) on the spatial/stereo parameter estimation of the parametric encoder, which offset is caused by the channel alignment in the DFT domain by cyclic shifting within the ITD detection and compensation module 20. In an embodiment, at least one comparison parameter may be calculated for restoring the audio signals of the pair of channels at the decoder, e.g. from the downmix signal.

Fig. 2 shows an embodiment of such a parametric encoder 200 for a stereo audio signal, wherein the comparison device 100 of fig. 1 may be used to provide ITD parameters ITDtThe ITD compensated pair of frequency transforms Lt,k,comp,Rt,k,compAnd a comparison parameter rt,b,corrAnd gt,b

Parameter(s)The quantization encoder 200 may use the ITD compensated frequency transform Lt,k,comp,Rt,k,compGenerating as input a downmix signal DMX in a downmix block 40 for a left input channel signal l (τ) and a right input channel signal r (τ)t,k. Other embodiments may additionally or alternatively use frequency translation Lt,k,Rt,kGenerating a downmix signal DMXt,k

The parametric encoder 200 may calculate stereo parameters, such as IPD, on a frame basis in the comparison and spatial parameter calculation block 30. Other embodiments may determine different or additional stereo/spatial parameters. The encoding process of the parametric encoder 200 embodiment of fig. 2 may generally follow the following steps, which will be described in detail below.

1. Time-to-frequency conversion of input signals using windowed DFT

In the window and DFT blocks 11, 12, 21, 22

2. ITD estimation and compensation in the frequency domain

In the ITD detection and compensation module 20

3. Stereo parameter extraction and comparison parameter calculation

In the comparison and spatial parameter calculation block 30

4. Downmix

In the downmix block 40

5. Frequency to time conversion before windowing and overlap-add

In IDFT block 50

The embodiment of the parametric audio encoder 200 in fig. 2 may be based on using the ITD compensated frequency transform Lt,k,comp,Rt,k,compAnd weighted mid/side transformation of the input channels in the frequency domain with the ITD as input. It may also compute stereo/spatial parameters (e.g., IPD) and compute two gain factors for capturing stereo images. Which can mitigate the negative effects of the window shifting described above.

For spatial parameter extraction in the comparison and spatial parameter calculation module 30, the ITD-compensated time-frequency interval L may be usedt,k,compAnd Rt,k,compAre grouped into sub-bands, andand for each subband the difference IPD between phases and the two gain factors can be calculated. Let IbIndicating the index of the frequency bin in subband b. The IPD may be calculated as

The two gain factors may be associated with the ITD-compensated pair of frequency transforms Lt,k,compAnd Rt,k,compThe mid/side transform of the band-by-band phase compensation (given by the following equations (4) and (5)) is correlated:

for k ∈ Ib

And

a first gain factor g of the gain factorst,bCan be viewed as being used in equation (6) to convert M from the intermediate signaltContralateral signal transformation StOptimal prediction gain for band-by-band prediction:

St,k=gt,bMt,kt,k (6)

so that the prediction residual ρ in equation (6) given by equation (7)t,kIs at a minimum

The first gain factor gt,bMay be referred to as side gain.

A second gain factor rt,bDescribing the prediction residual ρt,kEnergy of M relative to the intermediate signalt,kIs given by the formula (8) as

And may be referred to as residual gain. Residual gain rt,bMay be used at a decoder, such as the decoder embodiment in fig. 3, to form the prediction residual ρ for mid/side transformst,kAs appropriate.

In the encoder embodiment shown in fig. 2, the ITD compensated frequency transform L given in equation (9) below may be usedt,k,compAnd Rt,k,compEnergy E ofL,t,bAnd ER,t,bWill gain factor gt,bAnd rt,bBoth are calculated as comparison parameters in the comparison and spatial parameter calculation block 30:

and the absolute value of its inner product is given in equation (10):

based on said energy EL,t,bAnd ER,t,bAnd inner product XL/R,t,bThe side gain factor g can be expressed using equation (11)t,bIs calculated as

Further, equation (12) may be used to base the energy E onL,t,bAnd ER,t,bAnd inner product XL/R,t,bAnd a side gain factor gt,bThe residual gain factor rt,bThe calculation is as follows:

in other embodiments, other methods and/or formulas may be used to calculate the side gain factor g, as appropriatet,bAnd a residual gain factor rt,bAnd/or different comparison parameters.

As mentioned before, ITD compensation in the frequency domain may generally save complexity, but (without further measures) has drawbacks. Ideally, for clean silenced speech recorded using an AB microphone setting, the left channel signal l (τ) is essentially a delayed (delayed by delay d) and scaled (scaled by gain c) version of the right channel r (τ). This case can be expressed by the following formula (13), in which:

l(τ)=cr(τ-d) (13)。

after appropriate ITD compensation of the unswitched input channel audio signals l (τ) and r (τ), the side gain factor g is invertedt,bWill be given in equation (14) as

Wherein the vanished residual gain factor rt,bIs given as

rt,b=0 (15)。

However, if the cyclic shift blocks 13 and 23 are used by the ITD detection and compensation block 20, respectively, to perform channel alignment in the frequency domain as in the embodiment in fig. 2, the corresponding DFT analysis window w (τ) is also rotated. Thus, after compensating for ITD in the frequency domain, an ITD compensated frequency transform R for the right channelt,k,compCan be determined in the form of time-frequency intervals by DFT of the following formula

w(τ)r(τ) (16),

And ITD compensated frequency translation L for the left channelt,k,compCan be determined in the form of time-frequency intervals by DFT of the following formula

w(τ+ITDt)r(τ) (17),

Where w is the DFT analysis window function.

It has been observed that this channel alignment in the frequency domain mainly affects the residual prediction gain factor rt,bFollowing ITDtIs increased. Without any further measures, the channel alignment in the frequency domain would thus add additional ambience to the output audio signal at the decoder, as shown in fig. 3. This additional environment is undesirable, especially when the audio signal to be encoded contains clean speech, because a false environment can impair the intelligibility of the speech.

Thus, the (predicted) residual gain factor r may be corrected in the presence of non-zero ITDs by using another comparison parametert,bTo mitigate the above-mentioned effects.

In an embodiment, this may be done by calculating the residual gain rt,bIs done, which is intended to match the desired residual signal e (τ) when the signal is coherent and flat in time. In this case, one expects a global prediction gain given by equation (18)

And is composed ofGiven global of vanishingTherefore, the desired residual signal e (τ) may be determined using equation (19) as

In an embodiment, the ITD parameter ITD may be usedtAnd an autocorrelation function W equal to or approximating the analysis window function W given in equation (20)X(n) calculating in the comparison and spatial parameter calculation block 30 a function excluding a side gain factor g based on the desired residual signal e (τ)t,bAnd a residual gain factor rt,bOther comparative parameters than:

WX(n)=∑τw(τ)w(τ+n) (20)。

if M isrIs represented by r2(τ) short term average, the energy of the desired residual signal e (τ) can be approximately calculated from equation (21) as

In the windowed intermediate signal given by equation (22) is

mt(τ)=(wt(τ)+cwt(τ+ITDt))r(τ) (22),

The windowed intermediate signal mtThe energy of (τ) can be approximated by equation (23):

[(1+c2)WX(0)+2cWX(ITDt)]Mr (23)。

in an embodiment, the above function used in the calculation of the comparison parameters in the comparison and spatial parameter calculation block 30 is equal to or approximates the autocorrelation function W of the analysis windowXNormalized version of (n)As given in equation (23a)

Autocorrelation function based on the normalizationThe other comparison parameters may be compared using equation (24)The calculation is as follows:

to be a residual gain rt,bAn estimated correction parameter is provided. In an embodiment, the parameters are comparedCan be used as the local residual gain r in the subband bt,bIs estimated. In another embodiment, the comparison parameter may be usedInfluencing the residual gain r as an offsett,bAnd (4) correcting. I.e. residual gain rt,bCan be determined by the corrected residual gain r as given in equation (25)t,b,corrReplacement of

Thus, in an embodiment, the further comparison parameter calculated in the comparison and spatial parameter calculation block 30 may comprise the corrected residual gain rt,b,corrCorresponding to the residual gain correction parameter given by the formula (24)To the residual gain r corrected in the form of the offset defined in equation (25)t,b

Thus, another embodiment relates to parametric audio coding using a windowed DFT and (a subset of) a parameter IPD according to formula (3), a side gain g according to formula (11)t,bResidual gain r according to equation (12)t,bAnd ITD, wherein the residual gain r is adjusted according to formula (25)t,b

In the empirical evaluation, different choices for the right channel audio signal r (τ) in equation (13) may be usedTo test residual gain estimatesAs can be seen from Table 1 below, for a white noise input signal r (τ) that satisfies the temporal flatness assumption, the residual gain estimateVery close to the residual gain r measured in the subbandt,bAverage value of (a).

Table 1: measured residual gain r for panned white noiset,bAverage of, and ITD and residual gain estimation(indicated in parentheses).

For speech signals r (τ), the temporal flatness assumption is often violated, which typically increases the residual gain rt,bAverage value of (see table 2 below, compared to table 1 above). Thus, the method of residual gain adjustment or correction according to equation (25) may be considered to be rather conservative. However, it can still remove most of the undesirable circumstances for clean voice recordings.

ITD\c 1 2 4
ms 0.1055 0.1022 0.0874
(0.0885) (0.0785) (0.0565)
ms 0.1782 0.1634 0.1283
(0.1631) (0.1458) (0.1039)
ms 0.2435 0.2191 0.1657
(0.2327) (0.2062) (0.1473)
ms 0.3050 0.2720 0.2014
(0.2992) (0.2627) (0.1885)

Table 2: measured residual gain r for panned mono speecht,bAverage of, and ITD and residual gain estimation(indicated in parentheses).

In case a single analysis window w is used, the normalized autocorrelation function given in equation (23a) can be consideredIndependent of the frame index t. Furthermore, for a typical analysis window function w, the autocorrelation function is normalizedIt can be considered to change very slowly. Therefore, it is possible to accurately align the values from a small table of valuesInterpolation is performed, which makes the correction scheme very efficient in terms of complexity.

Thus, in an embodiment, the normalized version of the autocorrelation function may be passed through an analysis window stored in a look-up tableInterpolation is performed to obtain a residual gain estimate or residual gain correction offset for use in determining the residual gain estimate or residual gain correction offset in block 30As a function of the comparison parameter. In other embodiments, the method for normalizing autocorrelation functions may be used as appropriateOther methods of interpolation of (2).

For BCC, e.g. [2]]Similar problems may occur when estimating inter-channel coherence (ICC) in subbands. In an embodiment, an energy E of formula (9) may be usedL,t,bAnd ER,t,bAnd the inner product of equation (10) to map the corresponding ICC by equation (26)t,bEstimated as

By definition, the ICC is measured after compensating for the ITD. However, a non-matching window function w may bias the ICC measurement. In the above-described clean-silenced speech setting described by equation (13), the ICC will be 1 if calculated on the correctly aligned input channel.

However, the offset (when the ITD is compensated in the frequency domain by cyclic shift)tCaused by rotation of the analysis window function w (τ) in the frequency domain) may bias the measurement of ICC towards that given in equation (27)

In an embodiment, the residual gain r in equation (25) is compared witht,bCompared to the correction of ICC can be corrected in a similar way, i.e. by replacing it in the way given in equation (28),

thus, another embodiment relates to parametric audio coding using a windowed DFT and a subset of the parameters IPD [ in accordance with equation (3) ], IPD, ICC according to equation (26), and ITD, wherein the ICC is adjusted according to equation (28).

In the embodiment of the parametric encoder 200 shown in fig. 2, the downmix block 40 may be obtained by calculating the downmix signal DMX given by equation (29) in the frequency domaint,kTo reduce the number of channels of a multi-channel (here stereo) system. In an embodiment, the downmix signal DMXt,kThe ITD compensated frequency transform L can be used according tot,k,compAnd Rt,k,compTo calculate

In equation (29), β may be a real absolute phase adjustment parameter calculated from stereo/spatial parameters. In other embodiments, the coding scheme as shown in fig. 2 may also work with any other downmix method. Other embodiments may use frequency translation Lt,kAnd Rt,kAnd optionally using other parameters to determine the downmix signal DMXt,k

In the encoder embodiment of fig. 2, an Inverse Discrete Fourier Transform (IDFT) block 50 may receive the frequency-domain downmix signal DMX from the downmix block 40t,k. The IDFT block 50 may down-mix the time-frequency intervals DMXt,kK-1, transformed from the frequency domain to the time domain to produce a time domain downmix signal dmx (τ). In an embodiment, a composition window w may be appliedS(τ) and adds it to the time-domain downmix signal dmx (τ).

Furthermore, as in the embodiment of FIG. 2, the core encoder 60 may receive the domain downmix signal dmx (τ) to be in accordance with MPEG-4Part 3[1]]Or any other suitable audio encoding algorithm as appropriate, to encode the single channel audio signal. In the embodiment of fig. 2, the core encoded time domain downmix signal dmx (τ) may be associated with an ITD parameter ITDtSide gain gt,bAnd corrected residual gain rt,b,corrCombined, appropriately processed and/or further encoded for transmission to a decoder.

Fig. 3 shows an embodiment of a multi-channel decoder. The decoder may receive a combined signal comprising the mono/downmix input signal dmx (τ) in the time domain and comprising the comparison and/or spatial parameters as frame-based side information. The decoder as shown in fig. 3 may perform the following steps, which will be described in detail below.

1. Time-to-frequency conversion of input using windowed DFT

In DFT block 80

2. Prediction of missing residual in frequency domain

In the upmix and space recovery block 90

3. Upmixing in the frequency domain

In the upmix and space recovery block 90

4. Frequency domain ITD synthesis

In the ITD synthesis block 100

5. Frequency domain to time domain conversion, windowing and overlap-add

In IDFT blocks 112, 122 and window blocks 111, 121

The time-to-frequency transformation of the mono/downmix signal input signal dmx (τ) may be done in a similar way as the input audio signal for the encoder in fig. 2. In some embodiments, an appropriate amount of zero padding may be added for ITD recovery in the frequency domain. The process may use time-frequency intervals DMXt,kThe form of K-1 yields a frequency transformation of the downmix signal.

To restore the downmix signal DMXt,kMay need to be independent of the transmitted downmix signal DMXt,kThe second signal of (2). The corrected residual gain r may be used, for example, in the up-mix spatial recovery block 90t,b,corrAs comparison parameter (sent by an encoder such as the encoder in fig. 2) and using the downmix signal DMXt,kTime-delayed time-frequency interval of time, to construct (reconstruct) such a signal) As given by equation (30):

for k ∈ Ib

In other embodiments, different methods and formulas may be used to recover the downmix signal DMX based on the transmitted at least one comparison parametert,kThe spatial characteristics of (a).

In addition, the upmix and spatial recovery block 90 may use the downmix signal DMX transmitted by the encodert,kAnd side gain gt,bAnd reconstructed residual signalThe inverse transform to the mid/side transform at the encoder is applied to perform the upmix. This may result in decoded ITD compensated frequency transformsAndgiven by formulas (31) and (32) as

For k ∈ Ib

And

where β is the same absolute phase rotation parameter as in the downmixing process of equation (29).

In addition, as shown in FIG. 3, the ITD synthesis/decompensation block 100 may receive decoded ITD-compensated frequency transformsAndthe latter can be rotated in the manner as given in equations (33) and (34)Andto apply the ITD parameter ITD in the frequency domaintTo produce decoded frequency transforms that have been compensated for ITDAnd

and

in FIG. 3, the frequency-domain to time-domain transformation of the ITD uncompensated decoded frequency transform in time-frequency intervals may be performed by IDFT blocks 112 and 122, respectivelyAndk-0.., K-1. The resulting time domain signal may then be windowed by window blocks 111 and 121, respectively, and added to the reconstructed time domain output audio signals of the left and right audio channelsAnd

the above-described embodiments are merely illustrative of the principles of the present invention. It is to be understood that modifications and variations of the arrangements and details described herein will be apparent to others skilled in the art. It is therefore intended that the scope of the appended patent claims be limited only by the details of the description and the explanation of the embodiments herein, and not by the details of the description and the explanation.

Reference to the literature

[1]MPEG-4 High Efficiency Advanced Audio Coding(HE-AAC)v2

[2]Jürgen Herre,FROM JOINT STEREO TO SPATIAL AUDIO CODING -RECENT PROGRESS AND STANDARDIZATION,Proc.of the 7th Int.Conference on digital Audio Effects(DAFX-04),Naples,Italy,October 5-8,2004

[3]Christoph Tourney and Christof Faller,Improved Time Delay Analysis/Synthesis for Parametric Stereo Audio Coding,AES Convention Paper 6753,2006

[4]Christof Faller and Frank Baumgarte,Binaural Cue Coding Part II:Schemes and Applications,IEEE Transactions on Speech and Audio Processing,Vol.11,No.6,November 2003。

16页详细技术资料下载
上一篇:一种医用注射器针头装配设备
下一篇:在向后兼容音频比特流中嵌入增强的音频传输

网友询问留言

已有0条留言

还没有人留言评论。精彩留言会获得点赞!

精彩留言,会给你点赞!

技术分类