Loudspeaker control

文档序号：1941947 发布日期：2021-12-07 浏览：18次中文

阅读说明：本技术 扬声器控制 (Loudspeaker control ) 是由 F·M·法兹 E·哈姆丹 A·福兰克 M·西蒙于 2021-06-07 设计创作，主要内容包括：提供了一种控制扬声器阵列的方法。该方法包括：接收要通过所述阵列在声学环境中的相应多个控制点处再现的多个输入音频信号；以及通过将滤波器集合应用于所述多个输入音频信号来生成所述阵列中每一扬声器的相应输出音频信号。所述滤波器集合基于：基于传递函数集合的第一近似的第一多个滤波器元件,所述传递函数集合中的每一传递函数在应用于所述扬声器中的相应一个扬声器的音频信号与在所述控制点中的相应一个控制点处从所述扬声器中的所述相应一个扬声器接收到的音频信号之间；以及基于所述传递函数集合的第二近似的第二多个滤波器元件。(A method of controlling a loudspeaker array is provided. The method comprises the following steps: receiving a plurality of input audio signals to be reproduced by the array at a respective plurality of control points in an acoustic environment; and generating a respective output audio signal for each speaker in the array by applying a set of filters to the plurality of input audio signals. The set of filters is based on: a first plurality of filter elements based on a first approximation of a set of transfer functions, each transfer function in the set of transfer functions between an audio signal applied to a respective one of the speakers and an audio signal received from the respective one of the speakers at a respective one of the control points; and a second plurality of filter elements based on a second approximation of the set of transfer functions.)

1. A method of controlling a speaker array, the method comprising:

receiving a plurality of input audio signals to be reproduced by the array at a respective plurality of control points in an acoustic environment; and

determining a respective output audio signal for each speaker in the array by applying a set of filters to the plurality of input audio signals,

wherein the set of filters is based on:

a first plurality of filter elements based on a first approximation of a set of transfer functions, each transfer function in the set of transfer functions between an audio signal applied to a respective one of the speakers and an audio signal received from the respective one of the speakers at a respective one of the control points; and

a second plurality of filter elements based on a second approximation of the set of transfer functions.

2. The method of claim 1, wherein the first approximation is based on a free-field acoustic propagation model.

3. The method of any preceding claim, wherein the first approximation is based on a point-source acoustic propagation model.

4. The method of any preceding claim, wherein the second approximation accounts for one or more of reflection, refraction, diffraction, or scattering of sound in the acoustic environment.

5. The method of any preceding claim, wherein the second approximation accounts for scatter from the heads of one or more listeners.

6. The method of any preceding claim, wherein the second approximation accounts for one or more of a frequency response of each of the speakers or a directivity pattern of each of the speakers.

7. The method of any preceding claim, wherein the set of filters comprises:

a first subset of filters based on the first and second pluralities of filter elements; and

a second subset of filters based on one of the first or second plurality of filter elements.

8. The method of any preceding claim, wherein generating a respective output audio signal for each speaker in the array comprises:

generating a respective intermediate audio signal for each control point by applying the first subset of filters to the input audio signal; and

generating a respective output audio signal for each speaker by applying a second subset of filters to the intermediate audio signal.

9. The method of claim 7 or 8, wherein the array comprises L speakers and the plurality of control point packetsComprises M control points, and wherein the first subset of filters comprises M²A plurality of filters and the second subset of filters comprises L M filters.

10. The method of any preceding claim, wherein the set of filters or the first subset of filters is determined based on an inverse of a matrix containing the first and second plurality of filter elements.

11. The method of claim 10, wherein a matrix comprising the first and second pluralities of filter elements is determined based on:

in the frequency domain, a product of a matrix comprising the second plurality of filter elements and a matrix comprising the first plurality of filter elements; or

Equivalent operations in the time domain.

12. The method of any of claims 10 to 11, wherein the set of filters is determined based on:

in the frequency domain, a product of a matrix comprising the first plurality of filter elements and an inverse matrix of a matrix comprising the first and second plurality of filter elements; or

Equivalent operations in the time domain.

13. The method of any preceding claim, wherein each of the first plurality of filter elements is a frequency independent delay-gain element.

14. The method of any preceding claim, wherein each of the first plurality of filter elements comprises a delay term and/or a gain term based on a relative position of one of the control points and one of the loudspeakers.

15. The method of any preceding claim, wherein each of the first plurality of filter elements comprises a delay term and/or a gain term, the delay term and/or gain term being determined for each given row of a first matrix comprising the first plurality of filter elements to:

increasing co-linearity between a given row of a first matrix and a corresponding row of a second matrix comprising the second plurality of filter elements; and

optionally, co-linearity between a given row of the first matrix and a non-corresponding row of the second matrix is reduced.

16. The method of any preceding claim, wherein each of the first plurality of filter elements comprises a delay term based on a linear approximation of a phase of a corresponding one of the second plurality of filter elements.

17. The method of any preceding claim, wherein the plurality of control points comprise positions of a corresponding plurality of listeners.

18. The method of any preceding claim, wherein the plurality of control points comprise the positions of the ears of one or more listeners.

19. A method as claimed in any preceding claim, wherein the second approximation is based on one or more head related transfer functions, HRTFs.

20. The method of any preceding claim, further comprising determining the plurality of control points using a position sensor.

21. The method of any preceding claim, wherein generating a respective output audio signal comprises applying at least a portion of the set of filters in a plurality of sub-bands using a filter bank.

22. The method of any preceding claim, wherein the set of filters is time-varying.

23. The method of any preceding claim, further comprising outputting the output audio signal to the loudspeaker array.

24. An apparatus configured to perform the method of any preceding claim.

25. A computer program comprising instructions which, when executed by a processing system, cause the processing system to perform a method according to any one of claims 1 to 23, or

A computer readable medium comprising instructions which, when executed by a processing system, cause the processing system to perform the method of any one of claims 1 to 23, or

A data carrier signal comprising instructions which, when executed by a processing system, cause the processing system to carry out the method according to any one of claims 1 to 23.

Technical Field

The present disclosure relates to a method of controlling a loudspeaker array and to a corresponding apparatus and computer program.

Background

The speaker array may be used to reproduce a plurality of different audio signals at a plurality of control points. The audio signals applied to the speaker array are generated using filters that can be designed to avoid crosstalk. However, the determination of the weights of these filters can be computationally expensive, especially if the control points are moving and thus the filter weights need to be calculated in real time. This may be the case, for example, if the control point corresponds to the listener's position in the acoustic environment.

A previous approach for determining filter weights for a loudspeaker array is described in WO 2017/158338a 1.

Disclosure of Invention

Aspects of the disclosure are defined in the appended claims.

Drawings

Examples of the present disclosure will now be explained with reference to the accompanying drawings, in which:

fig. 1 shows a method of controlling a loudspeaker array;

FIG. 2 shows an apparatus for controlling a loudspeaker array, which may be used to implement the method of FIG. 1;

fig. 3a shows a sound field control application intended to reproduce 3D binaural audio by performing crosstalk cancellation and creating narrow beams for the listener's ears;

fig. 3b shows a sound field control application intended to reproduce different content signals for different listeners;

fig. 3c shows a sound field control application aimed at reproducing 3D binaural audio by performing crosstalk cancellation and creating narrow beams for the ears of multiple listeners, while also bouncing sound off the walls of the environment to create a further 3D image source;

FIG. 3D illustrates the use of a head tracking system that estimates the real-time 3D position of a listener relative to a speaker array;

FIG. 4 shows a signal processing block diagram of a potential acoustic control problem with reproducing multiple acoustic signals at multiple control points with a loudspeaker array;

FIG. 5 shows a simplified signal processing diagram of a multiple-input multiple-output (MIMO) control process used in array signal processing to reproduce M input signals with L speakers;

fig. 6 shows a simplified signal processing diagram of a filtering approach called 'technique 1' to reproduce M input signals with L loudspeakers;

fig. 7 shows an extended signal processing diagram of the technique 1 approach, which shows M × M independent filters and M × L dependent filters;

fig. 8 shows a signal processing block diagram for the approach described herein (referred to as 'technique 2');

figure 9a shows a first signal processing scheme in which the technique 2 process is divided into a plurality of frequency bands to allow different values of signal processing parameters to be taken in different frequency bands;

fig. 9b shows a second signal processing scheme that divides the technology 2 process into multiple frequency bands;

fig. 9c shows a third signal processing scheme for dividing the technology 2 process into multiple frequency bands;

FIG. 10a shows simulation results of the processing power requirements of the listener adaptive array filter based on the technique 1 approach compared to the traditional listener adaptive and static MIMO approaches; and

fig. 10b shows a comparison of crosstalk cancellation performance between filters obtained using the technique 1 approach and the technique 2 approach described herein.

Like reference numerals refer to like parts throughout the specification and drawings.

Detailed Description

In general, the present disclosure relates to a method of controlling a loudspeaker array to reproduce a plurality of input audio signals at a respective plurality of control points in a manner that avoids crosstalk (which reduces the extent to which audio signals to be reproduced at a first control point are also reproduced at other control points). A set of filters is applied to an input audio signal to obtain a plurality of output audio signals that are output to a speaker array. The present disclosure relates generally to the manner in which these filters are determined.

Fig. 1 shows a method of controlling a loudspeaker array.

In step S100, a plurality of input audio signals to be reproduced by a loudspeaker array at a respective plurality of control points in an acoustic environment is received.

In step S110, a plurality of control points may be received using a position sensor. In particular, a location of each of a plurality of control points may be received or determined.

In step S120, a set of filters may be determined. If step S110 is performed, the filter set may be determined based on the determined plurality of control points. Alternatively, the set of filters may be determined based on a predetermined plurality of control points. The manner in which this set of filters is determined is described in detail below.

In step S130, a respective output audio signal for each speaker in the array is determined by applying the set of filters to the plurality of input audio signals.

The filter set may be applied in the frequency domain. In this case, a transform such as a Fast Fourier Transform (FFT) is applied to the input audio signal, a filter is applied, and then an inverse transform is applied to obtain an output audio signal.

In step S140, the output audio signal may be output to a speaker array.

Steps S100 to S140 may be repeated with another plurality of input audio signals. The set of filters may remain the same while repeating steps S100 to S140, in which case step S120 need not be performed, or may be changed.

As will be understood by those skilled in the art, the steps of fig. 1 may be performed for a plurality of frames of the input audio signal received consecutively. Accordingly, it is not necessary to complete all of steps S100 to S140 before starting to repeat steps S100 to S140. For example, in some implementations, step S100 is performed a second time before step S140 is performed a first time.

A block diagram of an exemplary apparatus 200 for implementing any of the methods described herein, such as the method of fig. 1, is shown in fig. 2. The apparatus 200 includes a processor 210 (e.g., a digital signal processor) arranged to execute computer readable instructions that may be provided to the apparatus 200 via one or more of the memory 220, the network interface 230, or the input interface 250.

Memory 220, such as Random Access Memory (RAM), is arranged to be able to retrieve, store and provide to processor 210 instructions and data that have been stored in memory 220. The network interface 230 is arranged to enable the processor 210 to communicate with a communication network, such as the internet. The input interface 250 is arranged to receive user input provided via an input device (not shown) such as a mouse, keyboard or touch screen. The processor 210 may also be coupled to a display adapter 240, the display adapter 240 in turn being coupled to a display device (not shown). The processor 210 may also be coupled to an audio interface 260, which audio interface 260 may be used to output audio signals to one or more audio devices, such as the speaker array 300. The audio interface 260 may include a digital-to-analog converter (DAC) (not shown), for example, for an audio device having an analog input.

Various approaches for determining a set of filters are now described.

Context(s)

Listener adaptive crosstalk cancellation (CTC) based 3D audio systems rely on multiple control filters to generate sound that drives one or more speakers. The parameters of these filters are adapted in real-time according to the instantaneous location of one or more listeners, which is estimated using a listener tracking device (e.g., a camera, a global positioning system device, or a wearable device). This filter parameter adaptation requires expensive computational resources, making it difficult for small embedded devices to use this audio reproduction approach. Part of the computational resource consumption comes from the need for multiple inverse filters due to the use of complex, accurate transfer function models between the system speaker and the ear of a given listener. A simpler acoustic transfer function can be used to reduce the computational load, but at the cost of reducing the quality of the reproduced audio, particularly in terms of its perceptual spatial properties. Therefore, it is difficult to create an adaptive system with a low computational load and with high quality performance.

The listener adaptive CTC system may be based on a stereo speaker arrangement. The listener adaptive system may also use a four speaker arrangement to allow the listener to turn around and hear sounds over a 360 degree range. These examples of listener adaptive CTC systems use a time-varying signal processing control approach to adapt to time-varying listener positions and head orientations. The control filter may be read from a database or calculated on-the-fly at significant computational cost. While this signal processing approach can be implemented using a large Central Processing Unit (CPU), such as that available in a Personal Computer (PC), their underlying signal processing becomes a limiting factor for embedded systems when more than two speakers are used.

The CTC based 3D audio system has an improved response when more than two speakers are used. These can be used with a non-listener adaptive, fixed approach. However, such approaches may not be suitable for consumer applications because they assume that the listener stays at a single listening position.

From a signal processing perspective, the main problem with many approaches is that they are based on 'classical' Multiple Input Multiple Output (MIMO) signal streams requiring M × L control filters, M being the number of sound pressure control points (typically one for each ear of the listener) and L being the number of loudspeakers of the loudspeaker array. For a dual speaker system, only four filters are required; however, if the system is to be listener adaptive, twice the number is required, and if more loudspeakers are to be used, the processing cost increases very fast.

The technique described in WO 2017/158338a1 (hereinafter referred to as 'technique 1') allows processing efficient listener adaptive audio reproduction using a loudspeaker array of more than two loudspeakers. The main CPU overhead (or consumption) reduction introduced by technique 1 is due to the decomposition of the filtered signal processing audio stream into a combination of speaker Dependent Filter (DF) and speaker Independent Filter (IF). In technique 1, the speaker Independent Filters (IFs) are implemented as a set of time-varying Finite Impulse Response (FIR) filters, and the speaker Dependent Filters (DFs) are implemented as a set of time-varying gain delay elements. Due to this decomposition only M x M control filters and M delay lines (L read points per delay line) are needed. This processing scheme results in a significant reduction in processing complexity compared to the mxl matrix of filters required by other methods, since L is much larger than M in most implementations.

However, the processing savings introduced by technique 1 require that the acoustic transfer function between each speaker and the sound pressure control point can be represented by linear phase and frequency independent gains, for example assuming a free-field point monopole propagation model. However, it may be useful to use more complex transfer functions, which will significantly improve the perceptual quality of the virtual sound image and cannot be represented with simple gain and delay.

Overview of technique 1

A loudspeaker array based sound field control system aims at reproducing one or more acoustic signals at one or more points in space (control points) while canceling acoustic crosstalk (or sound leakage) to other control points. Such acoustic control results in the production of narrow acoustic beams that can be directionally controlled or steered in space in a precise manner to facilitate various acoustic applications.

For example, one application may precisely control the pressure to the ears of one or more listeners 341, 342, 343 to create 'virtual headphones' and reproduce 3D sound, which is called crosstalk cancellation (CTC), as shown in fig. 3 a. Another application may be to reproduce various different and independent sound beams 320 to two or more listeners, so that each of them may listen to a different sound program or the same program with user-specific sound levels, as shown in fig. 3 b. Since the sound beam 320 controls the sound field around the ear, these control techniques are known as "the ability to personalize the sound around the listener". Furthermore, the beams generated by the speaker array 300 may be controlled to also direct sound to the walls 330 of the room where the sound is reproduced. This sound bounces off the wall and reaches the listener, creating an immersive experience, as shown in fig. 3 c.

The L-channel speaker array includes a positionA loudspeaker. For a given reproduction frequency ω ═ 2 π f (in radians/second), the goal is at a set of control pointsTo reproduce a set of M audio signals d (ω) — [ d ] rendered by M beams created by the loudspeaker array₁(ω)，…，dM(ω)]^T. The listener can move around freely in the listening space and control the point x_mThe position of the can vary in space. To allow this, point { x is controlled_mThe instantaneous spatial position of the listener can be collected by a listener tracking system 310 (camera, wearable, laser, sound based) that provides real-time coordinates of the listener's ear relative to each speaker of the speaker array, as shown in fig. 3 d.

A block diagram of the sound pressure control problem reproduced by a loudspeaker array is depicted in fig. 4. The potential sound pressure control problem can be expressed in the frequency domain as

p(ω)＝S(ω)H(ω)d(ω)， (1)

Wherein p (ω) ═ p₁(ω)，…，p_M(ω)]^TInvolving in different control points x_mTo reproduce the sound pressure signal. (.)^TRepresenting a vector or matrix transpose and,is a so-called target matrix (plant matrix) whose elements are the acoustic transfer functions between the L sources and the M control points, andis a control filter matrix designed to reproduce the audio input signal d (ω) at a control point given S (ω). Each column H of H_mIs designed to be at control point x_mTo reproduce its corresponding audio signal d_mWhile minimizing radiation pressure at other control points. Unless necessary, the dependency on ω will be omitted hereinafter.

The ultimate goal of the sound control system is to achieve

p＝e^-jωTd，

WhereinAnd e^-jωTIs the modeling delay used to ensure the causal relationship of the solution. If SH ═ e^-jωTI, where I is an M × M identity matrix, this condition is satisfied. One way to allow this condition to be approximately met is to compute H as a regularized pseudo-inverse of S, i.e.

H＝e^-jωTS^H[SS^H+A]^-1 (3)

Wherein A is a regularization matrix, and (·)^HIndicating the Hermitian transpose. The above equation may be referred to as the pseudo-inverse solution of the uncertainty system, and thus the set of control filters it returns may be referred to as the "inverse" filter. Such a system would have M inputs for M audio signals and L outputs for L speakers of the array, as shown in the block diagram of fig. 5. In the case of MIMO systems such as those used in the classical array signal processing, M × L control filters are required.

In the array signal processing, the array control filter H is calculated for a given acoustic target matrix S. The target matrix is a model of the electro-acoustic transfer function between the array loudspeaker and the control point at which the sound pressure is to be controlled. Ideally, the target matrix will characterize the physical transfer functions found in an actual acoustic system as accurately as possible. However, this is not always possible in practical applications. While it is possible to perform acoustic measurements with relatively great accuracy and estimate the target matrix for a given system, this is a complex process that can only be performed accurately under laboratory conditions. Furthermore, even small movements of the listener, the target matrix varies significantly, which requires a dense measurement grid to allow wide adaptability to listener movements. Furthermore, this approach also produces a set of L × M complex inverse filters, which makes the reconstruction computationally complex. Therefore, it is helpful to use a very simple but accurate acoustic propagation model to represent the object matrix S.

One particular case is when the target matrix S is approximated by a simple matrix C formed assuming a free-field point-source acoustic propagation model between each loudspeaker and the sound pressure control point. Thus, the matrix C is defined as

Where each element of this matrix is formed by delay and gain elements, e.g.,

whereinIs wave number and c₀Is the speed of sound in air, and r_mLIs a frequency independent real number that depends on the distance between the mth acoustic control point and the acoustic center of the lth loudspeaker. The use of such a propagation model makes it easy to calculate the elements of the matrix C once the position of the control points relative to the loudspeaker array is known, so that appropriate processing is required to calculate a new set H of control filters.

Using a simple electroacoustic model helps to reduce the amount of computation required to obtain a new set of filters, while also helping to reduce the number of low-level operations required to filter a given amount of digital audio content. Further simplification can be achieved by analyzing the structure of equation (3), which is a formula of the pseudo-inverse of the underdetermined least squares problem. Careful analysis shows that certain terms (filter elements) are common to some outputs/speakers. These are called Independent Filters (IF). The other terms are specific to certain speakers and are referred to as correlation filters (DF). Thus, the terms of equation (3) and the resulting signal processing architecture can be grouped as follows:

wherein T is₁And T₂Is to satisfy the relation T₁+T₂Delay of T. This makes it possible to decompose the signal processing in equation (6) into a set of M × M IFs and a set of L × M DFs. This results in the signal processing scheme shown in fig. 6, which is shown in its expanded form in fig. 7.

One of the features of this array signal processing is that it is possible to implement M × M IFs with conventional (time-varying) FIR filtering and M × L DFs with M (time-varying) delay lines each with L access points. At this point, the DFs act like a delay-and-sum beamformer. This implementation introduces a substantial reduction in the computational cost required to filter a certain amount of digital audio when compared to traditional M x L variable filter-based MIMO filtering approaches, allowing the number of floating point operations per second (FLOPS) to be reduced and the processing to be embedded in a smaller device. The only requirement to achieve this reduction in computational complexity is that the elements of the matrix C comprise only frequency independent gains and delays.

Technical 2 method

It may be useful to use a more accurate, frequency dependent transfer function model than that provided by the matrix C described above. For example, it may be desirable to use rigid spheres or measured Head Related Transfer Functions (HRTFs) to cancel crosstalk to account for listener head diffraction and thereby improve spatial audio quality, or it may be useful to compensate for the frequency response and directionality of the speakers, or to compensate for diffraction of other elements in the environment.

One way to achieve this is to replace the simple matrix C with a more complex matrix G that provides a better approximation of the physical transfer function matrix S. For example, the matrix G may be created by measuring a physical transfer function S (in which case the elements of G may be, for example, head-related transfer functions), or by using an analytical or numerical model of S (such as a rigid sphere or a boundary element model of a human head). However, in this case, the elements of G will not be as simple delays and gains as is the case with C, but will be based on more complex frequency dependent data or functions. If such a matrix G is used in equation (6) for the digital filter calculation, on the one hand this will result in a better audio quality performance of the system, but on the other hand it will require much more complex DFs, resulting in a significant increase of the overall calculation load.

The inventors have gained the following insight: by using a relatively complex, more accurate matrix G and a relatively simple, less accurate matrix C, the audio quality of technique 1 can be significantly improved without significantly increasing the computational load. First, it is recalled that since the goal of the filter design step is p ═ e^-jωTd, where p is SHd, the filter H is such that

SH≈e^-jωTI (7)

Where I is an M identity matrix.

Equation (6) for calculating H is replaced by the following equation (ignoring the regularization matrix A for the moment)

And SC^H[CC^H]^-1In contrast, SC^H[GC^H]^-1Provides a much better approximation to the identity matrix because G is a much better approximation to S than C. This allows for a significantly improved audio quality.

However, the use of more accurate but computationally more complex matrices G is limited by IF, while DF is a matrixSimple gain and delay included in (1). And matrix G^HThis allows much lower computational costs than are required for DF as well.

In this case, the foregoing sound pressure problem is now given as follows

p＝e^-jωTSC^H[GC^H]^-1d。 (9)

It is also possible to apply a regularization scheme (e.g., Tikhonov regularization) to the design of the IF. In this case, the formula (8) is rewritten as

H＝e^-jωTC^H[GC^H+A]^-1 (10)

Where a is the regularizer used to control the energy of the array filter. A block diagram corresponding to this Digital Signal Processing (DSP) architecture is depicted in fig. 8. It can be observed how the filter H is divided into M × M independent filters IF and M × L dependent filters DF.

An alternative way to calculate the IF of the independent filter is to solve a (convex) optimization problem

Limited by | | C^HIFs||_p2≤H_max. (12) Here, theAndrepresenting a suitable matrix norm, e.g. the Frobenius norm, while H_maxIs the allowable upper limit of the matrix norm of the array filter H.

It is worth noting at this point that the combination of matrices G and C provides further possibilities for creating array controlled filters, which may benefit from using this hybrid control approach and a more realistic transfer function model. For example, by calculating H as follows, it may be useful to employ a "weighted" control approach to adjust the contribution from any selected speaker to control the sound pressure at any control point

H＝e^-jωTW_LC^H[GW_LC^H+A]^-1， (13)

In this case, W_LIs an LxL diagonal weighting matrix comprisingPositive weight for each speaker.

A similar approach may be useful for some use cases where it is desirable to control the sound pressure at each control point in different ways. In this case, a matrix W of size M containing positive weights may be used_MWherein the control filter is given by:

H＝[GW_MC^H+A]^-1C^HW_Me^-jωT。 (14)

the following set of items is now defined:

newly introduced matrix G (i.e. G)_m，l) Has a formWhere τ (x)_m，y_l) Is a position dependent delay depending on the position of each loudspeaker and control point, and G₀(x_m，y_lω) is a complex frequency dependent function.

C (i.e. C)_m，l) Is formed byGain and delay of (2).

Real value gain g_m，lDepending on the relative positions of the loudspeaker and the control point.

Is comprised in G_m，lDelay term τ (x) in the definition of (1)_m，y_l) May be a corresponding element C defining a matrix C_m，lThe same delay of (a).

Delay term τ (x)_m，y_l) The following options are possible: matrix GC^HThe phase of the terms on the diagonal is as close to zero as possible.

Thus, a possible choice of delay is the value τ (x)_m，y_l) So that ω τ (x)_m，y_l) Is G_m，lIs the best linear approximation of the phase (across frequency).

Other possibilities for the design of C are based on collinearity factors

Wherein | | · | | is l²Norm operator, and c_m′And g_mRespectively the m 'th row of matrix C and the m' th row of matrix G.

One option is to select the delay term τ (x)_m，y_l) And gain term g_m，lSo as to make the collinearity factor gamma_m，m′The row is maximized (or incremented) for each combination of rows with subscript m ═ m' over the frequency range of interest.

Another possibility is to select the delay term τ (x)_m，y_l) And gain term g_m，lTo achieve the best compromise between maximizing (or increasing) the collinearity factor for each combination of rows with subscript m ≠ m 'and minimizing (or decreasing) the collinearity factor for rows with subscript m ≠ m', also within the frequency range of interest.

By way of example, one possible mathematical formula for this optimization problem is

Wherein the design parameter alpha_kAnd ζ_kIs non-negative real, andandrespectively all delays tau (x)_m，y_l) And gain g_m，lA collection of (a). { omega [ [ omega ] ]_k}_{k＝1，...，K}Is a set of frequencies across a frequency range of interest (note gamma_m，m′Is a frequency dependent quantity).

One of the advantages of this optimization approach is increased system stability. For the case of M ═ 2, this can be evidenced by the fact that: det (GC)^H) The absolute value of (which is the determinant of the matrix to be inverted for the filtering calculation) is

Where phi is the phase term. It can be seen that if no assumption is made about φ, γ is maximized (or increased)_1，1And gamma_2，2And minimizing (or reducing) gamma_1，2And y_2，1The absolute value of the determinant is maximized (or increased) and thus the stability of the system is improved.

The above approach uses two sets of transfer functions to compute the array filter and is referred to as 'technique 2'.

Filter bank implementation

For some applications it may be useful to implement parallel versions of the same signal processing algorithm (but applicable to different frequency bands). This may be needed, for example, if different types of acoustic actuators are used for different frequency ranges (treble and bass). In this case, a different number of speakers L may be used for each different frequency band_n. This requires different calculations for the matrices C and G for different frequency bands, so that the elements of these matrices can be set to N ═ 1, …, N]The different frequency bands take different values. Three different ways of achieving this are described below.

A first multi-band architecture is shown in fig. 9 a. Using a set of N bandpass filters B at the input_nAnd the core technology 2 process is repeated N times. In this case, the IF and DF for each band are different. The band pass filter may alternatively be a low pass filter or a high pass filter. In this case, the IF and DF for the nth band can be defined as

IF_n＝[G_nC_n ^H+A_n]^-1 (18)

DF_n＝C_n (19)

Wherein the matrix G_n、C_n、A_nAs defined herein above, but with parameter values specific to the nth frequency band. Using these definitions of IF and DF, in the frequency domainL corresponding to the nth frequency band is given by_nLoudspeaker signal q_n

q_n＝C_n ^H[G_nC_n ^H+A_n]^-1B_nd (20)

A second possible multi-band DSP architecture is shown in fig. 9 b. In this case, the IF considers a matrix C that is different for each band_nAnd the output of the IF is later divided into N bands that are fed to the N groups DF, the value of the scaling delay for each band being different. This scheme only requires the use of M × M IFs, as opposed to having a different set of IFs for each band. These IF's can be defined as

Wherein W_nIs a frequency weighting function which depends mainly on the band-pass filter B_nAnd may be complex valued. DF can be calculated according to equation (19).

A third possible multi-band DSP architecture is shown in fig. 9 c. In this case, multi-band processing is included in both the IF and DF, so that a single set of M × M IFs and M × L DFs (as opposed to a different set for each band) is required. IF may be as defined in equation (21) and DF may be defined as

By this approach, DF is no longer a gain delay element. In this third approach, for each given loudspeaker, the signals associated with the respective frequency bands are added together. Therefore, this method is not suitable for the case where different acoustic drivers are used for different frequency bands (treble and bass). However, in other applications, this approach may be useful, for example, when the group delay of the elements of G is better approximated by different delays in different frequency bands. According to the above definition of IF and DF, the L loudspeaker signals q are given in the frequency domain as follows

Effect of the measures of technique 1 and technique 2

Fig. 10a shows simulation results of the processing power requirements of the listener adaptive array filter based on the technique 1 approach compared to the traditional listener adaptive and static MIMO approaches. Specifically, for the static MIMO approach 1001, the listener adaptive MIMO approach 1002, and the technique 1 approach 1003, the number of required MFLOPS is shown as a function of the number of speakers L.

To illustrate the advantages provided by the technique 2 approach, simulation results for a speaker array with three speakers are shown in fig. 10 b. In this simulation, the CTC spectrum is shown, representing the channel separation of the acoustic signal delivered in the listener's ear. Ideally, the performance metric should be as large as possible for an array transmitting 3D sound through CTCs to provide good 3D immersion. As observed in fig. 10b, the performance of technology 21004 is much better over the audio frequency range than that of technology 11005, especially above 2kHz, where the effect of head diffraction is large.

The technique 2 approach combines the simplicity and low computational cost of technique 1, since there is a matrix C^HSimple DF is represented, but it also allows to introduce a more accurate target matrix G in the calculation of the IF without significantly increasing the overall calculation cost of the algorithm. This allows complex acoustic phenomena, such as reflections due to diffraction of the head or acoustic environment, to be taken into account and compensated for, and thereby improve the quality of the reproduced audio.

An effect of the present disclosure is to provide a filter computation scheme that allows the use of complex transfer function models while using a limited amount of processing resources.

An effect of the present disclosure is to provide a filtering approach with improved stability.

Alternate implementation

It will be appreciated that the above approaches (particularly technique 1 and technique 2) can be implemented in a variety of ways. The following is a general description of features that may be common to many implementations of the above approaches. It will of course be appreciated that any feature of the above approach may be combined with any common feature listed below, unless otherwise specified.

A method of controlling (or 'driving') a loudspeaker array (e.g. a line array of L loudspeakers) is provided.

The method may comprise generating, in an acoustic environment (or 'acoustic space'), at a respective plurality of control points (or 'listening positions') (e.g.,) A plurality of input audio signals (e.g., d) reproduced by the array are received.

Each of the plurality of input audio signals may be different.

At least one of the plurality of input audio signals may be different from at least one other of the plurality of input audio signals.

The method may further comprise generating (or 'determining') a respective output audio signal (e.g. Hd or q) for each speaker in the array by applying a set of filters (e.g. H) to the plurality of input audio signals (e.g. d).

The filter set may be a digital filter. The filter set may be applied in the frequency domain.

The set of filters may be based on a first plurality of filter elements (e.g., C) and a second plurality of filter elements (e.g., G).

The first plurality of filter elements (e.g., C) may be based on a first approximation of a set of transfer functions (e.g., S).

The second plurality of filter elements (e.g., G) may be based on a second approximation of the set of transfer functions (e.g., S).

Each transfer function of the set of transfer functions may be between an audio signal applied to a respective one of the loudspeakers and an audio signal received from the respective one of the loudspeakers at the respective one of the control points.

The first and second plurality of filter elements may be based on different approximations of the set of transfer functions. In particular, the different approximations may be based on different models of the set of transfer functions.

The filter elements may be weights of the filter. The plurality of filter elements may be any set of filter weights. The filter element may be any component of the weight of the filter. The plurality of filter elements may be a plurality of components of respective weights of the filter.

The filter set may be obtained by combining two different matrices C and G, which in turn are calculated using two different approximations of the physical electro-acoustic transfer function that constitutes the system objective matrix S. Matrix G (e.g., as used in equation 10) may be formed using an accurate, frequency-dependent approximation of target matrix S. Matrix C (e.g., as used in equation 10) may be formed using frequency independent gain and delay, or more generally, may be formed using the following elements: which is different from the elements of G and allows for a DF that can be calculated with a reduced computational load compared to a DF calculated based on G.

The first approximation (e.g., for determining C) may be based on a free-field acoustic propagation model and/or a point-source acoustic propagation model.

The second approximation (e.g., for determining G) may account for one or more of reflection, refraction, diffraction, or scattering of sound in the acoustic environment. Alternatively or additionally, the second approximation may account for scatter from the heads of one or more listeners. Alternatively or additionally, the second approximation may account for one or more of a frequency response of each speaker or a directivity pattern of each speaker.

The filter set (e.g., H) may include:

a first filter subset (e.g., [ GC ] based on first (e.g., C) and second (e.g., G) pluralities of filter elements^H]^-1) (ii) a And

a second subset of filters (e.g., C) based on one of the first or second plurality of filter elements^H)。

Generating a respective output audio signal for each speaker in the array may include:

by applying a first subset of filters (e.g., [ GC ]) to an input audio signal (e.g., d)^H]^-1) -generating for each control point (m) a respective intermediate audio signal; and

by applying a second subset of filters (e.g. C) to the intermediate audio signal^H) A respective output audio signal is generated for each speaker.

The array may include L loudspeakers, and the plurality of control points may include M control points, and the first subset of filters may include M²And the second subset of filters may include L x M filters.

The filter set or first subset of filters may be based on a matrix (e.g., [ GC ] comprising first (e.g., C) and second (e.g., G) pluralities of filter elements^H]) Is determined by the inverse of.

A matrix (e.g., [ GC ] comprising first and second pluralities of filter elements^H]) May be regularized (e.g., by regularization matrix a) before being inverted.

A matrix (e.g., [ GC ] comprising first and second pluralities of filter elements^H]) The determination may be based on:

in the frequency domain, a matrix comprising the second plurality of filter elements (e.g., G) and a matrix comprising the first plurality of filter elements (e.g., [ C ]^H]) The product of (a); or

Equivalent operations in the time domain.

The set of filters may be determined based on:

in the frequency domain, a matrix containing a first plurality of filter elements (e.g., [ C ]^H]) And a matrix comprising first and second pluralities of filter elements (e.g., [ GC ]^H]) The product of (a); or

Equivalent operations in the time domain.

An optimization technique may be used to determine the filter set.

A first subset of filters may be determined to reduce a difference between a scalar matrix (e.g., identity matrix I) and a matrix comprising a product of: a matrix comprising a second plurality of filter elements (e.g., G), a matrix comprising a first plurality of filter elements (e.g., C), and a matrix representing a first subset of filters (e.g., IF).

Each of the first plurality of filter elements (e.g., C) may be a frequency independent delay-gain element (e.g., C))。

Each of the first plurality of filter elements may include a delay term (e.g., a delay term)) And/or gain terms (e.g., g)_m，l) The gain term being based on one of the control points and one of the loudspeakers (e.g. y)_l) Relative position (e.g., x)_m)。

-for each given one of the plurality of control points (m):

a first vector (e.g. c)_m) May contain filter elements from a first plurality of filter elements (e.g., C) corresponding to a given control point (m), an

A second vector (e.g. g)_m) May contain filter elements from a second plurality of filter elements (e.g., G) corresponding to a given control point (m);

also, each of the first plurality of filter elements may include a delay term and/or a gain term determined based on a collinearity (e.g., γ) between the first vector and the second vector.

Delay term (e.g. of) And/or gain terms (e.g., g)_m，l) Can be determined so as to, for each given one of the plurality of control points (m), increase (or maximize) the first vector (e.g., c) corresponding to the given control point_m) And a second vector (e.g., g) corresponding to a given control point_m) Co-linearity between (e.g., γ)_m，m′)。

Delay term (e.g. of) And/or gain terms (e.g., g)_m，l) May be determined so as to:

for each different pair of first (m) of the plurality of control points₁) And a second (m)₂) Given a control point, a first vector corresponding to a first given control point is reduced (or minimized) (e.g., reduced) to a second given control point) And a second vector corresponding to a second given control point (e.g., a) Co-linearity therebetween (e.g.) (ii) a And

for each third given control point (m) of the plurality of control points₃) Adding (or maximizing) a first vector corresponding to a third given control point (e.g., to maximize a second vector corresponding to a third given control point)) And a second vector corresponding to a third given control point (e.g., a) Co-linearity therebetween (e.g.)。

Each of the first plurality of filter elements may include a delay term (e.g.,) And/or gain terms (e.g., g)_m，l) The delay term and/or gain term is determined for each given row of a first matrix (e.g., C) comprising a first plurality of filter elements such that:

increasing (or maximizing) a co-linearity (e.g., γ) between a given row of a first matrix (e.g., C) and a corresponding row of a second matrix (e.g., G) that includes a second plurality of filter elements; and

optionally, co-linearity (e.g., γ) between a given row of a first matrix (e.g., C) and a non-corresponding row of a second matrix (e.g., G) is reduced (or minimized).

Each of the first plurality of filter elements may include a delay term based on a linear approximation of the phase of a corresponding one of the second plurality of filter elements (e.g., G) (e.g.,)。

a plurality of control points (e.g.,) The positions of a corresponding plurality of listeners may be included, for example, when operating in a 'personal audio' mode.

A plurality of control points (e.g.,) The position of one or more of the listener's ears may be included, for example, when operating in "binaural" mode.

The second approximation may be based on one or more head related transfer functions HRTFs. The one or more HRTFs may be measured HRTFs. The one or more HRTFs may be simulated HRTFs. The one or more HRTFs may be determined using a boundary meta-model of the head.

The second plurality of filter elements may be determined by measuring a set of transfer functions.

The method may also include determining a plurality of control points using the position sensor.

Generating the respective output audio signal (e.g., Hd) may include applying at least a portion of a set of filters in the plurality of sub-bands using a filter bank.

A first subset of filters (e.g., [ GC ]^H]^-1) And a second subset of filters (e.g. C)^H) Can be applied toEach sub-band (e.g., as shown in fig. 9 a).

A first subset of filters (e.g., [ GC ]^H]^-1) And a second subset of filters (e.g. C)^H) Can be applied within a filter bank (e.g. as shown in fig. 9 a).

A first subset of filters (e.g., [ GC ]^H]^-1) Can be applied in the full frequency band while the second subset of filters (e.g., C)^H) May be applied in each sub-band (e.g., as shown in fig. 9 b). In other words, the first subset of filters (e.g., [ GCH ]]^-1) Can be applied outside the filter bank, while a second subset of filters (e.g. C)^H) Can be applied within a filter bank.

Generating a respective output audio signal for each speaker in the array may include:

for each of a first subset of speakers, generating a respective output audio signal in a first sub-band of a plurality of sub-bands; and

for each of a second subset of speakers, generating a respective output audio signal in a second sub-band of the plurality of sub-bands,

the first and second subsets of speakers are different and the first and second sub-bands of the plurality of sub-bands are different.

The first plurality of filter elements may include a first subset of first filter elements for a first sub-band of the plurality of sub-bands and a second subset of first filter elements for a second sub-band of the plurality of sub-bands; and/or the second plurality of filter elements may comprise a first subset of second filter elements for a first sub-band of the plurality of sub-bands and a second subset of second filter elements for a second sub-band of the plurality of sub-bands.

The first subset of first filter elements and the second subset of first filter elements may be different and/or the first subset of second filter elements and the second subset of second filter elements may be different.

The filter set (e.g., H) may be time-varying. Alternatively, the filter set (e.g., H) may be fixed or time invariant, such as when the listener position and head direction are considered to be relatively static.

The method may also include outputting an output audio signal (e.g., Hd or q) to the speaker array.

The method may also include receiving a set of filters (e.g., H), for example, from another processing device or from a filter determination module. The method may also include determining a set of filters (e.g., H).

The first and second approximations may be different.

At least one of the first plurality of filter elements (e.g., C) may be different from a corresponding one of the second plurality of filter elements (e.g., C).

The method may further include determining any variable listed herein using any equation listed herein.

The filter set may be determined using any of the equations listed herein (e.g., equations 6, 8, 10, 13, 14).

An apparatus configured to perform any of the methods described herein is provided.

The apparatus may include a digital signal processor configured to perform any of the methods described herein.

The apparatus may comprise a loudspeaker array.

The apparatus may be coupled to a speaker array, or may be configured to be coupled to a speaker array.

There is provided a computer program comprising instructions which, when executed by a processing system, cause the processing system to perform any of the methods described herein.

A (non-transitory) computer-readable medium or a data carrier signal comprising the computer program is provided.

In some implementations, the various methods described above are implemented by a computer program. In some implementations, the computer program includes computer code arranged to instruct a computer to perform the functions of one or more of the various methods described above. In some implementations, computer programs and/or code for performing the methods are provided to devices, such as computers, on one or more computer-readable media or more generally on a computer program product. The computer readable medium is transitory or non-transitory. The computer-readable medium or media can be, for example, an electronic, magnetic, optical, electromagnetic, infrared, or semiconductor system, or a propagation medium for data transmission, such as for downloading code over the internet. Alternatively, the one or more computer-readable media may take the form of one or more physical computer-readable media, such as semiconductor or solid state memory, magnetic tape, a removable computer diskette, a Random Access Memory (RAM), a read-only memory (ROM), a rigid magnetic disk, or an optical disk (such as a CD-ROM, CD-R/W, or DVD).

In an implementation, the modules, components, and other features described herein are implemented as discrete components or integrated in the functionality of a hardware component such as an ASIC, FPGA, DSP, or similar device.

A 'hardware component' is a tangible (e.g., non-transitory) physical component (e.g., a set of one or more processors) capable of performing a particular operation and configured or arranged in a particular physical manner. In some implementations, the hardware components include dedicated circuitry or logic that is permanently configured to perform certain operations. In some implementations, the hardware component is or includes a special purpose processor, such as a Field Programmable Gate Array (FPGA) or ASIC. In some implementations, the hardware components also include programmable logic or circuitry that is temporarily configured by software to perform certain operations.

Thus, the term 'hardware component' should be understood to include a tangible entity that is physically constructed, permanently configured (e.g., hardwired), or temporarily configured (e.g., programmed) to operate in a particular manner or to perform a particular operation described herein.

Further, in some implementations, the modules and components are implemented as firmware or functional circuitry within a hardware device. Further, in some implementations, the modules and components are implemented in any combination of hardware devices and software components, or only in software (e.g., code stored or otherwise implemented in a machine-readable medium or transmission medium).

Those skilled in the art will recognize that a wide variety of modifications, alterations, and combinations can be made with respect to the above described examples without departing from the scope of the disclosed concepts, and that such modifications, alterations, and combinations are to be viewed as being within the ambit of the disclosure.

It will be appreciated that although the various approaches described above may be described implicitly or explicitly as 'optimal', engineering involves a compromise and thus an approach that is optimal from one perspective may not be optimal from another perspective. Furthermore, a somewhat sub-optimal approach may still be useful. Thus, both optimal and suboptimal solutions should be considered within the scope of the present disclosure.

Those skilled in the art will also recognize that the scope of the present invention is not limited by the examples described herein, but is instead defined by the following claims.

29页详细技术资料下载

Loudspeaker control

相关技术

网友询问留言