缥缈技术 - 缥缈技术

low complexity multi-channel intelligent loudspeaker with voice control

文档序号：1697183 发布日期：2019-12-10 浏览：19次中文

阅读说明：本技术 具有语音控制的低复杂度多声道智能扩音器 (low complexity multi-channel intelligent loudspeaker with voice control ) 是由 U.霍尔巴赫 M.克龙拉赫纳于 2019-05-30 设计创作，主要内容包括：本公开提供一种具有语音控制的低复杂度多声道智能扩音器。具体提供一种数字信号处理器,其被编程为：从立体声输入中提取中心声道；使用第一组有限脉冲响应滤波器和第一旋转矩阵,将所述中心声道应用于扬声器元件阵列,以目标角围绕轴产生第一束音频内容；使用第二组有限脉冲响应滤波器和第二旋转矩阵,将所述立体声输入的左声道应用于所述扬声器元件阵列,以从所述目标角的第一偏移角围绕所述轴产生第二束音频内容；以及使用第三组有限脉冲响应滤波器和第三旋转矩阵,将所述立体声输入的右声道应用于所述扬声器元件阵列,以从所述目标角的第二偏移角围绕所述轴产生第三束音频内容。(The present disclosure provides a low complexity multi-channel smart loudspeaker with voice control. Specifically, a digital signal processor is provided that is programmed to: extracting a center channel from a stereo input; applying the center channel to an array of speaker elements using a first set of finite impulse response filters and a first rotation matrix to produce a first beam of audio content at a target angle about an axis; applying a left channel of the stereo input to the array of speaker elements using a second set of finite impulse response filters and a second rotation matrix to produce a second beam of audio content about the axis from a first offset angle of the target angle; and applying a right channel of the stereo input to the array of speaker elements using a third set of finite impulse response filters and a third rotation matrix to produce a third beam of audio content around the axis from a second offset angle of the target angle.)

1. A smart microphone, comprising:

An array of N speaker elements disposed in a circular configuration about an axis and configured for multi-channel audio playback; and

A digital signal processor programmed to:

The center channel is extracted from the stereo input,

Applying the center channel to the array of speaker elements using a first set of finite impulse response filters and a first rotation matrix to produce a first beam of audio content at a target angle about the axis,

Applying a left channel of the stereo input to the array of speaker elements using a second set of finite impulse response filters and a second rotation matrix to produce a second beam of audio content about the axis from a first offset angle of the target angle, an

Applying a right channel of the stereo input to the array of speaker elements using a third set of finite impulse response filters and a third rotation matrix to produce a third beam of audio content around the axis from a second offset angle of the target angle.

2. The smart loudspeaker of claim 1, wherein extracting the center channel using the digital signal processor comprises: a high frequency path that performs center extraction on a high frequency at a first sampling rate; a low frequency path that performs center extraction on low frequencies at a second sampling rate lower than the first sampling rate; and an adder that combines an output of the high frequency path with an output of the low frequency path to create the center channel.

3. The smart loudspeaker of claim 1, further comprising: an array of M microphone elements disposed in a circular configuration about the axis and configured to receive an audio signal and provide an electrical signal, wherein the digital signal processor is further programmed to utilize a microphone beamformer to perform steerable microphone array beamforming of the electrical signal at the target angle to receive a speech input.

4. The smart loudspeaker of claim 3, wherein the digital signal processor is further programmed to calibrate the M arrays of microphone elements by convolving the electrical signal from each of the microphone elements with a minimum phase correction filter and a target microphone that is one of the microphone elements of the array.

5. The smart loudspeaker of claim 4, wherein the array of microphone elements further comprises a microphone element at a center of the circular configuration, wherein the target microphone is the center microphone.

6. The smart loudspeaker of claim 3, wherein the digital signal processor is further programmed to calibrate the microphone array using in-situ calibration comprising:

Estimating a frequency response of a reference microphone of the microphone array using the audio playback of the speaker element array as a reference signal; and

Equalizing the microphones of the array according to the frequency response.

7. the smart loudspeaker of claim 3, wherein the digital signal processor is further programmed to utilize a single adaptive Acoustic Echo Canceller (AEC) filter pair keyed to the stereo input of the array of microphone elements.

8. the smart loudspeaker of claim 7, wherein the microphone array is 10 millimeters in diameter.

9. The smart loudspeaker of claim 4, wherein M is 6-8.

10. A method for a smart loudspeaker, comprising:

the center channel is extracted from the stereo input,

applying the center channel to an array of speaker elements using a first set of finite impulse response filters and a first rotation matrix to produce a first beam of audio content at a target angle about an axis, the array of speaker elements disposed in a circular configuration about the axis and configured for multi-channel audio playback;

applying a right channel of the stereo input to the array of speaker elements using a third set of finite impulse response filters and a third rotation matrix to produce a third beam of audio content around the axis from a second offset angle of the target angle.

11. The method of claim 10, further comprising: performing center extraction on the high frequency with a first sampling rate using a high frequency path; a low frequency path that performs center extraction on low frequencies at a second sampling rate lower than the first sampling rate; and an adder that combines an output of the high frequency path with an output of the low frequency path to create the center channel.

12. The method of claim 10, further comprising: performing steerable microphone array beamforming with a microphone beamformer at the target angle to receive a speech input from an array of M microphone elements arranged in a circular configuration about the axis and configured to receive an audio signal and provide an electrical signal.

13. the method of claim 12, further comprising: calibrating the array of microphone elements by convolving the electrical signal from each of the microphone elements using a minimum phase correction filter and a target microphone that is one of the microphone elements of the array.

14. The method of claim 13, wherein the M microphone element arrays further comprise a microphone element at a center of the circular configuration, wherein the target microphone is the center microphone.

15. A method as in claim 12, further comprising calibrating the microphone array using in-situ calibration comprising:

estimating a frequency response of a reference microphone of the microphone array using the audio playback of the speaker element array as a reference signal; and

equalizing the microphones of the array according to the measured frequency response.

16. the method of claim 12, further comprising: utilizing a single adaptive Acoustic Echo Canceller (AEC) filter pair keyed to the stereo input of the array of microphone elements.

17. The method of claim 16, wherein the microphone array is 10 millimeters in diameter.

Technical Field

aspects of the present disclosure generally relate to a low complexity multi-channel smart loudspeaker with voice control.

background

Smart microphones with voice control and internet connectivity are becoming increasingly popular. End users expect products to perform various functions, including: understanding the sound from a user from any remote point in a room even when playing music, responding quickly to a user request and interacting with the user, focusing on one voice command while suppressing other voice commands, playing stereo music with high quality, playing music in a room can achieve the effect of a small home theater system, and automatically adjusting to the user's location in the room where the music is heard.

disclosure of Invention

in one or more illustrative examples, a smart microphone includes: an array of N speaker elements disposed in a circular configuration about an axis and configured for multi-channel audio playback; and a digital signal processor. The digital signal processor is configured to: extracting a center channel from a stereo input; applying the center channel to the array of speaker elements using a first set of finite input response filters and a first rotation matrix to produce a first beam of audio content at a target angle about the axis; applying a left channel of the stereo input to the array of speaker elements using a second set of finite input response filters and a second rotation matrix to produce a second beam of audio content about the axis from a first offset angle of the target angle; and applying a right channel of the stereo input to the array of speaker elements using a third set of finite input response filters and a third rotation matrix to produce a third beam of audio content around the axis from a second offset angle of the target angle.

In one or more illustrative examples, a method for a smart loudspeaker includes: extracting a center channel from a stereo input; applying the center channel to an array of speaker elements using a first set of finite input response filters and a first rotation matrix to produce a first beam of audio content at a target angle around the axis, the array of speaker elements disposed in a circular configuration around the axis and configured for multi-channel audio playback; applying a left channel of the stereo input to the array of speaker elements using a second set of finite input response filters and a second rotation matrix to produce a second beam of audio content about the axis from a first offset angle of the target angle; and applying a right channel of the stereo input to the array of speaker elements using a third set of finite input response filters and a third rotation matrix to produce a third beam of audio content around the axis from a second offset angle of the target angle.

Drawings

FIG. 1 shows a simplified block diagram of a smart microphone;

FIG. 2 illustrates an example three beam application using smart loudspeakers;

FIG. 3A shows a view of an example smart microphone;

FIG. 3B illustrates a cross-sectional view of an example smart microphone;

Fig. 4 shows a view of an example seven-channel microphone array for a smart loudspeaker;

FIG. 5 shows a graph comparing the performance of a single AEC filter on an array microphone with the performance on a reference microphone;

Fig. 6 shows an example block diagram of the center extraction function of the upmixer of the smart loudspeaker as shown in fig. 1;

FIG. 7 shows an example of a six speaker array with low frequency drivers;

FIG. 8 illustrates an example system block diagram of a beamforming filter and rotation matrix of a medium to high frequency driver and a signal path of a low frequency driver;

FIG. 9 shows an example rotation of a sound field using smart loudspeakers;

FIG. 10 illustrates an example cross-filter frequency response of a smart microphone;

FIG. 11 illustrates an example approximation of a low frequency driver target response;

FIG. 12 shows example high frequency responses at different angles around a smart microphone;

FIG. 13 shows a combined transducer filter, impulse response, amplitude response and phase for a smart microphone;

fig. 14 shows an example contour plot of a forward beam using a smart microphone in a narrow beam configuration;

Fig. 15 shows an example contour plot of a forward beam using a smart microphone in a mid-beam configuration;

Fig. 16 shows an example contour plot of a forward beam using a smart microphone in an omni-directional beam configuration;

fig. 17 shows an example contour plot of a forward beam using a smart microphone in an omni-directional beam configuration with three mid-beam configurations;

FIG. 18 illustrates an example of the frequency response of the microphones of the microphone array before calibration;

FIG. 19 shows an example of the frequency response of the microphones of the microphone array after calibration;

Fig. 20 shows an example of initial filter and angular attenuation of a microphone array;

Fig. 21 shows the phase response of the initial beamforming filter of the microphone array;

FIG. 22 illustrates an example contour plot of a microphone array beamformer;

FIG. 23 illustrates example directivity indices of a microphone array beamformer;

Fig. 24 illustrates an example microphone array layout with six microphones and three beamforming filters;

FIG. 25 illustrates an example frequency response of an optimized microphone array beamforming and EQ filter;

Fig. 26 illustrates an example phase response of a microphone array of optimal beamforming filters;

Fig. 27 shows an example of a white noise gain;

FIG. 28 shows an example of an optimized off-axis response;

FIG. 29 shows an example contour plot of the beam shaping results after optimization;

Figure 30 shows example directivity indices of the post-optimization beamforming results at two different filter lengths;

FIG. 31 shows an example process of loudspeaker operation; and is

FIG. 32 is a conceptual block diagram of a computing system configured to implement one or more aspects of various embodiments.

Detailed Description

As required, detailed embodiments of the present invention are disclosed herein; however, it is to be understood that the disclosed embodiments are merely exemplary of the invention that may be embodied in various and alternative forms. The figures are not necessarily to scale; some features may be exaggerated or minimized to show details of particular components. Therefore, specific structural and functional details disclosed herein are not to be interpreted as limiting, but merely as a representative basis for teaching one skilled in the art to variously employ the present invention.

To achieve the characteristics of a smart loudspeaker, it is desirable to combine a powerful host processor with WIFI connectivity, a real-time signal processor including steerable beamforming for receiving and transmitting sound, and a multi-channel echo cancellation filter bank. These components require a large amount of processing power. On the other hand, wireless portability with battery power option is often desirable. The present disclosure provides a solution that satisfies the need for audio quality and smart microphone characteristics while minimizing processing costs.

fig. 1 shows a simplified block diagram of a smart loudspeaker 100. As shown, the circuit in the figure receives an audio input 102 having a left (L) channel and a right (R) channel. This audio input 102 is provided to an upmixer 104. The upmixer 104 is configured to generate a center channel (C) from two-channel stereo sound sources, i.e., the (L) and (R) channels of the audio input 102, resulting in an upmixed signal 106 having a left negative center (L-C), a center (C), and a right negative center (R-C), as shown. Further operational details of the upmixer 104 are discussed below in connection with center channel extraction in the context of fig. 6.

The microphone 100 may also include a microphone beamformer 108. The loudspeaker beamformer 108 may have three inputs configured to receive the upmix signals 106(L-C), (R-C), and (C) from the upmixer 104. The loudspeaker beamformer 108 may also be connected to an L-array (typically L6 … 8) of loudspeakers 110. Each of the input channels (L-C), (R-C), and (C) corresponds to a beam of sound waves defining a beam width.

fig. 2 shows an example three-beam application 200 using the smart microphone 100. The three steering angles α L, α R, and α C define the pointing direction of the beam. Typically, the center (C) containing the dialog and the primary performer will be directed towards the listener, while the stereo channels are transmitted towards the room walls, causing the reflected sound to reach the listener, creating a sense of sound immersion and achieving the desired stereo image width and depth. The stereo angles al, ar can be adjusted individually to maximize the stereo effect, while the entire studio can be rotated (from ALL angles simultaneously) by the angle aall towards the listener.

referring back to fig. 1, loudspeaker 100 may additionally include an array of M microphones 112 arranged in a circle (typically M ═ 4.. 8 microphones). The in-situ microphone auto-calibration stage 116 may receive the microphone signal 114 from the microphone 112. The calibration signal 118 from the auto-calibration stage 116 may be provided to a microphone beamformer 120, the microphone beamformer 120 configured to transmit a speech output signal 122 suitable for a speech recognition engine (not shown) based on a microphone angle aM 124.

The loudspeaker 100 also includes a dual-input/one-output adaptive Acoustic Echo Canceller (AEC) filter 126. The AEC output signal 128 approximates the music signal received by the microphone 112 from the input channels 102(L) and (R) and travels from the loudspeaker 110 to the microphone 112 through direct and indirect (room reflection) paths. By subtracting this signal 128 from the microphone signal 114, music will be suppressed and only the desired speech signal will be heard.

Fig. 3A shows an example view 300A of an example smart loudspeaker 100. Fig. 3B shows a cross-sectional view 300B of an example smart microphone 100. In each of fig. 3A and 3B, the example smart array loudspeaker 100 includes 6 tweeters regularly spaced at 60 ° angular increments built into a cylindrical housing, and a woofer emitting downward. It should be noted that tweeter arrays having different numbers of devices may be used in other examples.

Fig. 4 shows a view of an example 400 of a seven-channel microphone array 112 for a smart loudspeaker 100. As shown, the microphone array 112 may be built into the center of the top cover of the loudspeaker 100. The array 112 shown includes six closely spaced microphones arranged in a circle, and an optional center microphone. Examples without a center microphone, or with more or fewer microphones in the microphone array 112, may be used.

the microphone may be small in diameter, for example, typically 10 mm in diameter. This greatly simplifies the AEC 126 of the system. In other systems, the microphones may be placed in a circular arrangement, typically 4-10 centimeters (cm). This approach requires a separate AEC filter pair for each microphone of the array 112 because the acoustic response varies significantly with increasing distance. By reducing the diameter of the microphone array 112, the processing power to perform AEC can be reduced by a factor of M (i.e., the number of microphones) by applying only one AEC filter pair instead of M pairs. Reference to AEC may be to the center microphone signal or the signal averaged over the circle for M array microphones 112.

Fig. 5 shows an example comparison graph 500 of the performance of a single AEC filter on various array microphones 112 versus the performance on a reference microphone. The graph 500 shows the attenuation in dB on the Y-axis for each microphone of the microphone array 112 over the frequency range shown on the X-axis. A broadband attenuation of the AES performance of less than 10dB is observed at microphone position 1.. 6 compared to reference position 7. Thus, example graph 500 illustrates the effectiveness of this approach.

fig. 6 shows an example block diagram 600 of the center extraction function of the upmixer 104 of the smart loudspeaker 100 as shown in fig. 1. Thus, fig. 6 shows further operational details of the up-mixer 104 performing the center channel extraction. In general, the upmixer 104 receives the left (L) and right (R) channels of the audio input 102 and processes the input to generate a center channel (C) 106. As shown in fig. 2, this center channel (C)106 may be directed towards the listener, while the stereo channels (L) and (R)102 may be transmitted towards the room walls.

referring more specifically to fig. 6, an audio input 102 having a left (L) channel and a right (R) channel is split into two paths, a high frequency path and a low frequency path. The high frequency path starts with a low order recursive Infinite Impulse Response (IIR) high pass filter 602 for each of the (L) and (R) channels. In one example, IIR high pass filter 602 may be implemented as a second order butterworth filter with a (-3dB) roll-edge frequency of 700.. 1000Hz. The low pass filter path may begin with a pair of Finite Impulse Response (FIR) decimation filters 604. In one non-limiting example, the decimation filter 604 may decimate 16.

The output of each of the high pass filter 602 and the low pass decimation filter 604 is provided to a Short Term Fourier Transform (STFT) block 606 using a bi-directional time/frequency analysis scheme. The upmixer 104 performs a two-way time/frequency analysis scheme that uses a very short fourier transform length, typically 128, with a jump length of 48, thus achieving a higher time resolution than methods that use longer time lengths. A method that applies a single Fast Fourier Transform (FFT) of length 1024, with a time resolution that may be 10.. 20 milliseconds (msec), depending on the length of the overlap. By using a shorter transmission length, the temporal resolution is shortened by a factor of ten, which is more closely related to human perception (e.g., 1.. 2 msec). Due to the sub-sampling of the low frequency band, the frequency resolution is improved without being affected. At the same time, aliasing distortion which may occur in a polyphase filter bank for non-linear processing is avoided. Thus, the two-way time/frequency analysis scheme results in exceptional fidelity and sound quality, while artifacts are suppressed to be inaudible. U.S. patent publication No. 2013/0208895, entitled "Audio Surround Processing System," which is incorporated herein by reference in its entirety, describes other aspects of the operation of the scheme.

The (L) and (R) outputs of the STFT block 606 of the high frequency path are provided to a center extraction block 608. Similarly, the (L) and (R) outputs of the STFT block 606 of the low frequency path are provided to another center extraction block 608.

Notably, the STFT block 606 and center extraction block 608 in the low frequency path are typically f_S/r_SAt a reduced sampling rate of, wherein f_S＝48kHz，r_S16. This increases the low frequency resolution by r_Smultiple times, the same short STFT length 128 may therefore be used.

The recombination after the respective central extraction processes in the high-frequency path and the low-frequency path is from a reduced sampling rate f by an inverse STFT_S16 to the original sampling rate f_SAnd delay compensation at high frequencies to match the higher delay due to FIR filtering of the low frequency path. More specifically, each center extraction block 608 feeds into a separate inverse STFT block 610. the output of the inverse STFT block 610 on the low frequency path is fed to a FIR interpolation filter 612, which FIR interpolation filter 612 may interpolate to account for the decimation performed at block 604. The output of the inverse STFT block 610 on the high frequency path may then be fed to a delay compensation block 614. The outputs of the FIR interpolation filter 612 and the delay compensation block 614 may then be combined using an adder 616, where the output of the adder 616 is the center output (C) channel 106.

Referring more specifically to the algorithm implemented by the center extraction block 608 itself, the following values may be calculated as follows:

P＝[|V_L|²+|V_R|²]/2 (1)

where P is the average signal energy, V_LIs a short-term signal spectrum complex vector of (L) the input channel 102 signal, and V_RA short-term signal spectrum complex vector of (R) input channel 102 signal;

wherein V_XRepresents the absolute value of the cross-spectral density; and is

p_c＝V_x/P (3)

Wherein p is_cis calculated as the cross-spectral density V_XQuotient of the absolute value of (d) and the ratio of the average signal energy P. This quotient may be referred to as a "time/frequency mask".

Using these values, p_cTime average ofIs recursively calculated with an update coefficient α (typically α ═ 0.2/r)_S) Is estimated. The time index i represents the actual number of blocks (e.g., i +1, where each hop is 48 samples long). The operation can be expressed as follows:

Then, the center signal is extracted using a nonlinear mapping function F. Need to makeIs obtained by multiplying the sum of the inputs (as a single signal) by a non-linear function of the maskAnd then obtaining the product. This function can be optimized to get the best compromise between channel separation and low distortion. The operation can be expressed as follows:

Fig. 7 shows an example 700 of a beamforming design for loudspeaker 100. As shown, six tweeters T1.. T6 are evenly arranged around the circle, supplemented by one woofer W providing low frequency spread, but without beamforming below the crossover frequency fC (typically 200.. 400 Hz, in this example fC 340 Hz).

Fig. 8 shows a system block diagram 800 of the beamformer 108 of the example loudspeaker 100 shown in fig. 7. The diagram 800 includes the rotation matrices of the beamforming filters (h1, h26, h35, h4) and the medium and high frequency drivers, as well as the signal paths of the low frequency drivers. As shown, the tweeter T1 is connected to a beam-forming FIR (finite impulse response) filter h1, the tweeters T2 and T6 are connected to a filter h26, the tweeters T3 and T5 are connected to a filter h35, and the tweeter T4 is connected to a filter h 4. Notably, these tweeter pairs may share the same filter due to the symmetry of the beam with respect to the principal axis.

The beam may be rotated to any desired angle phi by reassigning the tweeter. For example, a rotation of 60 ° may be achieved by connecting the filter h1 to the tweeter T2 and the filter h26 to the tweeter pair T1 and T3, and so on. Additionally, any angle in between may be achieved by linear interpolation of the corresponding tweeter signal. The rotation is implemented as a 4 x 6 gain matrix since there are 4 beamforming filters and 6 tweeters in this example. However, a different number of filters and tweeters may affect the dimension of the rotation matrix. Other interpolation rules, such as cosine or cosine squared, may additionally or alternatively be used in addition to linear interpolation.

Fig. 9 shows an example rotation 900 of a sound field using a smart loudspeaker 100. In a multi-channel application, for example using channels (L-C), (R-C) as shown in fig. 9, each channel is connected to its own set of beamforming filters and rotation matrices. Compared with FIG. 2, the whole sound field in FIG. 9 rotates by angle φ_{all are}And (L) channel rotation phi_L-φ_{All are}And (R) channel rotation phi_R-φ_{All are}. To perform the rotation, the (L-C) channel may use a first beamforming filter and a rotation matrix, the (C) channel may use a second beamforming filter and a rotation matrix, and the (R-C) channel may use a third beamforming filter and a rotation matrix.

Referring back to fig. 8, the woofer processing path includes a crossover filter hW, an optional recursive (IIR) high pass filter (for cutting frequencies below the woofer operating range), and an optional limiter. The cross filter can be designed as a FIR filter to implement an acoustic linear phase system. Other aspects of the Crossover Filter are described in U.S. patent No. 7,991,170, entitled "loudspaker Crossover Filter," which is incorporated herein by reference in its entirety.

fig. 10 shows an example 1000 of a cross-filter frequency response of a smart microphone 100. In the exemplary diagram 1000, the Y-axis represents decibels, while the X-axis represents a frequency range. As shown, the low frequency driver crosses the high frequency driver at about 340 Hz. Typically, the crossover filter is designed to equalize the measured loudspeaker response with respect to a crossover target.

FIG. 11 shows an example approximation 1100 of a low frequency driver target response. In the exemplary diagram 1100, the Y-axis represents decibels, while the X-axis represents a frequency range. Notably, the tweeter cross high pass filter may be decomposed into a beamforming filter.

The design of the beamforming filter may be based on acoustic data. In one example, an impulse response may be captured in an anechoic chamber. Each array driver may be rotated by a turntable, measured at discrete angles around the speaker. Other aspects of beamforming filter design are discussed in more detail in international application No. PCT/US17/49543, entitled "Variable optics Loudspeaker," which is incorporated herein by reference in its entirety.

by computing the complex spectrum using fourier transforms, the acoustic data can be pre-processed. Complex smoothing is then performed by calculating magnitude and phase, smoothing the magnitude and phase responses separately, and converting the data to complex spectral values. In addition, by multiplying each spectrum by its inverse, the angular response can be normalized to the spectrum of the front-end transducer at 0 °. This inverse response may later be used for global equalization.

Fig. 12 shows an example high frequency response 1200 for different angles around the smart loudspeaker 100. More specifically, example 1200 shows the amplitude response of the front-end transducer as seen over an angle of 15 ° to 180 ° in steps of 15 °. In the exemplary diagram 1200, the Y-axis represents decibels, while the X-axis represents a frequency range.

the measured smoothed complex frequency response may be represented in matrix form as follows:

H_sm(i，j)，i＝1...N，j＝1...M， (6)

where the frequency index is i, N is the FFT length (N2048 in the example shown), and M is the number of angle measurements within the pitch [0.. 180] ° (M13 for 15 ° steps in the example shown).

An array of R drivers (here, R ═ 6) contains a front driver at 0 °, a back driver at 180 °, and a driver at an angleP ═ R-2/2 driver pairs.

P beamforming filters C_rAre designed such that they are connected to a driver pair, wherein an additional filter C is provided for the rear driver_P+1. First, as described above, the measured frequency response is normalized with respect to the previous response by an angle greater than zero to eliminate the driver frequency response. This normalization can be reconsidered later when designing the final filter in the form of driver equalization, as follows:

H₀(i)＝H_sm(i，1)； (7)

H_norm(i，j)＝H_sm|(i，j)/H₀(i)，i＝1...N，j＝1...M

The filter design iteration works for each frequency point separately. For convenience, the frequency index may be removed as follows:

H(α_k)：＝H_norm(i，k) (8)

Because the measured and normalized frequency response is at the dispersion angle alpha_k。

Assuming radial symmetry, a cylindrical housing and identical drivers, the frequency response of the array, u (k), may be at an angle α by applying the same deflection angle to all drivers_kThe calculation is as follows:

Spectral filter value C_rit can be obtained iteratively by minimizing a quadratic error function:

Where t (k) is a spatial objective function specific to the selected beamwidth, as defined below.

Parameter a defines the array gain:

a_{Gain of}＝20log(a)

the array gain specifies the amount of sound played by the array that is greater than one single transducer. It should be higher than 1, but not higher than the total number of transducers R. To allow some of the acoustic cancellation required for super-directional beamforming, the array gain will be less than R, but should be much higher than 1. Typically, the array gain is frequency dependent and must be carefully selected to obtain good approximation results.

in addition, Q is the number of angle target points (e.g., Q9). Further, w (k) is a weighting function that can be used if higher precision is required at a particular approximation point compared to another approximation point (typically 0.1< w < 1).

The optimized variables are P +1 complex filter values/frequency indices i, C_r(i) R 1. (P + 1). The optimization may be from a first frequency point in the frequency band of intereststart (e.g. f)₁＝100Hz，f_g＝24KHz，N＝2048＝＞i₁Not equal to 8), willSet to the starting solution, then calculate the filter value by incrementing the index each time until the last point is reached

The non-linear optimization procedure may use the magnitude | C_r(i) i and unwrapped phase arg (C)_r(i))＝arctan(Im{C_r(i)}/Re{C_r(i) }) as variables instead of real and imaginary parts.

This bounded nonlinear optimization problem can be solved with standard software, such as the function "fmincon" in the Matlab optimization toolkit. The following ranges may apply:

G_max＝20*log(max(|C_r|)) (11)

The maximum allows the filter gain, and the upper and lower limits of the magnitude from one calculated frequency point to the next to be calculated are specified by the input parameter δ, as follows:

|C_r(i)|·(1-δ)＜|C_r(i+1)|＜|C_r(i)|·(1+δ)

(12)

To control the smoothness of the resulting frequency response.

an example of a design using an array diameter of 150 mm, where 6 mid/tweeters cross at 340Hz, is discussed below.

in the narrow beam example, fig. 13-14 show the results using the loudspeaker 100 of fig. 1. Parameters for the narrow beam example are as follows:

Objective function t_k＝[-1.5 -3.5 -8 -12 -15 -18 -20 -20]

Position alpha_k＝[15 30 45 60 90 120 150 180]°

Number of drivers R6

Number of driver pairs P2

Computing a beamforming filter C₁，C₂，C₃

the array gain is 12dB, and f is less than 1 kHz;

4dB,f>3.0kHz；

-3dB,f>7.5kHz。

The middle two bands are transition bands whose array gain decreases linearly from the previous value to the new value.

Maximum filter gain G_max＝5dB

Smoothing limit δ equal to 1.0

Fig. 13 shows the optimization results 1300 for a narrow beam example. These results include the combined transducer filter, impulse response, amplitude response, and phase of the smart microphone 100. The filter includes beamforming, crossover and driver EQ. As shown, the filter is smooth, does not exhibit too much time-spreading effect (pre-filtering), and requires very limited low-frequency gain, which is important to achieve sufficient dynamic range.

Fig. 14 shows a contour plot 1400 of the forward beam in a narrow beam configuration. The constant directivity of the entire frequency band 100Hz...20kHz reaches a high degree, except for some small artifacts, which are hardly audible around 4-5 kHz.

Fig. 15 shows a contour diagram 1500 of the loudspeaker 100 of fig. 1 in a medium-wide beam configuration. The parameters of the medium-wide beam example are as follows:

Objective function t_k＝[0 -1.5 -3 -5 -10 -15 -20 -25],

Position alpha_k＝[15 30 45 60 90 120 150 180]°

number of drivers R6

Number of driver pairs P2

Computing a beamforming filter C₁，C₂，C₃

the array gain is 12dB, and f is less than 1 kHz;

0dB,f>3.0kHz；

-2dB,f>7.5kHz。

The middle two bands are transition bands whose array gain decreases linearly from the previous value to the new value.

maximum filter gain G_max＝5dB

Smooth limit δ equal to 0.5

Fig. 15 shows a contour plot of the medium broad beam.

the loudspeaker 100 may further be used in an omni-directional mode. For a single sound source, such as speech, an omni-directional pattern with a diffuse pattern that is as uniform and angle-independent as possible is often desired. First, the same method is used for wide beam design:

Objective function t_k＝[0 0 0 -2 -4 -5 -6 -6],

Position alpha_k＝[15 30 45 60 90 120 150 180]°

number of drivers R6

number of driver pairs P2

Computing a beamforming filter C₁，C₂，C₃

The array gain is 8dB, and f is less than 1 kHz;

3dB,f>3.0kHz；

2dB,f>10kHz。

The middle two bands are transition bands whose array gain decreases linearly from the previous value to the new value.

maximum filter gain G_max＝0dB

smooth limit δ equal to 0.2

Fig. 16 shows an example contour plot 1600 of a forward beam using the smart loudspeaker 100 in an omni-directional beam configuration. As shown, fig. 16 indicates that the results of only partially achieving an omnidirectional target are shown, since above 4kHz there is still a significant main beam direction with artifacts due to spatial aliasing.

fig. 17 shows an example contour diagram 1700 of a forward beam using the smart microphone 100 in an omni-directional beam configuration with three mid-beam configurations. Better results can be obtained by using the three "medium-wide" beams shown previously, pointing at 0 and +/-120, respectively, as shown in figure 17.

referring to the steerable microphone array 112, the microphone beamformer 120 may be designed in three stages, initial and in-situ calibration, closed-loop initial solution, and target optimization.

Low cost Electret Condenser Microphones (ECM) and micro-electromechanical systems (MEMS) microphones typically exhibit a deviation from the average response of typically +/-3dB in terms of microphone auto-calibration. This is confirmed in the example of fig. 18, which shows the measured far-field responses of 6 ECM microphones (e.g. the arrangement shown in fig. 4) arranged on a circle with a diameter of 10 mm. Since low frequency beamforming relies on microphone differential signals (which are small when the wavelength is large compared to the diameter), very high accuracy is required.

Fig. 18 shows an example 1800 of the frequency response of the microphones in the microphone array before calibration. Initial calibration is accomplished by convolving the signal of each microphone with the minimum phase correction filter that targets one of the microphones. The choice of reference is arbitrary-it may be the (optional) center microphone, or the front microphone. The filter design method is performed in the frequency domain logarithmic domain and the minimum phase impulse response is obtained by the hilbert transform, a method well known to DSP designers. A FIR filter length of 32 is sufficient because the deviation between the microphones below 1kHz is mainly caused by frequency independent gain errors.

Fig. 19 shows an example 1900 of the frequency response of the microphones of the microphone array after calibration.

In order to accommodate microphone aging or environmental conditions such as temperature and humidity, in-situ calibration is required from time to time. This can be achieved by estimating the response of the reference microphone over time or a dedicated test signal when playing music, and then equalizing the other microphones to this goal.

for the initial beamforming solution, the circular microphone array 112 has a closed solution in free air. A well-known design can be used to obtain an initial solution for subsequent non-linear optimization. The textbook "Design of cyclic ar Differential Microphone Arrays" (Springer 2015) by Jacob Benesty is incorporated by reference in its entirety and describes the calculation of the Microphone beamforming filter vector H ═ H1.. Hm ] as follows:

WhereinA "pseudo-coherence matrix" representing diffuse noise;

i is an identity matrix;

ω is the frequency;

c is the speed of sound;

the distance between microphones i and j is:

Wherein d is the array diameter;

D ═ D1.. Dm ] denotes a steering vector, where

ε is a regularization factor. In this example, ε is 1 e-5.

At the angle θ, the delay vector V ═ of the ideal, circular array of point sensors (v1.. VM) can be defined as:

By stacking the above delays V as follows_mWave beam filter H_mAnd complexity of conjugationguide vector element D_mWe obtain the complex response B of the microphone m at the angle θ_m：

Finally, the beam response U (theta) is obtained by performing complex summation on the individual responses:

Fig. 20 shows an example 2000 of initial filter and angular attenuation of a microphone array. As shown, example 200 includes filter frequency response | H for front microphone 1, rear microphone 4, and side pairs 2/6 and 3/5, respectively, normalized with respect to the front filter_mthe pre-filter is shown as an EQ filter, the filter frequency response will be applied to all microphones.

Fig. 21 shows an example 2100 of a phase response of an initial beamforming filter of a microphone array. Although the magnitude of a single filter is substantially flat, the EQ filter requires approximately 20dB of gain over a wide frequency interval to compensate for the loss of filter phase opposition between microphones. This gain is undesirable because the self-noise of the microphone is amplified by a certain amount. With reference to non-linear optimization, the main design goal is to reduce noise gain.

Fig. 22 shows an example contour plot 2200 of a microphone array beamformer. Fig. 23 shows an example directivity index 2300 of a microphone array beamformer. The contour plot shown in fig. 22 and the directivity index shown in fig. 23 record the quality of the beamformer.

With respect to non-linear post optimization, fig. 24 shows a six-microphone layout, with beamforming filter C₁、C₂and C₃To be determined. The method is similar to the loudspeaker beamforming design described earlier.

First, the data is pre-processed by complex smoothing in the frequency domain and normalization of the front transducer. Therefore, in the optimization process, the frequency response of the first transducer mic1 is set to a constant of 1. The beamforming filter need not be applied to the mic1, and a global EQ filter applied to all microphones may be used.

the objective function of the design is at an angle theta_k＝[0∶15∶180]Attenuation u at DEG_kIt can be solved from the initial solution u_k(f)＝|U(f，θ_k) | take, as indicated above. Since this response is frequency dependent, a number of constant objective functions are used for different frequency intervals. For example, at the transition frequency f_tr1000Hz or less, a first objective function u_k(f 2000Hz) may be used for approximation in an interval 100Hz...1000Hz, followed by a second objective function u_k(f 4000 Hz) for the remaining interval 1000Hz...20 kHz. This approach produces a subsequently narrower beam at higher frequencies.

C₁...C₃Can be set to the previously obtained beamforming filter H_mAs shown in fig. 20 and 21.

Except that the amplitude difference δ is allowed to iterate from one frequency to the next i + 1:

|C_r(i)|·(1-δ)＜|C_r(i+1)|＜|C_r(i)|·(1+δ)， (17)

applying the phase bound δ p:

arg(C_r(i))·(1-δ_P)＜arg(C_r(i+1))＜arg(C_r(i))·(1+δ_P)。 (18)

in summary, the following bounds apply:

Amplitude limit δ equal to 0.75

Phase bound delta pi/60

maximum beam filter gain of 12dB

Maximum EQ filter gain of 20dB

Fig. 25 illustrates an example frequency response 2500 of the optimized microphone array 112. Fig. 26 illustrates an example phase response 2600 for the microphone array 112 for an optimal beamforming filter. Thus, fig. 25 and 26 show the resulting magnitude and phase response of the beamforming filter after non-linear post-optimization.

The overall white noise gain can be calculated as:

Fig. 27 shows an example 2700 of white noise gain. As shown in fig. 27, the results show that the goal of reducing the White Noise Gain (WNG) from the initial 20dB (as shown in fig. 20) to less than 10dB has been achieved, while the performance is improved.

FIG. 28 shows an example 2800 of an optimized off-axis response. Fig. 29 shows an example contour plot 2900 of the post-optimization beamforming results. Fig. 30 shows an example directivity index 3000 of the post-optimization beamforming results at two different filter lengths. As can be seen by comparing fig. 28-30 with fig. 22-23, the performance is improved.

Fig. 31 shows an example process 3100 for loudspeaker 100 operation. In one example, the process may be performed by the loudspeaker 100 using the concepts discussed in detail above. At 3102, the variable acoustic microphone 100 receives the input signal 102. In one example, the input may be a stereo signal that is provided to the variable acoustic loudspeaker 100 and is to be processed by a digital signal processor.

at operation 3104, the loudspeaker 100 extracts a center channel from the input signal. In one example, the upmixer 104 is configured to generate a center channel (C) from two-channel stereo sound sources (i.e., the (L) and (R) channels of the audio input 102), resulting in an upmix signal 106 that exhibits a left negative center (L-C), a center (C), and a right negative center (R-C). Fig. 6 details other aspects of the operation of the up-mixer 104.

at operation 3106, the loudspeaker 100 generates a center channel beam for output by the loudspeaker 100. In one example, the digital signal processor may use a set of finite input response filters to generate a plurality of output channels for beamforming of the extracted center channel, at least as discussed with respect to fig. 8. The loudspeaker 100 may further generate a first beam of audio content at a target angle using a first rotation matrix. In one example, the output of the filter may be routed to the speaker channels at a target angle, at least as discussed with respect to fig. 2 and 9. The loudspeaker 100 may apply a beam of audio content to an array of speaker elements as shown in fig. 9. In one example, the array of speaker elements is six drivers of a tweeter array, as shown in fig. 7.

At operation 3108, the loudspeaker 100 generates stereo channel beams for output by the loudspeaker 100. In one example, at least as discussed with respect to fig. 8, the digital signal processor may use a set of finite input response filters to generate a plurality of output channels for beamforming of the (L) channel; the digital signal processor may generate a second plurality of output channels for beamforming of the (R) channel using a second set of finite input response filters. The loudspeaker 100 may also generate a left audio content beam at an angle offset from the target angle using a rotation matrix and a right audio content beam at an angle offset from the target angle using another rotation matrix. In one example, the output of the filter may be routed to the speaker channels at a target angle, at least as discussed with respect to fig. 2 and 9. The loudspeaker 100 may also apply these beams of audio content to an array of speaker elements, as shown in fig. 9. In one example, the array of speaker elements is six drivers of a tweeter array, as shown in fig. 7.

At 3110, the microphone array 112 is calibrated by the loudspeaker 100. In one example, the loudspeaker 100 calibrates the microphone array 112 by convolving the electrical signals from each of the microphone elements of the array 112 using a minimum phase correction filter and a target microphone that is one of the microphone elements. In another example, the microphone 100 performs an in-situ calibration comprising: estimating a frequency response of a reference microphone of the microphone array 112 using the audio playback of the speaker array 110 as a reference signal; and equalizing the microphones of array 112 according to the measured frequency response.

At operation 3112, the loudspeaker 100 receives microphone signals 114 from the microphone array 112. In one example, the processor of the loudspeaker 100 may be configured to receive raw microphone signals 114 from the slave microphone array 112.

At operation 3114, the loudspeaker 100 performs echo cancellation on the received microphone signal 114. In one example, the loudspeaker 100 utilizes a single Adaptive Echo Canceller (AEC) 126 filter pair that is keyed to the stereo input of the array of microphone elements. Due to the short distance between the microphone elements of array 112, and the calibration of array 112, it is possible to use a single AEC instead of M AECs. Other aspects of AEC operation are described above with reference to fig. 1. By subtracting the AEC signal 128 from the microphone signal 114, the audio content (such as L, R and C-beams) played by the loudspeaker 100 will be suppressed and only the desired speech signal will be heard.

at operation 3116, the loudspeaker 100 performs speech recognition on the echo-cancelled microphone signal 114. Thus, the microphone 100 may be able to respond to voice commands. After operation 3116, process 3100 ends.

Fig. 32 is a conceptual block diagram of an audio system 3200 configured to implement one or more aspects of various embodiments. As one example, these embodiments may include process 3100. As shown, the audio system 3200 includes a computing device 3201, one or more speakers 3220, and one or more microphones 3230. Computing device 3201 includes a processor 3202, an input/output (I/O) device 3204, and a memory 3210. Memory 3210 includes an audio processing application 3212 that is configured to interact with a database 3214.

the processor 3202 may be any technically feasible form of processing device configured to process data and/or execute program code. The processor 3202 may include, for example, but is not limited to, a system on chip (SoC), a Central Processing Unit (CPU), a Graphics Processing Unit (GPU), an Application Specific Integrated Circuit (ASIC), a Digital Signal Processor (DSP), a Field Programmable Gate Array (FPGA), and the like. Processor 3202 includes one or more processing cores. In operation, the processor 3202 is the primary processor of the computing device 3201, controlling and coordinating the operation of other system components.

I/O devices 3204 may include input devices, output devices, and devices capable of receiving input and providing output. For example, and without limitation, I/O device 3204 may include a wired and/or wireless communication device that transmits and/or receives data to speaker 3220, microphone 3230, a remote database, other audio devices, other computing devices, and so forth.

The memory 3210 may include one memory module or a group of memory modules. Audio processing applications 3212 in memory 3210 are executed by processor 3202 to perform the overall functions of computing device 3201 and thus coordinate operation of the overall audio system 3200. For example, and without limitation, audio processing application 3212 may process data obtained via one or more microphones 3230 to generate sound parameters and/or audio signals that are transmitted to one or more speakers 3220. The processing performed by the audio processing application 3212 may include, but is not limited to, filtering, statistical analysis, heuristic processing, acoustic processing, and/or other types of data processing and analysis.

the speaker 3220 is configured to generate sound based on one or more audio signals received from the computing system 3200 and/or an audio device (e.g., a power amplifier) associated with the computing system 3200. The microphone 3230 is configured to acquire acoustic data from the surrounding environment and transmit signals associated with the acoustic data to the computing device 3201. Computing device 3201 may then process the acoustic data obtained by microphone 3230 to determine and/or filter an audio signal reproduced by speaker 3220. In various embodiments, the microphone 3230 may include any type of transducer capable of acquiring acoustic data, including but not limited to a differential microphone, a piezoelectric microphone, an optical microphone, and the like.

In general, computing device 3201 is configured to coordinate the overall operation of audio system 3200. In other embodiments, the computing device 3201 may be coupled to, but separate from, other components of the audio system 3200. In such embodiments, the audio system 3200 may include a separate processor that receives data obtained from the surrounding environment and transmits the data to the computer device 3201, and the computer device 3201 may be included in a separate device such as a personal computer, audio video receiver, power amplifier, smartphone, portable media player, wearable device, or the like. However, embodiments disclosed herein contemplate any technically feasible system configured to implement the functionality of the audio system 3200.

The description of the various embodiments has been presented for purposes of illustration, but is not intended to be exhaustive or limited to the disclosed embodiments. Many modifications and variations will be apparent to those of ordinary skill in the art without departing from the scope and spirit of the described embodiments.

Aspects of the present embodiments may be embodied as a system, method or computer program product. Accordingly, aspects of the present disclosure may take the form of an entirely hardware embodiment, an entirely software embodiment (including firmware, resident software, micro-code, etc.) or an embodiment combining software and hardware aspects that may all generally be referred to herein as a "module" or "system. Furthermore, aspects of the present disclosure may take the form of a computer program product embodied in one or more computer-readable media having computer-readable program code embodied thereon.

Any combination of one or more computer-readable media may be utilized. The computer readable medium may be a computer readable signal medium or a computer readable storage medium. A computer readable storage medium may be, for example, but not limited to, an electronic, magnetic, optical, electromagnetic, infrared, or semiconductor system, apparatus, or device, or any suitable combination of the foregoing. More specific examples (a non-exhaustive list) of the computer readable storage medium would include the following: an electrical connection having one or more wires, a portable computer diskette, a hard disk, a Random Access Memory (RAM), a read-only memory (ROM), an erasable programmable read-only memory (EPROM or flash memory), an optical fiber, a portable compact disc read-only memory (CD-ROM), an optical storage device, a magnetic storage device, or any suitable combination of the foregoing. In the context of this document, a computer readable storage medium may be any tangible medium that can contain, or store a program for use by or in connection with an instruction execution system, apparatus, or device.

Aspects of the present disclosure are described above with reference to flowchart illustrations and/or block diagrams of methods, apparatus (systems) and computer program products according to embodiments of the disclosure. It will be understood that each block of the flowchart illustrations and/or block diagrams, and combinations of blocks in the flowchart illustrations and/or block diagrams, can be implemented by computer program instructions. These computer program instructions may be provided to a processor of a general purpose computer, special purpose computer, or other programmable data processing apparatus to produce a machine, such that the instructions, which execute via the processor of the computer or other programmable data processing apparatus, enable the functions/acts specified in the flowchart block or blocks and/or block diagram block or blocks to be implemented. Such a processor may be, but is not limited to, a general purpose processor, a special purpose processor, an application specific processor, or a field programmable processor.

The flowchart and block diagrams in the figures illustrate the architecture, functionality, and operation of possible implementations of systems, methods and computer program products according to various embodiments of the present disclosure. In this regard, each block in the flowchart or block diagrams may represent a module, segment, or portion of code, which comprises one or more executable instructions for implementing the specified logical function(s). It should also be noted that, in some alternative implementations, the functions noted in the block may occur out of the order noted in the figures. For example, two blocks shown in succession may, in fact, be executed substantially concurrently, or the blocks may sometimes be executed in the reverse order, depending upon the functionality involved. It will also be noted that each block of the block diagrams and/or flowchart illustration, and combinations of blocks in the block diagrams and/or flowchart illustration, can be implemented by special purpose hardware-based systems which perform the specified functions or acts, or combinations of special purpose hardware and computer instructions.

While exemplary embodiments are described above, these embodiments are not intended to describe all possible forms of the invention. Rather, the words used in the specification are words of description rather than limitation, and it is understood that various changes may be made without departing from the spirit and scope of the invention. In addition, features of various implementing embodiments may be combined to form further embodiments of the invention.

50页详细技术资料下载

上一篇：一种医用注射器针头装配设备

下一篇：一种耳机测试方法和耳机

low complexity multi-channel intelligent loudspeaker with voice control

相关技术

网友询问留言