A kind of sound collection method, device and medium

文档序号:1757177 发布日期:2019-11-29 浏览:14次 中文

阅读说明:本技术 一种声音采集方法、装置及介质 (A kind of sound collection method, device and medium ) 是由 龙韬臣 侯海宁 于 2019-08-15 设计创作,主要内容包括:本公开是关于一种声音采集方法,将M个声音采集装置采集到的M个时域信号转换为M个原始频域信号;在N个预设网格点中的每一点处,对所述M个原始频域信号进行波束成型,以获得与所述N个预设网格点一一对应的N个波束成型频域信号;基于所述N个波束成型频域信号,确定与K个频点中的每一个相对应的N个频率分量的平均幅度,并且合成包括所述K个频点、且在每一个频点处以所述平均幅度为幅度、参考声音采集装置的原始频域信号的相位为相位的合成频域信号;将合成频域信号转换成合成时域信号。通过应用本公开实施例的声音采集方法,声音采集阵列采集的原始时域信号中的干扰方向上的噪声得到很好的抑制,由此获得增强的时域信号。(The disclosure is directed to a kind of sound collection methods, and the collected M time-domain signal of M voice collection device is converted to M original frequency domain signal;At the every bit in N number of specific trellis point, beam forming is carried out to the M original frequency domain signal, to obtain and N number of specific trellis point N number of beam forming frequency-region signal correspondingly;Based on N number of beam forming frequency-region signal, it determines the average amplitude of N number of frequency component corresponding with each of K frequency point, and synthesizes and include the K frequency point and be the synthesis frequency-region signal of phase by amplitude, the phase of the original frequency domain signal of reference voice acquisition device of the average amplitude at each frequency point;Synthesis frequency-region signal is converted into synthesis time-domain signal.By the sound collection method of the application embodiment of the present disclosure, the noise on interference radiating way in the original time domain signal of sound collection array acquisition is inhibited well, thus to obtain the time-domain signal of enhancing.)

1. a kind of sound collection method characterized by comprising

The collected M time-domain signal of M voice collection device is converted into M original frequency domain signal;

At the every bit in N number of specific trellis point, beam forming is carried out to the M original frequency domain signal, with acquisition and institute State N number of specific trellis point N number of beam forming frequency-region signal correspondingly;

Based on N number of beam forming frequency-region signal, N number of frequency component corresponding with each of K frequency point is determined Average amplitude, and synthesize include the K frequency point and at each frequency point using the average amplitude as the synthesis frequency domain of amplitude Signal, phase of the synthesis frequency-region signal at each frequency point is the reference voice specified in the M voice collection device Corresponding phase in the original frequency domain signal of acquisition device;

The synthesis frequency-region signal is converted into synthesis time-domain signal,

Wherein, M, N, K are greater than the integer equal to 2.

2. sound collection method according to claim 1, which is characterized in that described each in N number of specific trellis point At point, beam forming is carried out to the M original frequency domain signal, it is N number of correspondingly with N number of specific trellis point to obtain Beam forming frequency-region signal includes:

In the expectation acquisition range of the M voice collection device, N number of specific trellis point on different directions is selected;

At each specific trellis point, the positional relationship based on the M voice collection device Yu the specific trellis point is determined Guiding vector associated with each frequency point;

At each specific trellis point, based on the guiding vector on each frequency point, to the M original frequency domain signal into Traveling wave beam forming obtains beam forming frequency-region signal corresponding with the specific trellis point.

3. sound collection method according to claim 2, which is characterized in that described at each specific trellis point, base In the positional relationship of the M voice collection device and the specific trellis point, guiding vector associated with each frequency point is determined Include:

Obtain the specific trellis point to the M voice collection device distance vector;

Distance vector and the specific trellis point based on the specific trellis point to the M voice collection device are to reference to sound The distance of sound acquisition device, determine the specific trellis point to M voice collection device reference time delay vector;

Time delay vector is referred to based on described, determines the guiding vector of the specific trellis point on each frequency point.

4. sound collection method according to claim 2, which is characterized in that described at each specific trellis point, base Guiding vector on each frequency point carries out beam forming to the M original frequency domain signal, obtains and the specific trellis Putting corresponding beam forming frequency-region signal includes:

The noise covariance matrix of guiding vector and each frequency point based on each frequency point, determination are corresponding with each frequency point Beam forming weight coefficient;

Based on the beam forming weight coefficient and the M original frequency domain signal, determination is corresponding with each specific trellis point Beam forming frequency-region signal.

5. sound collection method according to claim 1, which is characterized in that N number of specific trellis point is evenly arranged in The M voice collection device is formed by a circle in the horizontal plane of array co-ordinates system.

6. a kind of voice collection device characterized by comprising

Signal conversion module, for the collected M time-domain signal of M voice collection device to be converted to M original frequency domain letter Number;

Signal processing module, for carrying out wave to the M original frequency domain signal at the every bit in N number of specific trellis point Beam forming, to obtain and N number of specific trellis point N number of beam forming frequency-region signal correspondingly;

Signal synthesizing module, it is determining opposite with each of K frequency point for being based on N number of beam forming frequency-region signal The average amplitude for the N number of frequency component answered, and synthesize include the K frequency point and at each frequency point with it is described be averaged width Degree is the synthesis frequency-region signal of amplitude, and phase of the synthesis frequency-region signal at each frequency point is M sound collection dress Set the corresponding phase in the original frequency domain signal of specified reference voice acquisition device;

Signal output module, for the synthesis frequency-region signal to be converted into synthesis time-domain signal;

Wherein, M, N, K are greater than the integer equal to 2.

7. voice collection device according to claim 6, which is characterized in that the signal processing module is in N number of default net At every bit in lattice point, beam forming is carried out to the M original frequency domain signal, to obtain and N number of specific trellis point N number of beam forming frequency-region signal includes: correspondingly

In the expectation acquisition range of the M voice collection device, N number of specific trellis point on different directions is selected;

At each specific trellis point, the positional relationship based on the M voice collection device Yu the specific trellis point is determined Guiding vector associated with each frequency point;

At each specific trellis point, based on the guiding vector on each frequency point, to the M original frequency domain signal into Traveling wave beam forming obtains beam forming frequency-region signal corresponding with the specific trellis point.

8. voice collection device according to claim 7, which is characterized in that the signal processing module is default at each At mesh point, the positional relationship based on the M voice collection device Yu the specific trellis point is determining associated with each frequency point Guiding vector include:

Obtain the specific trellis point to the M voice collection device distance vector;

Distance vector and the specific trellis point based on the specific trellis point to the M voice collection device are to reference to sound The distance of sound acquisition device, determine the specific trellis point to M voice collection device reference time delay vector;

Time delay vector is referred to based on described, determines the guiding vector of the specific trellis point on each frequency point.

9. voice collection device according to claim 7, which is characterized in that described at each specific trellis point, base Guiding vector on each frequency point carries out beam forming to the M original frequency domain signal, obtains and the specific trellis Putting corresponding beam forming frequency-region signal includes:

The noise covariance matrix of guiding vector and each frequency point based on each frequency point, determination are corresponding with each frequency point Beam forming weight coefficient;

Based on the beam forming weight coefficient and the M original frequency domain signal, determination is corresponding with each specific trellis point Beam forming frequency-region signal.

10. voice collection device according to claim 6, which is characterized in that N number of specific trellis point is evenly arranged in The M voice collection device is formed by a circle in the horizontal plane of array co-ordinates system.

11. a kind of voice collection device characterized by comprising

Processor;

Memory for storage processor executable instruction;

Wherein, the processor is configured to:

The collected M time-domain signal of M voice collection device is converted into M original frequency domain signal;

At the every bit in N number of specific trellis point, beam forming is carried out to the M original frequency domain signal, with acquisition and institute State N number of specific trellis point N number of beam forming frequency-region signal correspondingly;

Based on N number of beam forming frequency-region signal, N number of frequency component corresponding with each of K frequency point is determined Average amplitude, and synthesize include the K frequency point and at each frequency point using the average amplitude as the synthesis frequency domain of amplitude Signal, phase of the synthesis frequency-region signal at each frequency point is the reference voice specified in the M voice collection device Corresponding phase in the original frequency domain signal of acquisition device;

The synthesis frequency-region signal is converted into synthesis time-domain signal, wherein M, N, K are greater than the integer equal to 2.

12. a kind of non-transitorycomputer readable storage medium, when the instruction in the storage medium is by the processing of mobile terminal When device executes, so that mobile terminal is able to carry out a kind of sound collection method, which comprises

The collected M time-domain signal of M voice collection device is converted into M original frequency domain signal;

At the every bit in N number of specific trellis point, beam forming is carried out to the M original frequency domain signal, with acquisition and institute State N number of specific trellis point N number of beam forming frequency-region signal correspondingly;

Based on N number of beam forming frequency-region signal, N number of frequency component corresponding with each of K frequency point is determined Average amplitude, and synthesize include the K frequency point and at each frequency point using the average amplitude as the synthesis frequency domain of amplitude Signal, phase of the synthesis frequency-region signal at each frequency point is the reference voice specified in the M voice collection device Corresponding phase in the original frequency domain signal of acquisition device;The synthesis frequency-region signal is converted into synthesis time-domain signal, wherein M, N, K are greater than the integer equal to 2.

Technical field

This disclosure relates to sound collection field more particularly to a kind of sound collection method, device and medium.

Background technique

In Internet of Things, AI epoch, intelligent sound can effectively improve man-machine friendship as one of artificial intelligence core technology Mutual mode greatly improves the convenience that intellectual product uses.In the related technology, intellectual product equipment pickup mostly uses microphone Array, and application microphone array Beamforming technology improves Speech processing quality, to improve the language under true environment Sound discrimination.There are two difficult points for the Beamforming technology of microphone array at present: 1. noises are difficult to estimate;2. under strong jamming Voice direction is unknown.For the Xiang Wenti that seeks of voice, it is accurate to algorithm comparison to seek under quiet scene at present, but in strong jamming field Under scape, seeking to algorithm can fail, this is from seeking the constraint decision to algorithm itself.It therefore, at present in the art cannot be fine The voice that ground solves under strong jamming scene seeks Xiang Wenti.

Summary of the invention

To overcome the problems in correlation technique, the disclosure provides a kind of sound collection method, device and medium.

According to the first aspect of the embodiments of the present disclosure, a kind of sound collection method is provided, comprising:

The collected M time-domain signal of M voice collection device is converted into M original frequency domain signal;

At the every bit in N number of specific trellis point, beam forming is carried out to the M original frequency domain signal, to obtain With N number of specific trellis point N number of beam forming frequency-region signal correspondingly;

Based on N number of beam forming frequency-region signal, N number of frequency point corresponding with each of K frequency point is determined The average amplitude of amount, and synthesize include the K frequency point and at each frequency point using the average amplitude as the synthesis of amplitude Frequency-region signal, phase of the synthesis frequency-region signal at each frequency point is the reference specified in the M voice collection device Corresponding phase in the original frequency domain signal of voice collection device;The synthesis frequency-region signal is converted into synthesis time-domain signal, Wherein, M, N, K are greater than the integer equal to 2.

At the every bit in N number of specific trellis point, beam forming is carried out to the M original frequency domain signal, with N number of beam forming frequency-region signal includes: correspondingly with N number of specific trellis point for acquisition

In the expectation acquisition range of the M voice collection device, N number of specific trellis point on different directions is selected;

At each specific trellis point, the positional relationship based on the M voice collection device Yu the specific trellis point, Determine guiding vector associated with each frequency point;

At each specific trellis point, based on the guiding vector on each frequency point, the M original frequency domain is believed Number beam forming is carried out, obtains corresponding with specific trellis point beam forming frequency-region signal.

It is described at each specific trellis point, the position based on the M voice collection device Yu the specific trellis point Relationship determines that guiding vector associated with each frequency point includes:

Obtain the specific trellis point to the M voice collection device distance vector;

Distance vector and the specific trellis point based on the specific trellis point to the M voice collection device to ginseng The distance for examining voice collection device, determine the specific trellis point to M voice collection device reference time delay vector;

Time delay vector is referred to based on described, determines the guiding vector of the specific trellis point on each frequency point.

It is described at each specific trellis point, based on the guiding vector on each frequency point, to the M original frequencies Domain signal carries out beam forming, obtains beam forming frequency-region signal corresponding with the specific trellis point and includes:

The noise covariance matrix of guiding vector and each frequency point based on each frequency point, determining and each frequency point Corresponding beam forming weight coefficient;

Based on beam forming weight coefficient and the M original frequency domain signal, determination is corresponding with each specific trellis point Beam forming frequency-region signal.

N number of specific trellis point is evenly arranged in the water that the M voice collection device is formed by array co-ordinates system On a circle in plane.

According to the second aspect of an embodiment of the present disclosure, a kind of voice collection device, including signal conversion module are provided, are used for The collected M time-domain signal of M voice collection device is converted into M original frequency domain signal;

Signal processing module, at the every bit in N number of specific trellis point, to the M original frequency domain signal into Traveling wave beam forming, to obtain and N number of specific trellis point N number of beam forming frequency-region signal correspondingly;

Signal synthesizing module, for being based on each of N number of beam forming frequency-region signal, determining and K frequency point The average amplitude of corresponding N number of frequency component, and synthesize and include the K frequency point and put down at each frequency point with described Equal amplitude is the synthesis frequency-region signal of amplitude, and phase of the synthesis frequency-region signal at each frequency point is that the M sound is adopted Corresponding phase in the original frequency domain signal for the reference voice acquisition device specified in acquisition means;Signal output module, being used for will The synthesis frequency-region signal is converted into synthesis time-domain signal;

Wherein, M, N, K are greater than the integer equal to 2.

The signal processing module carries out the M original frequency domain signal at the every bit in N number of specific trellis point Beam forming, to obtain, N number of beam forming frequency-region signal includes: correspondingly with N number of specific trellis point

In the expectation acquisition range of the M voice collection device, N number of specific trellis point on different directions is selected;

At each specific trellis point, the positional relationship based on the M voice collection device Yu the specific trellis point, Determine guiding vector associated with each frequency point;

At each specific trellis point, based on the guiding vector on each frequency point, the M original frequency domain is believed Number beam forming is carried out, obtains corresponding with specific trellis point beam forming frequency-region signal.

The signal processing module is default with this based on the M voice collection device at each specific trellis point The positional relationship of mesh point determines that guiding vector associated with each frequency point includes:

Obtain the specific trellis point to the M voice collection device distance vector;

Distance vector and the specific trellis point based on the specific trellis point to the M voice collection device to ginseng The distance for examining voice collection device, determine the specific trellis point to M voice collection device reference time delay vector;

Time delay vector is referred to based on described, determines the guiding vector of the specific trellis point on each frequency point.

It is described at each specific trellis point, based on the guiding vector on each frequency point, to the M original frequencies Domain signal carries out beam forming, obtains beam forming frequency-region signal corresponding with the specific trellis point and includes:

The noise covariance matrix of guiding vector and each frequency point based on each frequency point, determining and each frequency point Corresponding beam forming weight coefficient;

Based on beam forming weight coefficient and the M original frequency domain signal, determination is corresponding with each specific trellis point Beam forming frequency-region signal.

N number of specific trellis point is evenly arranged in the water that the M voice collection device is formed by array co-ordinates system On a circle in plane.

According to the third aspect of an embodiment of the present disclosure, a kind of voice collection device is provided, comprising:

Processor;

Memory for storage processor executable instruction;

Wherein, the processor is configured to:

The collected M time-domain signal of M voice collection device is converted into M original frequency domain signal;

At the every bit in N number of specific trellis point, beam forming is carried out to the M original frequency domain signal, to obtain With N number of specific trellis point N number of beam forming frequency-region signal correspondingly;

Based on N number of beam forming frequency-region signal, N number of frequency point corresponding with each of K frequency point is determined The average amplitude of amount, and synthesize include the K frequency point and at each frequency point using the average amplitude as the synthesis of amplitude Frequency-region signal, phase of the synthesis frequency-region signal at each frequency point is the reference specified in the M voice collection device Corresponding phase in the original frequency domain signal of voice collection device;

The synthesis frequency-region signal is converted into synthesis time-domain signal, wherein M, N, K are greater than the integer equal to 2.

According to a fourth aspect of embodiments of the present disclosure, a kind of non-transitorycomputer readable storage medium is provided, when described When instruction in storage medium is executed by the processor of terminal, enable the terminal to execute a kind of sound collection method, the side Method includes:

The collected M time-domain signal of M voice collection device is converted into M original frequency domain signal;

At the every bit in N number of specific trellis point, beam forming is carried out to the M original frequency domain signal, to obtain With N number of specific trellis point N number of beam forming frequency-region signal correspondingly;

Based on N number of beam forming frequency-region signal, N number of frequency point corresponding with each of K frequency point is determined The average amplitude of amount, and synthesize include the K frequency point and at each frequency point using the average amplitude as the synthesis of amplitude Frequency-region signal, phase of the synthesis frequency-region signal at each frequency point is the reference specified in the M voice collection device Corresponding phase in the original frequency domain signal of voice collection device;

The synthesis frequency-region signal is converted into synthesis time-domain signal, wherein M, N, K are greater than the integer equal to 2.

The technical scheme provided by this disclosed embodiment can include the following benefits: use multi-direction beam forming plan Slightly, it sums to multi-direction wave beam, reaches beam pattern in interference radiating way and form null, the effect that other directions normally export is ingenious Ground, which has bypassed, to be sought inaccurate to algorithm and causes sound collection effect variation or sound collection that this problem is not allowed under strong jamming.

It should be understood that above general description and following detailed description be only it is exemplary and explanatory, not The disclosure can be limited.

Detailed description of the invention

The drawings herein are incorporated into the specification and forms part of this specification, and shows and meets implementation of the invention Example, and be used to explain the principle of the present invention together with specification.

Fig. 1 is a kind of flow chart of sound collection method shown according to an exemplary embodiment.

Fig. 2 is that a kind of sound collection method shown according to an exemplary embodiment establishes specific trellis point schematic diagram.

Fig. 3 shows the emulation beam pattern of the microphone array of the sound collection method using the embodiment of the present disclosure.

Fig. 4 is a kind of block diagram of voice collection device shown according to an exemplary embodiment.

Fig. 5 is a kind of block diagram of device shown according to an exemplary embodiment.

Specific embodiment

Example embodiments are described in detail here, and the example is illustrated in the accompanying drawings.Following description is related to When attached drawing, unless otherwise indicated, the same numbers in different drawings indicate the same or similar elements.Following exemplary embodiment Described in embodiment do not represent all embodiments consistented with the present invention.On the contrary, they be only with it is such as appended The example of device and method being described in detail in claims, some aspects of the invention are consistent.

It is used for voice collection device array according to the sound collection method of the embodiment of the present disclosure, voice collection device array to be One group of multiple voice collection device positioned at space different location arranges the array to be formed by certain regular shape, is to sky Between the voice signal propagated carry out a kind of device of spatial sampling, collected signal contains its spatial positional information.According to The topological structure of voice collection device, array can be one-dimensional array, and two-dimensional planar array is also possible to spherical equal three-dimensional battle array Column.

Fig. 1 is a kind of flow chart of sound collection method shown according to an exemplary embodiment, as shown in Figure 1, this public affairs The sound collection method for opening embodiment includes step S11-S14.

In step S11, the collected M time-domain signal of M voice collection device is converted into M original frequency domain signal, Wherein, M is the integer more than or equal to 2.Implement the method in the present invention, needs using more than two voice collection devices, from Different direction collected sound signals, the quantity of voice collection device is more, inhibits the effect of interference better.M sound collection The arrangement of device can be linear array, the arrangement side that planar array or other anyone skilled in the art are contemplated that Formula.

In one example, with xm(t) in representative voice acquisition device array m-th of voice collection device a frame adding window Signal (m=1,2 ... M).To time-domain signal xm(t) after carrying out Fourier transformation, corresponding original frequency domain signal X is obtainedm(k)。 Illustratively, the length of a frame can be set within the scope of 10ms~30ms, such as 20ms.Then, windowing process be then in order to Signal after making framing is continuous, illustratively, Hamming window can be added in Audio Signal Processing.

In step s 12, at the every bit in N number of specific trellis point, to M original frequency domain signal carry out wave beam at Type, to obtain and N number of specific trellis point N number of beam forming frequency-region signal correspondingly;Wherein, N is whole more than or equal to 2 Number.

Specific trellis point, which refers to acquire in space in expectation, is divided into multiple mesh points for estimation sound source position or direction, i.e., Gridding processing is carried out to the expectation acquisition space centered on voice collection device array (including multiple voice collection devices). Specifically, the process of the processing are as follows: using voice collection device array geometry center as grid element center, with apart from grid element center certain One length be radius carry out two-dimensional space in circular net format or three-dimensional space in spherical reticulated format;In another example with sound Acquisition device array geometry center is grid element center, is square center using grid element center and carries out two by side length of a certain length Square net in dimension space, or, carrying out three-dimensional space using grid element center as square center and by side length of a certain length Interior square gridding.

It should be noted that specific trellis point is the virtual point for being used to carry out beam forming in the present embodiment, it is not True point source of sound or sound source collection point.The value of the quantity N of specific trellis point is bigger, then the direction chosen is more, can be more The enterprising traveling wave beam forming in more directions, the effect finally realized also can be better.Meanwhile N number of specific trellis point should as much as possible minute Cloth is in different directions, to be sampled in a plurality of directions.

In one example, in the same plane by the setting of N number of specific trellis point, and it is distributed in each side in the plane Upwards.Further, illustrate to be more convenient for, N number of specific trellis o'clock is uniformly distributed in 360 degree, can while facilitating calculating To reach better effect.It should be noted that N number of specific trellis point arrangement mode of the disclosure is without being limited thereto.

In step S13, it is based on N number of beam forming frequency-region signal, determination is corresponding with each of K frequency point N number of The average amplitude of frequency component, and synthesize include the K frequency point and at each frequency point using above-mentioned average amplitude as amplitude Synthesis frequency-region signal, the phase of the synthesis frequency-region signal at each frequency point is specified in the M voice collection device Reference voice acquisition device original frequency domain signal in corresponding phase.Here, reference voice acquisition device and above-mentioned steps Beam forming process in S12 is related, specifically for determining a sound collection for referring to time delay during beam forming Device.Beam forming process is detailed further below.In addition, the original frequency domain in the K frequency point and step S11 is believed Number correlation, for example, after voice signal is transformed from the time domain to frequency domain by Fourier transformation, it can be according to frequency-region signal come really It is fixed it includes multiple frequency points.

In step S14, synthesis frequency-region signal is converted into synthesis time-domain signal.The synthesis time-domain signal is as after going interference Enhancing voice signal be used for voice collection device subsequent processing, therefore can reach inhibit noise purpose.

In the following, the step S12 for sound collection method is described in detail.In one embodiment, step S12 can wrap Include step S121-S123.

In step S121, in the expectation acquisition range of M voice collection device, select N number of default on different directions Mesh point.

N number of specific trellis point should be distributed in different directions as much as possible, to be sampled in a plurality of directions.In order to It is convenient to carry out, in all directions that N number of specific trellis point can be chosen in the same plane, and be distributed in the plane.Certainly, In order to more simply implement disclosed method, N number of specific trellis point can be uniformly distributed in 360 degree.

In step S122, at each specific trellis point, the position based on M voice collection device Yu the specific trellis point Relationship is set, determines guiding vector associated with each frequency point.

For example, in one example, step S122 may be implemented are as follows: former with the array co-ordinates system of M voice collection device Centered on point, the coordinate of the M voice collection device and the coordinate of N number of specific trellis point are determined;Based on M sound The coordinate of acquisition device establishes guiding vector on each frequency point for each specific trellis point, obtains N number of specific trellis point each Guiding vector on a frequency point.

In one embodiment, step S122 may include:

Step S1221, obtain each specific trellis point to M voice collection device distance vector.

Step S1222, based on the specific trellis point to the distance vector and the specific trellis of M voice collection device Point arrive reference voice acquisition device distance, determine the specific trellis point to M voice collection device reference time delay vector.

Step S1223 determines the guiding vector of the specific trellis point on each frequency point based on time delay vector is referred to.

In one example, by taking a certain specific trellis point as an example, it is assumed that the specific trellis point is n-th of specific trellis point (n= 1,2 ... N), to use S convenient for statementnIndicate that the coordinate, coordinate value areIn addition, because having M sound collection dress It sets, has the coordinate of M voice collection device, respectively P1,P2…PM.Its corresponding coordinate value is respectively as follows:And the coordinates matrix of all voice collection devices is indicated with P:

Firstly, seek the specific trellis point to reference voice acquisition device distance.As an example, it is assumed herein that M sound The first voice collection device in sound acquisition device is as reference voice acquisition device.It should be noted that in fact, M sound Any voice collection device in sound acquisition device may be designated as reference voice acquisition device, as long as adopting in entire sound In the implementation procedure of set method, keep the reference voice acquisition device constant.Therefore, in this example, the specific trellis Point arrives the distance of reference voice acquisition device are as follows:It is then possible to seek this Distance vector of the specific trellis point to M voice collection device: dist=P-Sn, wherein P is above to indicate that all sound are adopted The coordinates matrix of acquisition means.It should be noted that actually distance d of the specific trellis point to reference voice acquisition device1It is pre- If mesh point is worth to one in the distance vector dist of M voice collection device, therefore, d1Do not have with the computation sequence of dist Limitation.

Based on specific trellis point SnTo the distance vector of M voice collection device, specific trellis point S is calculatednTo M The time delay vector of voice collection device, is indicated with tau, then tau=sqrt (sum (dist.^2,2)), i.e., put down dist vector Radical sign is opened after summing by row in side.

With the time delay vector of the specific trellis point to M voice collection device, the specific trellis point is subtracted to reference voice Acquisition device when delay, divided by the velocity of sound, it is available refer to time delay taut:taut=(tau-tau1)/c.Wherein tau is to be somebody's turn to do Time delay vector of the specific trellis point to M voice collection device, tau1It is adopted for the specific trellis point to specified reference voice The time delay of acquisition means, tau1=d1/ c, c are the velocity of sound.

Guiding vector formula: a will be substituted into reference to time delay vector tauts(k)=e-j×2πk×Δf×taut, it is default that this can be sought Guiding vector of the mesh point on K frequency point, in which: e is the nature truth of a matter, and j is imaginary unit, and K is to be obtained by Fourier transformation The frequency point number (value range is 0 to Nfft-1) arrived, Δ f=fs/ Nfft, wherein fsFor using rate, Nfft is Fourier transformation Points, c is the velocity of sound.Similarly, other guiding vectors of specific trellis point on each frequency point can be sought, no longer enumerated here.

Next, in step S123, at each specific trellis point, based on the guiding vector on each frequency point, to M Original frequency domain signal carries out beam forming, obtains beam forming frequency-region signal corresponding with each specific trellis point.

In one example, step S123 may include step S1231-S1232.

In step S1231, the noise covariance matrix of guiding vector and each frequency point based on each frequency point, determine with The corresponding beam forming weight coefficient of each frequency point:

Wherein as(k) for the guiding of the specific trellis point on each frequency point to Amount, Rn(k) it is the noise covariance matrix on each frequency point, can is the noise covariance square estimated by any one algorithm Battle array,For Rn(k) inverse,It is the conjugate transposition of guiding vector.

In step S1232, beam forming weight coefficient and M original frequency domain signal based on each frequency point are determining and every The corresponding beam forming frequency-region signal of each frequency point of one specific trellis point.Specifically, for a specific trellis point For, M frequency corresponding with the frequency point point in beam forming weight coefficient and M original frequency domain signal based on each frequency point Amount, can determine beam forming frequency component corresponding with the frequency point, then synthesize this by K beam forming frequency component and preset The beam forming frequency-region signal of mesh point.

Wherein: It is Wmvdr(k) conjugate transposition.

Each corresponding specific trellis point, can get a beam forming frequency-region signal, choose N number of specific trellis point, It is available to arrive N number of beam forming frequency-region signal, it is expressed as Y1,Y2,…YN

In one embodiment, in step s 13, be based on N number of beam forming frequency-region signal, determine in K frequency point Each corresponding N number of frequency component average amplitude, and synthesizing includes the K frequency point and at each frequency point Using the average amplitude as the synthesis frequency-region signal of amplitude, phase of the synthesis frequency-region signal at each frequency point is the M Corresponding phase in the original frequency domain signal for the reference voice acquisition device specified in a voice collection device.

In one example, for N number of beam forming frequency-region signal of acquisition, Y1,Y2,…YN, frequency at a certain frequency point The amplitude of component, is expressed as R1(k),R2(k),…RN(k), whole N number of beam forming frequency-region signals can be obtained in k-th of frequency Average amplitude at point: R (k)=(R1(k)+R2(k)+…+Rn(k))/N.Obtain the frequency domain letter of reference voice acquisition device acquisition Number phase, reference voice acquisition device acquisition frequency-region signal be expressed as X1(k), phase is phase (X1(k)).Synthesis packet Include K frequency point and at each frequency point using correspond to the average amplitude of frequency point as amplitude, with the original of reference voice acquisition device The phase that frequency point is corresponded in frequency-region signal is the synthesis frequency-region signal of phase:

Back to the step S14 of sound collection method, inverse Fourier transform is carried out to synthesis frequency-region signal in this step, Obtain synthesis time-domain signal: y (N)=ISTFT (Ysum(k)).Here, the synthesis time-domain signal is the enhancing sound after interference Sound signal.It is dry in the original time domain signal of microphone array acquisition by the sound collection method of the application embodiment of the present disclosure The noise disturbed on direction is inhibited well, thus to obtain the time-domain signal of enhancing.

In one embodiment, in step S121, N number of specific trellis point is evenly arranged in M voice collection device institute shape At array co-ordinates system horizontal plane in a circle on.Illustratively, the radius of the circle can be in general 1m to 5m.Convenient for calculating While, effect also can be more preferable.

Technical solution in order to better understand the present invention, it is existing for example:

As shown in Fig. 2, speaker includes 6 microphones, former with the array co-ordinates system of 6 microphones by taking intelligent sound box as an example Centered on point, on the array horizontal plane of 6 microphones composition, the circle that Radius is r is chosen, radius r can be 1~1.5m, For under normal circumstances, at a distance from people and intelligent sound box interact.6 are chosen at equal intervals within the scope of upper 0 °~360 ° of circle with 60 ° It is a, such as respectively 1 °, point corresponding to 61 °, 121 °, 181 °, 241 °, 301 °, as specific trellis point.And it is 90 ° specified The voice collection device of direction position is reference voice acquisition device, and in subsequent calculating, always with sound collection dress It sets as reference voice acquisition device, naturally it is also possible to which specifying other voice collection devices is reference voice acquisition device.

Again centered on the origin of array co-ordinates system, the coordinate of 6 microphones, respectively P are obtained1,P2…P6.It is corresponded to Coordinate value be respectively as follows:And the coordinate square of all voice collection devices is indicated with P Battle array:

And the coordinate S of 6 specific trellis points1, S2…S6

By taking the specific trellis of 61 ° of positions point as an example, which is second specific trellis point, and the coordinate of the point is S2, coordinate Value is

Firstly, seeking the specific trellis point and reference voice acquisition device (illustratively, herein with the first sound collection dress Be set to example) distance:It is then possible to seek specific trellis point S2To M The distance vector of a voice collection device: dist=P-S2

Based on specific trellis point S2To the distance vector of M voice collection device, specific trellis point S is calculated2To M The time delay vector of voice collection device, is indicated with tau, then tau=sqrt (sum (dist.^2,2)), i.e. square pressing to dist Radical sign is opened after row summation.

With specific trellis point S2The time delay vector of the array formed to M microphone, subtracts specific trellis point S2To ginseng Examine voice collection device when delay, divided by the velocity of sound, it is available refer to time delay taut, taut=(tau-tau1)/c.Wherein Tau is specific trellis point S2To the time delay vector of M voice collection device, tau1For specific trellis point S2To specified The time delay of reference voice acquisition device, c are the velocity of sound.

Guiding vector formula: a will be substituted into reference to time delay vector tauts(k)=e-j×2πk×Δf×taut, it is default that this can be sought Mesh point S2Guiding vector on K frequency point, is expressed asWherein: e is the nature truth of a matter, and j is imaginary unit, and K is logical Cross the frequency point number (value range is 0 to Nfft-1) that Fourier transformation obtains, Δ f=fs/ Nfft, wherein fsFor using rate, Nfft For the points of Fourier transformation, c is the velocity of sound.

By the above method, other the available guiding vectors of specific trellis point on each frequency point.

Collected 6 time-domain signals of 6 voice collection devices are converted into 6 original frequency domain signals: X1(k), X2 (k) ... X6(k)。

At the every bit in 6 specific trellis points, beam forming is carried out to 6 original frequency domain signals;

Still with second specific trellis point S2For, calculate this of wave beam forming weight coefficient:WhereinFor guiding vector of the second specific trellis point on each frequency point, Rn(k) It can be the noise covariance matrix estimated by any one algorithm for noise covariance matrix,For Rn(k) It is inverse,It is the conjugate transposition of guiding vector.

In the second specific trellis point S2Place carries out beam forming to the original frequency domain signal of 6 voice collection devices, obtains The corresponding beam forming frequency-region signal of second specific trellis point:Wherein,

For other specific trellis points, using same method, totally 6 beam forming frequency-region signals: Y can be obtained1, Y2,…Y6

Corresponding above-mentioned 6 beam forming frequency-region signals have 6 frequencies corresponding with frequency at the frequency point at a certain frequency point Rate component, by taking k-th of frequency point as an example, in the corresponding frequency of the frequency point, 6 frequency components are respectively R1(k),R2(k),…R6 (k).Average amplitude of 6 beam forming frequency-region signals at k-th of frequency point: R (k)=(R can be obtained1(k)+R2(k)+…+ R6(k))/6。

Obtain the phase of the frequency-region signal of reference voice acquisition device acquisition, the frequency domain letter of reference voice acquisition device acquisition Number it is expressed as X1(k), phase is phase (X1(k))。

It synthesizes at each frequency point using average amplitude as amplitude, with the phase of the original frequency domain signal of reference voice acquisition device Position is the synthesis frequency-region signal of phase:

Inverse Fourier transform is carried out to synthesis frequency-region signal, obtains synthesis time-domain signal: y (6)=ISTFT (Ysum(k))。 Time-domain signal will be synthesized as output signal.

Fig. 3 shows the emulation beam pattern of the microphone array of the sound collection method using the embodiment of the present disclosure.

Abscissa in beam pattern is the orientation where specific trellis point above.It, can be at either one in simulation process Interference source is set on position.Simulation process and the detailed process for drawing beam pattern are known to those skilled in the art, herein no longer It is described in detail.

By the sound collection method of the application embodiment of the present disclosure, it can be confirmed that the signal gain on interference radiating way is minimum, That is interference signal is inhibited by, and the voice signal in other directions is not affected substantially substantially.As shown in figure 3, dry It disturbs direction and forms very deep null, interference is inhibited, while the voice signal in other directions is protected.By the implementation Example can inhibit the interference of any direction it is found that by disclosed method, achieve the purpose that inhibit noise jamming.

Fig. 4 is a kind of voice collection device block diagram shown according to an exemplary embodiment.Referring to Fig. 4, which includes Signal conversion module 401, signal processing module 402, signal synthesizing module 403 and signal output module 404.

The signal conversion module 401 is configurable for for by the collected M time-domain signal of M voice collection device Be converted to M original frequency domain signal;

The signal processing module 402 is configured at the every bit in N number of specific trellis point, to M original frequencies Domain signal carries out beam forming, to obtain and N number of specific trellis point N number of beam forming frequency-region signal correspondingly;

The signal synthesizing module 403 is configurable for based on N number of beam forming frequency-region signal, in determining and K frequency point Each corresponding N number of frequency component average amplitude, and synthesize include K frequency point and at each frequency point with institute The synthesis frequency-region signal that average amplitude is amplitude is stated, phase of the synthesis frequency-region signal at each frequency point is the M sound Corresponding phase in the original frequency domain signal for the reference voice acquisition device specified in sound acquisition device;The signal output module 404 are configured as signal output module, are converted into synthesis time-domain signal for that will synthesize frequency-region signal;Wherein, M, N, K are big In the integer for being equal to 2.

Signal processing module at the every bit in N number of specific trellis point, to M original frequency domain signal carry out wave beam at Type, to obtain, N number of beam forming frequency-region signal includes: correspondingly with N number of specific trellis point

In the expectation acquisition range of M voice collection device, N number of specific trellis point on different directions is selected;

At each specific trellis point, the positional relationship based on M voice collection device Yu specific trellis point, determine with Each associated guiding vector of frequency point;

At each specific trellis point, based on the guiding vector on each frequency point, wave is carried out to M original frequency domain signal Beam forming obtains beam forming frequency-region signal corresponding with the specific trellis point.

Signal processing module is at each specific trellis point, the position based on M voice collection device Yu specific trellis point Relationship is set, determines that guiding vector associated with each frequency point includes:

Obtain the specific trellis point to the M voice collection device distance vector;

Distance vector and the specific trellis point based on the specific trellis point to the M voice collection device to ginseng The distance for examining voice collection device, determine the specific trellis point to M voice collection device reference time delay vector;

Based on reference time delay vector, the guiding vector of the specific trellis point on each frequency point is determined.

At each specific trellis point, based on the guiding vector on each frequency point, wave is carried out to M original frequency domain signal Beam forming, obtaining beam forming frequency-region signal corresponding with the specific trellis point includes:

The noise covariance matrix of guiding vector and each frequency point based on each frequency point, determination are corresponding with each frequency point Beam forming weight coefficient;

Based on beam forming weight coefficient and the M original frequency domain signal, determination is corresponding with each specific trellis point Beam forming frequency-region signal.

N number of specific trellis point is evenly arranged in the horizontal plane that the M voice collection device is formed by array co-ordinates system On an interior circle.

About the device in above-described embodiment, wherein modules execute the concrete mode of operation in related this method Embodiment in be described in detail, no detailed explanation will be given here.

Fig. 5 is a kind of block diagram for voice collection device 500 shown according to an exemplary embodiment.For example, device 500 can be mobile phone, computer, digital broadcasting terminal, messaging device, game console, tablet device, and medical treatment is set It is standby, body-building equipment, personal digital assistant etc..

Referring to Fig. 5, device 500 may include following one or more components: processing component 502, memory 504, electric power Component 506, multimedia component 508, audio component 510, the interface 512 of input/output (I/O), sensor module 514, and Communication component 516.

The integrated operation of the usual control device 500 of processing component 502, such as with display, telephone call, data communication, phase Machine operation and record operate associated operation.Processing component 502 may include that one or more processors 520 refer to execute It enables, to perform all or part of the steps of the methods described above.In addition, processing component 502 may include one or more modules, just Interaction between processing component 502 and other assemblies.For example, processing component 502 may include multi-media module, it is more to facilitate Interaction between media component 508 and processing component 502.

Memory 504 is configured as storing various types of data to support the operation in equipment 500.These data are shown Example includes the instruction of any application or method for operating on device 500, contact data, and telephone book data disappears Breath, picture, video etc..Memory 504 can be by any kind of volatibility or non-volatile memory device or their group It closes and realizes, such as static random access memory (SRAM), electrically erasable programmable read-only memory (EEPROM) is erasable to compile Journey read-only memory (EPROM), programmable read only memory (PROM), read-only memory (ROM), magnetic memory, flash Device, disk or CD.

Electric power assembly 506 provides electric power for the various assemblies of device 500.Electric power assembly 506 may include power management system System, one or more power supplys and other with for device 500 generate, manage, and distribute the associated component of electric power.

Multimedia component 508 includes the screen of one output interface of offer between described device 500 and user.One In a little embodiments, screen may include liquid crystal display (LCD) and touch panel (TP).If screen includes touch panel, screen Curtain may be implemented as touch screen, to receive input signal from the user.Touch panel includes one or more touch sensings Device is to sense the gesture on touch, slide, and touch panel.The touch sensor can not only sense touch or sliding action Boundary, but also detect duration and pressure associated with the touch or slide operation.In some embodiments, more matchmakers Body component 508 includes a front camera and/or rear camera.When equipment 500 is in operation mode, such as screening-mode or When video mode, front camera and/or rear camera can receive external multi-medium data.Each front camera and Rear camera can be a fixed optical lens system or have focusing and optical zoom capabilities.

Audio component 510 is configured as output and/or input audio signal.For example, audio component 510 includes a sound Acquisition device (MIC), when device 500 is in operation mode, when such as call mode, recording mode, and voice recognition mode, sound Acquisition device is configured as receiving external audio signal.The received audio signal can be further stored in memory 504 Or it is sent via communication component 516.In some embodiments, audio component 510 further includes a loudspeaker, for exporting audio Signal.

I/O interface 512 provides interface between processing component 502 and peripheral interface module, and above-mentioned peripheral interface module can To be keyboard, click wheel, button etc..These buttons may include, but are not limited to: home button, volume button, start button and lock Determine button.

Sensor module 514 includes one or more sensors, and the state for providing various aspects for device 800 is commented Estimate.For example, sensor module 514 can detecte the state that opens/closes of equipment 500, and the relative positioning of component, for example, it is described Component is the display and keypad of device 500, and sensor module 514 can be with 500 1 components of detection device 500 or device Position change, the existence or non-existence that user contacts with device 500,500 orientation of device or acceleration/deceleration and device 500 Temperature change.Sensor module 514 may include proximity sensor, be configured to detect without any physical contact Presence of nearby objects.Sensor module 514 can also include optical sensor, such as CMOS or ccd image sensor, at As being used in application.In some embodiments, which can also include acceleration transducer, gyro sensors Device, Magnetic Sensor, pressure sensor or temperature sensor.

Communication component 516 is configured to facilitate the communication of wired or wireless way between device 500 and other equipment.Device 500 can access the wireless network based on communication standard, such as WiFi, 2G or 3G or their combination.In an exemplary implementation In example, communication component 516 receives broadcast singal or broadcast related information from external broadcasting management system via broadcast channel. In one exemplary embodiment, the communication component 516 further includes near-field communication (NFC) module, to promote short range communication.Example Such as, NFC module can be based on radio frequency identification (RFID) technology, Infrared Data Association (IrDA) technology, ultra wide band (UWB) technology, Bluetooth (BT) technology and other technologies are realized.

In the exemplary embodiment, device 500 can be believed by one or more application specific integrated circuit (ASIC), number Number processor (DSP), digital signal processing appts (DSPD), programmable logic device (PLD), field programmable gate array (FPGA), controller, microcontroller, microprocessor or other electronic components are realized, for executing the above method.

In the exemplary embodiment, a kind of non-transitorycomputer readable storage medium including instruction, example are additionally provided It such as include the memory 504 of instruction, above-metioned instruction can be executed by the processor 520 of device 500 to complete the above method.For example, The non-transitorycomputer readable storage medium can be ROM, random access memory (RAM), CD-ROM, tape, floppy disk With optical data storage devices etc..

A kind of non-transitorycomputer readable storage medium, when the instruction in the storage medium is by the processing of mobile terminal When device executes, so that mobile terminal is able to carry out a kind of sound collection method, which comprises

The collected M time-domain signal of M voice collection device is converted into M original frequency domain signal;

At the every bit in N number of specific trellis point, beam forming is carried out to M original frequency domain signal, with obtain with it is N number of Specific trellis point N number of beam forming frequency-region signal correspondingly;

Based on N number of beam forming frequency-region signal, N number of frequency component corresponding with each of K frequency point is determined Average amplitude, and synthesize include the K frequency point and at each frequency point using the average amplitude as the synthesis frequency domain of amplitude Signal, phase of the synthesis frequency-region signal at each frequency point is the reference voice specified in the M voice collection device Corresponding phase in the original frequency domain signal of acquisition device;Synthesis frequency-region signal is converted into synthesis time-domain signal, wherein M, N, K is greater than the integer equal to 2.

Those skilled in the art after considering the specification and implementing the invention disclosed here, will readily occur to of the invention its Its embodiment.This application is intended to cover any variations, uses, or adaptations of the invention, these modifications, purposes or Person's adaptive change follows general principle of the invention and including the undocumented common knowledge in the art of the disclosure Or conventional techniques.The description and examples are only to be considered as illustrative, and true scope and spirit of the invention are by following Claim is pointed out.

It should be understood that the present invention is not limited to the precise structure already described above and shown in the accompanying drawings, and And various modifications and changes may be made without departing from the scope thereof.The scope of the present invention is limited only by the attached claims.

19页详细技术资料下载
上一篇:一种医用注射器针头装配设备
下一篇:一种基于麦克风阵列波束形成算法的语音处理系统

网友询问留言

已有0条留言

还没有人留言评论。精彩留言会获得点赞!

精彩留言,会给你点赞!