Direct sound detection method, system and computer readable storage medium

文档序号：1874729 发布日期：2021-11-23 浏览：18次中文

阅读说明：本技术 直达声检测方法、系统以及计算机可读存储介质 (Direct sound detection method, system and computer readable storage medium ) 是由白炳潮黄景标林聚财殷俊于 2021-07-07 设计创作，主要内容包括：本申请公开了一种直达声检测方法、系统以及计算机可读存储介质,上述直达声检测方法包括：接收麦克风阵列采集获得的阵列信号,并根据阵列信号获取阵列频域信号；从阵列频域信号中筛选出信噪比高于第一门限值的第一目标频点,并从阵列频域信号中获得第一目标频点所在位置的第一目标信号；根据第一目标信号获得对应的协方差矩阵,并对协方差矩阵进行特征值分解,以获得协方差矩阵的所有特征值和最大特征值；根据所有特征值和最大特征值从第一目标频点中筛选出直达信号能量大于混响信号能量的第二目标频点,以利用第二目标频点进行声源定位。通过这样的设计方式,可以获得信噪比较高,受混响影响较小的频点,从而提升后续声源定位的准确度。(The application discloses a direct sound detection method, a system and a computer readable storage medium, wherein the direct sound detection method comprises the following steps: receiving an array signal acquired by a microphone array, and acquiring an array frequency domain signal according to the array signal; screening a first target frequency point with the signal-to-noise ratio higher than a first threshold value from the array frequency domain signals, and obtaining a first target signal of the position of the first target frequency point from the array frequency domain signals; acquiring a corresponding covariance matrix according to the first target signal, and performing eigenvalue decomposition on the covariance matrix to acquire all eigenvalues and the maximum eigenvalue of the covariance matrix; and screening out a second target frequency point with the direct signal energy larger than the reverberation signal energy from the first target frequency points according to all the characteristic values and the maximum characteristic value so as to position the sound source by using the second target frequency point. By the design mode, frequency points with higher signal-to-noise ratio and smaller influence of reverberation can be obtained, so that the accuracy of subsequent sound source positioning is improved.)

1. A direct sound detection method, comprising:

receiving an array signal acquired by a microphone array, and acquiring an array frequency domain signal according to the array signal;

screening a first target frequency point with the signal-to-noise ratio higher than a first threshold value from the array frequency domain signals, and obtaining a first target signal of the position of the first target frequency point from the array frequency domain signals;

obtaining a corresponding covariance matrix according to the first target signal, and performing eigenvalue decomposition on the covariance matrix to obtain all eigenvalues and the maximum eigenvalue of the covariance matrix;

and screening out a second target frequency point with direct signal energy larger than reverberation signal energy from the first target frequency points according to all the characteristic values and the maximum characteristic value, so as to perform sound source positioning by using the second target frequency point.

2. The direct sound detection method according to claim 1, wherein the step of screening out a second target frequency point, in which the energy of the direct signal is greater than the energy of the reverberant signal, from the first target frequency points according to all the eigenvalues and the maximum eigenvalue, so as to perform sound source localization by using the second target frequency point comprises:

obtaining the ratio of the maximum characteristic value to the sum of all the characteristic values, and taking a first target frequency point corresponding to the ratio larger than a second threshold value as a second target frequency point; wherein the ratio is in the range of 0-1.

3. The direct sound detection method according to claim 2, wherein before the step of obtaining a ratio of the maximum eigenvalue to a sum of all the eigenvalues and using a first target frequency point corresponding to the ratio larger than a second threshold value as a second target frequency point, the method comprises:

obtaining an index of a language function and a value of the language function according to the reverberation time; wherein the exponent is inversely proportional to the reverberation time;

in response to the value of the language function being less than a first threshold value, treating the first threshold value as the second threshold value;

in response to the value of the language function being greater than a second threshold value, treating the second threshold value as the second threshold value;

in response to the value of the language function being greater than or equal to the first threshold value and less than or equal to the second threshold value, treating the value of the language function as the second threshold value.

4. The direct sound detection method of claim 1, wherein the step of obtaining a corresponding covariance matrix from the first target signal comprises:

acquiring a corresponding conjugate transpose matrix according to the first target signal;

and obtaining an expected value of the product of the first target signal and the conjugate transpose matrix, and taking the expected value as the corresponding covariance matrix.

5. The direct sound detection method of claim 4, wherein the step of performing eigenvalue decomposition on the covariance matrix to obtain all eigenvalues and maximum eigenvalues of the covariance matrix comprises:

decomposing the covariance matrix into an eigenvector matrix of the covariance matrix, a diagonal matrix formed by arranging eigenvalues of the covariance matrix from big to small, and a conjugate transpose matrix of the eigenvector matrix;

and obtaining all eigenvalues and maximum eigenvalues of the covariance matrix according to the diagonal matrix, wherein the eigenvalue is a main diagonal element of the diagonal matrix.

6. The direct sound detection method of claim 1, wherein the step of acquiring the acquired array signal by the receiving microphone array and acquiring an array frequency domain signal according to the array signal comprises:

receiving an array signal acquired by a microphone array;

sequentially performing framing processing and windowing operation on the array signals to obtain a plurality of time domain signals;

converting the plurality of time domain signals into a plurality of frequency domain signals using a fast fourier transform;

and obtaining a sum value of a product of a steering vector matrix and a sound source signal and a noise signal according to the frequency domain signal, and taking the sum value as the array frequency domain signal.

7. The direct sound detection method according to claim 1, wherein the step of screening out a first target frequency point with a signal-to-noise ratio higher than a first threshold value from the array frequency domain signals, and obtaining a first target signal of a position of the first target frequency point from the array frequency domain signals comprises:

acquiring a frequency domain signal of one microphone in the array frequency domain signals, and acquiring the power of each frequency point in the frequency domain signal of the microphone;

screening out a third target frequency point with the power larger than the first threshold value from the frequency domain signals of the microphone;

and screening out a first target frequency point with the same position as the third target frequency point from the array frequency domain signals, and obtaining a first target signal of the position of the first target frequency point from the array frequency domain signals.

8. The direct sound detection method according to claim 7, wherein the step of screening out the third target frequency point with the power greater than the first threshold value from the frequency domain signal of the microphone comprises:

obtaining a minimum power value of the frequency point within a preset time threshold according to the power, and taking the minimum power value as noise power;

and obtaining the product of the first multiple and the noise power, and taking the product as the first threshold value.

9. A direct sound detection system comprising a memory and a processor coupled to each other, the memory having stored therein program instructions for execution by the processor to implement the direct sound detection method of any one of claims 1 to 8.

10. A computer-readable storage medium, characterized in that the computer-readable storage medium stores a computer program for implementing the direct sound detection method of any one of claims 1 to 8.

Technical Field

The present application relates to the field of speech signal processing technologies, and in particular, to a direct sound detection method, system, and computer-readable storage medium.

Background

In daily life, acoustic devices have been commonly used in human life. In the face of a complex actual scene, a single microphone is not satisfactory, and in order to achieve higher-quality voice communication, a microphone array is adopted to process voice signals. Sound source localization is a very important subject in a microphone array, but the sound source localization effect is often affected by reverberation and noise, especially in a reverberation environment, the error of sound source localization is very large, and the conventional sound source localization method often cannot achieve an ideal effect. When sound source positioning is performed in a reverberation and noise environment, if a direct sound signal with less noise and reverberation pollution can be obtained, the sound source positioning effect can be remarkably improved.

The currently adopted method is to directly select the maximum power point on the power spectrum as a candidate point for estimating the azimuth, but the method does not combine the noise spectrum to select the frequency point and does not consider the influence of noise and reverberation interference in the actual scene. Therefore, a new direct sound detection method is needed to solve the above problems.

Disclosure of Invention

The technical problem mainly solved by the application is to provide a direct sound detection method, a direct sound detection system and a computer readable storage medium, so that frequency points with high signal-to-noise ratio and less influence of reverberation can be obtained.

In order to solve the technical problem, the application adopts a technical scheme that: provided is a direct sound detection method including: receiving an array signal acquired by a microphone array, and acquiring an array frequency domain signal according to the array signal; screening a first target frequency point with the signal-to-noise ratio higher than a first threshold value from the array frequency domain signals, and obtaining a first target signal of the position of the first target frequency point from the array frequency domain signals; obtaining a corresponding covariance matrix according to the first target signal, and performing eigenvalue decomposition on the covariance matrix to obtain all eigenvalues and the maximum eigenvalue of the covariance matrix; and screening out a second target frequency point with direct signal energy larger than reverberation signal energy from the first target frequency points according to all the characteristic values and the maximum characteristic value, so as to perform sound source positioning by using the second target frequency point.

The step of screening out a second target frequency point with direct signal energy greater than reverberation signal energy from the first target frequency points according to all the characteristic values and the maximum characteristic values to perform sound source positioning by using the second target frequency point comprises the following steps: obtaining the ratio of the maximum characteristic value to the sum of all the characteristic values, and taking a first target frequency point corresponding to the ratio larger than a second threshold value as a second target frequency point; wherein the ratio is in the range of 0-1.

Before the step of obtaining the ratio of the maximum eigenvalue to the sum of all the eigenvalues and using the first target frequency point corresponding to the ratio larger than the second threshold value as the second target frequency point, the method includes: obtaining an index of a language function and a value of the language function according to the reverberation time; wherein the exponent is inversely proportional to the reverberation time; in response to the value of the language function being less than a first threshold value, treating the first threshold value as the second threshold value; in response to the value of the language function being greater than a second threshold value, treating the second threshold value as the second threshold value; in response to the value of the language function being greater than or equal to the first threshold value and less than or equal to the second threshold value, treating the value of the language function as the second threshold value.

Wherein the step of obtaining a corresponding covariance matrix according to the first target signal comprises: acquiring a corresponding conjugate transpose matrix according to the first target signal; and obtaining an expected value of the product of the first target signal and the conjugate transpose matrix, and taking the expected value as the corresponding covariance matrix.

Wherein the step of performing eigenvalue decomposition on the covariance matrix to obtain all eigenvalues and a maximum eigenvalue of the covariance matrix comprises: decomposing the covariance matrix into an eigenvector matrix of the covariance matrix, a diagonal matrix formed by arranging eigenvalues of the covariance matrix from big to small, and a conjugate transpose matrix of the eigenvector matrix; and obtaining all eigenvalues and maximum eigenvalues of the covariance matrix according to the diagonal matrix, wherein the eigenvalue is a main diagonal element of the diagonal matrix.

The receiving method comprises the following steps of receiving an array signal acquired by a microphone array, and acquiring an array frequency domain signal according to the array signal, wherein the receiving method comprises the following steps: receiving an array signal acquired by a microphone array; sequentially performing framing processing and windowing operation on the array signals to obtain a plurality of time domain signals; converting the plurality of time domain signals into a plurality of frequency domain signals using a fast fourier transform; and obtaining a sum value of a product of a steering vector matrix and a sound source signal and a noise signal according to the frequency domain signal, and taking the sum value as the array frequency domain signal.

The step of screening out a first target frequency point with a signal-to-noise ratio higher than a first threshold value from the array frequency domain signal and obtaining a first target signal of the position of the first target frequency point from the array frequency domain signal includes: acquiring a frequency domain signal of one microphone in the array frequency domain signals, and acquiring the power of each frequency point in the frequency domain signal of the microphone; screening out a third target frequency point with the power larger than the first threshold value from the frequency domain signals of the microphone; and screening out a first target frequency point with the same position as the third target frequency point from the array frequency domain signals, and obtaining a first target signal of the position of the first target frequency point from the array frequency domain signals.

Before the step of screening out the third target frequency point with the power greater than the first threshold value from the frequency domain signal of the microphone, the method includes: obtaining a minimum power value of the frequency point within a preset time threshold according to the power, and taking the minimum power value as noise power; and obtaining the product of the first multiple and the noise power, and taking the product as the first threshold value.

In order to solve the above technical problem, another technical solution adopted by the present application is: there is provided a direct sound detection system comprising a memory and a processor coupled to each other, the memory having stored therein program instructions, the processor being configured to execute the program instructions to implement the direct sound detection method mentioned in any of the embodiments above.

In order to solve the above technical problem, the present application adopts another technical solution: there is provided a computer-readable storage medium storing a computer program for implementing the direct sound detection method mentioned in any one of the above embodiments.

Different from the prior art, the beneficial effects of the application are that: according to the method and the device, the array signals acquired by the microphone array are received, the array frequency domain signals are acquired according to the array signals, the first target frequency points with high signal-to-noise ratio are screened out from the array frequency domain signals, then the covariance matrix is subjected to eigenvalue decomposition, all eigenvalues and maximum eigenvalues of the covariance matrix are acquired, the second target frequency points which are less affected by reverberation and are dominated by direct sound are screened out from the first target frequency points according to all eigenvalues and maximum eigenvalues, finally, the frequency points with high signal-to-noise ratio and small influence of reverberation can be acquired, and therefore the accuracy of subsequent sound source positioning is improved.

Drawings

In order to more clearly illustrate the technical solutions in the embodiments of the present application, the drawings needed to be used in the description of the embodiments are briefly introduced below, and it is obvious that the drawings in the following description are only some embodiments of the present application, and it is obvious for those skilled in the art to obtain other drawings based on these drawings without creative efforts.

Wherein:

FIG. 1 is a schematic flow diagram of one embodiment of a direct sound detection method of the present application;

FIG. 2 is a schematic flow chart illustrating an embodiment of step S1 in FIG. 1;

FIG. 3 is a schematic flow chart illustrating an embodiment of step S2 in FIG. 1;

FIG. 4 is a schematic flow chart illustrating an embodiment of the method before step S21 in FIG. 3;

FIG. 5 is a schematic flow chart illustrating one embodiment of a corresponding step in step S3 of FIG. 1;

FIG. 6 is a schematic flow chart illustrating one embodiment of a corresponding step in step S3 of FIG. 1;

FIG. 7 is a flowchart illustrating an embodiment of a step prior to the step corresponding to step S4 in FIG. 1;

FIG. 8 is a block diagram of a framework of an embodiment of the direct sound detection system of the present application;

FIG. 9 is a schematic block diagram of an embodiment of the direct sound detection system of the present application;

FIG. 10 is a block diagram of an embodiment of a computer-readable storage medium of the present application.

Detailed Description

The technical solutions in the embodiments of the present application will be clearly and completely described below with reference to the drawings in the embodiments of the present application, and it is obvious that the described embodiments are only a part of the embodiments of the present application, and not all the embodiments. All other embodiments, which can be derived by a person skilled in the art from the embodiments given herein without making any creative effort, shall fall within the protection scope of the present application.

Referring to fig. 1, fig. 1 is a schematic flow chart of an embodiment of a direct sound detection method according to the present application. Specifically, the direct sound detection method includes:

s1: and receiving an array signal acquired by the microphone array, and acquiring an array frequency domain signal according to the array signal.

Specifically, in the present embodiment, please refer to fig. 2, and fig. 2 is a flowchart illustrating an implementation manner of step S1 in fig. 1. Specifically, step S1 includes:

s10: and receiving an array signal acquired by the microphone array.

S11: and sequentially performing framing processing and windowing operation on the array signals to obtain a plurality of time domain signals.

S12: the plurality of time domain signals are converted to a plurality of frequency domain signals using a fast fourier transform.

S13: and obtaining the sum value of the product of the steering vector matrix and the sound source signal and the noise signal according to the frequency domain signal, and taking the sum value as the array frequency domain signal.

Specifically, in the present embodiment, a steering vector matrix, a sound source signal, and a noise signal are obtained from a frequency domain signal, and a frequency domain signal model is obtained from the steering vector matrix, the sound source signal, and the noise signal. Specifically, the frequency domain signal model is:

array frequency domain signals: x (τ, f)_i)＝A(f_i,θ)S(τ,f_i)+N(τ,f_i)

A steering vector matrix:

A(f_i,θ)＝[a_s1(f_i,θ₁),a_s2(f_i,θ₂),L,a_sk+1(f_i,θ_k+1)]∈C^M*(k+1)

sound source signal: s (τ, f)_i)＝[s₁(τ,f_i),s₂(τ,f_i),L,s_k+1(τ,f_i)]^T∈C^(k+1)*1

Noise signal: n (τ, f)_i)＝[n₁(τ,f_i),n₂(τ,f_i),L,n_M(τ,f_i)]^T∈C^M*1

Array flow pattern:

where τ indicates the delay τ, θ required to reach the first microphone by the sound source_iIndicating that the ith target is located at theta_iDirection, f_iDenotes the f th_iAnd the frequency points are arranged, d represents the array element interval, T represents transposition, and the total number of k +1 targets and M array elements are arranged.

S2: and screening a first target frequency point with the signal-to-noise ratio higher than a first threshold value from the array frequency domain signals, and obtaining a first target signal of the position of the first target frequency point from the array frequency domain signals.

Specifically, in the present embodiment, please refer to fig. 3, and fig. 3 is a flowchart illustrating an implementation manner of step S2 in fig. 1. Specifically, step S2 includes:

s20: and acquiring the frequency domain signal of one of the microphones in the array frequency domain signals, and acquiring the power of each frequency point in the frequency domain signal of the microphone.

Specifically, a frequency domain signal of one microphone in a microphone array is obtained, and power P (t, f) of each frequency point in the frequency domain signal of the microphone is obtained through calculation_i) Wherein t and f_iIndicating the frequency point f at time t_iOf the power of (c). Power P (t, f) of frequency point_i) The calculation method is the prior art, and is not described herein again.

S21: and screening a third target frequency point with power larger than a first threshold value from the frequency domain signals of the microphone.

Specifically, in the present embodiment, please refer to fig. 4, and fig. 4 is a flowchart illustrating an embodiment before step S21 in fig. 3. Specifically, the step S21 is preceded by:

s210: and obtaining the minimum power value of the frequency point within a preset time threshold according to the power, and taking the minimum power value as the noise power.

Specifically, considering the non-stationarity of voice and the stationarity of noise, the minimum power value of each frequency point in the last 3 seconds is counted as the estimated noise power:

P_noise(t,f_i)＝min(P(t₀,f_i)),t-3≤t₀≤t

of course, the preset time threshold may also be set to other values according to user requirements, and the application is not limited herein.

S211: and obtaining the product of the first multiple and the noise power, and taking the product as a first threshold value.

Specifically, in the present embodiment, the first multiple is set to 3, using the noise power P_noise(t,f_i) Calculating a first threshold value: p_th(f_i)＝3*P_noise(t,f_i). Of course, in other embodiments, the first multiple may be set to other values, and the application is not limited herein.

Specifically, returning to step S21, power P (t, f) is filtered from the frequency domain signal of the microphone_i) Greater than a first threshold value P_th(f_i) Third target frequency point of (1), denoted as f_j。

S22: and screening a first target frequency point with the same position as the third target frequency point from the array frequency domain signals, and obtaining the first target signal of the position of the first target frequency point from the array frequency domain signals.

Specifically, from the array frequency domain signal X (τ, f)_i) Middle screening and third target frequency point f_jFirst target frequency points with the same position and array frequency domain signals X (tau, f)_i) Obtaining a first target signal of the position of the first target frequency point, and recording as X (tau, f)_j)。

S3: and acquiring a corresponding covariance matrix according to the first target signal, and performing eigenvalue decomposition on the covariance matrix to acquire all eigenvalues and the maximum eigenvalue of the covariance matrix.

Specifically, in the present embodiment, please refer to fig. 5, and fig. 5 is a flowchart illustrating an embodiment of a step corresponding to step S3 in fig. 1. Specifically, the step of obtaining the corresponding covariance matrix according to the first target signal in step S3 includes:

s30: and acquiring a conjugate transpose matrix corresponding to the first target signal according to the first target signal.

In particular, from the first target signal X (τ, f)_j) Obtaining a first target signal X (tau, f)_j) Corresponding conjugate transpose matrix X^H(τ,f_j) Where H denotes a conjugate transpose.

S31: and obtaining an expected value of the product of the first target signal and the conjugate transpose matrix, and taking the expected value as a corresponding covariance matrix.

In particular, from the first target signal X (τ, f)_j) And a conjugate transpose matrix X^H(τ,f_j) Calculating to obtain a corresponding covariance matrix R (f)_j)：

R(f_j)＝E{X(τ,f_j)X^H(τ,f_j)}

Wherein E { } represents a desired value.

Specifically, in the present embodiment, please refer to fig. 6, and fig. 6 is a flowchart illustrating an embodiment of a step corresponding to step S3 in fig. 1. Specifically, the step of performing eigenvalue decomposition on the covariance matrix in step S3 to obtain all eigenvalues and the maximum eigenvalue of the covariance matrix includes:

s40: and decomposing the covariance matrix into an eigenvector matrix of the covariance matrix, a diagonal matrix formed by arranging eigenvalues of the covariance matrix from large to small, and a conjugate transpose matrix of the eigenvector matrix.

Specifically, for the covariance matrix R (f)_j) And (3) carrying out characteristic value decomposition:

R(f_j)＝U(f_j)Λ(f_j)U^H(f_j)

wherein, U (f)_j) Represents a covariance matrix R (f)_j) A feature vector matrix of (a); Λ (f)_j) A diagonal matrix, U, formed by arranging eigenvalues of the covariance matrix in descending order^H(f_j) Represents the feature vector matrix U (f)_j) The conjugate transpose matrix of (2).

S41: and obtaining all eigenvalues and the maximum eigenvalue of the covariance matrix according to the diagonal matrix.

Specifically, the covariance matrix R (f)_j) All characteristic values ofIs a diagonal matrix Λ (f)_j) From all eigenvaluesThe maximum characteristic value is selected

S4: and screening out a second target frequency point with the direct signal energy larger than the reverberation signal energy from the first target frequency points according to all the characteristic values and the maximum characteristic value so as to position the sound source by using the second target frequency point.

Specifically, in the present embodiment, step S4 includes: obtaining maximum featuresValue ofWith all characteristic valuesRatio r of the sums_j：Will be greater than the second threshold value r_thThe first target frequency point corresponding to the ratio is used as a second target frequency point. Specifically, the above ratio is in the range of 0 to 1.

Considering that when only a single sound source exists, the energy of the reverberation signal incident to the array from a non-target angle is less than that of the direct signal, that is, the signal at the position of the second target frequency point is the direct signal, and the rest signals are signals generated by reverberation. The energy of the direct signal is larger than that of the reverberation signal at some frequency points, and the energy is larger than that of the reverberation signal according to all characteristic valuesAnd maximum eigenvalueAnd screening out a second target frequency point which is less influenced by reverberation and is mainly influenced by direct sound from the first target frequency points, wherein the second target frequency point is the frequency point which is mainly influenced by the direct sound, and better estimation performance can be realized by using the frequency point to perform sound source positioning, so that the frequency point which is higher in signal-to-noise ratio and less influenced by the reverberation can be finally obtained, and the accuracy of subsequent sound source positioning is improved.

Specifically, in the present embodiment, please refer to fig. 7, and fig. 7 is a flowchart illustrating an embodiment before the step corresponding to step S4 in fig. 1. Specifically, before the step of obtaining the ratio of the maximum eigenvalue to the sum of all eigenvalues and taking the first target frequency point corresponding to the ratio greater than the second threshold value as the second target frequency point, the method includes:

s50: and obtaining the index of the language function and the value of the language function according to the reverberation time.

In particular, the index of the linguistic function and the reverberation time T₆₀Is inversely proportional toIn the present embodiment, T₆₀Is the known reverberation time. Of course, in other embodiments, the reverberation time T₆₀Or may be calculated by other methods. In addition, in the present embodiment, the value of the language function is

S51: and judging the magnitude relation between the value of the language function and the first threshold value and the second threshold value.

Specifically, in the present embodiment, the first threshold value and the second threshold value are set to 0.1 and 0.8, respectively. Of course, the present application does not limit the values of the first threshold and the second threshold. Determining a value of a linguistic functionThe magnitude relation with the first threshold and the second threshold is shown in the following formula:

s52: and when the value of the language function is smaller than the first threshold value, taking the first threshold value as a second threshold value.

Specifically, in the present embodiment, when the value of the language function is usedLess than 0.1, the second threshold value r_th＝0.1。

S53: and when the value of the language function is larger than the second threshold value, taking the second threshold value as the second threshold value.

Specifically, in the present embodiment, when the value of the language function is usedWhen the value is more than 0.8, the second threshold value r_th＝0.8。

S54: and when the value of the language function is greater than or equal to the first threshold value and less than or equal to the second threshold value, taking the value of the language function as the second threshold value.

Specifically, in the present embodiment, when the value of the language function is usedGreater than or equal to 0.1 and less than or equal to 0.8, the second threshold value

According to reverberation time T₆₀Calculating a second threshold value r_thWhen the reverberation is heavier, the second threshold value r_thThe higher will be; the second threshold value r is set as the reverberation is lighter_thThe lower will be. This allows the second threshold value r to be determined as a function of the actual situation_thTherefore, frequency points which are less affected by reverberation are screened out, and the accuracy of subsequent sound source positioning is improved.

Referring to fig. 8, fig. 8 is a block diagram of an embodiment of a direct sound detection system according to the present application. The direct sound detection system specifically includes:

the acquisition module 10 is configured to receive an array signal acquired by a microphone array, and acquire an array frequency domain signal according to the array signal.

The first screening module 12 is coupled to the obtaining module 10, and configured to screen a first target frequency point with a signal-to-noise ratio higher than a first threshold value from the array frequency domain signal.

And the processing module 14 is coupled to the first screening module 12 and configured to obtain a first target signal at a position of a first target frequency point from the array frequency domain signal. Of course, the processing module 14 is further configured to obtain a corresponding covariance matrix according to the first target signal, and perform eigenvalue decomposition on the covariance matrix to obtain all eigenvalues and maximum eigenvalues of the covariance matrix.

And the second screening module 16 is coupled to the processing module 14, and configured to screen, from the first target frequency points, a second target frequency point where energy of the direct signal is greater than energy of the reverberation signal according to all the characteristic values and the maximum characteristic value, so as to perform sound source positioning by using the second target frequency point.

Referring to fig. 9, fig. 9 is a schematic structural diagram of an embodiment of the direct sound detection system of the present application. The direct sound detection system comprises a memory 20 and a processor 22 coupled to each other. Specifically, in the present embodiment, the memory 20 stores program instructions, and the processor 22 is configured to execute the program instructions to implement the direct sound detection method mentioned in any of the above embodiments.

Specifically, the processor 22 may also be referred to as a CPU (Central Processing Unit). The processor 22 may be an integrated circuit chip having signal processing capabilities. The Processor 22 may also be a general purpose Processor, a Digital Signal Processor (DSP), an Application Specific Integrated Circuit (ASIC), a Field Programmable Gate Array (FPGA) or other Programmable logic device, discrete Gate or transistor logic, discrete hardware components. A general purpose processor may be a microprocessor or the processor may be any conventional processor or the like. In addition, processor 22 may be commonly implemented by a plurality of integrated circuit chips.

Referring to fig. 10, fig. 10 is a block diagram illustrating an embodiment of a computer-readable storage medium according to the present application. The computer-readable storage medium 30 stores a computer program 300 that can be read by a computer, and the computer program 300 can be executed by a processor to implement the direct sound detection method mentioned in any of the above embodiments. The computer program 300 may be stored in the computer-readable storage medium 30 in the form of a software product, and includes several instructions for causing a computer device (which may be a personal computer, a server, or a network device) or a processor (processor) to execute all or part of the steps of the method according to the embodiments of the present application. The computer-readable storage medium 30 having a storage function may be various media capable of storing program codes, such as a usb disk, a removable hard disk, a Read-Only Memory (ROM), a Random Access Memory (RAM), a magnetic disk or an optical disk, or may be a terminal device, such as a computer, a server, a mobile phone, or a tablet.

In summary, different from the situation in the prior art, in the present application, an array signal acquired by a microphone array is received, an array frequency domain signal is acquired according to the array signal, a first target frequency point with a high signal-to-noise ratio is screened out from the array frequency domain signal, then, a covariance matrix is subjected to eigenvalue decomposition to obtain all eigenvalues and maximum eigenvalues of the covariance matrix, a second target frequency point which is less affected by reverberation and is dominant in direct sound is screened out from the first target frequency point according to all eigenvalues and maximum eigenvalues, and finally, a frequency point with a high signal-to-noise ratio and less affected by reverberation can be obtained, so that the accuracy of subsequent sound source positioning is improved.

The above description is only for the purpose of illustrating embodiments of the present application and is not intended to limit the scope of the present application, and all modifications of equivalent structures and equivalent processes, which are made by the contents of the specification and the drawings of the present application or are directly or indirectly applied to other related technical fields, are also included in the scope of the present application.

15页详细技术资料下载

Direct sound detection method, system and computer readable storage medium

相关技术

网友询问留言