Underdetermined blind source voice signal separation method based on single source point detection

文档序号:1398266 发布日期:2020-03-03 浏览:42次 中文

阅读说明:本技术 一种基于单源点检测的欠定盲源语音信号分离的方法 (Underdetermined blind source voice signal separation method based on single source point detection ) 是由 李一兵 王一凡 田园 郭小晨 吴静 叶方 孙骞 赵彤 于 2018-08-24 设计创作,主要内容包括:本发明提供了一种基于单源点检测的欠定盲源语音信号分离的方法,包括以下步骤:首先将线型麦克风阵列放于源信号的远场,得到多组接收信号数据;然后将接收信号数据进行时频域分析,构建时频域信号散点图;对每个点的横纵坐标作比值得到一组数据,进行聚类得到幅度的衰减参数;利用势函数聚类方法,得到势函数—衰减参数—时延参数三维散点图。利用子空间映射的方法,混合矩阵下完成源语音信号的恢复。本发明的核心内容在于利用盲源分离中的稀疏成分分析技术提出一种基于单源点检测的欠定盲源语音信号分离的方法,应用本发明可以在一定含噪环境下,对无回响时延混合模型的语音信号混叠进行有效的分离。该方法计算量较少,复杂度低,估计精度高,能达到预期目标。(The invention provides a method for separating underdetermined blind source voice signals based on single source point detection, which comprises the following steps: firstly, placing a linear microphone array in a far field of a source signal to obtain a plurality of groups of received signal data; then, carrying out time-frequency domain analysis on the received signal data to construct a time-frequency domain signal scatter diagram; taking a ratio of the horizontal coordinate and the vertical coordinate of each point to obtain a group of data, and clustering to obtain attenuation parameters of the amplitude; and obtaining a potential function-attenuation parameter-time delay parameter three-dimensional scatter diagram by using a potential function clustering method. And recovering the source speech signal under the mixed matrix by utilizing a subspace mapping method. The core content of the invention lies in that a method for underdetermined blind source voice signal separation based on single source point detection is provided by utilizing a sparse component analysis technology in blind source separation, and the application of the method can effectively separate voice signal aliasing of a non-echo time delay mixed model in a certain noisy environment. The method has the advantages of less calculation amount, low complexity and high estimation precision, and can achieve the expected target.)

1. A method for underdetermined blind source voice signal separation based on single source point detection is characterized in that: the method comprises the following specific steps:

the method comprises the following steps: firstly, a linear microphone array is placed in a far field of a source signal to be used as a sensor to receive a sound signal, and a plurality of groups of received signal data are obtained;

step two: then, performing time-frequency domain analysis on the received signal data, constructing a time-frequency domain signal scatter diagram, and screening by using a single-source point detection criterion to obtain a non-low-energy single-source time-frequency point meeting the condition;

step three: because the value of the attenuation coefficient corresponds to the slope of each single source point on the time-frequency domain scatter diagram, a group of data is obtained by taking the ratio of the horizontal coordinate and the vertical coordinate of each point, and the attenuation parameters of the amplitude are obtained by clustering;

step four: and (3) obtaining a potential function-attenuation parameter-time delay parameter three-dimensional scatter diagram by using a potential function clustering method, mapping by using the attenuation parameters obtained by the estimation in the step three to obtain a potential function-time delay parameter two-dimensional scatter diagram, and clustering to obtain the estimation of the time delay parameters.

Step five: and D, completing the recovery of the source speech signal by using a subspace mapping method under the condition that the estimated mixing matrix is obtained in the step four.

Technical Field

The invention belongs to the technical field of voice signal processing, and particularly relates to a method for separating underdetermined blind source voice signals based on single source point detection.

Background

In the vast history of human social development, the appearance of language can be said to be a milestone-like moment, and the gold development period is met by the human science and technology culture. In the world, with the rapid development of speech signal processing technology and related technologies, more and more speech products are going into our daily lives. Under a specific environment, how to separate the aliasing voice signals acquired from each sensor is an important link of the voice signal processing technology. Therefore, the blind source separation technology is used for separating the aliasing voice signals, so that the blind source separation technology is a work with great research value.

At present, experts and scholars at home and abroad use blind source separation technology to solve the problem of voice signal separation and have already carried out a series of researches. The most common situation in real life is that the number of mixed signals is less than that of source signals, and blind source signal separation under such conditions is called underdetermined blind source separation. The initial method requires sufficient sparsity of the source signal in the time domain or the time-frequency domain, and the classical algorithm such as the DUET algorithm. But as the number of source signals increases, the assumption of sufficient sparsity of the source signals becomes more difficult to satisfy. A hybrid parameter estimation algorithm based on single-source active interval detection, such as TIFROM algorithm and tifcor (Time-frequency Correlation) algorithm, has appeared later, and the key of such algorithm lies in the detection of single-source interval. Later, scholars put forward the concept of single source points, and further widen the sparsity assumption of source signals, and as long as each source signal has some discrete single-source time frequency points, the estimation of a mixing matrix can be realized, but most of the existing single-source point detection algorithms are only suitable for instantaneous linear mixing models. The invention provides a separation method of underdetermined blind source voice signals based on single source point detection aiming at a non-echo linear time delay mixed model.

Disclosure of Invention

The invention aims to provide a method for separating underdetermined blind source voice signals based on single-source point detection, which not only considers the amplitude attenuation of the signals, but also considers the time delay in the transmission process. Firstly, selecting single-source time frequency points meeting conditions according to a single-source point detection criterion, then carrying out clustering analysis to obtain attenuation in a mixed matrix, mapping the single-source points near the estimated attenuation parameters to a two-dimensional graph through a three-dimensional scatter diagram of the single-source points, carrying out clustering again to obtain time delay parameters, realizing automatic matching of the attenuation parameters and the time delay parameters, thereby completing estimation of the mixed matrix, and finally completing recovery of source signals through subspace projection. Under the condition that the obtained source signal prior information is insufficient, the separation process of the source speech signal is accurately and effectively finished, and therefore the purpose of preprocessing the speech signal is finished.

The technical scheme of the invention is as follows: a method for underdetermined blind source voice signal separation based on single source point detection comprises the following steps:

the method comprises the following steps: firstly, a linear microphone array is placed in a far field of a source signal to be used as a sensor to receive a sound signal, and a plurality of groups of received signal data are obtained;

step two: then, performing time-frequency domain analysis on the received signal data, constructing a time-frequency domain signal scatter diagram, and screening by using a single-source point detection criterion to obtain a non-low-energy single-source time-frequency point meeting the condition;

step three: because the value of the attenuation coefficient corresponds to the slope of each single source point on the time-frequency domain scatter diagram, a group of data is obtained by taking the ratio of the horizontal coordinate and the vertical coordinate of each point, and the attenuation parameters of the amplitude are obtained by clustering;

step four: and (3) obtaining a potential function-attenuation parameter-time delay parameter three-dimensional scatter diagram by using a potential function clustering method, mapping by using the attenuation parameters obtained by the estimation in the step three to obtain a potential function-time delay parameter two-dimensional scatter diagram, and clustering to obtain the estimation of the time delay parameters.

Step five: and D, completing the recovery of the source speech signal by using a subspace mapping method under the condition that the estimated mixing matrix is obtained in the step four.

The core content of the invention lies in that a method for underdetermined blind source voice signal separation based on single source point detection is provided by utilizing a sparse component analysis technology in blind source separation, and the application of the method can effectively separate voice signal aliasing of a non-echo time delay mixed model in a certain noisy environment. The method has the advantages of less calculation amount, low complexity and high estimation precision, and can achieve the expected target.

The present invention will be described in detail below with reference to the accompanying drawings and specific embodiments.

Drawings

Fig. 1 is a flow chart of underdetermined blind source speech signal separation.

FIG. 2 is a two-dimensional scattergram of spatial time-frequency distribution after single-source point detection.

FIG. 3 potential function

Figure BDA0001777247090000021

The three-dimensional scattergram of (a).

FIG. 4 is a two-dimensional projected scatter plot of phase shift parameters.

FIG. 5 is a two-dimensional projected scatter plot of phase shift parameters.

Detailed Description

The embodiment provides a method for separating an underdetermined blind source speech signal based on single source point detection, a flowchart of which is shown in fig. 1, and specifically includes the following steps:

the method comprises the following steps: firstly, a linear microphone array is placed in a far field of a source signal to be used as a sensor to receive a sound signal, and a plurality of groups of received signal data are obtained. Since our goal is to separate the aliased speech signal without obtaining any prior information, a series of signal processing will be performed, the specific flow is shown in fig. 1. For convenience of explanation in the specification, the number of sensors in the microphone array is set to 2 herein.

Figure BDA0001777247090000031

Figure BDA0001777247090000032

Wherein x is1(t) and x2(t) represents the mixed signal received by the 1 st and 2 nd sensors, sk(t) denotes the kth source signal αk(k=1,2,L,N),τk(k-1, 2, L, N) respectively represents the relative attenuation and relative delay from the kth source signal to the 2 nd receive array element,

Figure BDA0001777247090000034

is a phase shift parameter. The matrix A ∈ C2×NRepresented as a mixing matrix.

Step two: since the received mixed signal has a serious aliasing condition and poor sparsity under the influence of noise factors, time-frequency domain analysis needs to be performed on the received signal data through wigner distribution.

Figure BDA0001777247090000035

Figure BDA0001777247090000036

Wherein, Ws(t, f) and WxAnd (t, f) respectively represent the spatial time-frequency distribution of the source signal and the mixed signal.

Figure BDA0001777247090000037

Construction of two paths of mixed signals of time-frequency domain

Figure BDA0001777247090000038

The two-dimensional scattergram of (a). The detection criterion formula (7) of the single-source point can be obtained through the formulas (4), (5) and (6), and the non-low-energy single-source time-frequency point meeting the conditions can be obtained through screening according to the formula.

Figure BDA0001777247090000041

In order to reduce the negative effect of low-energy time-frequency points on clustering characteristics, low-energy single-source points are removed through an equation (8).

Figure BDA0001777247090000042

The two-dimensional scattergram after completing step three is shown in fig. 2.

Step three: since the value of the attenuation coefficient corresponds to the slope of each single source point on the scatter plot.

Figure BDA0001777247090000043

Therefore, a group of data is obtained by taking the ratio of the horizontal coordinate and the vertical coordinate of each point, and the DBSCAN clustering is carried out to obtain the attenuation parameter of the amplitude, namely the clustering center.

Step four: fig. 3 shows a three-dimensional scattergram of potential function, attenuation parameter, and delay parameter obtained by equations (10) and (11).

Figure BDA0001777247090000044

Figure BDA0001777247090000045

And (4) mapping by using the attenuation parameters obtained by estimation in the third step to obtain a potential function-time delay parameter two-dimensional scatter diagram as shown in figure 4, and clustering to obtain the estimation of the time delay parameters.

Step five: and calculating the shortest distance between each time-frequency point mixed signal vector and each column vector of the mixed matrix to be combined and expanded into a subspace by using a subspace mapping method, wherein each column vector of the subspace is the column vector of the mixed matrix corresponding to the time-frequency point when the shortest distance is lower than a threshold value. The source audio signal can be recovered by solving the pseudo-inverse of the frequency point mixing matrix at the moment. A comparison of the initial source signal and the recovered source signal is shown in fig. 5.

From fig. 5, we can find that the method can well separate the speech signal without obtaining enough a priori information.

Finally, it should be noted that the above examples are only intended to describe the technical solutions of the present invention and not to limit the technical methods, the present invention can be extended in application to other modifications, variations, applications and embodiments, and therefore all such modifications, variations, applications, embodiments are considered to be within the spirit and teaching scope of the present invention.

9页详细技术资料下载
上一篇:一种医用注射器针头装配设备
下一篇:一种音频信号缩放处理方法及装置

网友询问留言

已有0条留言

还没有人留言评论。精彩留言会获得点赞!

精彩留言,会给你点赞!