Identity authentication method applied to voiceprint recognition in electric power field

文档序号：1955187 发布日期：2021-12-10 浏览：16次中文

阅读说明：本技术 一种应用于电力领域中声纹识别的身份认证方法 (Identity authentication method applied to voiceprint recognition in electric power field ) 是由王治华高峰陈宏福于 2020-06-09 设计创作，主要内容包括：本发明涉及基于人工智能的电力调度领域,具体为一种应用于电力领域中声纹识别的身份认证方法,其不同之处在于,包括以下步骤：步骤1：采集语音内容形成语音信号,进行预处理；步骤2：对预处理后的语音信号进行快速傅立叶变换,得到其在频谱上的能量分布；步骤3：对语音信号频谱做模平方处理,得到语音信号的功率谱；步骤4：对语音信号进行滤波；步骤5：确定每个滤波器组输出的对数能量；步骤6：将对数能量求离散余弦变换,得到梅尔倒谱参数；步骤7：通过在倒谱域减去估计的信道噪声均值,使得带噪语音特征尽可能接近于零；步骤8：动态特征提取；步骤9：利用相似度计算完成身份认证。本发明有效提升身份认证的准确率。(The invention relates to the field of power dispatching based on artificial intelligence, in particular to an identity authentication method applied to voiceprint recognition in the field of power, which is characterized by comprising the following steps of: step 1: collecting voice content to form a voice signal, and preprocessing the voice signal; step 2: carrying out fast Fourier transform on the preprocessed voice signal to obtain the energy distribution of the voice signal on a frequency spectrum; and step 3: performing modular square processing on a voice signal frequency spectrum to obtain a power spectrum of the voice signal; and 4, step 4: filtering the voice signal; and 5: determining the logarithmic energy of each filter bank output; step 6: discrete cosine transform is solved for the logarithmic energy to obtain Mel cepstrum parameters; and 7: subtracting the estimated channel noise mean value in the cepstrum domain to make the characteristic of the voice with noise as close to zero as possible; and 8: extracting dynamic characteristics; and step 9: and (5) completing identity authentication by utilizing similarity calculation. The invention effectively improves the accuracy of identity authentication.)

1. An identity authentication method applied to voiceprint recognition in the field of electric power is characterized by comprising the following steps:

step 1: collecting voice content to form a voice signal, and preprocessing the voice signal;

step 2: carrying out fast Fourier transform on the preprocessed voice signal to obtain the energy distribution of the voice signal on a frequency spectrum;

and step 3: performing modular square processing on the voice signal frequency spectrum obtained in the step (2) to obtain a power spectrum of the voice signal;

and 4, step 4: defining a filter bank and filtering the voice signal by passing the power spectrum through the filter bank;

and 5: determining the logarithmic energy of each filter bank output;

step 6: solving discrete cosine transform of the logarithmic energy obtained in the step 5 to obtain a Mel cepstrum parameter;

and 7: subtracting the estimated channel noise mean value in the cepstrum domain to enable the characteristic of the voice with noise to be close to zero as much as possible, thereby eliminating the adverse effect of the channel;

and 8: extracting dynamic characteristics;

and step 9: and (5) completing identity authentication by utilizing similarity calculation.

2. The identity authentication method applied to voiceprint recognition in the power field according to claim 1, wherein: the preprocessing in the step 1 includes end point detection processing, sampling quantization processing, framing processing, windowing processing and pre-emphasis processing.

3. The identity authentication method applied to voiceprint recognition in the power field according to claim 1, wherein: the logarithmic energy S (m) in the step 5 is calculated by the formula;

4. the identity authentication method applied to voiceprint recognition in the power field according to claim 1, wherein: the calculation formula of the mel cepstrum parameter c (n) in the step 6 is as follows:

wherein L has a value of 16.

5. The identity authentication method applied to voiceprint recognition in the power field according to claim 1, wherein: in step 8, the dynamic features and the static features together form feature parameters of the speech signal.

6. The identity authentication method applied to voiceprint recognition in the power field according to claim 1, wherein: in step 9, the similarity between the verification voiceprint code feature vector and the registration voiceprint code feature vector needs to be determined.

Technical Field

The invention relates to the field of power dispatching based on artificial intelligence, in particular to an identity authentication method applied to voiceprint recognition in the field of power.

Background

The electric power company serves as an electric power service platform and provides safe, economic, clean and sustainable electric power supply and service for the development of the economic society. Power scheduling requires not only the perfection of various devices, but also the application of intelligent technologies. In practical applications, background noise is a real challenge for speech recognition applications, and even if a speaker is in a quiet office environment, it is difficult to avoid certain noise during a telephone voice call. The voice recognition system has high-efficiency noise elimination capability so as to meet the application requirements of users in various environments.

In view of this, in order to overcome the shortcomings of the prior art, it is an urgent need in the art to provide an identity authentication method applied to voiceprint recognition in the power field.

Disclosure of Invention

The invention aims to overcome the defects of the prior art, provides the identity authentication method applied to voiceprint recognition in the field of electric power, effectively improves the accuracy of identity authentication, and improves the use experience.

In order to solve the technical problems, the invention provides an identity authentication method applied to voiceprint recognition in the field of electric power, which is characterized by comprising the following steps:

step 1: collecting voice content to form a voice signal, and preprocessing the voice signal;

step 2: carrying out fast Fourier transform on the preprocessed voice signal to obtain the energy distribution of the voice signal on a frequency spectrum;

and step 3: performing modular square processing on the voice signal frequency spectrum obtained in the step (2) to obtain a power spectrum of the voice signal;

and 4, step 4: defining a filter bank and filtering the voice signal by passing the power spectrum through the filter bank;

and 5: determining the logarithmic energy of each filter bank output;

step 6: solving discrete cosine transform of the logarithmic energy obtained in the step 5 to obtain a Mel cepstrum parameter;

and 8: extracting dynamic characteristics;

and step 9: and (5) completing identity authentication by utilizing similarity calculation.

According to the technical scheme, the preprocessing in the step 1 comprises end point detection processing, sampling quantization processing, framing processing, windowing processing and pre-emphasis processing.

According to the technical scheme, the calculation formula of the logarithmic energy S (m) in the step 5 is as follows;

according to the above technical solution, the calculation formula of the mel cepstrum parameter c (n) in step 6 is:

wherein L has a value of 16.

According to the above technical solution, in the step 8, the dynamic feature and the static feature together form a feature parameter of the speech signal.

According to the above technical solution, in step 9, the similarity between the verification voiceprint code feature vector and the registration voiceprint code feature vector needs to be determined.

Compared with the prior art, the identity authentication method applied to voiceprint recognition in the power field can effectively improve the accuracy of identity authentication and improve the use experience.

Drawings

FIG. 1 is a schematic overall flow chart of an embodiment of the present invention;

fig. 2 is a block diagram illustrating a detailed flow of data preprocessing according to an embodiment of the present invention.

Detailed Description

The technical solutions in the embodiments of the present invention will be clearly and completely described below with reference to the drawings in the embodiments of the present invention, and it is obvious that the described embodiments are only a part of the embodiments of the present invention, and not all of the embodiments. All other embodiments, which can be derived by a person skilled in the art from the embodiments given herein without making any creative effort, shall fall within the protection scope of the present invention.

Referring to fig. 1 and 2, the present invention discloses an identity authentication method for voiceprint recognition in the power field, which is different in that the method comprises the following steps:

step 1, collecting voice content to form a voice signal, and preprocessing the voice signal, including endpoint detection processing, sampling quantization processing, framing processing, windowing processing and pre-emphasis processing.

First the endpoint detection removes the mute signal. Then, sampling quantization is used to ensure the accuracy of the voice file, wherein the sampling frequency and the number of channels affect the size of the voice file, a high sampling frequency (i.e., 44100Hz) is used because the authentication is performed based on voice, and monaural voice data is collected.

The framing process is to assemble N sampling points into one observation unit, which is called a frame. Let s (N) be the signal after the frame, where N is 0, 1.

The window function adopted in the windowing process is a hamming window, the window function is set as w (n), and the windowed signal is S' (n), then:

wherein N is more than or equal to 0 and less than or equal to N-1, and a is 0.55.

S'(n)＝S(n)×W(n)

The pre-emphasis is realized by using a first-order pre-emphasis digital filter, and the pre-emphasized signal S is set₁(n)：

S₁(n)＝S'(n)-μS'(n-1)

Where μ is the pre-emphasis coefficient, and μ is 0.93. The high frequency part can be enhanced by pre-emphasis processing to flatten the frequency spectrum of the signal.

Step 2: for the preprocessed voice signal S₁(n) performing fast fourier transform, it is difficult to analyze the characteristics of the signal by the change of the signal in the time domain, so it is necessary to convert the signal into the frequency domain to observe the energy distribution, and the difference of the energy distribution indicates different voice characteristics.

For each frame of audio signal S₁(n) performing a fast Fourier transform,

the energy distribution x (k) of the spectrum is obtained, where k is 1, 2.

And step 3: performing modular square processing on the frequency spectrum of the voice signal obtained in the step two to obtain a power spectrum | X (k) of the voice signal²Wherein k is 1, 2.

And 4, step 4: a filter bank is defined having M triangular filters with center frequencies f (M), where k is 1, 2. Filtering the voice signal by the power spectrum obtained in the step three through a filter bank, wherein the frequency response of a triangular filter is H_m(k)

M sets of parameters H can be obtained_mWherein M is 1, 2.

And 5: calculating the logarithmic energy S (m) output by each filter bank;

step 6: obtaining discrete cosine transform from logarithmic energy S (m) in the step five, obtaining L-order Mel cepstrum parameter C (n)

Wherein L has a value of 16.

And 7: and obtaining a channel noise mean value by cepstrum mean value reduction, so that the characteristics of the voice with noise are close to zero as much as possible, and the adverse noise influence of the channel is eliminated.

And 8: and (5) extracting dynamic characteristics. The standard cepstral parameters only reflect the static characteristics of the speech parameters, and the dynamic characteristics of speech can be described by the difference spectrum of these static characteristics. Experiments prove that: the recognition performance of the system can be effectively improved by combining the dynamic and static characteristics. The calculation of the difference parameter may use the following formula

Wherein dt represents the tth first order difference; ct denotes the t-th cepstrum coefficient; q represents the order of the cepstral coefficient; k represents the time difference of the first derivative, taking 2. And substituting the result in the formula again to obtain a second-order difference parameter.

And step 9: and (4) utilizing the similarity calculation to finish identity authentication, wherein in the authentication stage, the similarity between the verification voiceprint code characteristic vector and the registration voiceprint code characteristic vector needs to be calculated. Assume that the verification and registration voiceprint codes are denoted by T and R, respectively, and are N and M in length, respectively. Then

T＝t₁,t₂,...,t_N

R＝r₁,r₂,...,r_M

The similarity between the voiceprints is compared by calculating the distance D ═ T, R between them, the smaller the distance the higher the similarity. And d (T (n), R (m)) represents the distance between the feature vectors of the two frames, wherein n and m are the frame numbers of any choice in T and R respectively. The cumulative sum of the distances between frames is the overall distance of the two sequences.

D(n,m)＝d(T(n),R(m))+min{D(n-1,m),D(n-1,m-1),D(n,m-1)}

The calculated accumulated distance D is the similarity difference between the feature vector of the verification voiceprint code and the feature vector of the registration voiceprint code, and the greater the distance is, the smaller the similarity is.

It should be noted that, in this document, terms such as "comprises," "comprising," or any other variation thereof, are intended to cover a non-exclusive inclusion, such that a process, method, article, or apparatus that comprises a list of elements does not include only those elements but may include other elements not expressly listed or inherent to such process, method, article, or apparatus. Without further limitation, an element defined by the phrase "comprising an … …" does not exclude the presence of other identical elements in a process, method, article, or apparatus that comprises the element.

Although embodiments of the present invention have been shown and described, it will be appreciated by those skilled in the art that changes, modifications, substitutions and alterations can be made in these embodiments without departing from the principles and spirit of the invention, the scope of which is defined in the appended claims and their equivalents.

7页详细技术资料下载

上一篇：一种医用注射器针头装配设备

下一篇：一种声纹识别方法及相关装置

Identity authentication method applied to voiceprint recognition in electric power field

相关技术

网友询问留言