Identity recognition method and device and computer readable storage medium
阅读说明:本技术 身份识别方法、装置及计算机可读存储介质 (Identity recognition method and device and computer readable storage medium ) 是由 冯惠华 于 2019-10-10 设计创作,主要内容包括:本发明涉及人工智能技术,揭露了一种身份识别方法,包括:收集声纹样本集,建立声纹库;对所述声纹样本集进行预处理操作,得到文本相关声纹向量序列集和文本无关声纹向量序列集;将所述文本相关声纹向量序列集进行压缩处理,得到声纹码本集,基于在文本相关声纹识别场景中接收用户的声纹语音内容接收用户的声纹语音内容,根据所述声纹语音内容与声纹码本集的欧式距离识别出所述用户的身份;对所述文本无关声纹向量序列集提取梅尔频率倒谱系数,基于在文本无关声纹识别场景中接收用户的声纹语音内容,根据所述梅尔频率倒谱系数检测出所述用户的身份。本发明还提出一种身份识别装置以及一种计算机可读存储介质。本发明实现了身份的精准识别。(The invention relates to an artificial intelligence technology, and discloses an identity identification method, which comprises the following steps: collecting a voiceprint sample set and establishing a voiceprint library; preprocessing the voiceprint sample set to obtain a text-related voiceprint vector sequence set and a text-unrelated voiceprint vector sequence set; compressing the text-related voiceprint vector sequence set to obtain a voiceprint codebook set, receiving the voiceprint voice content of a user based on the voiceprint voice content of the user received in a text-related voiceprint recognition scene, and recognizing the identity of the user according to the Euclidean distance between the voiceprint voice content and the voiceprint codebook set; and extracting a Mel frequency cepstrum coefficient from the text-independent voiceprint vector sequence set, receiving voiceprint voice content of a user in a text-independent voiceprint recognition scene, and detecting the identity of the user according to the Mel frequency cepstrum coefficient. The invention also provides an identity recognition device and a computer readable storage medium. The invention realizes accurate identification of the identity.)
1. An identity recognition method, the method comprising:
collecting a voiceprint sample set, and establishing a voiceprint library, wherein the voiceprint library comprises a text-related voiceprint set and a text-unrelated voiceprint set;
carrying out preprocessing operation on the voiceprints in the voiceprint library to obtain a voiceprint vector sequence set, wherein the voiceprint vector sequence set comprises a text-related voiceprint vector sequence set and a text-unrelated voiceprint vector sequence set;
compressing the text-related voiceprint vector sequence set to obtain a voiceprint codebook set;
based on receiving voiceprint voice content of a user in a text-related voiceprint recognition scene, calculating Euclidean distance between the voiceprint voice content and the voiceprint codebook set, and recognizing identity information of the user according to the Euclidean distance;
extracting a Mel frequency inverse spectrum number set from the text-independent voiceprint vector sequence set;
receiving voiceprint voice content of a user in a text-independent voiceprint recognition scene, extracting a Mel frequency cepstrum coefficient of the user according to the voiceprint voice content of the user, and recognizing identity information of the user according to the Mel frequency cepstrum coefficient set.
2. The identity recognition method of claim 1, wherein the pre-processing the voiceprints in the voiceprint library to obtain a voiceprint vector sequence set comprises:
pre-emphasis is carried out on the voiceprints in the voiceprint library through a digital filter, and a high-frequency voiceprint set is obtained;
performing framing processing on the high-frequency voiceprint set according to a preset voiceprint frame length to obtain a framed high-frequency voiceprint set;
windowing the framing high-frequency voiceprint set by using a Hamming window to obtain a framing high-frequency voiceprint component sequence set, and denoising the voiceprint component sequence set by using a double-threshold end point to obtain the voiceprint vector sequence set.
3. The identification method of claim 1 wherein the method of calculating the euclidean distance between the voiceprint speech content and the voiceprint codebook set comprises:
wherein X represents the user's voiceprint speech content, Y represents a voiceprint codebook in a voiceprint codebook set, XiI-th voiceprint content, y, representing a useriRepresenting the ith voiceprint codebook in the set of voiceprint codebooks.
4. The identification method according to any of claims 1 to 3, wherein said extracting Mel frequency cepstral coefficients for the set of text-independent voiceprint vector sequences comprises:
carrying out Fourier transform on the text-independent voiceprint vector sequence set to obtain a frequency spectrum of the text-independent voiceprint vector sequence set, and calculating a power spectrum of the frequency spectrum;
and filtering the power spectrum by using a triangular filter, and performing power conversion on the filtered power spectrum to obtain the Mel frequency cepstrum coefficient.
5. The method of claim 4, wherein the step of performing power conversion on the filtered power spectrum to obtain the mel-frequency cepstral coefficients comprises:
wherein, Ci(k) Representing the mel-frequency cepstrum coefficient, L representing the order of MFCC, Pi(k) The power spectrum is shown, M is the number of sequences of the mel-frequency cepstrum, and M is the number of triangular filters.
6. An identification device comprising a memory and a processor, the memory having stored thereon an identification program executable on the processor, the identification program when executed by the processor implementing the steps of:
collecting a voiceprint sample set, and establishing a voiceprint library, wherein the voiceprint library comprises a text-related voiceprint set and a text-unrelated voiceprint set;
carrying out preprocessing operation on the voiceprints in the voiceprint library to obtain a voiceprint vector sequence set, wherein the voiceprint vector sequence set comprises a text-related voiceprint vector sequence set and a text-unrelated voiceprint vector sequence set;
compressing the text-related voiceprint vector sequence set to obtain a voiceprint codebook set;
based on receiving voiceprint voice content of a user in a text-related voiceprint recognition scene, calculating Euclidean distance between the voiceprint voice content and the voiceprint codebook set, and recognizing identity information of the user according to the Euclidean distance;
extracting a Mel frequency inverse spectrum number set from the text-independent voiceprint vector sequence set;
receiving voiceprint voice content of a user in a text-independent voiceprint recognition scene, extracting a Mel frequency cepstrum coefficient of the user according to the voiceprint voice content of the user, and recognizing identity information of the user according to the Mel frequency cepstrum coefficient set.
7. The identification apparatus according to claim 6, wherein the pre-processing the voiceprint in the voiceprint library to obtain a voiceprint vector sequence set comprises:
pre-emphasis is carried out on the voiceprints in the voiceprint library through a digital filter, and a high-frequency voiceprint set is obtained;
performing framing processing on the high-frequency voiceprint set according to a preset voiceprint frame length to obtain a framed high-frequency voiceprint set;
windowing the framing high-frequency voiceprint set by using a Hamming window to obtain a framing high-frequency voiceprint component sequence set, and denoising the voiceprint component sequence set by using a double-threshold end point to obtain the voiceprint vector sequence set.
8. The identification apparatus of claim 6 wherein the method of calculating the euclidean distance of the voiceprint voice content from the voiceprint codebook set comprises:
wherein X represents the user's voiceprint speech content, Y represents a voiceprint codebook in a voiceprint codebook set, XiI-th voiceprint content, y, representing a useriPresentation soundThe ith voiceprint codebook in the set of voiceprint codebooks.
9. The identification apparatus according to any of claims 6 to 8 wherein said extracting mel-frequency cepstral coefficients for the set of text-independent voiceprint vector sequences comprises:
carrying out Fourier transform on the text-independent voiceprint vector sequence set to obtain a frequency spectrum of the text-independent voiceprint vector sequence set, and calculating a power spectrum of the frequency spectrum;
and filtering the power spectrum by using a triangular filter, and performing power conversion on the filtered power spectrum to obtain the Mel frequency cepstrum coefficient.
10. A computer-readable storage medium, having stored thereon an identification program executable by one or more processors to perform the steps of the identification method of any one of claims 1 to 5.
Technical Field
The invention relates to the technical field of artificial intelligence, in particular to an identity recognition method and device based on user behavior cooperation and a computer readable storage medium.
Background
Voiceprint recognition is also called speaker recognition, is a kind of biological recognition, and is a technology for automatically recognizing the identity of a speaker according to voiceprint parameters reflecting physiological and behavioral characteristics of the speaker in a voiceprint waveform. Each person, no matter how similar the simulated person's speech is, has a unique voiceprint, which is formed by each person's vocal organs during their growth. At present, as colleges and universities continue to expand, more and more students bring certain pressure to school management. In the scene of class attendance, the attendance is checked in manually by depending on the on-site roll call of a teacher, so that on one hand, inconvenience is brought to the teacher and attendance management is inconvenient, and on the other hand, manual attendance is also possible to be counterfeited, and the like; in the book borrowing scene, the original mode is to use the student card to register in the system, on one hand, the student card needs to be handled, and on the other hand, the book borrowing management also needs the auxiliary registration of a manager; in the scene of entrance guard management, originally, all brush student's card, the student often appears forgetting to take the card and the card is taken unchangeably in the package etc. the condition.
Disclosure of Invention
The invention provides an identity recognition method, an identity recognition device and a computer readable storage medium, and mainly aims to present an accurate identity recognition result to a user when the user performs identity recognition.
In order to achieve the above object, the present invention provides an identity recognition method, comprising:
collecting a voiceprint sample set, and establishing a voiceprint library, wherein the voiceprint library comprises a text-related voiceprint set and a text-unrelated voiceprint set;
carrying out preprocessing operation on the voiceprints in the voiceprint library to obtain a voiceprint vector sequence set, wherein the voiceprint vector sequence set comprises a text-related voiceprint vector sequence set and a text-unrelated voiceprint vector sequence set;
compressing the text-related voiceprint vector sequence set to obtain a voiceprint codebook set;
based on receiving voiceprint voice content of a user in a text-related voiceprint recognition scene, calculating Euclidean distance between the voiceprint voice content and the voiceprint codebook set, and recognizing identity information of the user according to the Euclidean distance;
extracting a Mel frequency inverse spectrum number set from the text-independent voiceprint vector sequence set;
receiving voiceprint voice content of a user in a text-independent voiceprint recognition scene, extracting a Mel frequency cepstrum coefficient of the user according to the voiceprint voice content of the user, and recognizing identity information of the user according to the Mel frequency cepstrum coefficient set.
Optionally, the preprocessing the voiceprint in the voiceprint library to obtain a voiceprint vector sequence set includes:
pre-emphasis is carried out on the voiceprints in the voiceprint library through a digital filter, and a high-frequency voiceprint set is obtained;
performing framing processing on the high-frequency voiceprint set according to a preset voiceprint frame length to obtain a framed high-frequency voiceprint set;
windowing the framing high-frequency voiceprint set by using a Hamming window to obtain a framing high-frequency voiceprint component sequence set, and denoising the voiceprint component sequence set by using a double-threshold end point to obtain the voiceprint vector sequence set.
Optionally, the method for calculating the euclidean distance between the voiceprint speech content and the voiceprint codebook set includes:
wherein X represents the user's voiceprint speech content, Y represents a voiceprint codebook in a voiceprint codebook set, XiI-th voiceprint content, y, representing a useriRepresenting the ith voiceprint codebook in the set of voiceprint codebooks.
Optionally, the extracting mel-frequency cepstral coefficients for the set of text-independent voiceprint vector sequences comprises:
carrying out Fourier transform on the text-independent voiceprint vector sequence set to obtain a frequency spectrum of the text-independent voiceprint vector sequence set, and calculating a power spectrum of the frequency spectrum;
and filtering the power spectrum by using a triangular filter, and performing power conversion on the filtered power spectrum to obtain the Mel frequency cepstrum coefficient.
Optionally, the method for obtaining the mel-frequency cepstrum coefficient after performing power conversion on the filtered power spectrum includes:
wherein, Ci(k) Representing the mel-frequency cepstrum coefficient, L representing the order of MFCC, Pi(k) The power spectrum is shown, M is the number of sequences of the mel-frequency cepstrum, and M is the number of triangular filters.
In addition, in order to achieve the above object, the present invention further provides an identification apparatus, which includes a memory and a processor, wherein the memory stores an identification program operable on the processor, and the identification program, when executed by the processor, implements the following steps:
collecting a voiceprint sample set, and establishing a voiceprint library, wherein the voiceprint library comprises a text-related voiceprint set and a text-unrelated voiceprint set;
carrying out preprocessing operation on the voiceprints in the voiceprint library to obtain a voiceprint vector sequence set, wherein the voiceprint vector sequence set comprises a text-related voiceprint vector sequence set and a text-unrelated voiceprint vector sequence set;
compressing the text-related voiceprint vector sequence set to obtain a voiceprint codebook set;
based on receiving voiceprint voice content of a user in a text-related voiceprint recognition scene, calculating Euclidean distance between the voiceprint voice content and the voiceprint codebook set, and recognizing identity information of the user according to the Euclidean distance;
extracting a Mel frequency inverse spectrum number set from the text-independent voiceprint vector sequence set;
receiving voiceprint voice content of a user in a text-independent voiceprint recognition scene, extracting a Mel frequency cepstrum coefficient of the user according to the voiceprint voice content of the user, and recognizing identity information of the user according to the Mel frequency cepstrum coefficient set.
Optionally, the preprocessing the voiceprint in the voiceprint library to obtain a voiceprint vector sequence set includes:
pre-emphasis is carried out on the voiceprints in the voiceprint library through a digital filter, and a high-frequency voiceprint set is obtained;
performing framing processing on the high-frequency voiceprint set according to a preset voiceprint frame length to obtain a framed high-frequency voiceprint set;
windowing the framing high-frequency voiceprint set by using a Hamming window to obtain a framing high-frequency voiceprint component sequence set, and denoising the voiceprint component sequence set by using a double-threshold end point to obtain the voiceprint vector sequence set.
Optionally, the method for calculating the euclidean distance between the voiceprint speech content and the voiceprint codebook set includes:
wherein X represents the user's voiceprint speech content, Y represents a voiceprint codebook in a voiceprint codebook set, XiI-th voiceprint content, y, representing a useriRepresenting the ith voiceprint codebook in the set of voiceprint codebooks.
Optionally, the extracting mel-frequency cepstral coefficients for the set of text-independent voiceprint vector sequences comprises:
carrying out Fourier transform on the text-independent voiceprint vector sequence set to obtain a frequency spectrum of the text-independent voiceprint vector sequence set, and calculating a power spectrum of the frequency spectrum;
and filtering the power spectrum by using a triangular filter, and performing power conversion on the filtered power spectrum to obtain the Mel frequency cepstrum coefficient.
In addition, to achieve the above object, the present invention also provides a computer readable storage medium, which stores an identification program, wherein the identification program can be executed by one or more processors to implement the steps of the identification method as described above.
According to the identity recognition method, the identity recognition device and the computer readable storage medium, when a user performs identity recognition, a voiceprint library is established for a collected voiceprint sample set, and the identities of the user in a text-related voiceprint recognition scene and a text-unrelated voiceprint recognition scene are obtained by combining Euclidean distance and Mel frequency cepstrum coefficient, so that an accurate identity recognition result can be presented to the user.
Drawings
Fig. 1 is a schematic flow chart of an identity recognition method according to an embodiment of the present invention;
fig. 2 is a schematic diagram of an internal structure of an identification apparatus according to an embodiment of the present invention;
fig. 3 is a schematic block diagram of an identity recognition program in an identity recognition device according to an embodiment of the present invention.
The implementation, functional features and advantages of the objects of the present invention will be further explained with reference to the accompanying drawings.
Detailed Description
It should be understood that the specific embodiments described herein are merely illustrative of the invention and are not intended to limit the invention.
The invention provides an identity recognition method. Fig. 1 is a schematic flow chart of an identity recognition method according to an embodiment of the present invention. The method may be performed by an apparatus, which may be implemented by software and/or hardware.
In this embodiment, the identity recognition method includes:
and S1, collecting a voiceprint sample set, and establishing a voiceprint library, wherein the voiceprint library comprises a text-related voiceprint set and a text-unrelated voiceprint set.
In a preferred embodiment of the present invention, the voiceprint sample set can be a voiceprint set of a student at a school. Preferably, the voiceprint library is obtained by recording voiceprints of all students at school, and the recorded voiceprints of all students at school are stored in wav format.
Further, the preferred embodiment of the present invention divides the voiceprint library into a text-dependent voiceprint set and a text-independent voiceprint set. The text related voiceprint set is obtained by reading and recording the voice print set by the school students according to preset text contents, and is used for identity recognition in a text related voiceprint recognition scene. The voice print recognition scene related to the text can be a student identity recognition scene entering and exiting a campus dormitory, so that the preset text content can be XXX of a certain dormitory, and when a student enters and exits the campus dormitory, voice is input according to the text content recorded before, and then the student can effectively enter and exit the campus dormitory. The text-independent voiceprint set is obtained by inputting any voiceprint voice of the students at school, and the input is not required to be read according to preset text content.
S2, preprocessing the voiceprints in the voiceprint library to obtain a voiceprint vector sequence set, wherein the voiceprint vector sequence set comprises a text-related voiceprint vector sequence set and a text-unrelated voiceprint vector sequence set.
In the preferred embodiment of the present invention, the preprocessing operations include pre-emphasis, framing, windowing, and denoising. The pre-treatment operation comprises the following specific implementation steps: pre-emphasis is carried out on the voiceprints in the voiceprint library through a digital filter, and a high-frequency voiceprint set is obtained; performing framing processing on the high-frequency voiceprint set according to a preset voiceprint frame length to obtain a framed high-frequency voiceprint set; windowing the framing high-frequency voiceprint set by using a Hamming window to obtain a framing high-frequency voiceprint component sequence set, and denoising the voiceprint component sequence set by using a double-threshold end point to obtain the voiceprint vector sequence set.
Preferably, the digital filter of the present invention includes: h (z) ═ 1-. mu.z-1Wherein z is the value range of the voiceprint mu from 0.9 to 1.0. Preferably, in the preferred embodiment of the present invention, μ is 0.97. Wherein the pre-emphasis is used to enhance the high frequency part of the voiceprint.
The length of the frame passing through the preset voiceprint frame can be as follows: 0-0.5 times the frame length.
The windowing comprises multiplying the framed high frequency voiceprints in the framed high frequency voiceprint set by a window function of a hamming window to form the framed high frequency voiceprint component sequence set. Preferably, in the present invention, the window function of the hamming window is as follows:
where ω (N) represents the nth window function in the hamming window and N represents the window length.
Further, in a preferred embodiment of the present invention, the voiceprint library is divided into a text-dependent voiceprint set and a text-independent voiceprint set, and the voiceprint vector sequence set is correspondingly divided into a text-dependent voiceprint vector sequence set and a text-independent voiceprint vector sequence set.
And S3, compressing the text related voiceprint vector sequence set to obtain a voiceprint codebook set.
The compression process in the preferred embodiment of the present invention comprises: and converting the text-related voiceprint vector sequence set into a vector set, and mapping the vector set through vector quantization to obtain a plurality of discrete vectors so as to form the voiceprint codebook set. In detail, for each vector x of the space, the vector quantization is a mapping of said x into L discrete vectors yi(1. ltoreq. i. ltoreq.L). Wherein, yiCalled the code vector, and the set is called the codebook.
S4, based on the voiceprint voice content of the user received in the text-related voiceprint recognition scene, calculating the Euclidean distance between the voiceprint voice content and the voiceprint codebook, and recognizing the identity information of the user according to the Euclidean distance.
As described above, the text-related voiceprint recognition scenario may be a student identity recognition scenario entering and exiting a campus dormitory. Preferably, in the present invention, the above-mentioned preprocessing operation is performed on the voiceprint voice content of the receiving user to obtain a voiceprint vector sequence of the voiceprint voice content of the user, and an euclidean distance between the voiceprint vector sequence of the voiceprint voice content of the user and the voiceprint codebook set is calculated by using an euclidean distance formula. And when the Euclidean distance is greater than or equal to the preset threshold value, the authentication of the identity information of the user fails, and the user cannot enter or exit the campus dormitory. Preferably, in the present invention, the preset threshold is 0.2, and the euclidean distance calculation formula is:
wherein X represents the user's voiceprint speech content, Y represents a voiceprint codebook in a voiceprint codebook set, XiI-th voiceprint voice content, y, of a useriRepresenting the ith voiceprint codebook in the set of voiceprint codebooks.
And S5, extracting a Mel frequency inverse spectrum number set from the text-independent voiceprint vector sequence set.
In a preferred embodiment of the present invention, the Mel-scale frequency cepstral coefficients (MFCC) is a shape of a vocal tract describing a voice generated by a speaker.
Preferably, in the present invention, the steps of performing MFCC extraction include: carrying out Fourier transform on the text-independent voiceprint vector sequence set to obtain a frequency spectrum of the text-independent voiceprint vector sequence set, and calculating a power spectrum of the frequency spectrum; filtering the power spectrum with a triangular filter (Mel); and performing power conversion on the filtered power spectrum through discrete cosine transform to obtain the MFCC. Where the Mel filter bank is a set of non-linearly distributed filter banks, for example, applying a set of 128 filters to a frame can convert an 883-dimensional vector into a 128-dimensional vector.
Wherein the Fourier transform comprises:
wherein, Xi(k) A frequency spectrum representing a set of text-independent voiceprint vector sequences, x (N) representing an input voiceprint vector sequence, N representing the number of points of the fourier transform, e representing an infinite acyclic decimal.
The method of calculating the power spectrum of the frequency spectrum comprises:
wherein, Pi(k) Representing a power spectrum.
The discrete cosine transform comprises:
wherein, Ci(k) Represents MFCC, L represents the order of MFCC, and ranges from 12 to 16, preferably, the invention takes 14, Pi(k) The power spectrum is shown, M represents the number of sequences of the mel-frequency cepstrum, and M represents the number of triangular filters.
S6, based on the voiceprint voice content of the user received in the text-independent voiceprint recognition scene, extracting the Mel frequency cepstrum coefficient of the user according to the voiceprint voice content of the user, and recognizing the identity information of the user according to the Mel frequency cepstrum coefficient set.
In a preferred embodiment of the present invention, the text-independent recognition scenario includes: class check-in scenes, library borrowing scenes and the like. In the classroom attendance scene, a attendance two-dimensional code is preset in each seat in a classroom, students can log in through a campus applet or a mobile phone APP and then scan the code to perform voiceprint recording skip, and at the moment, personal information of the students establishes a two-dimensional stereo view, namely, corresponding persons at corresponding positions, which can be used for counting classroom full-seat rate, student non-arrival rate and the like; in the library borrows the scene, borrow the two-dimensional code at every books setting, equally, the student can log in through applet or cell-phone APP and sweep the sign indicating number to go into the voiceprint and type the discernment, confirm the person of affiliation of books according to the matching result, allocate the state of borrowing books under this classmate, and go out the alarm and forbid, then, the student can carry the books of borrowing and directly leave the library.
Preferably, in the present invention, the user is an S user, the MFCC of the S user is obtained according to the extraction of the MFCC in S5, the posterior probability of the MFCC of the S user is calculated through a gaussian mixture model, and the corresponding user with the highest posterior probability is used as a target user, where the posterior probability refers to the probability of sending a message known by a receiving end after the receiving end receives the message.
Further, the gaussian mixture model comprises:
wherein the content of the first and second substances,
represents the posterior probability of the user, T represents the sequence length of MFCC, M represents the number of components of the Gaussian mixture model, ωkThe value of the mixed weight of the Gaussian mixture model is in a range of 0-1, and preferably, in the invention, the value of M is 16, and omegakThe value is 0.7, mukThe value was 0.98.The invention also provides an identity recognition device. Fig. 2 is a schematic diagram of an internal structure of an identification apparatus according to an embodiment of the present invention.
In the present embodiment, the
The
The
The
Optionally, the
While fig. 2 only shows the
In the embodiment of the
step one, collecting a voiceprint sample set and establishing a voiceprint library, wherein the voiceprint library comprises a text-related voiceprint set and a text-unrelated voiceprint set.
In a preferred embodiment of the present invention, the voiceprint sample set can be a voiceprint set of a student at a school. Preferably, the voiceprint library is obtained by recording voiceprints of all students at school, and the recorded voiceprints of all students at school are stored in wav format.
Further, the preferred embodiment of the present invention divides the voiceprint library into a text-dependent voiceprint set and a text-independent voiceprint set. The text related voiceprint set is obtained by reading and recording the voice print set by the school students according to preset text contents, and is used for identity recognition in a text related voiceprint recognition scene. The voice print recognition scene related to the text can be a student identity recognition scene entering and exiting a campus dormitory, so that the preset text content can be XXX of a certain dormitory, and when a student enters and exits the campus dormitory, voice is input according to the text content recorded before, and then the student can effectively enter and exit the campus dormitory. The text-independent voiceprint set is obtained by inputting any voiceprint voice of the students at school, and the input is not required to be read according to preset text content.
And secondly, preprocessing the voiceprints in the voiceprint library to obtain a voiceprint vector sequence set, wherein the voiceprint vector sequence set comprises a text-related voiceprint vector sequence set and a text-unrelated voiceprint vector sequence set.
In the preferred embodiment of the present invention, the preprocessing operations include pre-emphasis, framing, windowing, and denoising. The pre-treatment operation comprises the following specific implementation steps: pre-emphasis is carried out on the voiceprints in the voiceprint library through a digital filter, and a high-frequency voiceprint set is obtained; performing framing processing on the high-frequency voiceprint set according to a preset voiceprint frame length to obtain a framed high-frequency voiceprint set; windowing the framing high-frequency voiceprint set by using a Hamming window to obtain a framing high-frequency voiceprint component sequence set, and denoising the voiceprint component sequence set by using a double-threshold end point to obtain the voiceprint vector sequence set.
Preferably, the digital filter of the present invention includes: h (z) ═ 1-. mu.z-1Wherein z is the value range of the voiceprint mu from 0.9 to 1.0. Preferably, in the preferred embodiment of the present invention, μ is 0.97. Wherein the pre-emphasis is used to enhance the high frequency part of the voiceprint.
The length of the frame passing through the preset voiceprint frame can be as follows: 0-0.5 times the frame length.
The windowing comprises multiplying the framed high frequency voiceprints in the framed high frequency voiceprint set by a window function of a hamming window to form the framed high frequency voiceprint component sequence set. Preferably, in the present invention, the window function of the hamming window is as follows:
where ω (N) represents the nth window function in the hamming window and N represents the window length.
Further, in a preferred embodiment of the present invention, the voiceprint library is divided into a text-dependent voiceprint set and a text-independent voiceprint set, and the voiceprint vector sequence set is correspondingly divided into a text-dependent voiceprint vector sequence set and a text-independent voiceprint vector sequence set.
And step three, compressing the text related voiceprint vector sequence set to obtain a voiceprint codebook set.
The compression process in the preferred embodiment of the present invention comprises: and converting the text-related voiceprint vector sequence set into a vector set, and mapping the vector set through vector quantization to obtain a plurality of discrete vectors so as to form the voiceprint codebook set. In detail, for each vector of spacex vector quantization is the mapping of said x into L discrete vectors yi(1. ltoreq. i. ltoreq.L). Wherein, yiCalled the code vector, and the set is called the codebook.
And fourthly, receiving the voiceprint voice content of the user in a text-related voiceprint recognition scene, calculating the Euclidean distance between the voiceprint voice content and the voiceprint codebook, and recognizing the identity information of the user according to the Euclidean distance.
As described above, the text-related voiceprint recognition scenario may be a student identity recognition scenario entering and exiting a campus dormitory. Preferably, in the present invention, the above-mentioned preprocessing operation is performed on the voiceprint voice content of the receiving user to obtain a voiceprint vector sequence of the voiceprint voice content of the user, and an euclidean distance between the voiceprint vector sequence of the voiceprint voice content of the user and the voiceprint codebook set is calculated by using an euclidean distance formula. And when the Euclidean distance is greater than or equal to the preset threshold value, the authentication of the identity information of the user fails, and the user cannot enter or exit the campus dormitory. Preferably, in the present invention, the preset threshold is 0.2, and the euclidean distance calculation formula is:
wherein X represents the user's voiceprint speech content, Y represents a voiceprint codebook in a voiceprint codebook set, XiI-th voiceprint voice content, y, of a useriRepresenting the ith voiceprint codebook in the set of voiceprint codebooks.
And step five, extracting a Mel frequency inverse spectrum number set from the text-independent voiceprint vector sequence set.
In a preferred embodiment of the present invention, the Mel-scale frequency cepstral coefficients (MFCC) is a shape of a vocal tract describing a voice generated by a speaker.
Preferably, in the present invention, the steps of performing MFCC extraction include: carrying out Fourier transform on the text-independent voiceprint vector sequence set to obtain a frequency spectrum of the text-independent voiceprint vector sequence set, and calculating a power spectrum of the frequency spectrum; filtering the power spectrum with a triangular filter (Mel); and performing power conversion on the filtered power spectrum through discrete cosine transform to obtain the MFCC. Where the Mel filter bank is a set of non-linearly distributed filter banks, for example, applying a set of 128 filters to a frame can convert an 883-dimensional vector into a 128-dimensional vector.
Wherein the Fourier transform comprises:
wherein, Xi(k) A frequency spectrum representing a set of text-independent voiceprint vector sequences, x (N) representing an input voiceprint vector sequence, N representing the number of points of the fourier transform, e representing an infinite acyclic decimal.
The method of calculating the power spectrum of the frequency spectrum comprises:
wherein, Pi(k) Representing a power spectrum.
The discrete cosine transform comprises:
wherein, Ci(k) Represents MFCC, L represents the order of MFCC, and ranges from 12 to 16, preferably, the invention takes 14, Pi(k) The power spectrum is shown, M represents the number of sequences of the mel-frequency cepstrum, and M represents the number of triangular filters.
And step six, receiving the voiceprint voice content of the user in the text-independent voiceprint recognition scene, extracting the Mel frequency cepstrum coefficient of the user according to the voiceprint voice content of the user, and recognizing the identity information of the user according to the Mel frequency cepstrum coefficient set.
In a preferred embodiment of the present invention, the text-independent recognition scenario includes: class check-in scenes, library borrowing scenes and the like. In the classroom attendance scene, a attendance two-dimensional code is preset in each seat in a classroom, students can log in through a campus applet or a mobile phone APP and then scan the code to perform voiceprint recording skip, and at the moment, personal information of the students establishes a two-dimensional stereo view, namely, corresponding persons at corresponding positions, which can be used for counting classroom full-seat rate, student non-arrival rate and the like; in the library borrows the scene, borrow the two-dimensional code at every books setting, equally, the student can log in through applet or cell-phone APP and sweep the sign indicating number to go into the voiceprint and type the discernment, confirm the person of affiliation of books according to the matching result, allocate the state of borrowing books under this classmate, and go out the alarm and forbid, then, the student can carry the books of borrowing and directly leave the library.
Preferably, in the present invention, the user is an S user, the MFCC of the S user is obtained according to the extraction of the MFCC in S5, the posterior probability of the MFCC of the S user is calculated through a gaussian mixture model, and the corresponding user with the highest posterior probability is used as a target user, where the posterior probability refers to the probability of sending a message known by a receiving end after the receiving end receives the message.
Further, the gaussian mixture model comprises:
wherein the content of the first and second substances,represents the posterior probability of the user, T represents the sequence length of MFCC, M represents the number of components of the Gaussian mixture model, ωkThe value of the mixed weight of the Gaussian mixture model is in a range of 0-1, and preferably, in the invention, the value of M is 16, and omegakThe value is 0.7, mukThe value was 0.98.
Alternatively, in other embodiments, the identification program may be divided into one or more modules, and the one or more modules are stored in the
For example, referring to fig. 3, a schematic diagram of program modules of an identification program in an embodiment of the identification apparatus of the present invention is shown, in this embodiment, the identification program may be divided into a
the
The
The
The functions or operation steps of the program modules such as the
Furthermore, an embodiment of the present invention further provides a computer-readable storage medium, where an identity recognition program is stored on the computer-readable storage medium, where the identity recognition program is executable by one or more processors to implement the following operations:
collecting a voiceprint sample set, and establishing a voiceprint library, wherein the voiceprint library comprises a text-related voiceprint set and a text-unrelated voiceprint set;
carrying out preprocessing operation on the voiceprints in the voiceprint library to obtain a voiceprint vector sequence set, wherein the voiceprint vector sequence set comprises a text-related voiceprint vector sequence set and a text-unrelated voiceprint vector sequence set;
compressing the text-related voiceprint vector sequence set to obtain a voiceprint codebook set;
based on receiving voiceprint voice content of a user in a text-related voiceprint recognition scene, calculating Euclidean distance between the voiceprint voice content and the voiceprint codebook set, and recognizing identity information of the user according to the Euclidean distance;
extracting a Mel frequency inverse spectrum number set from the text-independent voiceprint vector sequence set;
receiving voiceprint voice content of a user in a text-independent voiceprint recognition scene, extracting a Mel frequency cepstrum coefficient of the user according to the voiceprint voice content of the user, and recognizing identity information of the user according to the Mel frequency cepstrum coefficient set.
The embodiment of the computer readable storage medium of the present invention is substantially the same as the embodiments of the identity recognition apparatus and method, and will not be described herein again.
It should be noted that the above-mentioned numbers of the embodiments of the present invention are merely for description, and do not represent the merits of the embodiments. And the terms "comprises," "comprising," or any other variation thereof, are intended to cover a non-exclusive inclusion, such that a process, apparatus, article, or method that comprises a list of elements does not include only those elements but may include other elements not expressly listed or inherent to such process, apparatus, article, or method. Without further limitation, an element defined by the phrase "comprising an … …" does not exclude the presence of other like elements in a process, apparatus, article, or method that includes the element.
Through the above description of the embodiments, those skilled in the art will clearly understand that the method of the above embodiments can be implemented by software plus a necessary general hardware platform, and certainly can also be implemented by hardware, but in many cases, the former is a better implementation manner. Based on such understanding, the technical solution of the present invention may be embodied in the form of a software product, which is stored in a storage medium (e.g., ROM/RAM, magnetic disk, optical disk) as described above and includes instructions for enabling a terminal device (e.g., a mobile phone, a computer, a server, or a network device) to execute the method according to the embodiments of the present invention.
The above description is only a preferred embodiment of the present invention, and not intended to limit the scope of the present invention, and all modifications of equivalent structures and equivalent processes, which are made by using the contents of the present specification and the accompanying drawings, or directly or indirectly applied to other related technical fields, are included in the scope of the present invention.
- 上一篇:一种医用注射器针头装配设备
- 下一篇:一种基于自注意力和迁移学习的声纹识别方法