Identity recognition method and device and computer readable storage medium

文档序号:1393432 发布日期:2020-02-28 浏览:33次 中文

阅读说明:本技术 身份识别方法、装置及计算机可读存储介质 (Identity recognition method and device and computer readable storage medium ) 是由 冯惠华 于 2019-10-10 设计创作,主要内容包括:本发明涉及人工智能技术,揭露了一种身份识别方法,包括:收集声纹样本集,建立声纹库;对所述声纹样本集进行预处理操作,得到文本相关声纹向量序列集和文本无关声纹向量序列集;将所述文本相关声纹向量序列集进行压缩处理,得到声纹码本集,基于在文本相关声纹识别场景中接收用户的声纹语音内容接收用户的声纹语音内容,根据所述声纹语音内容与声纹码本集的欧式距离识别出所述用户的身份;对所述文本无关声纹向量序列集提取梅尔频率倒谱系数,基于在文本无关声纹识别场景中接收用户的声纹语音内容,根据所述梅尔频率倒谱系数检测出所述用户的身份。本发明还提出一种身份识别装置以及一种计算机可读存储介质。本发明实现了身份的精准识别。(The invention relates to an artificial intelligence technology, and discloses an identity identification method, which comprises the following steps: collecting a voiceprint sample set and establishing a voiceprint library; preprocessing the voiceprint sample set to obtain a text-related voiceprint vector sequence set and a text-unrelated voiceprint vector sequence set; compressing the text-related voiceprint vector sequence set to obtain a voiceprint codebook set, receiving the voiceprint voice content of a user based on the voiceprint voice content of the user received in a text-related voiceprint recognition scene, and recognizing the identity of the user according to the Euclidean distance between the voiceprint voice content and the voiceprint codebook set; and extracting a Mel frequency cepstrum coefficient from the text-independent voiceprint vector sequence set, receiving voiceprint voice content of a user in a text-independent voiceprint recognition scene, and detecting the identity of the user according to the Mel frequency cepstrum coefficient. The invention also provides an identity recognition device and a computer readable storage medium. The invention realizes accurate identification of the identity.)

1. An identity recognition method, the method comprising:

collecting a voiceprint sample set, and establishing a voiceprint library, wherein the voiceprint library comprises a text-related voiceprint set and a text-unrelated voiceprint set;

carrying out preprocessing operation on the voiceprints in the voiceprint library to obtain a voiceprint vector sequence set, wherein the voiceprint vector sequence set comprises a text-related voiceprint vector sequence set and a text-unrelated voiceprint vector sequence set;

compressing the text-related voiceprint vector sequence set to obtain a voiceprint codebook set;

based on receiving voiceprint voice content of a user in a text-related voiceprint recognition scene, calculating Euclidean distance between the voiceprint voice content and the voiceprint codebook set, and recognizing identity information of the user according to the Euclidean distance;

extracting a Mel frequency inverse spectrum number set from the text-independent voiceprint vector sequence set;

receiving voiceprint voice content of a user in a text-independent voiceprint recognition scene, extracting a Mel frequency cepstrum coefficient of the user according to the voiceprint voice content of the user, and recognizing identity information of the user according to the Mel frequency cepstrum coefficient set.

2. The identity recognition method of claim 1, wherein the pre-processing the voiceprints in the voiceprint library to obtain a voiceprint vector sequence set comprises:

pre-emphasis is carried out on the voiceprints in the voiceprint library through a digital filter, and a high-frequency voiceprint set is obtained;

performing framing processing on the high-frequency voiceprint set according to a preset voiceprint frame length to obtain a framed high-frequency voiceprint set;

windowing the framing high-frequency voiceprint set by using a Hamming window to obtain a framing high-frequency voiceprint component sequence set, and denoising the voiceprint component sequence set by using a double-threshold end point to obtain the voiceprint vector sequence set.

3. The identification method of claim 1 wherein the method of calculating the euclidean distance between the voiceprint speech content and the voiceprint codebook set comprises:

Figure FDA0002227889720000011

wherein X represents the user's voiceprint speech content, Y represents a voiceprint codebook in a voiceprint codebook set, XiI-th voiceprint content, y, representing a useriRepresenting the ith voiceprint codebook in the set of voiceprint codebooks.

4. The identification method according to any of claims 1 to 3, wherein said extracting Mel frequency cepstral coefficients for the set of text-independent voiceprint vector sequences comprises:

carrying out Fourier transform on the text-independent voiceprint vector sequence set to obtain a frequency spectrum of the text-independent voiceprint vector sequence set, and calculating a power spectrum of the frequency spectrum;

and filtering the power spectrum by using a triangular filter, and performing power conversion on the filtered power spectrum to obtain the Mel frequency cepstrum coefficient.

5. The method of claim 4, wherein the step of performing power conversion on the filtered power spectrum to obtain the mel-frequency cepstral coefficients comprises:

Figure FDA0002227889720000021

wherein, Ci(k) Representing the mel-frequency cepstrum coefficient, L representing the order of MFCC, Pi(k) The power spectrum is shown, M is the number of sequences of the mel-frequency cepstrum, and M is the number of triangular filters.

6. An identification device comprising a memory and a processor, the memory having stored thereon an identification program executable on the processor, the identification program when executed by the processor implementing the steps of:

collecting a voiceprint sample set, and establishing a voiceprint library, wherein the voiceprint library comprises a text-related voiceprint set and a text-unrelated voiceprint set;

carrying out preprocessing operation on the voiceprints in the voiceprint library to obtain a voiceprint vector sequence set, wherein the voiceprint vector sequence set comprises a text-related voiceprint vector sequence set and a text-unrelated voiceprint vector sequence set;

compressing the text-related voiceprint vector sequence set to obtain a voiceprint codebook set;

based on receiving voiceprint voice content of a user in a text-related voiceprint recognition scene, calculating Euclidean distance between the voiceprint voice content and the voiceprint codebook set, and recognizing identity information of the user according to the Euclidean distance;

extracting a Mel frequency inverse spectrum number set from the text-independent voiceprint vector sequence set;

receiving voiceprint voice content of a user in a text-independent voiceprint recognition scene, extracting a Mel frequency cepstrum coefficient of the user according to the voiceprint voice content of the user, and recognizing identity information of the user according to the Mel frequency cepstrum coefficient set.

7. The identification apparatus according to claim 6, wherein the pre-processing the voiceprint in the voiceprint library to obtain a voiceprint vector sequence set comprises:

pre-emphasis is carried out on the voiceprints in the voiceprint library through a digital filter, and a high-frequency voiceprint set is obtained;

performing framing processing on the high-frequency voiceprint set according to a preset voiceprint frame length to obtain a framed high-frequency voiceprint set;

windowing the framing high-frequency voiceprint set by using a Hamming window to obtain a framing high-frequency voiceprint component sequence set, and denoising the voiceprint component sequence set by using a double-threshold end point to obtain the voiceprint vector sequence set.

8. The identification apparatus of claim 6 wherein the method of calculating the euclidean distance of the voiceprint voice content from the voiceprint codebook set comprises:

Figure FDA0002227889720000031

wherein X represents the user's voiceprint speech content, Y represents a voiceprint codebook in a voiceprint codebook set, XiI-th voiceprint content, y, representing a useriPresentation soundThe ith voiceprint codebook in the set of voiceprint codebooks.

9. The identification apparatus according to any of claims 6 to 8 wherein said extracting mel-frequency cepstral coefficients for the set of text-independent voiceprint vector sequences comprises:

carrying out Fourier transform on the text-independent voiceprint vector sequence set to obtain a frequency spectrum of the text-independent voiceprint vector sequence set, and calculating a power spectrum of the frequency spectrum;

and filtering the power spectrum by using a triangular filter, and performing power conversion on the filtered power spectrum to obtain the Mel frequency cepstrum coefficient.

10. A computer-readable storage medium, having stored thereon an identification program executable by one or more processors to perform the steps of the identification method of any one of claims 1 to 5.

Technical Field

The invention relates to the technical field of artificial intelligence, in particular to an identity recognition method and device based on user behavior cooperation and a computer readable storage medium.

Background

Voiceprint recognition is also called speaker recognition, is a kind of biological recognition, and is a technology for automatically recognizing the identity of a speaker according to voiceprint parameters reflecting physiological and behavioral characteristics of the speaker in a voiceprint waveform. Each person, no matter how similar the simulated person's speech is, has a unique voiceprint, which is formed by each person's vocal organs during their growth. At present, as colleges and universities continue to expand, more and more students bring certain pressure to school management. In the scene of class attendance, the attendance is checked in manually by depending on the on-site roll call of a teacher, so that on one hand, inconvenience is brought to the teacher and attendance management is inconvenient, and on the other hand, manual attendance is also possible to be counterfeited, and the like; in the book borrowing scene, the original mode is to use the student card to register in the system, on one hand, the student card needs to be handled, and on the other hand, the book borrowing management also needs the auxiliary registration of a manager; in the scene of entrance guard management, originally, all brush student's card, the student often appears forgetting to take the card and the card is taken unchangeably in the package etc. the condition.

Disclosure of Invention

The invention provides an identity recognition method, an identity recognition device and a computer readable storage medium, and mainly aims to present an accurate identity recognition result to a user when the user performs identity recognition.

In order to achieve the above object, the present invention provides an identity recognition method, comprising:

collecting a voiceprint sample set, and establishing a voiceprint library, wherein the voiceprint library comprises a text-related voiceprint set and a text-unrelated voiceprint set;

carrying out preprocessing operation on the voiceprints in the voiceprint library to obtain a voiceprint vector sequence set, wherein the voiceprint vector sequence set comprises a text-related voiceprint vector sequence set and a text-unrelated voiceprint vector sequence set;

compressing the text-related voiceprint vector sequence set to obtain a voiceprint codebook set;

based on receiving voiceprint voice content of a user in a text-related voiceprint recognition scene, calculating Euclidean distance between the voiceprint voice content and the voiceprint codebook set, and recognizing identity information of the user according to the Euclidean distance;

extracting a Mel frequency inverse spectrum number set from the text-independent voiceprint vector sequence set;

receiving voiceprint voice content of a user in a text-independent voiceprint recognition scene, extracting a Mel frequency cepstrum coefficient of the user according to the voiceprint voice content of the user, and recognizing identity information of the user according to the Mel frequency cepstrum coefficient set.

Optionally, the preprocessing the voiceprint in the voiceprint library to obtain a voiceprint vector sequence set includes:

pre-emphasis is carried out on the voiceprints in the voiceprint library through a digital filter, and a high-frequency voiceprint set is obtained;

performing framing processing on the high-frequency voiceprint set according to a preset voiceprint frame length to obtain a framed high-frequency voiceprint set;

windowing the framing high-frequency voiceprint set by using a Hamming window to obtain a framing high-frequency voiceprint component sequence set, and denoising the voiceprint component sequence set by using a double-threshold end point to obtain the voiceprint vector sequence set.

Optionally, the method for calculating the euclidean distance between the voiceprint speech content and the voiceprint codebook set includes:

Figure BDA0002227889730000021

wherein X represents the user's voiceprint speech content, Y represents a voiceprint codebook in a voiceprint codebook set, XiI-th voiceprint content, y, representing a useriRepresenting the ith voiceprint codebook in the set of voiceprint codebooks.

Optionally, the extracting mel-frequency cepstral coefficients for the set of text-independent voiceprint vector sequences comprises:

carrying out Fourier transform on the text-independent voiceprint vector sequence set to obtain a frequency spectrum of the text-independent voiceprint vector sequence set, and calculating a power spectrum of the frequency spectrum;

and filtering the power spectrum by using a triangular filter, and performing power conversion on the filtered power spectrum to obtain the Mel frequency cepstrum coefficient.

Optionally, the method for obtaining the mel-frequency cepstrum coefficient after performing power conversion on the filtered power spectrum includes:

wherein, Ci(k) Representing the mel-frequency cepstrum coefficient, L representing the order of MFCC, Pi(k) The power spectrum is shown, M is the number of sequences of the mel-frequency cepstrum, and M is the number of triangular filters.

In addition, in order to achieve the above object, the present invention further provides an identification apparatus, which includes a memory and a processor, wherein the memory stores an identification program operable on the processor, and the identification program, when executed by the processor, implements the following steps:

collecting a voiceprint sample set, and establishing a voiceprint library, wherein the voiceprint library comprises a text-related voiceprint set and a text-unrelated voiceprint set;

carrying out preprocessing operation on the voiceprints in the voiceprint library to obtain a voiceprint vector sequence set, wherein the voiceprint vector sequence set comprises a text-related voiceprint vector sequence set and a text-unrelated voiceprint vector sequence set;

compressing the text-related voiceprint vector sequence set to obtain a voiceprint codebook set;

based on receiving voiceprint voice content of a user in a text-related voiceprint recognition scene, calculating Euclidean distance between the voiceprint voice content and the voiceprint codebook set, and recognizing identity information of the user according to the Euclidean distance;

extracting a Mel frequency inverse spectrum number set from the text-independent voiceprint vector sequence set;

receiving voiceprint voice content of a user in a text-independent voiceprint recognition scene, extracting a Mel frequency cepstrum coefficient of the user according to the voiceprint voice content of the user, and recognizing identity information of the user according to the Mel frequency cepstrum coefficient set.

Optionally, the preprocessing the voiceprint in the voiceprint library to obtain a voiceprint vector sequence set includes:

pre-emphasis is carried out on the voiceprints in the voiceprint library through a digital filter, and a high-frequency voiceprint set is obtained;

performing framing processing on the high-frequency voiceprint set according to a preset voiceprint frame length to obtain a framed high-frequency voiceprint set;

windowing the framing high-frequency voiceprint set by using a Hamming window to obtain a framing high-frequency voiceprint component sequence set, and denoising the voiceprint component sequence set by using a double-threshold end point to obtain the voiceprint vector sequence set.

Optionally, the method for calculating the euclidean distance between the voiceprint speech content and the voiceprint codebook set includes:

Figure BDA0002227889730000041

wherein X represents the user's voiceprint speech content, Y represents a voiceprint codebook in a voiceprint codebook set, XiI-th voiceprint content, y, representing a useriRepresenting the ith voiceprint codebook in the set of voiceprint codebooks.

Optionally, the extracting mel-frequency cepstral coefficients for the set of text-independent voiceprint vector sequences comprises:

carrying out Fourier transform on the text-independent voiceprint vector sequence set to obtain a frequency spectrum of the text-independent voiceprint vector sequence set, and calculating a power spectrum of the frequency spectrum;

and filtering the power spectrum by using a triangular filter, and performing power conversion on the filtered power spectrum to obtain the Mel frequency cepstrum coefficient.

In addition, to achieve the above object, the present invention also provides a computer readable storage medium, which stores an identification program, wherein the identification program can be executed by one or more processors to implement the steps of the identification method as described above.

According to the identity recognition method, the identity recognition device and the computer readable storage medium, when a user performs identity recognition, a voiceprint library is established for a collected voiceprint sample set, and the identities of the user in a text-related voiceprint recognition scene and a text-unrelated voiceprint recognition scene are obtained by combining Euclidean distance and Mel frequency cepstrum coefficient, so that an accurate identity recognition result can be presented to the user.

Drawings

Fig. 1 is a schematic flow chart of an identity recognition method according to an embodiment of the present invention;

fig. 2 is a schematic diagram of an internal structure of an identification apparatus according to an embodiment of the present invention;

fig. 3 is a schematic block diagram of an identity recognition program in an identity recognition device according to an embodiment of the present invention.

The implementation, functional features and advantages of the objects of the present invention will be further explained with reference to the accompanying drawings.

Detailed Description

It should be understood that the specific embodiments described herein are merely illustrative of the invention and are not intended to limit the invention.

The invention provides an identity recognition method. Fig. 1 is a schematic flow chart of an identity recognition method according to an embodiment of the present invention. The method may be performed by an apparatus, which may be implemented by software and/or hardware.

In this embodiment, the identity recognition method includes:

and S1, collecting a voiceprint sample set, and establishing a voiceprint library, wherein the voiceprint library comprises a text-related voiceprint set and a text-unrelated voiceprint set.

In a preferred embodiment of the present invention, the voiceprint sample set can be a voiceprint set of a student at a school. Preferably, the voiceprint library is obtained by recording voiceprints of all students at school, and the recorded voiceprints of all students at school are stored in wav format.

Further, the preferred embodiment of the present invention divides the voiceprint library into a text-dependent voiceprint set and a text-independent voiceprint set. The text related voiceprint set is obtained by reading and recording the voice print set by the school students according to preset text contents, and is used for identity recognition in a text related voiceprint recognition scene. The voice print recognition scene related to the text can be a student identity recognition scene entering and exiting a campus dormitory, so that the preset text content can be XXX of a certain dormitory, and when a student enters and exits the campus dormitory, voice is input according to the text content recorded before, and then the student can effectively enter and exit the campus dormitory. The text-independent voiceprint set is obtained by inputting any voiceprint voice of the students at school, and the input is not required to be read according to preset text content.

S2, preprocessing the voiceprints in the voiceprint library to obtain a voiceprint vector sequence set, wherein the voiceprint vector sequence set comprises a text-related voiceprint vector sequence set and a text-unrelated voiceprint vector sequence set.

In the preferred embodiment of the present invention, the preprocessing operations include pre-emphasis, framing, windowing, and denoising. The pre-treatment operation comprises the following specific implementation steps: pre-emphasis is carried out on the voiceprints in the voiceprint library through a digital filter, and a high-frequency voiceprint set is obtained; performing framing processing on the high-frequency voiceprint set according to a preset voiceprint frame length to obtain a framed high-frequency voiceprint set; windowing the framing high-frequency voiceprint set by using a Hamming window to obtain a framing high-frequency voiceprint component sequence set, and denoising the voiceprint component sequence set by using a double-threshold end point to obtain the voiceprint vector sequence set.

Preferably, the digital filter of the present invention includes: h (z) ═ 1-. mu.z-1Wherein z is the value range of the voiceprint mu from 0.9 to 1.0. Preferably, in the preferred embodiment of the present invention, μ is 0.97. Wherein the pre-emphasis is used to enhance the high frequency part of the voiceprint.

The length of the frame passing through the preset voiceprint frame can be as follows: 0-0.5 times the frame length.

The windowing comprises multiplying the framed high frequency voiceprints in the framed high frequency voiceprint set by a window function of a hamming window to form the framed high frequency voiceprint component sequence set. Preferably, in the present invention, the window function of the hamming window is as follows:

where ω (N) represents the nth window function in the hamming window and N represents the window length.

Further, in a preferred embodiment of the present invention, the voiceprint library is divided into a text-dependent voiceprint set and a text-independent voiceprint set, and the voiceprint vector sequence set is correspondingly divided into a text-dependent voiceprint vector sequence set and a text-independent voiceprint vector sequence set.

And S3, compressing the text related voiceprint vector sequence set to obtain a voiceprint codebook set.

The compression process in the preferred embodiment of the present invention comprises: and converting the text-related voiceprint vector sequence set into a vector set, and mapping the vector set through vector quantization to obtain a plurality of discrete vectors so as to form the voiceprint codebook set. In detail, for each vector x of the space, the vector quantization is a mapping of said x into L discrete vectors yi(1. ltoreq. i. ltoreq.L). Wherein, yiCalled the code vector, and the set is called the codebook.

S4, based on the voiceprint voice content of the user received in the text-related voiceprint recognition scene, calculating the Euclidean distance between the voiceprint voice content and the voiceprint codebook, and recognizing the identity information of the user according to the Euclidean distance.

As described above, the text-related voiceprint recognition scenario may be a student identity recognition scenario entering and exiting a campus dormitory. Preferably, in the present invention, the above-mentioned preprocessing operation is performed on the voiceprint voice content of the receiving user to obtain a voiceprint vector sequence of the voiceprint voice content of the user, and an euclidean distance between the voiceprint vector sequence of the voiceprint voice content of the user and the voiceprint codebook set is calculated by using an euclidean distance formula. And when the Euclidean distance is greater than or equal to the preset threshold value, the authentication of the identity information of the user fails, and the user cannot enter or exit the campus dormitory. Preferably, in the present invention, the preset threshold is 0.2, and the euclidean distance calculation formula is:

Figure BDA0002227889730000062

wherein X represents the user's voiceprint speech content, Y represents a voiceprint codebook in a voiceprint codebook set, XiI-th voiceprint voice content, y, of a useriRepresenting the ith voiceprint codebook in the set of voiceprint codebooks.

And S5, extracting a Mel frequency inverse spectrum number set from the text-independent voiceprint vector sequence set.

In a preferred embodiment of the present invention, the Mel-scale frequency cepstral coefficients (MFCC) is a shape of a vocal tract describing a voice generated by a speaker.

Preferably, in the present invention, the steps of performing MFCC extraction include: carrying out Fourier transform on the text-independent voiceprint vector sequence set to obtain a frequency spectrum of the text-independent voiceprint vector sequence set, and calculating a power spectrum of the frequency spectrum; filtering the power spectrum with a triangular filter (Mel); and performing power conversion on the filtered power spectrum through discrete cosine transform to obtain the MFCC. Where the Mel filter bank is a set of non-linearly distributed filter banks, for example, applying a set of 128 filters to a frame can convert an 883-dimensional vector into a 128-dimensional vector.

Wherein the Fourier transform comprises:

Figure BDA0002227889730000071

wherein, Xi(k) A frequency spectrum representing a set of text-independent voiceprint vector sequences, x (N) representing an input voiceprint vector sequence, N representing the number of points of the fourier transform, e representing an infinite acyclic decimal.

The method of calculating the power spectrum of the frequency spectrum comprises:

Figure BDA0002227889730000072

wherein, Pi(k) Representing a power spectrum.

The discrete cosine transform comprises:

Figure BDA0002227889730000073

wherein, Ci(k) Represents MFCC, L represents the order of MFCC, and ranges from 12 to 16, preferably, the invention takes 14, Pi(k) The power spectrum is shown, M represents the number of sequences of the mel-frequency cepstrum, and M represents the number of triangular filters.

S6, based on the voiceprint voice content of the user received in the text-independent voiceprint recognition scene, extracting the Mel frequency cepstrum coefficient of the user according to the voiceprint voice content of the user, and recognizing the identity information of the user according to the Mel frequency cepstrum coefficient set.

In a preferred embodiment of the present invention, the text-independent recognition scenario includes: class check-in scenes, library borrowing scenes and the like. In the classroom attendance scene, a attendance two-dimensional code is preset in each seat in a classroom, students can log in through a campus applet or a mobile phone APP and then scan the code to perform voiceprint recording skip, and at the moment, personal information of the students establishes a two-dimensional stereo view, namely, corresponding persons at corresponding positions, which can be used for counting classroom full-seat rate, student non-arrival rate and the like; in the library borrows the scene, borrow the two-dimensional code at every books setting, equally, the student can log in through applet or cell-phone APP and sweep the sign indicating number to go into the voiceprint and type the discernment, confirm the person of affiliation of books according to the matching result, allocate the state of borrowing books under this classmate, and go out the alarm and forbid, then, the student can carry the books of borrowing and directly leave the library.

Preferably, in the present invention, the user is an S user, the MFCC of the S user is obtained according to the extraction of the MFCC in S5, the posterior probability of the MFCC of the S user is calculated through a gaussian mixture model, and the corresponding user with the highest posterior probability is used as a target user, where the posterior probability refers to the probability of sending a message known by a receiving end after the receiving end receives the message.

Further, the gaussian mixture model comprises:

Figure BDA0002227889730000081

wherein the content of the first and second substances,

Figure BDA0002227889730000082

represents the posterior probability of the user, T represents the sequence length of MFCC, M represents the number of components of the Gaussian mixture model, ωkThe value of the mixed weight of the Gaussian mixture model is in a range of 0-1, and preferably, in the invention, the value of M is 16, and omegakThe value is 0.7, mukThe value was 0.98.

The invention also provides an identity recognition device. Fig. 2 is a schematic diagram of an internal structure of an identification apparatus according to an embodiment of the present invention.

In the present embodiment, the identification apparatus 1 may be a PC (Personal Computer), a terminal device such as a smart phone, a tablet Computer, or a mobile Computer, or may be a server. The identification device 1 comprises at least a memory 11, a processor 12, a communication bus 13, and a network interface 14.

The memory 11 includes at least one type of readable storage medium, which includes a flash memory, a hard disk, a multimedia card, a card type memory (e.g., SD or DX memory, etc.), a magnetic memory, a magnetic disk, an optical disk, and the like. The memory 11 may in some embodiments be an internal storage unit of the identification appliance 1, for example a hard disk of the identification appliance 1. The memory 11 may also be an external storage device of the identification apparatus 1 in other embodiments, such as a plug-in hard disk provided on the identification apparatus 1, a Smart Media Card (SMC), a Secure Digital (SD) Card, a Flash memory Card (Flash Card), and the like. Further, the memory 11 may also comprise both an internal memory unit and an external memory device of the identification apparatus 1. The memory 11 may be used not only to store application software installed in the identification apparatus 1 and various types of data, such as the code of the identification program 01, but also to temporarily store data that has been output or is to be output.

Processor 12, which in some embodiments may be a Central Processing Unit (CPU), controller, microcontroller, microprocessor or other data Processing chip, is configured to execute program code or process data stored in memory 11, such as executing id 01.

The communication bus 13 is used to realize connection communication between these components.

The network interface 14 may optionally include a standard wired interface, a wireless interface (e.g., WI-FI interface), typically used to establish a communication link between the apparatus 1 and other electronic devices.

Optionally, the apparatus 1 may further comprise a user interface, which may comprise a Display (Display), an input unit such as a Keyboard (Keyboard), and optionally a standard wired interface, a wireless interface. Alternatively, in some embodiments, the display may be an LED display, a liquid crystal display, a touch-sensitive liquid crystal display, an OLED (Organic Light-Emitting Diode) touch device, or the like. The display, which may also be referred to as a display screen or display unit, is suitable for displaying information processed in the identification device 1 and for displaying a visual user interface.

While fig. 2 only shows the identification appliance 1 with the components 11-14 and the identification program 01, it will be understood by those skilled in the art that the structure shown in fig. 1 does not constitute a limitation of the identification appliance 1 and may comprise fewer or more components than shown, or some components may be combined, or a different arrangement of components.

In the embodiment of the device 1 shown in fig. 2, the memory 11 stores an identification program 01; the processor 12, when executing the identification program 01 stored in the memory 11, implements the following steps:

step one, collecting a voiceprint sample set and establishing a voiceprint library, wherein the voiceprint library comprises a text-related voiceprint set and a text-unrelated voiceprint set.

In a preferred embodiment of the present invention, the voiceprint sample set can be a voiceprint set of a student at a school. Preferably, the voiceprint library is obtained by recording voiceprints of all students at school, and the recorded voiceprints of all students at school are stored in wav format.

Further, the preferred embodiment of the present invention divides the voiceprint library into a text-dependent voiceprint set and a text-independent voiceprint set. The text related voiceprint set is obtained by reading and recording the voice print set by the school students according to preset text contents, and is used for identity recognition in a text related voiceprint recognition scene. The voice print recognition scene related to the text can be a student identity recognition scene entering and exiting a campus dormitory, so that the preset text content can be XXX of a certain dormitory, and when a student enters and exits the campus dormitory, voice is input according to the text content recorded before, and then the student can effectively enter and exit the campus dormitory. The text-independent voiceprint set is obtained by inputting any voiceprint voice of the students at school, and the input is not required to be read according to preset text content.

And secondly, preprocessing the voiceprints in the voiceprint library to obtain a voiceprint vector sequence set, wherein the voiceprint vector sequence set comprises a text-related voiceprint vector sequence set and a text-unrelated voiceprint vector sequence set.

In the preferred embodiment of the present invention, the preprocessing operations include pre-emphasis, framing, windowing, and denoising. The pre-treatment operation comprises the following specific implementation steps: pre-emphasis is carried out on the voiceprints in the voiceprint library through a digital filter, and a high-frequency voiceprint set is obtained; performing framing processing on the high-frequency voiceprint set according to a preset voiceprint frame length to obtain a framed high-frequency voiceprint set; windowing the framing high-frequency voiceprint set by using a Hamming window to obtain a framing high-frequency voiceprint component sequence set, and denoising the voiceprint component sequence set by using a double-threshold end point to obtain the voiceprint vector sequence set.

Preferably, the digital filter of the present invention includes: h (z) ═ 1-. mu.z-1Wherein z is the value range of the voiceprint mu from 0.9 to 1.0. Preferably, in the preferred embodiment of the present invention, μ is 0.97. Wherein the pre-emphasis is used to enhance the high frequency part of the voiceprint.

The length of the frame passing through the preset voiceprint frame can be as follows: 0-0.5 times the frame length.

The windowing comprises multiplying the framed high frequency voiceprints in the framed high frequency voiceprint set by a window function of a hamming window to form the framed high frequency voiceprint component sequence set. Preferably, in the present invention, the window function of the hamming window is as follows:

where ω (N) represents the nth window function in the hamming window and N represents the window length.

Further, in a preferred embodiment of the present invention, the voiceprint library is divided into a text-dependent voiceprint set and a text-independent voiceprint set, and the voiceprint vector sequence set is correspondingly divided into a text-dependent voiceprint vector sequence set and a text-independent voiceprint vector sequence set.

And step three, compressing the text related voiceprint vector sequence set to obtain a voiceprint codebook set.

The compression process in the preferred embodiment of the present invention comprises: and converting the text-related voiceprint vector sequence set into a vector set, and mapping the vector set through vector quantization to obtain a plurality of discrete vectors so as to form the voiceprint codebook set. In detail, for each vector of spacex vector quantization is the mapping of said x into L discrete vectors yi(1. ltoreq. i. ltoreq.L). Wherein, yiCalled the code vector, and the set is called the codebook.

And fourthly, receiving the voiceprint voice content of the user in a text-related voiceprint recognition scene, calculating the Euclidean distance between the voiceprint voice content and the voiceprint codebook, and recognizing the identity information of the user according to the Euclidean distance.

As described above, the text-related voiceprint recognition scenario may be a student identity recognition scenario entering and exiting a campus dormitory. Preferably, in the present invention, the above-mentioned preprocessing operation is performed on the voiceprint voice content of the receiving user to obtain a voiceprint vector sequence of the voiceprint voice content of the user, and an euclidean distance between the voiceprint vector sequence of the voiceprint voice content of the user and the voiceprint codebook set is calculated by using an euclidean distance formula. And when the Euclidean distance is greater than or equal to the preset threshold value, the authentication of the identity information of the user fails, and the user cannot enter or exit the campus dormitory. Preferably, in the present invention, the preset threshold is 0.2, and the euclidean distance calculation formula is:

Figure BDA0002227889730000111

wherein X represents the user's voiceprint speech content, Y represents a voiceprint codebook in a voiceprint codebook set, XiI-th voiceprint voice content, y, of a useriRepresenting the ith voiceprint codebook in the set of voiceprint codebooks.

And step five, extracting a Mel frequency inverse spectrum number set from the text-independent voiceprint vector sequence set.

In a preferred embodiment of the present invention, the Mel-scale frequency cepstral coefficients (MFCC) is a shape of a vocal tract describing a voice generated by a speaker.

Preferably, in the present invention, the steps of performing MFCC extraction include: carrying out Fourier transform on the text-independent voiceprint vector sequence set to obtain a frequency spectrum of the text-independent voiceprint vector sequence set, and calculating a power spectrum of the frequency spectrum; filtering the power spectrum with a triangular filter (Mel); and performing power conversion on the filtered power spectrum through discrete cosine transform to obtain the MFCC. Where the Mel filter bank is a set of non-linearly distributed filter banks, for example, applying a set of 128 filters to a frame can convert an 883-dimensional vector into a 128-dimensional vector.

Wherein the Fourier transform comprises:

Figure BDA0002227889730000121

wherein, Xi(k) A frequency spectrum representing a set of text-independent voiceprint vector sequences, x (N) representing an input voiceprint vector sequence, N representing the number of points of the fourier transform, e representing an infinite acyclic decimal.

The method of calculating the power spectrum of the frequency spectrum comprises:

Figure BDA0002227889730000122

wherein, Pi(k) Representing a power spectrum.

The discrete cosine transform comprises:

wherein, Ci(k) Represents MFCC, L represents the order of MFCC, and ranges from 12 to 16, preferably, the invention takes 14, Pi(k) The power spectrum is shown, M represents the number of sequences of the mel-frequency cepstrum, and M represents the number of triangular filters.

And step six, receiving the voiceprint voice content of the user in the text-independent voiceprint recognition scene, extracting the Mel frequency cepstrum coefficient of the user according to the voiceprint voice content of the user, and recognizing the identity information of the user according to the Mel frequency cepstrum coefficient set.

In a preferred embodiment of the present invention, the text-independent recognition scenario includes: class check-in scenes, library borrowing scenes and the like. In the classroom attendance scene, a attendance two-dimensional code is preset in each seat in a classroom, students can log in through a campus applet or a mobile phone APP and then scan the code to perform voiceprint recording skip, and at the moment, personal information of the students establishes a two-dimensional stereo view, namely, corresponding persons at corresponding positions, which can be used for counting classroom full-seat rate, student non-arrival rate and the like; in the library borrows the scene, borrow the two-dimensional code at every books setting, equally, the student can log in through applet or cell-phone APP and sweep the sign indicating number to go into the voiceprint and type the discernment, confirm the person of affiliation of books according to the matching result, allocate the state of borrowing books under this classmate, and go out the alarm and forbid, then, the student can carry the books of borrowing and directly leave the library.

Preferably, in the present invention, the user is an S user, the MFCC of the S user is obtained according to the extraction of the MFCC in S5, the posterior probability of the MFCC of the S user is calculated through a gaussian mixture model, and the corresponding user with the highest posterior probability is used as a target user, where the posterior probability refers to the probability of sending a message known by a receiving end after the receiving end receives the message.

Further, the gaussian mixture model comprises:

Figure BDA0002227889730000131

wherein the content of the first and second substances,represents the posterior probability of the user, T represents the sequence length of MFCC, M represents the number of components of the Gaussian mixture model, ωkThe value of the mixed weight of the Gaussian mixture model is in a range of 0-1, and preferably, in the invention, the value of M is 16, and omegakThe value is 0.7, mukThe value was 0.98.

Alternatively, in other embodiments, the identification program may be divided into one or more modules, and the one or more modules are stored in the memory 11 and executed by one or more processors (in this embodiment, the processor 12) to implement the present invention.

For example, referring to fig. 3, a schematic diagram of program modules of an identification program in an embodiment of the identification apparatus of the present invention is shown, in this embodiment, the identification program may be divided into a voiceprint preprocessing module 10, a calculation recognition module 20, and an extraction recognition module 30, and exemplarily:

the voiceprint preprocessing module 10 is configured to: collecting a voiceprint sample set, and establishing a voiceprint library, wherein the voiceprint library comprises a text-related voiceprint set and a text-unrelated voiceprint set; and preprocessing the voiceprints in the voiceprint library to obtain a voiceprint vector sequence set, wherein the voiceprint vector sequence set comprises a text-related voiceprint vector sequence set and a text-unrelated voiceprint vector sequence set.

The calculation identification module 20 is configured to: compressing the text-related voiceprint vector sequence set to obtain a voiceprint codebook set; based on receiving voiceprint voice content of a user in a text-related voiceprint recognition scene, calculating Euclidean distance between the voiceprint voice content and the voiceprint codebook set, and recognizing identity information of the user according to the Euclidean distance.

The extraction identification module 30 is configured to: extracting a Mel frequency inverse spectrum number set from the text-independent voiceprint vector sequence set; receiving voiceprint voice content of a user in a text-independent voiceprint recognition scene, extracting a Mel frequency cepstrum coefficient of the user according to the voiceprint voice content of the user, and recognizing identity information of the user according to the Mel frequency cepstrum coefficient set.

The functions or operation steps of the program modules such as the voiceprint preprocessing module 10, the calculation recognition module 20, and the extraction recognition module 30 when executed are substantially the same as those of the above embodiments, and are not described herein again.

Furthermore, an embodiment of the present invention further provides a computer-readable storage medium, where an identity recognition program is stored on the computer-readable storage medium, where the identity recognition program is executable by one or more processors to implement the following operations:

collecting a voiceprint sample set, and establishing a voiceprint library, wherein the voiceprint library comprises a text-related voiceprint set and a text-unrelated voiceprint set;

carrying out preprocessing operation on the voiceprints in the voiceprint library to obtain a voiceprint vector sequence set, wherein the voiceprint vector sequence set comprises a text-related voiceprint vector sequence set and a text-unrelated voiceprint vector sequence set;

compressing the text-related voiceprint vector sequence set to obtain a voiceprint codebook set;

based on receiving voiceprint voice content of a user in a text-related voiceprint recognition scene, calculating Euclidean distance between the voiceprint voice content and the voiceprint codebook set, and recognizing identity information of the user according to the Euclidean distance;

extracting a Mel frequency inverse spectrum number set from the text-independent voiceprint vector sequence set;

receiving voiceprint voice content of a user in a text-independent voiceprint recognition scene, extracting a Mel frequency cepstrum coefficient of the user according to the voiceprint voice content of the user, and recognizing identity information of the user according to the Mel frequency cepstrum coefficient set.

The embodiment of the computer readable storage medium of the present invention is substantially the same as the embodiments of the identity recognition apparatus and method, and will not be described herein again.

It should be noted that the above-mentioned numbers of the embodiments of the present invention are merely for description, and do not represent the merits of the embodiments. And the terms "comprises," "comprising," or any other variation thereof, are intended to cover a non-exclusive inclusion, such that a process, apparatus, article, or method that comprises a list of elements does not include only those elements but may include other elements not expressly listed or inherent to such process, apparatus, article, or method. Without further limitation, an element defined by the phrase "comprising an … …" does not exclude the presence of other like elements in a process, apparatus, article, or method that includes the element.

Through the above description of the embodiments, those skilled in the art will clearly understand that the method of the above embodiments can be implemented by software plus a necessary general hardware platform, and certainly can also be implemented by hardware, but in many cases, the former is a better implementation manner. Based on such understanding, the technical solution of the present invention may be embodied in the form of a software product, which is stored in a storage medium (e.g., ROM/RAM, magnetic disk, optical disk) as described above and includes instructions for enabling a terminal device (e.g., a mobile phone, a computer, a server, or a network device) to execute the method according to the embodiments of the present invention.

The above description is only a preferred embodiment of the present invention, and not intended to limit the scope of the present invention, and all modifications of equivalent structures and equivalent processes, which are made by using the contents of the present specification and the accompanying drawings, or directly or indirectly applied to other related technical fields, are included in the scope of the present invention.

15页详细技术资料下载
上一篇:一种医用注射器针头装配设备
下一篇:一种基于自注意力和迁移学习的声纹识别方法

网友询问留言

已有0条留言

还没有人留言评论。精彩留言会获得点赞!

精彩留言,会给你点赞!