Speech recognition method based on isolated words and range hood applying same

文档序号:1650315 发布日期:2019-12-24 浏览:22次 中文

阅读说明:本技术 一种基于孤立词的语音识别方法及应用有该方法的吸油烟机 (Speech recognition method based on isolated words and range hood applying same ) 是由 杜杉杉 于 2018-05-28 设计创作,主要内容包括:本发明涉及一种基于孤立词的语音识别方法,包括以下步骤:对模板语音进行训练形成语音模板库;将待识别语音与语音模板库进行比较计算,实现待识别语音的识别。利用端点检测算法提取出模板语音、待识别语音中的各孤立词语音,计算获得各孤立词的特征参数。将模板语音中各不同孤立词的特征参数进行存储构成孤立词语音特征参数库。将各模板语音对应的特征参数向量进行存储,进而构成语音模板库。获取待识别语音对应的特征参数向量,判断待识别语音对应的特征参数向量是否存在于语音模板库中,如果是,则实现待识别语音的识别。该基于孤立词的语音识别方法能够大大减小数据处理量。应用了该孤立词的语音识别方法的吸油烟机,成本低且指令语音识别率高。(The invention relates to a speech recognition method based on isolated words, which comprises the following steps: training the template voice to form a voice template library; and comparing and calculating the voice to be recognized with the voice template library to realize the recognition of the voice to be recognized. And extracting the template voice and each isolated word voice in the voice to be recognized by using an endpoint detection algorithm, and calculating to obtain the characteristic parameters of each isolated word. And storing the characteristic parameters of different isolated words in the template voice to form an isolated word voice characteristic parameter library. And storing the characteristic parameter vectors corresponding to the template voices to further form a voice template library. And acquiring the characteristic parameter vector corresponding to the voice to be recognized, judging whether the characteristic parameter vector corresponding to the voice to be recognized exists in a voice template library, and if so, recognizing the voice to be recognized. The voice recognition method based on the isolated words can greatly reduce the data processing amount. The range hood using the voice recognition method of the isolated words has low cost and high instruction voice recognition rate.)

1. A speech recognition method based on isolated words comprises the following steps:

recording each template voice, and training to form a voice template library;

collecting a voice to be recognized;

comparing and calculating the voice to be recognized with a voice template library, and recognizing the template voice content corresponding to the voice to be recognized;

the method is characterized in that:

for the input template voice and the collected voice to be recognized, detecting and acquiring a starting point and an end point of each isolated word voice in the template voice and the voice to be recognized by using an end point detection algorithm, further extracting each isolated word voice in the template voice and the voice to be recognized, and performing calculation processing on each isolated word voice to further obtain characteristic parameters of each isolated word;

when training a voice template library, storing the characteristic parameters of different isolated words to further form an isolated word voice characteristic parameter library;

acquiring corresponding isolated word voice characteristic parameters aiming at each template voice, calculating characteristic parameter vectors forming the corresponding template voice, and storing the characteristic parameter vectors corresponding to each template voice to further form a voice template library;

the method comprises the steps of obtaining characteristic parameters of isolated word voices in voices to be recognized, comparing and calculating the characteristic parameters with the characteristic parameters of the isolated word voices in an isolated word voice characteristic parameter library, further obtaining isolated word voice characteristic parameters corresponding to the isolated word voices in the voices to be recognized in the isolated word voice characteristic parameter library, further obtaining characteristic parameter vectors corresponding to the voices to be recognized, judging whether the characteristic parameter vectors corresponding to the voices to be recognized exist in a voice template library or not, and if yes, recognizing the voices to be recognized.

2. The isolated word-based speech recognition method of claim 1, wherein: when a speech template base is trained, isolated word sound characteristic parameters corresponding to the current template speech are obtained, each isolated word sound characteristic parameter in the template speech is compared with isolated word sound characteristic parameters stored in the current isolated word sound characteristic parameter base for calculation, and if new isolated word sound characteristic parameters exist, the new isolated word sound characteristic parameters are stored in the isolated word sound characteristic parameter base.

3. The isolated word-based speech recognition method according to claim 1 or 2, wherein: and carrying out comparison calculation on the characteristic parameters of the isolated word sound through a DTW algorithm.

4. The isolated word-based speech recognition method according to claim 1 or 2, wherein: and respectively processing the extracted isolated word voices to obtain Mel frequency spectrums corresponding to the isolated word voices, further performing cepstrum analysis on the Mel frequency spectrums corresponding to the isolated word voices to obtain Mel frequency spectrum cepstrum coefficients MFCC corresponding to the isolated word voices, and taking the MFCC corresponding to the isolated word voices as characteristic parameters corresponding to the isolated words.

5. The isolated word-based speech recognition method according to claim 1 or 2, wherein: in the process of training a voice template library, judging whether a section of template voice is detected completely, and further respectively training each section of template voice;

and in the process of recognizing the speech to be recognized, judging that one section of speech to be recognized is detected, and further respectively recognizing each section of speech to be recognized.

6. The isolated word-based speech recognition method of claim 5, wherein: before calculating the characteristic parameter vector of a section of template voice or voice to be recognized, judging whether the calculation of the characteristic parameters of all isolated word voices in the section of template voice or the voice to be recognized is finished or not, and if so, calculating the characteristic parameter vector of the section of template voice or the voice to be recognized;

the method for judging whether the feature parameter calculation of all isolated word voices in a section of template voice or voice to be recognized is finished comprises the following steps:

detecting whether isolated word speech exists after the current isolated word speech in the template speech or the speech to be recognized;

if not, judging to finish the calculation of the characteristic parameters of all isolated word voices in a section of template voice or the voice to be recognized;

if yes, calculating the time interval t between the current isolated word voice and the next isolated word voice;

if T is less than or equal to the set time T, judging that the calculation of the characteristic parameters of all isolated word voices in a section of unfinished template voice or the voice to be recognized is not finished;

and if T is greater than the set time T, judging that the calculation of the characteristic parameters of all isolated word voices in a section of template voice or voice to be recognized is finished.

7. A range hood to which the speech recognition method based on isolated words as claimed in any one of claims 1 to 6 is applied, characterized in that: the range hood comprises a range hood body, a sound acquisition unit and a control circuit board, wherein the sound acquisition unit and the control circuit board are arranged on the range hood body, and the sound acquisition unit is electrically connected with the control circuit board.

Technical Field

The invention relates to the technical field of voice recognition, in particular to a voice recognition method based on isolated words and a range hood applying the voice recognition method.

Background

With the rapid development of voice recognition technology, more and more home appliances with voice control function are available. In addition, the dialects of different regions are numerous due to different use regions, and the Mandarin Chinese of local people is not standard. The existing voice chip often generates a fixed template library to recognize voice, which causes that the common voice recognition module has an unsatisfactory dialect recognition effect. In general, the semantic recognition template library also needs to be externally hung with a storage module due to large data volume, so that additional cost is increased.

Chinese patent application publication No. CN106997762A (application No. 201710134617.6) discloses a method and apparatus for voice control of a home appliance, wherein a target voice recognition engine is selected from a plurality of pre-trained voice recognition engines according to a user instruction, and the home appliance is voice controlled by the target voice. The method selects a target speech recognition engine capable of recognizing the language commonly used or used by the user from a plurality of pre-trained speech recognition engines according to the user requirements, and recognizes the speech input by the user through the target speech recognition engine to realize the speech control of the household appliance, so that the household appliance supports the speech control of different dialects, the user group using the household appliance is further expanded, and the stickiness between the household appliance and the user is improved. However, the method needs to be provided with a plurality of speech recognition engines before implementation, and the requirement for data storage is high, so that the cost of the household appliance is correspondingly increased. Moreover, the pronunciation of different people in the same area has increased difference, and the accuracy of voice recognition cannot be effectively guaranteed.

The invention discloses a local accent voice recognition system based on embedded mobile equipment, which is applied to Chinese invention patent application with publication number CN106971721A (application number 201710198053.2). The disclosed voice recognition system comprises a model training module, a feature extraction module and a mode matching module, when the system is used, the model training module is used for collecting and training local accent voice to obtain an entry model of the local accent, the feature extraction module is used for extracting voice features in the input local accent, and the mode matching module is used for performing voice matching calculation on the voice features according to the entry model to obtain a voice recognition result. The method can perform voice recognition on isolated words and connection words, and can also perform voice recognition on specific persons and non-specific persons. However, the scheme only discloses the implementation principle of the method, and a specific implementation scheme is not proposed. During specific operation, if the existing training, feature extraction and feature matching methods are adopted, the situation of large data processing capacity still exists, the cost is high, and the method is not suitable for being used in common household appliances which are used in large quantities.

Disclosure of Invention

The first technical problem to be solved by the present invention is to provide a speech recognition method based on isolated words, which can greatly reduce the data processing amount on the basis of considering the recognition accuracy.

The second technical problem to be solved by the present invention is to provide a range hood capable of implementing voice control at a lower cost in view of the above prior art.

The technical scheme adopted by the invention for solving the first technical problem is as follows: a speech recognition method based on isolated words comprises the following steps:

recording each template voice, and training to form a voice template library;

collecting a voice to be recognized;

comparing and calculating the voice to be recognized with a voice template library, and recognizing the template voice content corresponding to the voice to be recognized;

the method is characterized in that:

for the input template voice and the collected voice to be recognized, detecting and acquiring a starting point and an end point of each isolated word voice in the template voice and the voice to be recognized by using an end point detection algorithm, further extracting each isolated word voice in the template voice and the voice to be recognized, and performing calculation processing on each isolated word voice to further obtain characteristic parameters of each isolated word;

when training a voice template library, storing the characteristic parameters of different isolated words to further form an isolated word voice characteristic parameter library;

acquiring corresponding isolated word voice characteristic parameters aiming at each template voice, calculating characteristic parameter vectors forming the corresponding template voice, and storing the characteristic parameter vectors corresponding to each template voice to further form a voice template library;

the method comprises the steps of obtaining characteristic parameters of isolated word voices in voices to be recognized, comparing and calculating the characteristic parameters with the characteristic parameters of the isolated word voices in an isolated word voice characteristic parameter library, further obtaining isolated word voice characteristic parameters corresponding to the isolated word voices in the voices to be recognized in the isolated word voice characteristic parameter library, further obtaining characteristic parameter vectors corresponding to the voices to be recognized, judging whether the characteristic parameter vectors corresponding to the voices to be recognized exist in a voice template library or not, and if yes, recognizing the voices to be recognized.

In order to reduce data storage, when a voice template base is trained, isolated word sound characteristic parameters corresponding to the current template voice are obtained, each isolated word sound characteristic parameter in the template voice is compared with isolated word sound characteristic parameters stored in the current isolated word sound characteristic parameter base for calculation, and if new isolated word sound characteristic parameters exist, the new isolated word sound characteristic parameters are stored in the isolated word sound characteristic parameter base.

Preferably, the comparison calculation of the isolated word tone feature parameters is performed by a DTW algorithm.

Preferably, the extracted isolated word voices are respectively processed to obtain Mel frequency spectrums corresponding to the isolated word voices, cepstrum analysis is further performed on the Mel frequency spectrums corresponding to the isolated word voices to obtain Mel frequency spectrum cepstrum coefficients MFCC corresponding to the isolated word voices, and the MFCC corresponding to the isolated word voices are used as characteristic parameters corresponding to the isolated words.

In order to improve the accuracy of recognition, in the training process of the voice template library, whether a section of template voice is detected is judged, and then training is respectively carried out on each section of template voice;

and in the process of recognizing the speech to be recognized, judging that one section of speech to be recognized is detected, and further respectively recognizing each section of speech to be recognized.

Simply, before calculating the feature parameter vector of a segment of template voice or voice to be recognized, whether the calculation of the feature parameters of all isolated word voices in the segment of template voice or the voice to be recognized is completed or not needs to be judged, and if so, the calculation of the feature parameter vector of the segment of template voice or the voice to be recognized is performed;

the method for judging whether the feature parameter calculation of all isolated word voices in a section of template voice or voice to be recognized is finished comprises the following steps:

detecting whether isolated word speech exists after the current isolated word speech in the template speech or the speech to be recognized;

if not, judging to finish the calculation of the characteristic parameters of all isolated word voices in a section of template voice or the voice to be recognized;

if yes, calculating the time interval t between the current isolated word voice and the next isolated word voice;

if T is less than or equal to the set time T, judging that the calculation of the characteristic parameters of all isolated word voices in a section of unfinished template voice or the voice to be recognized is not finished;

and if T is greater than the set time T, judging that the calculation of the characteristic parameters of all isolated word voices in a section of template voice or voice to be recognized is finished.

The technical scheme adopted by the invention for solving the second technical problem is as follows: a range hood applied with a speech recognition method based on isolated words is characterized in that: the range hood comprises a range hood body, a sound acquisition unit and a control circuit board, wherein the sound acquisition unit and the control circuit board are arranged on the range hood body, and the sound acquisition unit is electrically connected with the control circuit board.

Compared with the prior art, the invention has the advantages that: the speech recognition method based on the isolated words stores the characteristic parameters corresponding to the speech of each isolated word to form an isolated word sound characteristic parameter library, and then uses vector data formed by the characteristic parameters of the isolated word sound corresponding to each template speech as a speech template library. Therefore, all feature data corresponding to each template voice do not need to be stored, data storage capacity is greatly reduced, requirements for data storage are low, correspondingly used hardware cost is low, the isolated word-based voice recognition method is convenient to widely apply, the isolated word-based voice recognition method is particularly suitable for being applied to household appliances which realize instruction voice and the like and have small isolated word number in each voice, voice recognition rate is high, use cost is low, and mass production is facilitated. The range hood applying the speech recognition method based on the isolated words can train template speech of various speeches of the family members, can realize effective recognition and has strong speech recognition capability. When a user cooks, the control of the range hood by hands is not needed, the control is carried out through voice, the operation is convenient, and the cooker hood is sanitary and practical.

Drawings

FIG. 1 is a flow chart of a method for training template speech in a speech template library according to an embodiment of the present invention;

FIG. 2 is a flowchart illustrating a method for recognizing a speech to be recognized according to an embodiment of the present invention.

Detailed Description

The invention is described in further detail below with reference to the accompanying examples.

The speech recognition method based on isolated words in the embodiment integrally comprises the following steps: recording each template voice, and training to form a voice template library;

collecting a voice to be recognized;

and comparing and calculating the voice to be recognized with the voice template library, and recognizing the template voice content corresponding to the voice to be recognized.

The method is suitable for various application occasions, but the method is more suitable for the application of instruction type voice recognition occasions with less isolated word quantity in voice. For example, when the method is applied to various household electrical appliances, the speech to be recognized usually has the characteristics of "turning on", "turning off", "raising", "lowering", and the like, wherein the number of related isolated words is small, and the recognition accuracy is high. The embodiment takes the use of the speech recognition method based on isolated words in the range hood as an example for explanation. The range hood comprises a range hood body, a sound acquisition unit and a control circuit board, wherein the sound acquisition unit and the control circuit board are arranged on the range hood body, and the sound acquisition unit is electrically connected with the control circuit board. The voice acquisition unit is used for acquiring the recorded template voice and the voice to be recognized, and the training of a voice template library and the work of recognizing the voice to be recognized are realized in a controller of the control circuit board. When the range hood is used, the command voice input by the user is the voice to be recognized, and after the recognition work is completed, the control circuit board controls the corresponding components on the range hood to perform corresponding response operation.

The specific implementation method of the speech recognition method based on the isolated words in the range hood is as follows.

As shown in fig. 1, the method for training template voices in a voice template library includes the following steps:

s1, after the training mode is started, the range hood can prompt a user to enter a corresponding voice command, for example, the range hood prompts the user to enter command voices of opening the range hood, closing the range hood and the like, the user correspondingly enters the corresponding command voice to be used as template voice of the corresponding command, and the entered template voice can be any type of voice, such as Mandarin or dialect and the like;

s2, the controller starts to search the starting point and the end point of each isolated word voice in the template voice by using an end point detection algorithm; if the cigarette machine is turned on, 4 isolated word voices are contained, and if the cigarette machine is turned off, 4 isolated word voices are also contained;

s3, extracting detected isolated word voices according to the time sequence of the template voices, respectively performing mathematical transformation processing on the extracted current isolated word voices to obtain Mel frequency spectrums corresponding to the isolated word voices, further performing cepstrum analysis on the Mel frequency spectrums corresponding to the isolated word voices to obtain Mel frequency spectrum cepstrum coefficients MFCC corresponding to the isolated word voices, and taking the MFCC corresponding to the isolated word voices as characteristic parameters A corresponding to the isolated wordsiI is a natural number, i represents a reference number of the current isolated word voice, AiThe system comprises a plurality of parameter data;

s4, carrying out the current isolated word pronunciation characteristic parameter A through the DTW algorithmiWith the place in the current isolated word sound characteristic parameter libraryComparing and calculating some isolated word sound characteristic parameters, certainly not storing any isolated word sound characteristic parameter in the isolated word sound characteristic parameter library in the initial state, and if the isolated word sound characteristic parameter A does not exist in the current isolated word sound characteristic parameter libraryiThen the speech characteristic parameter A of the isolated word is usediStoring the characteristic parameters into an isolated word sound characteristic parameter library, and recording the corresponding characteristic parameters A of the isolated word sound in the isolated word sound characteristic parameter libraryiThe reference number of (a); if the speech characteristic parameter A of the isolated word exists in the current isolated word speech characteristic parameter libraryiThen recording the corresponding characteristic parameter A of the isolated word speech in the isolated word speech characteristic parameter librarybB ∈ i;

for example, after the characteristic parameters of the isolated word voices of 'opening', 'smoke', 'machine' are stored in the isolated word voice characteristic parameter library, the characteristic parameters of the isolated word voices corresponding to 'closing' or 'closing' in the template voice of the 'cigarette closing machine' are still stored in the isolated word voice characteristic parameter library, and the characteristic parameters of the isolated word voices corresponding to 'smoke' or 'machine' in the template voice of the 'cigarette closing machine' do not need to be stored in the isolated word voice characteristic parameter library again;

s5, detecting whether isolated word voice exists after the current isolated word voice;

if not, judging that the calculation of the characteristic parameters of all the isolated word voices in the template voice is finished, and carrying out S6;

if yes, calculating the time interval t between the current isolated word voice and the next isolated word voice;

if T is less than or equal to the set time T, judging that the calculation of the characteristic parameters of all the isolated word voices in the template voice is not finished, and returning to S3;

if T is greater than the set time T, judging that the calculation of the characteristic parameters of all the isolated word voices in the template voice is finished, and performing S6;

s6, obtaining the characteristic parameter vector B corresponding to the template voicem,Bm=[As1,As2,As3,……,Asi,……,Asn]M, si and sn are natural numbers, m represents the label of the current template voice, sn represents the number of isolated word voices in the mth template voice, s1 is more than or equal to si and less than or equal to sn, and si belongs to i; wherein the feature parameter vector B corresponding to the template speechmThe labels of the characteristic parameters of the isolated words in the Chinese language can be repeated;

for example: the characteristic parameter corresponding to the 'typing' isolated word is A1The characteristic parameter corresponding to the isolated word speech is A2The characteristic parameter corresponding to the isolated word speech of 'smoke' is A3The characteristic parameter corresponding to the isolated word speech of the 'machine' is A4The characteristic parameter corresponding to the isolated word speech is A5The characteristic parameter corresponding to the isolated word speech is A6If the characteristic parameter vector corresponding to the template voice of the cigarette machine is opened, the characteristic parameter vector can be recorded as B1=[A1,A2,A3,A4]The characteristic parameter vector corresponding to the template voice of the cigarette machine closing machine can be recorded as B1=[A5,A6,A3,A4]In the feature parameter vector BmWherein only each characteristic parameter A contained in the memoryiThe marking data itself does not store the characteristic parameters AiThe parameter data contained in the method can greatly reduce the storage capacity of the data, correspondingly reduce the product cost and facilitate the universal use of the voice recognition method;

s7, judging the characteristic parameter vector B corresponding to the template voicemWhether the template exists in the current voice template library or not, if not, the characteristic parameter vector B corresponding to the template voice of the section is used for judging whether the template voice exists in the currentmStoring the voice template library;

and S8, circularly carrying out S1 to S7, and further completing the training of the isolated word sound characteristic parameter library and the training of each template sound.

Because the people who use household electrical appliances are relatively fixed, different people can type in individual template pronunciation respectively, accomplish the training of each specific person's template pronunciation to make things convenient for each specific person to carry out the speech control operation.

As shown in fig. 2, after completing the training of the isolated word and speech feature parameter library and the speech template library, when the range hood is used, an instruction speech is sent to the range hood, the controller in the control circuit board recognizes the instruction speech as the speech to be recognized, and the method for recognizing the speech to be recognized includes the following steps:

s10, collecting the voice to be recognized sent by the user by the voice collecting unit in the range hood, if the user can send the voice to be recognized of opening the range hood, the voice to be recognized adopts the same type of voice as the voice to be recognized in the template voice training, if the template voice recorded in the template voice training is Mandarin, the voice to be recognized also adopts Mandarin. When template voice recorded during template voice training is dialect, the dialect is also adopted by the voice to be recognized;

s20, the controller starts to search the starting point and the end point of each isolated word voice in the voice to be recognized by using an end point detection algorithm; if the cigarette machine is opened, 4 isolated word voices are contained;

s30, extracting detected isolated word voices according to the time sequence of the voices to be recognized, performing mathematical transformation processing on the extracted current isolated word voices respectively to obtain Mel frequency spectrums corresponding to the isolated word voices, performing cepstrum analysis on the Mel frequency spectrums corresponding to the isolated word voices to obtain Mel frequency spectrum cepstrum coefficients MFCC corresponding to the isolated word voices, and taking the MFCC corresponding to the isolated word voices as characteristic parameters C corresponding to the isolated wordskK is a natural number, k represents a reference number of a current isolated word voice, CkThe system comprises a plurality of parameter data;

s40, carrying out the current isolated word pronunciation characteristic parameter C through the DTW algorithmkComparing and calculating with all isolated word sound characteristic parameters in the isolated word sound characteristic parameter library, and judging the isolated word sound characteristic parameter A corresponding to the isolated word sound in the isolated word sound characteristic parameter libraryaiAi belongs to i, and records the characteristic parameter A of the isolated word sound corresponding to the isolated word soundaiThe reference number of (a);

s50, detecting whether isolated word voice exists after the current isolated word voice;

if not, finishing the judgment of the corresponding isolated word sound characteristic parameters of all the isolated word sounds in the to-be-recognized speech in the isolated word sound characteristic parameter library, and performing S60;

if yes, calculating the time interval t0 between the current isolated word speech and the next isolated word speech;

if T0 is less than or equal to the set time T, judging the corresponding isolated word sound characteristic parameters of all the isolated word sounds in the to-be-recognized speech in the isolated word sound characteristic parameter library, and returning to S30;

if T0 is greater than the set time T, finishing the judgment of the corresponding isolated word sound characteristic parameters of all the isolated word sounds in the to-be-recognized speech in the isolated word sound characteristic parameter library, and carrying out S60;

s60, obtaining the characteristic parameter vector D corresponding to the speech to be recognized, D ═ Aa1,Aa2,Aa3,……,Aai,……,Aal]Al is a natural number, the al represents the number of isolated word voices in the voice to be recognized, and a1 is more than or equal to ai is less than or equal to al; the labels of the isolated word speech characteristic parameters in the characteristic parameter vector D corresponding to the speech to be recognized can be repeated;

s70, the feature parameter vector D corresponding to the speech to be recognized and the feature parameter vector B corresponding to each template speech in the speech template librarymPerforming comparison calculation, and judging whether the characteristic parameter vector D corresponding to the section of speech to be recognized exists in a speech template library; if the template voice is recognized to be the voice to be recognized, the recognition of the voice to be recognized is realized according to the content of the corresponding template voice, and then corresponding components in the range hood are controlled to work; if not, judging the voice recognition system to be an invalid instruction, and waiting for the acquisition of the next segment of voice to be recognized.

11页详细技术资料下载
上一篇:一种医用注射器针头装配设备
下一篇:一种基于混合信号域的MFCC特征提取方法及装置

网友询问留言

已有0条留言

还没有人留言评论。精彩留言会获得点赞!

精彩留言,会给你点赞!