Speech recognition robustness enhancement method based on fusion of multi-band speech signal features

文档序号:344499 发布日期:2021-12-03 浏览:7次 中文

阅读说明:本技术 基于融合多频带语音信号特征的语音识别鲁棒性增强方法 (Speech recognition robustness enhancement method based on fusion of multi-band speech signal features ) 是由 曹芬 冯煊 李永龙 肖剑书 童世奇 秦少明 于 2021-09-07 设计创作,主要内容包括:本发明公开了基于融合多频带语音信号特征的语音识别鲁棒性增强方法,具体包括以下步骤:S1、由语音输入模块输入信息,通过特征选取模块进行特征选取,然后通过语音信号特征提取单元对不同频带语音信号PNCC进行特征提取,进一步通过语音多频带特征融合单元将每个频带的语音信号PNCC特征乘以一个权重系数后进行语音信号PNCC特征融合,本发明涉及电网技术领域。该基于融合多频带语音信号特征的语音识别鲁棒性增强方法,可使得电网调度控制中心只需耗费调度员较少的时间和精力,在严重电网异常发生时,也不需要多级调度协同配合,在指挥多个单位统一处置异常的场景下,不会发生严重的信息阻塞,电网异常处置过程记录可实现自动记录。(The invention discloses a speech recognition robustness enhancing method based on fusion of multiband speech signal characteristics, which specifically comprises the following steps: s1, inputting information through a voice input module, performing feature selection through a feature selection module, performing feature extraction on voice signals PNCC of different frequency bands through a voice signal feature extraction unit, and performing voice signal PNCC feature fusion after multiplying the voice signal PNCC feature of each frequency band by a weight coefficient through a voice multiband feature fusion unit. According to the voice recognition robustness enhancing method based on the fusion of the multiband voice signal characteristics, the power grid dispatching control center only needs to consume less time and energy of a dispatcher, when serious power grid abnormity occurs, multi-level dispatching cooperative matching is not needed, under the scene that a plurality of units are commanded to uniformly handle abnormity, serious information blocking cannot occur, and automatic recording can be achieved in the process of handling the power grid abnormity.)

1. A speech recognition robustness enhancing method based on fusion of multiband speech signal features is characterized in that: the method specifically comprises the following steps:

s1, inputting information by a voice input module, performing feature selection by a feature selection module, performing feature extraction on voice signals PNCC of different frequency bands by a voice signal feature extraction unit, and performing voice signal PNCC feature fusion after multiplying the voice signal PNCC feature of each frequency band by a weight coefficient by a voice multiband feature fusion unit;

s2, after the voice signal PNCC characteristic fusion is carried out by the voice multiband characteristic fusion unit, model generation and training are carried out by the voice recognition acoustic model training unit;

s3, improving the robustness of the voice recognition model through a voice signal z-score standardization unit;

s4, constructing a Vector Space Model (VSM): utilizing a power grid fault dictionary to perform word segmentation on the power grid fault cases, counting word frequency, and combining a power grid fault entity weight table to obtain a power grid fault case characteristic vector;

s5, calculating the weight TF-IDF of each term in the vector;

s6, calculating cosine similarity;

and S7, similarity calculation is carried out according to the calculated values, and the larger the value is, the higher the information similarity of the characteristic vectors of the two power grid fault cases is, the more possibility of the phenomenon of duplication or multiple names is.

2. The method of claim 1 for speech recognition robustness enhancement based on fused multi-band speech signal features, wherein: the speech recognition acoustic model training unit in step S2 includes a training sample acquisition module, an original acoustic model acquisition module, an acoustic feature determination module, a state description model generation module, and an acoustic model generation module, and determines an acoustic state in the original acoustic model corresponding to each training text after the original acoustic model is acquired by the original acoustic model acquisition module.

3. The method of claim 2 for enhancing robustness of speech recognition based on fusing multiband speech signal features, wherein: and then determining the acoustic characteristics corresponding to each acoustic state according to the acoustic state and the acoustic characteristics corresponding to each training text in the acoustic characteristic determination module, and retraining by using the acoustic characteristics corresponding to each acoustic state through the state description model generation module to obtain the state description model of the acoustic state.

4. The method of claim 2 for enhancing robustness of speech recognition based on fusing multiband speech signal features, wherein: and finally, updating the original state description model in the original acoustic model by using the state model obtained by retraining through an acoustic model generation module to obtain the updated acoustic model.

5. The method of claim 1 for speech recognition robustness enhancement based on fused multi-band speech signal features, wherein: the speech signal z-score normalization unit in step S3 includes a low frequency signal enhancement module, a high frequency signal suppression module, and a band speech signal weight coefficient adjustment module, and enhances the characteristics of the low frequency signal (human speech signal) according to the low frequency signal enhancement module, and adjusts the weight coefficients of the speech signals in different frequency bands according to the principle that the high frequency signal suppression module suppresses the characteristics of the high frequency signal (non-human speech signal), thereby improving the robustness of the speech recognition model.

6. The method of claim 1 for speech recognition robustness enhancement based on fused multi-band speech signal features, wherein: in the step S4, the grid fault case feature vector is composed of n weights, each word has a weight, and different words affect the importance of the document relevance according to the weight of the word in the document.

7. The method of claim 1 for speech recognition robustness enhancement based on fused multi-band speech signal features, wherein: TF-IDF in said step S2 denotes a product of TF (word frequency) and IDF (inverse document frequency): wherein: TF-IDF is the word frequency (TF) x Inverse Document Frequency (IDF).

Technical Field

The invention relates to the technical field of power grids, in particular to a voice recognition robustness enhancing method based on fusion of multiband voice signal characteristics.

Background

With the rapid development of the extra-high voltage alternating current-direct current synchronous power grid, the large-scale centralized operation of new energy, the deep promotion of the reformation of the electric power system and the abnormal cooperative disposal work of the power grid, the extra-high voltage alternating current-direct current synchronous power grid face the challenges. In order to further improve the emergency disposal level of the power grid abnormality, measures of regulation and control integration and resource allocation globalization must be taken to deal with the emergency disposal level, especially under the condition of the power grid abnormality, common early warning, cooperative pre-control and overall disposal must be achieved, and very high requirements are provided for the cooperative work level of mechanisms at all levels. The power grid regulation and control centers of all levels of the power system are organizations, commands, guides and coordination mechanisms of power grid operation, and dispatchers of the regulation and control centers are used as direct commanders of the power grid operation and stick to the first line of the power grid operation work. With the continuous expansion of national networking scale and the continuous improvement of voltage grade, the power grid comprehensively surpasses the traditional ultrahigh voltage alternating current power grid in the aspects of equipment quantity, interconnection mode, coupling characteristic, complexity and the like, the problem that the power grid is influenced by natural, artificial and internal factors is more prominent, and the workload, complexity and working pressure of a dispatcher for dispatching the power grid are increased day by day. Therefore, the electric power department utilizes the artificial intelligence technology, the dispatcher can perform language interaction with the machine on the dispatching operation work, the regular and circulated work can be processed under the supervision and the command of the dispatcher, the work with strong repeatability is realized, the machine can replace more people to work, the intellectualization level of the dispatching operation is improved, the work efficiency is improved, the human error is eliminated, and the safe and stable operation of the power grid is ensured.

The existing power grid dispatching control center needs to consume a large amount of time and energy of a dispatcher, when serious power grid abnormality occurs, multi-level dispatching cooperation is needed to command a plurality of units to uniformly handle the abnormality, if telephone busy causes serious information blockage, and the record of the power grid abnormality handling process needs to be manually recorded.

Disclosure of Invention

Aiming at the defects of the prior art, the invention provides a voice recognition robustness enhancing method based on fusion of multiband voice signal characteristics, and solves the problems that the existing power grid dispatching control center needs to consume a great amount of time and energy of a dispatcher, needs multi-stage dispatching cooperative coordination when serious power grid abnormity occurs, and directs a plurality of units to uniformly handle abnormity under the scene, serious information blockage can occur if a telephone is busy, and manual recording is needed in the power grid abnormity handling process.

In order to achieve the purpose, the invention is realized by the following technical scheme: the method for enhancing the robustness of voice recognition based on the fusion of the characteristics of the multiband voice signals specifically comprises the following steps:

s1, inputting information by a voice input module, performing feature selection by a feature selection module, performing feature extraction on voice signals PNCC of different frequency bands by a voice signal feature extraction unit, and performing voice signal PNCC feature fusion after multiplying the voice signal PNCC feature of each frequency band by a weight coefficient by a voice multiband feature fusion unit;

s2, after the voice signal PNCC characteristic fusion is carried out by the voice multiband characteristic fusion unit, model generation and training are carried out by the voice recognition acoustic model training unit;

s3, improving the robustness of the voice recognition model through a voice signal z-score standardization unit;

s4, constructing a Vector Space Model (VSM): utilizing a power grid fault dictionary to perform word segmentation on the power grid fault cases, counting word frequency, and combining a power grid fault entity weight table to obtain a power grid fault case characteristic vector;

s5, calculating the weight TF-IDF of each term in the vector;

s6, calculating cosine similarity;

and S7, similarity calculation is carried out according to the calculated values, and the larger the value is, the higher the information similarity of the characteristic vectors of the two power grid fault cases is, the more possibility of the phenomenon of duplication or multiple names is.

Preferably, the speech recognition acoustic model training unit in step S2 includes a training sample obtaining module, an original acoustic model obtaining module, an acoustic feature determining module, a state description model generating module, and an acoustic model generating module, and determines an acoustic state in the original acoustic model corresponding to each training text after the original acoustic model is obtained by the original acoustic model obtaining module.

Preferably, the acoustic feature corresponding to each acoustic state is determined according to the acoustic state and the acoustic feature corresponding to each training text in the acoustic feature determination module, and then the state description model of the acoustic state is obtained by retraining through the state description model generation module by using the acoustic feature corresponding to each acoustic state.

Preferably, the acoustic model generation module finally updates the original state description model in the original acoustic model by using the state model obtained by retraining, and obtains the acoustic model after updating

Preferably, the speech signal z-score normalization unit in step S3 includes a low frequency signal enhancement module, a high frequency signal suppression module, and a band speech signal weight coefficient adjustment module, and the low frequency signal (human speech signal) feature is enhanced according to the low frequency signal enhancement module, and meanwhile, the high frequency signal suppression module suppresses the high frequency signal (non-human speech signal) feature to adjust the speech signal weight coefficients of different frequency bands, thereby improving the robustness of the speech recognition model.

Preferably, in step S4, the grid fault case feature vector is composed of n weights, each word has a weight, and different words influence the importance of the document relevance according to their own weights in the document.

Preferably, TF-IDF in step S2 represents a product of TF (term frequency) and IDF (inverse document frequency): wherein: TF-IDF is the word frequency (TF) x Inverse Document Frequency (IDF).

Advantageous effects

The invention provides a speech recognition robustness enhancing method based on fusion of multiband speech signal characteristics. Compared with the prior art, the method has the following beneficial effects:

the method for enhancing the robustness of the voice recognition based on the feature of the fused multiband voice signal specifically comprises the following steps of: s1, inputting information by a voice input module, performing feature selection by a feature selection module, performing feature extraction on voice signals PNCC of different frequency bands by a voice signal feature extraction unit, and performing voice signal PNCC feature fusion after multiplying the voice signal PNCC feature of each frequency band by a weight coefficient by a voice multiband feature fusion unit; s2, after the voice signal PNCC characteristic fusion is carried out by the voice multiband characteristic fusion unit, model generation and training are carried out by the voice recognition acoustic model training unit; s3, improving the robustness of the voice recognition model through a voice signal z-score standardization unit; s4, constructing a Vector Space Model (VSM): utilizing a power grid fault dictionary to perform word segmentation on the power grid fault cases, counting word frequency, and combining a power grid fault entity weight table to obtain a power grid fault case characteristic vector; s5, calculating the weight TF-IDF of each term in the vector; s6, calculating cosine similarity; s7, similarity calculation is carried out according to the calculated values, the larger the value is, the higher the information similarity of the characteristic vectors of the two power grid fault cases is, the more possibility of renaming or multiple phenomena is caused, the power grid dispatching control center only needs to consume less time and energy of a dispatcher, when serious power grid abnormity occurs, multi-level dispatching cooperative matching is not needed, and under the scene that a plurality of units are commanded to uniformly handle abnormity, serious information blockage can not occur even if a telephone is busy, and automatic recording can be realized in the power grid abnormity handling process.

Drawings

FIG. 1 is a schematic view of the overall structure of the present invention;

FIG. 2 is a schematic diagram of a speech recognition acoustic model training unit according to the present invention;

FIG. 3 is a diagram of a normalization unit of a speech signal z-score according to the present invention.

Detailed Description

The technical solutions in the embodiments of the present invention will be clearly and completely described below with reference to the drawings in the embodiments of the present invention, and it is obvious that the described embodiments are only a part of the embodiments of the present invention, and not all of the embodiments. All other embodiments, which can be derived by a person skilled in the art from the embodiments given herein without making any creative effort, shall fall within the protection scope of the present invention.

Referring to fig. 1-3, the present invention provides a technical solution: the method for enhancing the robustness of voice recognition based on the fusion of the characteristics of the multiband voice signals specifically comprises the following steps:

s1, inputting information by a voice input module, performing feature selection by a feature selection module, performing feature extraction on voice signals PNCC of different frequency bands by a voice signal feature extraction unit, and performing voice signal PNCC feature fusion after multiplying the voice signal PNCC feature of each frequency band by a weight coefficient by a voice multiband feature fusion unit;

s2, after the voice signal PNCC characteristic fusion is carried out by the voice multiband characteristic fusion unit, model generation and training are carried out by the voice recognition acoustic model training unit;

s3, improving the robustness of the voice recognition model through a voice signal z-score standardization unit;

s4, constructing a Vector Space Model (VSM): utilizing a power grid fault dictionary to perform word segmentation on the power grid fault cases, counting word frequency, and combining a power grid fault entity weight table to obtain a power grid fault case characteristic vector;

s5, calculating the weight TF-IDF of each term in the vector;

s6, calculating cosine similarity;

and S7, similarity calculation is carried out according to the calculated values, and the larger the value is, the higher the information similarity of the characteristic vectors of the two power grid fault cases is, the more possibility of the phenomenon of duplication or multiple names is.

In this embodiment of the present invention, the speech recognition acoustic model training unit in step S2 includes a training sample obtaining module, an original acoustic model obtaining module, an acoustic feature determining module, a state description model generating module, and an acoustic model generating module, and determines an acoustic state in the original acoustic model corresponding to each training text after the original acoustic model is obtained by the original acoustic model obtaining module.

In the embodiment of the invention, the acoustic characteristics corresponding to each acoustic state are determined according to the acoustic state and the acoustic characteristics corresponding to each training text in the acoustic characteristic determination module, and the state description model of the acoustic state is obtained by retraining through the state description model generation module by using the acoustic characteristics corresponding to each acoustic state.

In the embodiment of the invention, the acoustic model generation module updates the original state description model in the original acoustic model by using the state model obtained by retraining, and the updated acoustic model is obtained

In this embodiment of the present invention, the speech signal z-score normalization unit in step S3 includes a low frequency signal enhancement module, a high frequency signal suppression module, and a band speech signal weight coefficient adjustment module, and enhances the characteristics of the low frequency signal (human speech signal) according to the low frequency signal enhancement module, and adjusts the speech signal weight coefficients of different frequency bands according to the principle that the high frequency signal suppression module suppresses the characteristics of the high frequency signal (non-human speech signal), so as to improve the robustness of the speech recognition model.

In the embodiment of the present invention, the grid fault case feature vector in step S4 is composed of n weights, each word has a weight, and different words affect the importance of the document relevance according to their own weights in the document.

In the embodiment of the present invention, TF-IDF in step S2 represents a product of TF (term frequency) and IDF (inverse document frequency): wherein: TF-IDF is the word frequency (TF) x Inverse Document Frequency (IDF).

And those not described in detail in this specification are well within the skill of those in the art.

It is noted that, herein, relational terms such as first and second, and the like may be used solely to distinguish one entity or action from another entity or action without necessarily requiring or implying any actual such relationship or order between such entities or actions. Also, the terms "comprises," "comprising," or any other variation thereof, are intended to cover a non-exclusive inclusion, such that a process, method, article, or apparatus that comprises a list of elements does not include only those elements but may include other elements not expressly listed or inherent to such process, method, article, or apparatus.

Although embodiments of the present invention have been shown and described, it will be appreciated by those skilled in the art that changes, modifications, substitutions and alterations can be made in these embodiments without departing from the principles and spirit of the invention, the scope of which is defined in the appended claims and their equivalents.

8页详细技术资料下载
上一篇:一种医用注射器针头装配设备
下一篇:模型训练方法、音频处理方法、设备及可读存储介质

网友询问留言

已有0条留言

还没有人留言评论。精彩留言会获得点赞!

精彩留言,会给你点赞!