Voice denoising method, voice recognition method and computer readable storage medium

文档序号：1339709 发布日期：2020-07-17 浏览：10次中文

阅读说明：本技术 一种语音去噪方法、语音识别方法及计算机可读存储介质 (Voice denoising method, voice recognition method and computer readable storage medium ) 是由张广学肖龙源蔡振华李稀敏刘晓葳于 2020-03-03 设计创作，主要内容包括：本发明涉及人计算机技术领域,提供了一种语音去噪方法,所述方法包含步骤：获取待处理语音；确定所述待处理语音中所包含的各噪声的类型及对应的位置范围；分别基于所述噪声的类型,获取各所述噪声的特征信息；基于所述特征信息及所述位置范围明确所述噪声的起止位置；基于所述特征信息及所述起止位置,对所述噪声进行去噪处理。本实施例所提供的语音去噪方法,通过对待处理语音中的各种噪声进行识别,并进一步基于各噪声的位置信息及特征信息对其进行反向补偿,从而实现语音去噪的目的。(The invention relates to the technical field of computers, and provides a voice denoising method, which comprises the following steps: acquiring a voice to be processed; determining the type and the corresponding position range of each noise contained in the voice to be processed; respectively acquiring characteristic information of each noise based on the type of the noise; specifying a start-stop position of the noise based on the feature information and the position range; and denoising the noise based on the characteristic information and the start-stop position. The voice denoising method provided by this embodiment recognizes various noises in the voice to be processed, and further performs reverse compensation on the noises based on the position information and the feature information of the noises, thereby achieving the purpose of voice denoising.)

1. A method for denoising speech, the method comprising the steps of:

acquiring a voice to be processed;

determining the type and the corresponding position range of each noise contained in the voice to be processed;

respectively acquiring characteristic information of each noise based on the type of the noise;

specifying a start-stop position of the noise based on the feature information and the position range;

and denoising the noise based on the characteristic information and the start-stop position.

2. The method of claim 1, wherein the determining the type and corresponding location range of each noise included in the speech to be processed specifically comprises:

and identifying the voice to be processed based on a noise identification model so as to obtain each noise type and corresponding position range contained in the voice to be processed.

3. The method of claim 1, wherein the noise identification model is obtained by a method comprising:

collecting noise corpora and marking the type of the noise corpora;

constructing a noise identification algorithm based on a time delay neural network;

and training the noise recognition algorithm based on the marked noise corpus to obtain the noise recognition model.

4. The method of claim 1, wherein the obtaining the feature information of each noise based on the type of the noise comprises:

and inquiring a preset noise library based on the type of each noise so as to obtain the characteristic information of each noise.

5. The method of claim 1, wherein the characteristic information comprises one or more of a fixed frequency, a periodic variation, and a trend of variation.

6. The method of claim 1, wherein said ascertaining a start-stop location of said noise based on said feature information and said location range comprises:

determining the voice segment to be processed corresponding to the noise based on the position range;

and matching the characteristic information contained in the segment based on the characteristic information so as to clarify the starting and ending positions of the noise in the speech to be processed.

7. The method of claim 1, wherein the denoising the noise based on the feature information and the start-stop position specifically comprises:

determining the voice data to be processed corresponding to the noise based on the starting and stopping positions;

and performing reverse compensation on the voice data to be processed based on the characteristic information.

8. A speech recognition method, characterized in that the method specifically comprises the steps of:

denoising the speech to be recognized based on the speech denoising method of any one of claims 1 to 7;

and recognizing the denoised voice to be recognized based on a voice recognition model.

9. A computer-readable storage medium, storing a computer program, wherein the computer program, when executed by a processor, implements the speech denoising method according to any one of claims 1 through 7.

10. A computer-readable storage medium, in which a computer program is stored which, when being executed by a processor, carries out the speech recognition method of claim 8.

Technical Field

The present invention relates to the field of computer information technologies, and in particular, to a speech denoising method, a speech recognition method, and a computer-readable storage medium.

Background

Speech Recognition technology, also known as Automatic Speech Recognition (ASR), aims at converting the vocabulary content in human Speech into computer-readable input, such as keystrokes, binary codes or character sequences. Unlike speaker recognition and speaker verification, the latter attempts to recognize or verify the speaker who uttered the speech rather than the vocabulary content contained therein.

With the progress of data processing technology and the rapid spread of mobile internet, computer technology is widely applied to various fields of society, and with the progress of data processing technology, mass data is generated. Among them, voice data is receiving more and more attention. Speech recognition is a cross discipline. Over the last two decades. Speech recognition technology has made significant progress, starting to move from the laboratory to the market. It is expected that voice recognition technology will enter various fields such as industry, home appliances, communications, automotive electronics, medical care, home services, consumer electronics, etc. within the next 10 years.

The increasing application requirements also put forward higher requirements on the accuracy of voice recognition, and how to implement voice denoising to purify data to be recognized so as to improve recognition accuracy becomes an important research topic in the industry.

Disclosure of Invention

In view of the above problem, an embodiment of the present invention provides a speech denoising method, including: acquiring a voice to be processed; determining the type and the corresponding position range of each noise contained in the voice to be processed; respectively acquiring characteristic information of each noise based on the type of the noise; specifying a start-stop position of the noise based on the feature information and the position range; and denoising the noise based on the characteristic information and the start-stop position. The voice denoising method provided by this embodiment recognizes various noises in the voice to be processed, and further performs reverse compensation on the noises based on the position information and the feature information of the noises, thereby achieving the purpose of voice denoising.

Based on the same inventive concept, the embodiment of the invention also provides a voice recognition method, which specifically comprises the following steps: denoising the voice to be recognized based on the voice denoising method; and recognizing the denoised voice to be recognized based on a voice recognition model.

And a computer-readable storage medium, in which a computer program is stored, which, when being executed by a processor, implements the above-mentioned speech denoising method and/or the above-mentioned speech recognition method.

In one implementation, the determining the type and corresponding location range of each noise included in the speech to be processed specifically includes:

and identifying the voice to be processed based on a noise identification model so as to obtain each noise type and corresponding position range contained in the voice to be processed.

In one implementation, the method for acquiring the noise identification model includes: collecting noise corpora and marking the type of the noise corpora; constructing a noise identification algorithm based on a time delay neural network; and training the noise recognition algorithm based on the marked noise corpus to obtain the noise recognition model.

In one implementation, the characteristic information includes one or more of a fixed frequency, a periodic variation, and a variation trend.

In one implementation, the ascertaining a start-stop location of the noise based on the feature information and the location range specifically includes: determining the voice segment to be processed corresponding to the noise based on the position range; and matching the characteristic information contained in the segment based on the characteristic information so as to clarify the starting and ending positions of the noise in the speech to be processed.

In one implementation, the denoising the noise based on the feature information and the start-stop position specifically includes: determining the voice data to be processed corresponding to the noise based on the starting and stopping positions; and performing reverse compensation on the voice data to be processed based on the characteristic information.

Drawings

One or more embodiments are illustrated by way of example in the accompanying drawings, which correspond to the figures in which like reference numerals refer to similar elements and which are not to scale unless otherwise specified.

FIG. 1 is a flow chart of a speech denoising method according to a first embodiment of the present invention;

FIG. 2 is a flow chart showing a method for obtaining a noise identification model according to the first embodiment;

FIG. 3 is a flow chart of a method for determining the start-stop position of the noise according to the first embodiment.

Detailed Description

In order to make the objects, technical solutions and advantages of the embodiments of the present invention more apparent, the embodiments of the present invention will be described in detail below with reference to the accompanying drawings. However, it will be appreciated by those of ordinary skill in the art that numerous technical details are set forth in order to provide a better understanding of the present application in various embodiments of the present invention. However, the technical solution claimed in the present application can be implemented without these technical details and various changes and modifications based on the following embodiments.

In a first embodiment of the present invention, a speech denoising method is provided, where noise included in a speech to be recognized is preliminarily confirmed, feature information of each noise is confirmed and acquired, and denoising processing is performed based on the feature information, so as to improve accuracy of noise recognition.

Referring to fig. 1 in detail, fig. 1 is a flowchart illustrating a voice denoising method according to a first embodiment of the present invention. As shown in fig. 1, the method specifically includes the steps of:

step 101, obtaining a voice to be processed.

In this embodiment, the speech to be processed may be speech data that needs to be subjected to denoising processing alone, or may also be speech data to be recognized, which is not limited in the present invention. The voice to be processed may be a voice data segment or a set of complete voice data. The service terminal can directly acquire the voice to be processed from the sound source through the acquisition equipment, and can also acquire the voice from other equipment or applications.

Step 102, determining the type and the corresponding position range of each noise included in the speech to be processed.

In this embodiment, the speech to be processed may be recognized based on the noise recognition model to obtain each noise type and corresponding position range included in the speech to be processed. Wherein, the noise identification model can be constructed in advance and trained.

Specifically, referring to fig. 2, fig. 2 is a flowchart illustrating a method for obtaining a noise identification model, the method comprising the following steps:

step 201, collecting noise corpora, and marking the type of the noise corpora.

Specifically, the noise corpus may contain non-speech sounds of various scenes in life, such as: the method comprises the steps of car singing, door closing, table knocking, coughing, thunder, and the like, wherein the noise collection mode can be directly obtained from a sound source or obtained from a network resource library, and in the process of collecting noise corpora, the collected noise corpora can be screened, for example, the collected noise corpora can be tested based on a voice recognition model, if the recognition rate of the voice recognition model in a certain noise environment is low, the noise corpora can be added into a training set, and if the recognition rate is not influenced or not influenced, the noise corpora can be screened.

After the noise corpora are collected and screened, the noise corpora can be classified, and specific classification dimensionality can be performed based on actual application requirements, for example: the noise sources can be classified according to the physical characteristics of the noise sources, including aerodynamic noise, mechanical noise, electromagnetic noise and the like; the noise source can be classified according to the time characteristics of the noise source, including steady-state noise, unsteady-state noise, impulse noise and the like; the noise may be classified according to noise frequency components, including low frequency noise, intermediate frequency noise, high frequency noise, and the like. Preferably, the noise corpus can be classified in multiple dimensions.

After the classification of the noise corpus is determined, the collected noise corpus can be marked, and different identification values can be set for each classification, so that the noise corpus can be marked based on the identification values corresponding to the classifications to which the noise corpuses belong.

Step 202, a noise identification algorithm is constructed based on the time delay neural network.

In this embodiment, a Time-Delay Neural Network (TDNN) may be used to construct the noise identification algorithm.

Step 203, training the noise recognition algorithm based on the labeled noise corpus to obtain the noise recognition model.

In this implementation, the noise corpus may be converted into frequency domain features, and then the frequency domain features are directly used for model training, so as to ensure non-loss of the noise features.

Through the trained noise recognition model, the input layer can contain voice data, and the output layer can contain the noise type and the position range of noise contained in the voice data.

It should be noted that the speech to be processed may include a plurality of noises, and the noises included in the speech to be processed may be identified by the noise identification model, and the noise type and the position range in the speech to be processed of each noise may be correspondingly output.

And 103, acquiring characteristic information of each noise respectively based on the type of the noise.

In this implementation, a preset noise library may be queried based on the type of each noise to obtain characteristic information of each noise. The data in the noise library is collected in advance, may be obtained based on the noise corpus collected in the noise identification model, and includes a noise type and a feature information field, and records feature information corresponding to each noise type, where the feature information may include one or more of a fixed frequency, a periodic variation, and a variation trend.

And 104, determining the starting and stopping positions of the noise based on the characteristic information and the position range.

Through the execution of the above steps, the noise type, the position range and the feature information of each noise included in the speech to be recognized can be determined, and in this step, the method shown in fig. 3 can be used to realize the definition of the start-stop position of the noise based on the feature information and the position range.

As shown in fig. 3, the method comprises the steps of:

step 301, determining the speech segment to be processed corresponding to the noise based on the position range.

Specifically, a speech segment including the noise may be determined from the speech to be processed based on the position range, and it is understood that, if the speech to be processed includes a plurality of noises, the speech segment including the noises may be acquired based on the position range corresponding to each noise.

And 302, matching the characteristic information contained in the segment based on the characteristic information so as to clarify the starting and ending positions of the noise in the speech to be processed.

As described above, the feature information may include one or more of a fixed frequency, a periodic variation, and a variation trend, and in this step, the feature information of the voice segment may be acquired corresponding to the content included in the feature information of the noise, so that through matching, the specific position of the noise in the voice segment may be determined, and then the start-stop position of the noise in the voice to be processed may be accurately acquired.

And 105, denoising the noise based on the characteristic information and the start-stop position.

In this implementation, the to-be-processed voice data corresponding to each noise can be determined based on the start-stop position corresponding to each noise, so that the to-be-processed voice data can be subjected to voice reverse compensation based on the feature information of each noise, thereby implementing denoising processing.

It should be noted that the voice to be processed may include various noises, and the various noises may occur in different position ranges, so that it may occur that the same voice data to be processed includes multiple noises to be denoised, in this case, it is preferable that each noise is sequentially subjected to reverse compensation, and after each reverse compensation, the feature information is re-matched to determine that the feature information of other noise to be processed still exists in the voice data to be processed, thereby avoiding that the data characteristics of the same voice data to be processed are damaged after being compensated for many times.

Through the voice denoising method provided by the embodiment, the types and the position ranges of the noises contained in the voice to be processed can be recognized through the noise recognition model, then the characteristic information of the noises is obtained from the preset noise library, and the starting and stopping positions of the noises are further determined based on the characteristic information and the position ranges, so that the voice to be processed can be reversely supplemented based on the starting and stopping positions and the characteristic information, the purpose of denoising is realized, the accuracy of noise recognition can be improved by combining a method of matching by means of the noise recognition model and specific data, and the problems of low efficiency or low accuracy caused by independent dependence on the recognition model or the data matching mode are avoided.

Based on the same inventive concept, a second embodiment of the present invention provides a speech recognition method, which may include performing denoising processing on a speech to be recognized through the above method embodiment, and then recognizing the denoised speech to be recognized based on a speech recognition model.

By carrying out denoising processing on the voice to be recognized in advance, the noise which influences the recognition accuracy rate in the voice to be recognized can be removed, and therefore the voice recognition accuracy rate is improved.

Based on the same inventive concept, another embodiment of the present invention relates to a computer-readable storage medium storing a computer program. The computer program, when executed by a processor, implements the method of the first embodiment.

Those skilled in the art can understand that all or part of the steps in the method according to the above embodiments may be implemented by a program to instruct related hardware, where the program is stored in a storage medium and includes several instructions to enable a device (which may be a single chip, a chip, etc.) or a processor (processor) to execute all or part of the steps in the method according to the embodiments of the present application. And the aforementioned storage medium includes: a U-disk, a removable hard disk, a Read-only Memory (ROM), a Random Access Memory (RAM), a magnetic disk or an optical disk, and other various media capable of storing program codes.

The above description is only for the purpose of illustrating the preferred embodiments of the present invention and is not to be construed as limiting the invention, and any modifications, equivalents, improvements and the like that fall within the spirit and principle of the present invention are intended to be included therein.

8页详细技术资料下载

上一篇：一种医用注射器针头装配设备

下一篇：一种基于自适应采样率的降噪模型处理方法及系统

Voice denoising method, voice recognition method and computer readable storage medium

相关技术

网友询问留言