Voice noise reduction method, device, equipment and medium

文档序号:1339712 发布日期:2020-07-17 浏览:5次 中文

阅读说明:本技术 语音降噪方法、装置、设备及介质 (Voice noise reduction method, device, equipment and medium ) 是由 丁大为 王哲 嵇望 于 2020-06-10 设计创作,主要内容包括:本发明公开了一种语音降噪方法,涉及机器学习领域,用于解决现有语音降噪计算量大、资源占用多的问题,该方法包括以下步骤:获取语音数据;对所述语音数据进行预处理,并提取预处理后的所述语音数据的多维特征;将所述多维特征输入预设语音降噪模型,得到频带增益系数;将所述语音数据分为若干频带,根据所述频带增益系数过滤所述频带中的噪音数据;将过滤后的所述语音数据恢复成语音数据流,输出所述语音数据流。本发明还公开了一种语音降噪装置、电子设备和计算机存储介质。本发明通过计算频带增益系数,进而实现语音降噪。(The invention discloses a voice noise reduction method, which relates to the field of machine learning and is used for solving the problems of large calculation amount and large resource occupation of the existing voice noise reduction, and the method comprises the following steps: acquiring voice data; preprocessing the voice data and extracting multidimensional characteristics of the preprocessed voice data; inputting the multidimensional characteristics into a preset voice noise reduction model to obtain a frequency band gain coefficient; dividing the voice data into a plurality of frequency bands, and filtering noise data in the frequency bands according to the frequency band gain coefficient; and restoring the filtered voice data into a voice data stream, and outputting the voice data stream. The invention also discloses a voice noise reduction device, electronic equipment and a computer storage medium. The invention further realizes voice noise reduction by calculating the frequency band gain coefficient.)

1. A method for speech noise reduction, comprising the steps of:

acquiring voice data;

preprocessing the voice data, and extracting multidimensional characteristics of the preprocessed voice data;

inputting the multidimensional characteristics into a preset voice noise reduction model to obtain a frequency band gain coefficient and voice activity detection parameters;

when the voice activity detection parameter is 1, dividing the voice data into a plurality of frequency bands, and filtering noise data in the frequency bands according to the frequency band gain coefficient;

when the voice activity detection parameter is 0, setting the frequency band gain coefficient to be 0, and filtering noise data in the frequency band;

and restoring the filtered voice data into a voice data stream, and outputting the voice data stream.

2. The voice noise reduction method of claim 1, wherein obtaining voice data comprises the steps of:

and acquiring one frame of voice data every 10ms, wherein the sampling rate is 48 kHz.

3. The speech noise reduction method of claim 1, wherein the preprocessing comprises: and performing FFT transformation on the voice data.

4. The method of speech noise reduction according to claim 1, wherein the step of dividing the speech data into a plurality of frequency bands and filtering the noise data in the frequency bands according to the band gain coefficients comprises the steps of:

filtering the voice data through a comb filter, and dividing the filtered voice data into a plurality of frequency bands according to the number of the frequency bands in the preset voice noise reduction model;

and filtering the voice data of each frequency band according to the frequency band gain coefficient.

5. The method of speech noise reduction according to claim 1, wherein extracting multi-dimensional features of the preprocessed speech data comprises the steps of:

dividing the frequency spectrum of each frame of the voice data into 22 unequal frequency bands, and performing dct transformation on the energy of each frequency band to obtain 22 bark frequency cepstrum coefficients as a first characteristic;

extracting the first 6-dimensional features in the first features, and calculating first and second derivatives to obtain 12-dimensional features serving as second features;

extracting the first 6 frequency bands of the frequency bands, and performing gene period dct transformation to obtain six-dimensional features and 1 gene period coefficient as a third feature;

extracting the first 8 frequency bands of the frequency bands, and calculating the sum of difference values of the first eight frequency bands to obtain 1 stability coefficient as a fourth feature;

calculating the frame energy, the zero crossing rate, the normalized autocorrelation coefficient delayed by one position, the first coefficient of 12-order linear prediction and the 12-order linear prediction error of each frame of the voice data as a fifth characteristic;

the multi-dimensional features include the first feature, the second feature, the third feature, the fourth feature, and the fifth feature.

6. The method of claim 1, wherein the training process of the predetermined speech noise reduction model comprises the steps of:

acquiring a pre-constructed recurrent neural network, wherein the recurrent neural network comprises 3 full connection layers and 3 GRU networks;

acquiring training data, wherein the training data comprises pure voice data and noise data;

performing framing processing on the training data, and extracting multi-dimensional features of each frame of training data;

initializing network parameters of the recurrent neural network, inputting the multidimensional characteristics of each frame of training data into the recurrent neural network for model training, and performing model optimization according to a loss function to obtain the preset voice noise reduction model.

7. The method of claim 6, wherein the multidimensional feature is input into a preset speech noise reduction model to obtain a band gain factor, and the band gain factor is calculated by the following formula:wherein g isbFor the gain factor of the frequency band in question,is the energy of the pure voice, and the energy of the pure voice,energy containing noise;

the loss function calculation formula is as follows:whereinTo estimate the gain, gamma is a perceptual parameter,for the frequency band gain factor perception value,is the perceived value of the gain estimate.

8. A speech noise reduction apparatus, comprising:

the acquisition module is used for acquiring voice data;

the filtering module is used for preprocessing the voice data and extracting the multidimensional characteristics and voice activity detection parameters of the preprocessed voice data; when the voice activity detection parameter is 1, dividing the voice data into a plurality of frequency bands, and filtering noise data in the frequency bands according to the frequency band gain coefficient; when the voice activity detection parameter is 0, setting the frequency band gain coefficient to be 0, and filtering noise data in the frequency band;

and the output module is used for recovering the filtered voice data into a voice data stream and outputting the voice data stream.

9. An electronic device comprising a processor, a storage medium, and a computer program, the computer program being stored in the storage medium, wherein the computer program, when executed by the processor, implements the speech noise reduction method of any of claims 1 to 7.

10. A computer-readable storage medium, on which a computer program is stored, which, when being executed by a processor, carries out the method of speech noise reduction according to any one of claims 1 to 7.

Technical Field

The invention relates to the technical field of machine learning, in particular to a voice noise reduction method, device, equipment and medium.

Background

Noise suppression has been a topic of high interest since the last 70 th century. The traditional noise suppression algorithm needs a noise spectrum estimator which is driven by a Voice Activity Detector (VAD) or a similar algorithm, each component of the noise spectrum estimator needs an accurate estimator, the requirement on precision is high, a large amount of manual parameter adjustment work is needed, the efficiency is low, and the noise reduction effect is easily influenced as long as one parameter is not accurate enough.

Disclosure of Invention

In order to overcome the disadvantages of the prior art, an object of the present invention is to provide a speech noise reduction method, which obtains a band gain coefficient by extracting a multi-dimensional feature of speech data and inputting a speech noise reduction model, and performs a band-splitting noise reduction on the speech data according to the band gain coefficient.

One of the purposes of the invention is realized by adopting the following technical scheme:

a method of speech noise reduction comprising the steps of:

acquiring voice data;

preprocessing the voice data, and extracting multidimensional characteristics of the preprocessed voice data;

inputting the multidimensional characteristics into a preset voice noise reduction model to obtain a frequency band gain coefficient and voice activity detection parameters;

when the voice activity detection parameter is 1, dividing the voice data into a plurality of frequency bands, and filtering noise data in the frequency bands according to the frequency band gain coefficient;

when the voice activity detection parameter is 0, setting the frequency band gain coefficient to be 0, and filtering noise data in the frequency band;

and restoring the filtered voice data into a voice data stream, and outputting the voice data stream.

Further, acquiring voice data, comprising the steps of:

and acquiring one frame of voice data every 10ms, wherein the sampling rate is 48 kHz.

Further, the pre-processing comprises: and performing FFT transformation on the voice data.

Further, the method for dividing the voice data into a plurality of frequency bands and filtering the noise data in the frequency bands according to the frequency band gain coefficients comprises the following steps:

filtering the voice data through a comb filter, and dividing the filtered voice data into a plurality of frequency bands according to the number of the frequency bands in the preset voice noise reduction model;

and filtering the voice data of each frequency band according to the frequency band gain coefficient, and filtering noise data.

Further, extracting the multi-dimensional features of the preprocessed voice data comprises the following steps:

dividing the frequency spectrum of each frame of the voice data into 22 unequal frequency bands, and performing dct transformation on the energy of each frequency band to obtain 22 bark frequency cepstrum coefficients as a first characteristic;

extracting the first 6-dimensional features in the first features, and calculating first and second derivatives to obtain 12-dimensional features serving as second features;

extracting the first 6 frequency bands of the frequency bands, and performing gene period dct transformation to obtain six-dimensional features and 1 gene period coefficient as a third feature;

extracting the first 8 frequency bands of the frequency bands, and calculating the sum of difference values of the first eight frequency bands to obtain 1 stability coefficient as a fourth feature;

calculating the frame energy, the zero crossing rate, the normalized autocorrelation coefficient delayed by one position, the first coefficient of 12-order linear prediction and the 12-order linear prediction error of each frame of the voice data as a fifth characteristic;

the multi-dimensional features include the first feature, the second feature, the third feature, the fourth feature, and the fifth feature.

Further, the training process of the preset speech noise reduction model comprises the following steps:

acquiring a pre-constructed recurrent neural network, wherein the recurrent neural network comprises 3 full connection layers and 3 GRU networks;

acquiring training data, wherein the training data comprises pure voice data and noise data;

performing framing processing on the training data, and extracting multi-dimensional features of each frame of training data;

initializing network parameters of the recurrent neural network, inputting the multidimensional characteristics of each frame of training data into the recurrent neural network for model training, and performing model optimization according to a loss function to obtain the preset voice noise reduction model.

Further, inputting the multidimensional characteristics into a preset voice noise reduction model to obtain a frequency band gain coefficient, wherein a calculation formula of the frequency band gain coefficient is as follows: wherein g isbThe band gain coefficient is the energy of pure voice and the energy containing noise;

the loss function calculation formula is as follows: where gamma is a perceptual parameter,for the frequency band gain factor perception value,is the perceived value of the gain estimate.

It is another object of the present invention to provide a speech noise reduction apparatus, which obtains a band gain coefficient by extracting a multi-dimensional feature of speech data and inputting a speech noise reduction model, and performs a band-division noise reduction on the speech data according to the band gain coefficient.

The second purpose of the invention is realized by adopting the following technical scheme:

a speech noise reduction apparatus, comprising:

the acquisition module is used for acquiring voice data;

the filtering module is used for preprocessing the voice data and extracting the multidimensional characteristics of the preprocessed voice data; inputting the multidimensional characteristics into a preset voice noise reduction model to obtain a frequency band gain coefficient and voice activity detection parameters; when the voice activity detection parameter is 1, dividing the voice data into a plurality of frequency bands, and filtering noise data in the frequency bands according to the frequency band gain coefficient; when the voice activity detection parameter is 0, setting the frequency band gain coefficient to be 0, and filtering noise data in the frequency band;

and the output module is used for recovering the filtered voice data into a voice data stream and outputting the voice data stream.

It is a further object of the present invention to provide an electronic device comprising a processor, a storage medium, and a computer program stored in the storage medium, which when executed by the processor implements the above-mentioned speech noise reduction method.

It is a fourth object of the present invention to provide a computer readable storage medium storing one of the objects of the invention, having a computer program stored thereon, which when executed by a processor, implements the above-described speech noise reduction method.

Compared with the prior art, the invention has the beneficial effects that:

the invention carries out voice noise reduction through the frequency band gain coefficient, can realize voice noise reduction only by carrying out frequency band division on voice data and carrying out noise filtration on the frequency band, and only carries out filtering noise reduction on the voice data of detected voice signals, thereby reducing the calculation amount when no voice signal exists, greatly reducing the calculation complexity when voice noise reduction is carried out, having high noise reduction efficiency, realizing real-time noise reduction and not occupying a large amount of resources.

Drawings

FIG. 1 is a flow chart of a voice denoising method according to the first embodiment;

FIG. 2 is a flowchart of a multi-dimensional feature extraction method according to the first embodiment;

FIG. 3 is a flowchart of a model training method according to the second embodiment;

fig. 4 is a block diagram showing the structure of a speech noise reduction apparatus according to a third embodiment;

fig. 5 is a block diagram of the electronic apparatus according to the fourth embodiment.

Detailed Description

The present invention will now be described in more detail with reference to the accompanying drawings, in which the description of the invention is given by way of illustration and not of limitation. The various embodiments may be combined with each other to form other embodiments not shown in the following description.

15页详细技术资料下载
上一篇:一种医用注射器针头装配设备
下一篇:音频信号的处理方法及装置、存储介质

网友询问留言

已有0条留言

还没有人留言评论。精彩留言会获得点赞!

精彩留言,会给你点赞!