Intelligent voice noise reduction device based on storage and calculation integration, voice input equipment and system

文档序号：719643 发布日期：2021-04-16 浏览：7次中文

阅读说明：本技术 基于存算一体的智能语音降噪装置、语音输入设备、系统 (Intelligent voice noise reduction device based on storage and calculation integration, voice input equipment and system ) 是由王绍迪于 2019-10-15 设计创作，主要内容包括：本发明提供一种基于存算一体的智能语音降噪装置、语音输入设备以及电子系统,该基于存算一体的智能语音降噪装置包括：用于接收待处理的带噪语音的预处理模块、与该预处理模块连接的降噪模块以及与该预处理模块、该降噪模块连接的重建模块；该预处理模块将该待处理的带噪语音转换为频域信号；该降噪模块基于存算一体架构,对该频域信号的幅度谱进行降噪处理得到降噪后幅度谱；该重建模块根据该频域信号的相位谱以及该降噪后幅度谱进行信号重建得到降噪后语音；该降噪模块中预存深度学习降噪模型,其中,通过利用存算一体技术实现深度学习降噪模型进行智能语音降噪,扩展性好,硬件简单,能够直接应用到资源受限的离线终端场景。(The invention provides an intelligent voice noise reduction device based on storage and calculation, a voice input device and an electronic system, wherein the intelligent voice noise reduction device based on storage and calculation comprises: the voice processing device comprises a preprocessing module used for receiving the voice with noise to be processed, a noise reduction module connected with the preprocessing module and a reconstruction module connected with the preprocessing module and the noise reduction module; the preprocessing module converts the voice with noise to be processed into a frequency domain signal; the noise reduction module is used for carrying out noise reduction processing on the amplitude spectrum of the frequency domain signal based on a storage and calculation integrated framework to obtain a noise-reduced amplitude spectrum; the reconstruction module carries out signal reconstruction according to the phase spectrum of the frequency domain signal and the amplitude spectrum after noise reduction to obtain noise-reduced voice; the noise reduction module is provided with a deep learning noise reduction model in advance, wherein the deep learning noise reduction model is used for intelligent voice noise reduction by using a storage and calculation integration technology, the expansibility is good, the hardware is simple, and the method can be directly applied to an offline terminal scene with limited resources.)

1. The utility model provides an intelligent pronunciation noise reduction device based on deposit and calculation is integrative which characterized in that includes: the voice processing device comprises a preprocessing module used for receiving voice with noise to be processed, a noise reduction module connected with the preprocessing module and a reconstruction module connected with the preprocessing module and the noise reduction module;

the preprocessing module converts the voice with noise to be processed into a frequency domain signal;

the noise reduction module is used for carrying out noise reduction processing on the amplitude spectrum of the frequency domain signal based on a storage and calculation integrated framework to obtain a noise-reduced amplitude spectrum;

the reconstruction module carries out signal reconstruction according to the phase spectrum of the frequency domain signal and the amplitude spectrum after noise reduction to obtain noise-reduced voice;

and the noise reduction module is used for pre-storing a deep learning noise reduction model.

2. The intelligent voice noise reduction device based on integration of computation of claim 1, wherein the operation mode of the noise reduction module comprises: a processing mode and a programming mode;

and the noise reduction module performs noise reduction processing on the amplitude spectrum of the frequency domain signal in a processing mode to obtain a noise-reduced amplitude spectrum, and updates the deep learning noise reduction model in a programming mode.

3. The intelligent voice noise reduction device based on computer integration according to claim 1, wherein the noise reduction module comprises: the device comprises a filter circuit, a flash memory processing array, an analog-to-digital conversion module and a post-processing module which are sequentially connected.

4. The intelligent voice noise reduction device based on integration of computation of claim 3, wherein the flash memory processing array comprises a plurality of flash memory cells arranged in an array, and the flash memory cells are programmable semiconductor devices with adjustable threshold voltages.

5. The intelligent voice noise reduction device based on integration of computation of claim 3, wherein the flash memory processing array comprises a plurality of flash memory units arranged in an array, the flash memory units comprising: the programmable semiconductor device is used for storing long-term data and is adjustable in threshold voltage, and the analog capacitor unit is used for storing temporary data, and the programmable semiconductor device is connected with the analog capacitor unit in parallel.

6. The intelligent speech noise reduction device based on memory integration according to any one of claims 4 or 5, wherein the noise reduction module further comprises: and the programming circuit is used for regulating and controlling the data stored in the flash memory unit so as to update the neural network parameters of the deep learning noise reduction model.

7. The intelligent voice noise reduction device based on computer integration according to claim 1, wherein the preprocessing module comprises: a windowing unit and a Fourier transform unit;

the windowing unit is used for windowing the voice with noise to be processed;

and the Fourier transform unit is used for converting the windowed noisy speech into the frequency domain signal.

8. The intelligent voice noise reduction over computation-based unit of claim 1, wherein the reconstruction module comprises: a power spectrum compensation unit and an inverse Fourier transform unit;

the power spectrum compensation unit is used for compensating the amplitude spectrum after noise reduction;

and the inverse Fourier transform unit is used for performing inverse Fourier transform on the compensated amplitude spectrum and the phase spectrum of the frequency domain signal to obtain noise-reduced voice.

9. A speech input device, comprising: a voice acquisition device and a smart voice noise reduction device based on integration of calculation according to any one of claims 1 to 8 connected with the voice acquisition device;

the voice acquisition device is used for acquiring voice signals, and the intelligent voice noise reduction device is used for carrying out noise reduction processing on the voice signals.

10. An electronic system, characterized in that it comprises a computer-based intelligent speech noise reduction device according to any one of claims 1 to 8.

Technical Field

The invention relates to the technical field of voice processing, in particular to an intelligent voice noise reduction device based on storage and calculation integration, voice input equipment and a system.

Background

Artificial Intelligence (AI) is a new technical science to study and develop theories, methods, techniques and application systems for simulating, extending and expanding human Intelligence. Artificial intelligence is a branch of computer science that attempts to understand the essence of intelligence and produce a new intelligent machine that can react in a manner similar to human intelligence, a field of research that includes robotics, language recognition, image recognition, natural language processing, and expert systems, among others.

With the rapid development of artificial intelligence, intelligent speech recognition technology has wide application, for example: in electronic devices or systems such as smart phones, wireless headsets, intelligent robots, vehicle-mounted devices and the like, the voice input by a user needs to be accurately recognized by adopting an intelligent voice recognition technology. However, due to the interference of environmental noise and other device signals, the input speech contains noise, which affects the accuracy of speech recognition.

The traditional voice noise reduction method is usually based on Bayes estimation, spectral subtraction and the like of a statistical model, and the methods have high resource cost and complex hardware and are difficult to be directly applied to an offline terminal scene with limited resources.

Disclosure of Invention

Aiming at the problems in the prior art, the invention provides an intelligent voice noise reduction device based on storage and calculation integration, a voice input device and an electronic system, which can at least partially solve the problems in the prior art.

In order to achieve the purpose, the invention adopts the following technical scheme:

in a first aspect, an intelligent voice noise reduction device based on storage and calculation integration is provided, which includes: the voice processing device comprises a preprocessing module used for receiving the voice with noise to be processed, a noise reduction module connected with the preprocessing module and a reconstruction module connected with the preprocessing module and the noise reduction module;

the preprocessing module converts the voice with noise to be processed into a frequency domain signal;

wherein, the noise reduction module prestores a deep learning noise reduction model.

Further, the working mode of the noise reduction module comprises: a processing mode and a programming mode;

Further, the noise reduction module includes: the device comprises a filter circuit, a flash memory processing array, an analog-to-digital conversion module and a post-processing module which are sequentially connected.

Furthermore, the flash memory processing array comprises a plurality of flash memory units which are arranged in an array, and the flash memory units are programmable semiconductor devices with adjustable threshold voltage.

Further, the flash memory processing array includes a plurality of flash memory units arranged in an array, and the flash memory units include: the programmable semiconductor device is used for storing long-term data and has an adjustable threshold voltage, and the analog capacitor unit is used for storing temporary data and is connected with the programmable semiconductor device in parallel.

Further, the noise reduction module further comprises: and the programming circuit is used for regulating and controlling the data stored in the flash memory unit so as to update the neural network parameters of the deep learning noise reduction model.

Further, the preprocessing module includes: a windowing unit and a Fourier transform unit;

the windowing unit is used for windowing the voice with noise to be processed;

the Fourier transform unit is used for converting the windowed noisy speech into the frequency domain signal.

Further, the reconstruction module includes: a power spectrum compensation unit and an inverse Fourier transform unit;

the power spectrum compensation unit is used for compensating the amplitude spectrum after noise reduction;

the inverse Fourier transform unit is used for performing inverse Fourier transform on the compensated amplitude spectrum and the phase spectrum of the frequency domain signal to obtain noise-reduced voice.

In a second aspect, there is provided a voice input device comprising: the voice acquisition device and the intelligent voice noise reduction device based on the storage and calculation integration are connected with the voice acquisition device;

this pronunciation collection system is used for gathering speech signal, and this intelligence pronunciation noise reduction device is used for carrying out noise reduction to this speech signal.

In a third aspect, an electronic system is provided, which comprises the above-mentioned intelligent voice noise reduction device based on the integration of computation.

The invention provides an intelligent voice noise reduction device based on storage and calculation, a voice input device and an electronic system, wherein the intelligent voice noise reduction device based on storage and calculation comprises: the voice processing device comprises a preprocessing module used for receiving the voice with noise to be processed, a noise reduction module connected with the preprocessing module and a reconstruction module connected with the preprocessing module and the noise reduction module; the preprocessing module converts the voice with noise to be processed into a frequency domain signal; the noise reduction module is used for carrying out noise reduction processing on the amplitude spectrum of the frequency domain signal based on a storage and calculation integrated framework to obtain a noise-reduced amplitude spectrum; the reconstruction module carries out signal reconstruction according to the phase spectrum of the frequency domain signal and the amplitude spectrum after noise reduction to obtain noise-reduced voice; the noise reduction module is provided with a deep learning noise reduction model in advance, wherein the deep learning noise reduction model is used for intelligent voice noise reduction by using a storage and calculation integration technology, the expansibility is good, the hardware is simple, and the method can be directly applied to an offline terminal scene with limited resources.

In order to make the aforementioned and other objects, features and advantages of the invention comprehensible, preferred embodiments accompanied with figures are described in detail below.

Drawings

In order to more clearly illustrate the embodiments of the present application or the technical solutions in the prior art, the drawings needed to be used in the description of the embodiments or the prior art will be briefly introduced below, and it is obvious that the drawings in the following description are some embodiments of the present application, and it is obvious for those skilled in the art to obtain other drawings based on these drawings without creative efforts. In the drawings:

FIG. 1 is a block diagram illustrating a structure of an intelligent speech noise reduction apparatus based on a storage and computation integration according to an embodiment of the present invention;

FIG. 2 is a block diagram illustrating a noise reduction module in an intelligent speech noise reduction device based on a storage integration according to an embodiment of the present invention;

FIG. 3 shows a circuit diagram of a flash memory processing array in an embodiment of the invention;

FIG. 4 is a diagram illustrating a structure of a deep learning noise reduction model according to an embodiment of the present invention;

FIG. 5 shows a circuit diagram of a filter circuit in an embodiment of the invention;

FIG. 6 is a circuit diagram of a flash memory cell in another flash memory processing array in an embodiment of the invention;

FIG. 7 is a diagram illustrating a detailed structure of a pre-processing module in an embodiment of the present invention;

FIG. 8 is a diagram illustrating a detailed structure of a reconstruction module according to an embodiment of the present invention;

FIG. 9 is a block diagram showing the configuration of a voice input device in the embodiment of the present invention;

fig. 10 shows a block diagram of an electronic system according to an embodiment of the present invention.

Detailed Description

In order to make the technical solutions better understood by those skilled in the art, the technical solutions in the embodiments of the present application will be clearly and completely described below with reference to the drawings in the embodiments of the present application, and it is obvious that the described embodiments are only partial embodiments of the present application, but not all embodiments. All other embodiments, which can be derived by a person skilled in the art from the embodiments given herein without making any creative effort, shall fall within the protection scope of the present application.

FIG. 1 illustrates an intelligent voice noise reduction device based on a storage and computation integration provided by an embodiment of the present invention; as shown in fig. 1, the integrated intelligent voice noise reduction apparatus based on computation is disposed between a voice input device and a voice processing device or integrated behind the voice input device of an electronic device, and includes: the voice processing device comprises a preprocessing module 1 for receiving the voice with noise to be processed, a noise reduction module 2 connected with the preprocessing module 1, and a reconstruction module 3 connected with the preprocessing module 1 and the noise reduction module 2;

the preprocessing module 1 converts the voice with noise to be processed into a frequency domain signal;

the noise reduction module 2 is used for carrying out noise reduction processing on the amplitude spectrum of the frequency domain signal based on the storage and calculation integrated framework to obtain a noise-reduced amplitude spectrum;

the reconstruction module 3 carries out signal reconstruction according to the phase spectrum of the frequency domain signal and the amplitude spectrum after noise reduction to obtain noise-reduced voice;

and the noise reduction module prestores a deep learning noise reduction model.

It is worth mentioning that the operation mode of the noise reduction module 2 includes: a processing mode and a programming mode; and the noise reduction module performs noise reduction processing on the amplitude spectrum of the frequency domain signal in a processing mode to obtain a noise-reduced amplitude spectrum, and updates the deep learning noise reduction model in a programming mode.

And outputting the separated voice amplitude spectrum and noise amplitude spectrum after the deep learning noise reduction model processing.

The to-be-processed voice with noise can be acquired by voice signal acquisition devices such as a microphone, a bone conduction sensor and a sound sensor, and it is worth explaining that the voice recognized by the microphone is an audio signal in the environment, including environmental noise and the like, the noise of the analog voice signal is obviously more, and before the analog voice signal is recognized, the analog voice signal is denoised to reduce the signal noise and further improve the definition of the voice so as to improve the accuracy of the voice recognition. In addition, the bone conduction sensor can be arranged on an in-ear earphone, and can accurately acquire whether a user makes a sound.

The intelligent voice noise reduction device based on the storage and computation integration provided by the embodiment of the invention can utilize the noise reduction module with small volume, high operation speed and low cost to perform noise reduction processing on voice based on the intelligent learning noise reduction model, is beneficial to miniaturization of equipment volume, does not need to put the voice noise reduction process into a server for realization, does not need to perform complicated operations such as noise estimation and the like, does not depend on a communication network, reduces processing delay, improves user experience, has good expansibility and simple hardware, can be directly applied to an offline terminal scene with limited resources, and is suitable for AI noise reduction application scenes.

In addition, the noise reduction module processes the voice based on the nonvolatile storage device, so that data cannot be lost even if power is off, and the reliability of voice input is effectively improved.

The intelligent voice noise reduction device can be used for processing voice input noise reduction processing of occasions such as human voice detection, keyword awakening, command recognition, voice recognition, voiceprint recognition, voice synthesis and the like. If the electronic device is awakened based on the keyword in the user voice so as to save the standby power consumption of the electronic device, the method can be used in a navigator, a vehicle-mounted control terminal, an intelligent home, a wearable device, a government affair platform, a mobile phone, an operator platform, finance, education, logistics, a hotel, real estate, intelligent seat service, an intelligent robot and the like.

The noise reduction module can be a separately arranged chip, and can also be arranged in electronic equipment such as mobile equipment, a tablet computer, a notebook computer and a desktop computer. In some embodiments, the mobile device may include a smart home apparatus, a wearable device, a virtual reality apparatus, an augmented reality apparatus, a mobile phone, a personal digital assistant, a gaming apparatus, a navigation apparatus, a point-of-sale POS machine, the like, or any combination of the above. In some embodiments, the smart home devices may include smart lighting devices, control devices for smart appliances, smart monitoring devices, smart televisions, smart cameras, interphones, and the like or any combination of the above. In some embodiments, the wearable device may include a smart bracelet, smart footwear, smart glasses, smart helmet, smart watch, smart clothing, smart backpack, smart accessory, the like, or any combination thereof. In some embodiments, the virtual reality device and/or augmented reality device may include a virtual reality helmet, virtual reality glasses, virtual reality eyecups, augmented reality helmets, augmented reality glasses, augmented reality eyecups, and the like, or any combination thereof. For example, the virtual reality device and/or augmented reality device may include google glasses, virtual reality glasses, holographic lenses, virtual reality helmets, and the like.

FIG. 2 is a block diagram illustrating a noise reduction module in an intelligent speech noise reduction device based on a storage integration according to an embodiment of the present invention; as shown in fig. 2, the noise reduction module includes: the system comprises an input interface 10, a filter circuit 20 connected with the input interface 10, a demultiplexer 30 with an input end connected with the filter circuit 20, a flash memory processing array 40 connected with an output end of the demultiplexer 30, a multiplexer 50 with an input end connected with the flash memory processing array 40, an analog-to-digital conversion module 60 connected with an output end of the multiplexer, a post-processing module 70 connected with the analog-to-digital conversion module 60, a demultiplexer 80 connected with the post-processing module 70, an output interface 90 connected with two output ends of the demultiplexer 80, a register 100 and a control module 110 connected with each circuit module;

it is noted that flash processing array 40 may include a plurality of flash processing subarrays 40 for performing different analog vector-matrix multiplication operations, respectively₁～40_n。

Wherein a plurality of flash memory processing sub-arrays 40₁～40_nThe flash memory processing subarrays may have the same structure, or the structures of the flash memory processing subarrays may be set to be different according to actual application requirements, for example, the number of rows and the number of columns of each flash memory processing subarray may be set according to actual application requirements, which is not limited in this embodiment of the present invention.

The input terminal of the register 100 is connected to the demultiplexer 80, the output terminal is connected to the input terminal of the demultiplexer 30, and a plurality of output terminals of the demultiplexer 30 are respectively connected to a plurality of flash memory processing sub-arrays 40 included in the flash memory processing array 40₁～40_nA plurality of flash memory processing sub-arrays 40₁～40_nRespectively, are connected to a plurality of inputs of the multiplexer 50.

The input interface 10 is used for receiving the amplitude spectrum of the frequency domain signal; the filter circuit 60 filters the amplitude spectrum of the frequency domain signal to filter out the environmental noise in the amplitude spectrum of the frequency domain signal, the demultiplexer is used to selectively output one of the output signal of the filter circuit and the output signal of the register 100 to one or more of the flash memory processing sub-arrays, the flash memory processing sub-arrays are used to process the received signal, the processed signal is output to the analog-to-digital conversion module 60 through the multiplexer 50 for analog-to-digital conversion, and is further operated by the post-processing module 70 and then transmitted to the demultiplexer 80, the demultiplexer 80 selectively transmits the received signal to the output interface 90 for output, or transmits the received signal to the register 100 as intermediate data, and is transmitted to the demultiplexer 30 through the register 100 to participate in the next round of operation.

In an optional embodiment, the flash memory processing array 20 includes a plurality of flash memory cells arranged in an array, the plurality of flash memory cells may be divided into the above-mentioned flash memory processing sub-arrays, and the flash memory cells are programmable semiconductor devices with adjustable threshold voltages, and are used for performing neural network operations on received data in a processing mode, and performing data programming in a programming mode to adjust parameters of a neural network model. The threshold voltage of the programmable semiconductor devices is adjustable, namely the transconductance of each programmable semiconductor device is adjustable, which is equivalent to that variable simulation weight data is stored in each programmable semiconductor device, a plurality of programmable semiconductor devices in the flash memory processing array form a simulation data array, each data in the array can be freely adjusted, the output current of each programmable semiconductor device is equal to the sum of the input simulation data multiplied by the simulation weight data according to ohm's law and kirchhoff's law, the output current of the plurality of programmable semiconductor devices is equal to the sum of the output currents of each programmable semiconductor device, and further, various calculations are directly realized in the flash memory processing array.

In an alternative embodiment, the post-processing module 70 includes a plurality of programmable arithmetic operation units 70 for respectively implementing different arithmetic operations₁～70_n。

The programmable arithmetic operation unit is implemented in hardware for performing a specific arithmetic operation.

Wherein the arithmetic operation comprises: one or more of multiplication, addition, subtraction, division, shift, activation function, maximum value, minimum value, average value, pooling, etc.

The control module 110 performs a combination configuration of the demultiplexer 30, the flash processing array 40, the multiplexer 50, the programmable arithmetic operation module 70, and the demultiplexer 80 according to the configuration information and the finite state machine information, so that the noise reduction module can perform various operations, such as complex neural network operations.

The configuration information and the finite-state machine information can be obtained through a compiling tool according to the actual application requirements.

The configuration information is usually static, such as the state of each module participating in the task, the configuration size of each unit; configuration information is typically stored in memory and scheduled before the task runs. The finite state machine information is dynamic in general, and controls the time sequence and the state of the actual task when the task runs.

Specifically, the control module 110 performs combined configuration on the plurality of flash memory processing sub-arrays and the plurality of programmable arithmetic operation units according to the configuration information, selects the flash memory processing sub-arrays and the programmable arithmetic operation units which are put into operation, and controls a combined pairing manner of the flash memory processing sub-arrays and the programmable arithmetic operation units to realize specific operation.

It can be understood that each programmable arithmetic operation unit in the plurality of programmable arithmetic operation units can realize one or more arithmetic operations, and the plurality of programmable arithmetic operation units can be arranged and combined to form a plurality of composite operations, and can realize a plurality of combination configurations in cooperation with the plurality of flash memory processing sub-arrays, thereby realizing complex operation functions.

Control module 110 divides flash processing array 40 into a plurality of flash processing sub-arrays 40₁～40_nAnd controls a plurality of flash memory processing sub-arrays 40₁～40_nThe operation timing of (3) controls the operation state of the selector corresponding to each programmable arithmetic operation unit, so that the plurality of programmable arithmetic operation units realize arbitrary combination operation to participate in the operation, and controls the open/close states of the demultiplexer 30, the multiplexer 50, and the demultiplexer 80.

Through an input interfaceThe input frequency domain signal is filtered by the filter circuit 20 and then selectively connected to the plurality of flash memory processing sub-arrays (40) by a Demultiplexer (DEMUX)30₁～40_n) And performing processing such as voice feature extraction on one or a part of the voice signals. The control module 110 is coupled to the demultiplexer 30 to control the demultiplexer 30 according to the configuration information, and further to select which flash memory processing sub-arrays are involved in operation.

The plurality of flash memory processing sub-arrays (40)₁～40_n) Is connected to the analog-to-digital conversion module 60 via a multiplexer 50. The control module 110 is connected to the multiplexer 50, and controls the multiplexer 50 according to the configuration information, so as to select which flash memory processing sub-array has its output connected to the input of the analog-to-digital conversion module 60, i.e. the output of the above-mentioned flash memory processing sub-array participating in the operation is connected to the input of the analog-to-digital conversion module 60.

The input terminal of the programmable arithmetic operation module 70 is connected to the output terminal of the analog-to-digital conversion module 60.

A plurality of the programmable arithmetic operation units 70 of the programmable arithmetic operation module 70₁～70_nSerially connected, each of the programmable arithmetic units comprising: a demultiplexer, an arithmetic operation subunit and a multiplexer.

The input end of the demultiplexer in the programmable arithmetic operation unit is connected with a programmable arithmetic operation unit or the analog-to-digital conversion module 60, one output end is connected with the arithmetic operation subunit, the output end of the arithmetic operation subunit and the other output end of the demultiplexer are connected with the next programmable arithmetic operation unit or the demultiplexer 80 through the demultiplexer in the programmable arithmetic operation unit, and in addition, the control ends of the demultiplexer and the demultiplexer in the programmable arithmetic operation unit are connected with the control module.

Specifically, the input end of the demultiplexer in the first programmable arithmetic operation unit is connected to the output end of the analog-to-digital conversion module 60, one of the output ends is connected to the input end of the arithmetic operation subunit in the first programmable arithmetic operation unit, and the other output end and the output end of the arithmetic operation subunit are connected to the input end of the second programmable arithmetic operation unit through the demultiplexer in the programmable arithmetic operation unit.

The input end of the demultiplexer in the second programmable arithmetic operation unit is connected with the output end of the first programmable arithmetic operation unit, one of the output ends is connected with the input end of the arithmetic operation subunit in the second programmable arithmetic operation unit, and the other output end and the output end of the arithmetic operation subunit are connected with the input end of the third programmable arithmetic operation unit through the demultiplexer in the programmable arithmetic operation unit. And so on, until the nth programmable arithmetic operation unit, the input end of the demultiplexer in the nth programmable arithmetic operation unit is connected with the output end of the (n-1) th programmable arithmetic operation unit, one of the output ends is connected with the input end of the arithmetic operation subunit in the nth programmable arithmetic operation unit, and the other output end and the output end of the arithmetic operation subunit are connected with the input end of the demultiplexer 80 through the multiplexer of the programmable arithmetic operation unit.

The control module is connected with the demultiplexer and the multiplexer in each programmable arithmetic operation unit, and controls the demultiplexer and the multiplexer in each programmable arithmetic operation unit according to the configuration information to select whether the arithmetic operation subunit in the programmable arithmetic operation unit participates in the operation or not, thereby realizing the arrangement and combination configuration of a plurality of programmable arithmetic operation units, realizing different complex operations and flexibly configuring the arithmetic operation function.

In an alternative embodiment, each of the programmable arithmetic operation subunits may include a plurality of arithmetic operators arranged side by side, such as one or more of a multiplier, an adder, a subtractor, a divider, a shifter, an activation function, a maximum value calculator, a minimum value calculator, a mean value calculator and a pooling device, and the arithmetic operators are connected in parallel, and have input terminals connected to the output terminals of the corresponding demultiplexers respectively and output terminals connected to the input terminals of the corresponding multiplexers respectively.

Two output terminals of the demultiplexer 80 are connected to the input terminal of the output interface 90 and the input terminal of the register 100, respectively. The control module is connected to the demultiplexer 80, and controls the operating state of the demultiplexer 80 according to the configuration information to select whether to output the output result of the post-processing module 70 to the output interface 90 or the register 100, and when the output result is selected to be output to the register 100, it means that a new round of operation processing will be performed on the output result.

In an alternative embodiment, each of the flash memory processing sub-arrays employs a source-coupled, drain-summed topology, see fig. 3, including a plurality of programmable semiconductor devices (also referred to as flash memory cells) arranged in an array.

The source electrodes of all the programmable semiconductor devices in each column are connected to the same analog voltage input end, and the programmable semiconductor devices in multiple columns are correspondingly connected with a plurality of analog voltage input ends; the drain electrodes of all the programmable semiconductor devices in each row are connected to the same analog current output end, and the programmable semiconductor devices in the rows are correspondingly connected with a plurality of analog current output ends; the grid electrodes of all the programmable semiconductor devices in each row are connected to the same bias voltage input end, and the programmable semiconductor devices in multiple rows are correspondingly connected with a plurality of bias voltage input ends; wherein the threshold voltage of each of the programmable semiconductor devices is adjustable.

In another alternative embodiment, each of the flash memory processing sub-arrays includes a plurality of programmable semiconductor devices arranged in an array; the grid electrodes of all the programmable semiconductor devices in each row are connected to the same analog voltage input end, and the programmable semiconductor devices in multiple rows are correspondingly connected with a plurality of analog voltage input ends; the drain electrodes of all the programmable semiconductor devices in each row are connected to the same first end, and the programmable semiconductor devices in the rows are correspondingly connected with the first ends; the source electrodes of all the programmable semiconductor devices in each row are connected to the same second end, and the plurality of rows of programmable semiconductor devices are correspondingly connected with the plurality of second ends; the threshold voltage of each programmable semiconductor device can be adjusted; the first end is a bias voltage input end, the second end is an analog current output end, and a topological structure of grid coupling and source summation is realized; or the first end is an analog current output end, and the second end is a bias voltage input end, so that a topological structure of grid coupling and drain electrode summation is realized.

In an optional embodiment, the noise reduction module may further include: a circuit is programmed.

The programming circuit is connected with the source electrode, the grid electrode and/or the substrate of each programmable semiconductor device in the flash memory processing array and is used for regulating and controlling the threshold voltage of the programmable semiconductor device.

Wherein the programming circuit comprises: a voltage generating circuit for generating a programming voltage or an erase voltage and a voltage control circuit for applying the programming voltage to a selected programmable semiconductor device.

Specifically, the programming circuit utilizes the hot electron injection effect to apply a high voltage to the source of the programmable semiconductor device according to the threshold voltage requirement data of the programmable semiconductor device, and channel electrons are accelerated to a high speed so as to increase the threshold voltage of the programmable semiconductor device.

And the programming circuit applies high voltage to the grid electrode or the substrate of the programmable semiconductor device by utilizing the tunneling effect according to the threshold voltage requirement data of the programmable semiconductor device, thereby reducing the threshold voltage of the programmable semiconductor device.

In addition, the control module is connected with the programming circuit and used for controlling the programming circuit according to the configuration information so as to adjust the weight stored in the flash memory processing array.

In an optional embodiment, the noise reduction module may further include: a row-column decoder.

The row-column decoder is connected with the flash memory processing array and the control module and is used for performing row-column decoding on the flash memory processing array under the control of the control module.

In an alternative embodiment, the programmable semiconductor device may be implemented using floating gate transistors.

Wherein the flash memory processing array comprises: NOR type flash memory processing array and NAND type flash memory processing array, although the invention is not limited thereto.

In order to make those skilled in the art better understand the present invention, a scene in which a noise reduction module is used to perform neural network noise reduction on the to-be-processed noisy speech acquired by the speech signal acquisition device is combined with the deep learning noise reduction model shown in fig. 4.

The deep learning noise reduction model is used for denoising a noise-carrying voice P to be processed, and comprises: the device comprises an input layer, a plurality of hidden layers and an output layer, wherein each layer comprises a plurality of neurons, each layer of neurons mainly realizes vector-matrix multiplication operation, and the neurons of each layer are connected through certain arithmetic operation.

It should be noted that, the training of the deep learning noise reduction model is usually completed at the server side, and when the model is trained, the speech spectrum and the noise spectrum can be distinguished because the pure speech corresponding to the speech with noise is used as supervision. After sample data is used for training, a noise reduction module directly prestores a trained and qualified deep learning noise reduction model so as to carry out noise reduction processing by using the deep learning noise reduction model; the training process adopts the voice with noise as a learning sample, uses the pure voice corresponding to the voice with noise as a label, trains a deep learning noise reduction model, and obtains a neural network parameter capable of distinguishing the noise from the pure voice.

In the model shown in fig. 4, most of the operations are vector matrix multiply-add operations, which would cause huge operation overhead and response delay if implemented by using a conventional chip. In order to improve user experience, the embodiment of the invention adopts the noise reduction module to realize the vector matrix multiply-add operation scheme, so that the operation overhead and the response delay can be reduced.

Aiming at the deep learning noise reduction model, the working process of the noise reduction module is as follows:

the control module obtains configuration information and finite state machine information, where the configuration information and the finite state machine information include R periods of configuration information and finite state machine information, where the R periods correspond to operations (such as convolution, pooling) of R layer neurons of the neural network, and each period corresponds to an operation of one layer of neurons. The configuration information for each cycle includes: configuration information of the flash memory processing subarray, configuration information of the programmable arithmetic operation unit, configuration information of the demultiplexer 30, configuration information of the multiplexer 50, configuration information of the demultiplexer 80, and the like. The control module divides the flash memory processing array into R flash memory processing sub-arrays according to the configuration information, each flash memory processing sub-array corresponds to one period, namely each flash memory processing sub-array realizes the operation of one layer of the neural network, and then the control module controls the working time sequence of each circuit module according to the information of the finite state machine.

The input interface receives the amplitude spectrum P of the frequency domain signal and transmits the amplitude spectrum P to the filter circuit 20 for filtering;

the control module controls the demultiplexer 30 according to the configuration information of the first period and the finite state machine information, so that the filter circuit is communicated with the flash memory processing subarray 1 corresponding to the first layer of the deep learning noise reduction model, controls the multiplexer 50 at the rear end of the flash memory processing subarray, so that the flash memory processing subarray 1 is communicated with the analog-to-digital conversion module 60, controls the selectors and the one-out-of-two selectors of each programmable arithmetic operation unit of the programmable arithmetic operation module 70, so that after the output result of the flash memory processing subarray 1 is converted into a digital signal, the arithmetic operation 1 corresponding to the first layer of the deep learning noise reduction model is performed, and controls the demultiplexer 80 to output the operation result of the arithmetic operation 1 to the register 100 and further to the input end of the demultiplexer 30 for the operation processing of the second period;

then the control module controls the demultiplexer 30 to output the output of the register to the flash processing subarray 2 corresponding to the second layer of the deep learning noise reduction model, controls the demultiplexer 50 at the rear end of the flash processing array to communicate the flash processing subarray 2 with the analog-to-digital conversion module 60, controls the selectors and the alternative selectors of the programmable arithmetic units of the programmable arithmetic operation module 70 to convert the output result of the flash processing subarray 2 into digital signals, performs the arithmetic operation 2 corresponding to the first layer of the deep learning noise reduction model, controls the demultiplexer 80 to output the operation result of the arithmetic operation 2 to the register 100 and further to the input end of the demultiplexer 30 to perform the operation processing of the third cycle, and so on, to guide the operation processing of the last later stage, and after the post-processing module obtains the operation result of the last cycle, the operation result is output to the output interface 90 by controlling the demultiplexer.

In addition, as can be understood by those skilled in the art, when the configuration information is generated according to the actual application requirement, the configuration information may be implemented according to a preset instruction-architecture correspondence table.

It should be noted that, when the configuration information is generated according to the actual application requirement, the number of flash memory processing sub-arrays to be used and the scale of each flash memory processing sub-array may be known, and at this time, a dividing instruction of the flash memory processing array may be obtained according to the actual application requirement, and then the flash memory processing array may be divided into a plurality of flash memory processing sub-arrays according to the dividing instruction, corresponding to a plurality of matrix multiplication scales.

It can be understood by those skilled in the art that, when the embodiment of the present invention is applied to perform a plurality of cycle operations, the flash memory processing sub-array corresponding to each cycle may be programmed in each cycle, or each flash memory processing sub-array may be uniformly programmed according to a programming instruction before performing each cycle operation.

In an alternative embodiment, referring to fig. 5, the filter circuit 60 may include: the amplifier comprises resistors R1-R4, a capacitor C1, a capacitor C2 and an amplifier A, wherein one end of the resistor R1 is used as an input end and is used for receiving input signals, the other end of the resistor R2 is connected with one end of the capacitor C1, the other end of the capacitor C1 is grounded, the other end of the resistor R2 is connected with one end of the capacitor C2 and the positive input end of the amplifier, the other end of the capacitor C2 is grounded, one end of the resistor R3 is grounded, the other end of the resistor R3 is connected with the negative input end of the amplifier, one end of the resistor R4.

It is worth mentioning that the filter circuit has high precision, and the precision of the analog voice signal can be further improved by adopting the filter circuit, thereby being beneficial to subsequent processing.

In an alternative embodiment, the noise reduction module may further include a reading circuit for reading data stored in each programmable semiconductor device in the flash memory processing array for reference in weight adjustment.

In an alternative embodiment, the flash memory processing array in the noise reduction module includes a plurality of flash memory units arranged in an array, and referring to fig. 6, the flash memory units include: programmable semiconductor device for storing long-term data F₀And an analog capacitor unit T1 for storing temporary data, a programmable semiconductor device F₀Is connected in parallel with the analog capacitance unit T1.

The threshold voltage of the programmable semiconductor device is adjustable, and the programmable semiconductor device can be realized by adopting a floating gate transistor, such as: a SONOS-type floating-gate transistor (SONOS), a Split-gate transistor (Split-gate transistor), or a Charge-transfer floating-gate transistor (Charge-transfer), including but not limited to, all flash memory transistor devices employed in flash memories fall within the scope of the embodiments of the present invention.

The programmable semiconductor device F₀The gate of the word line is connected to the word line WL, the drain is connected to the bit line BL, and the source is connected to the source line SL.

Wherein the programmable semiconductor device F₀An analog capacitor unit T1 is connected in parallel to the programmable semiconductor device F₀In erasing, a voltage value (corresponding to short-time data) that needs to be increased or decreased is applied to the analog capacitor unit, that is, the voltage of the analog capacitor unit is adjusted. The output current of the flash memory unit is the programmable semiconductor device F₀The sum of the output current of the analog capacitor unit T1 and the output current of the analog capacitor unit T1, so that the purpose of adjusting the output current (i.e. the weight) of the flash memory unit can be achieved by adjusting the voltage of the analog capacitor unit, and when the erase count of the analog capacitor unit reaches a preset count (e.g. 10, 50, 100, 300, etc., which is not limited by the embodiment of the present invention) or a preset voltage (e.g. a voltage value in the range of 0.01V to 2V, e.g. 0.05V, 0.1V, 0.5V, 0.8V, 1V, which is not limited by the embodiment of the present invention), the voltage of the analog capacitor unit (corresponding to the stored data) is transferred to the programmable semiconductor device F₀Thereby reducing programmabilitySemiconductor device F₀The number of times of erasing and writing of the programmable semiconductor device F is avoided₀Aging of (2).

In an alternative embodiment, the analog capacitor unit T1 of the flash memory cell includes: output transistor N₀Charging transistor P₀Discharge transistor Q₀And a capacitor C₀。

Wherein the output transistor N₀Drain connected programmable semiconductor device F₀Source electrode of the programmable semiconductor device F₀Source electrode and grid electrode of the capacitor C₀One end of (a);

charging transistor P₀Has a source connected to a high voltage, a gate connected to a first control voltage Set, and a drain connected to a capacitor C₀The other end of (a);

discharge transistor Q₀Has a source connected to a low voltage, a gate connected to a second control voltage Reset, and a drain connected to a capacitor C₀And the other end of the same.

Wherein the charging transistor P₀The method is realized by adopting a PMOS transistor, and the PMOS transistor is conducted when the PMOS transistor is under negative voltage; discharge transistor Q₀This is achieved with an NMOS transistor, which is turned on at positive voltages.

When it is required to increase the weight (i.e., output current) of the flash memory cell, the first control voltage Set and the second control voltage Reset may be Set low by charging the transistor P₀Capacitor C₀Charging the capacitor C₀Is increased, thereby increasing the output transistor N₀Gate voltage of, output transistor N₀Is a function of its gate voltage, thereby increasing the output transistor N₀The output current of the analog capacitor unit T1 is increased, and the weight of the flash memory cell is increased.

When it is required to reduce the weight (i.e., output current) of the flash memory cell, the first control voltage Set and the second control voltage Reset may be Set high by the discharge transistor Q₀Capacitor C₀Discharge to make the capacitor C₀Voltage rise and fall, and thus the output transistor N is lowered₀Gate voltage of, output transistor N₀Is a function of its gate voltage, thereby reducing the output transistor N₀The output current of the analog capacitor unit T1 is reduced, and the weight of the flash memory cell is reduced.

In summary, in the embodiments of the invention, the purpose of adjusting the output current of the flash memory cell can be achieved by adjusting the voltage of the analog capacitor cell T1, and when the erase/write frequency of the analog capacitor cell T1 reaches the preset frequency or the preset voltage, the voltage of the analog capacitor cell T1 (corresponding to the stored data) is transferred to the programmable semiconductor device F₀Thereby reducing the number of programmable semiconductor devices F₀The number of times of erasing and writing of the programmable semiconductor device F is avoided₀Aging of (2).

In an optional embodiment, the noise reduction module includes a programming circuit, and the programming circuit is connected to the flash memory unit and is configured to regulate and control data stored in the flash memory unit to update the parameters of the deep learning noise reduction model. For a specific implementation scheme, reference may be made to the above description, which is not repeated herein.

In some embodiments, the deep-learning noise reduction model may be a perceptron, convolutional neural network, deconvolution network, deep convolutional inverse graph network, generative confrontation network, recurrent neural network, long-short term memory network, Hopfield network, boltzmann machine network, constrained boltzmann machine network, support vector machine, deep belief network, deep autoencoder, or the like. The user can construct a corresponding deep learning noise reduction model based on actual requirements.

In some embodiments, at the server side, after the deep learning model is determined, the deep learning model may be further trained and tested, parameters of the deep learning model are determined, and then the weight of the deep learning model that is qualified in training is written into the flash memory processing array through the programming circuit.

Because the noise reduction module adopts the nonvolatile memory device array to execute operation, the stored data and functions are not lost when the power is off, and the deep learning network parameters do not need to be written repeatedly. However, when the parameters of the deep learning network need to be adjusted, only the adjusted parameter data needs to be programmed into the noise reduction module again, so as to adjust the parameters of the deep learning network.

In an alternative embodiment, referring to fig. 7, the preprocessing module of the integrated intelligent speech noise reduction device comprises: windowing section 1a and fourier transform section 1 b.

The windowing unit 1a is used for windowing the voice with noise to be processed; the fourier transform unit 1b is configured to convert the windowed noisy speech into a frequency domain signal.

It should be noted that the windowing (also referred to as smoothing) unit 1a and the fourier transform unit 1b are implemented by digital circuits, and the windowing may be implemented by band-pass filters, which is to be understood as selecting a period in the time domain.

In an alternative embodiment, referring to fig. 8, the reconstruction module comprises: a power spectrum compensation unit 3a and an inverse fourier transform unit 3 b.

It should be noted that the power spectrum compensation unit 3a and the inverse fourier transform unit 3b are implemented by digital circuits.

The power spectrum compensation unit 3a is used for compensating the amplitude spectrum after noise reduction; and the inverse Fourier transform unit is used for performing inverse Fourier transform on the compensated amplitude spectrum and the phase spectrum of the frequency domain signal to obtain the noise-reduced voice.

Specifically, the main function of the power spectrum compensation is to compensate the voice amplitude spectrum based on the relative size of the voice amplitude spectrum and the noise amplitude spectrum output by the noise reduction module, and the compensation can be realized by adopting the following formula:

wherein the content of the first and second substances,representing a speech magnitude spectrum;representing a noise magnitude spectrum; z is a radical of_t(f) Representing a total power spectrum;

wherein the content of the first and second substances,which is representative of the power spectrum of the speech,which is indicative of the power spectrum of the noise,

the method is used for compensating the amplitude spectrum after noise reduction, and the definition of the voice after noise reduction can be improved.

It should be noted that the noise reduction module may be implemented by using a single chip based on a memory and computation integrated architecture (i.e., the chip architecture and the peripheral circuits thereof shown in fig. 2), that is, the noise reduction module is implemented by using a norflash memory and computation integrated chip, and the preprocessing module and the reconstruction module may be two separate chips or may be integrated in an MCU; still another implementation manner is to integrate the noise reduction module based on the storage and computation integrated structure, the preprocessing module and the reconstruction module on one chip, which is not limited in this embodiment of the present invention.

Fig. 9 shows a block diagram of the configuration of the voice input apparatus in the embodiment of the present invention. As shown in fig. 9, the voice input apparatus includes: a voice collecting device SS1 and a smart voice noise reducing device SS2 based on a computer and connected with the voice collecting device.

The voice acquisition device is used for acquiring voice signals, and the intelligent voice noise reduction device is used for carrying out noise reduction processing on the voice signals.

Specifically, the voice collecting device may be a microphone, a bone conduction sensor, a sound sensor, or the like.

It should be noted that the voice input device can be used as a voice input device for an electronic device or an external electronic system.

In addition, signals can be transmitted between the voice acquisition device and the intelligent voice noise reduction device through a wired network or a wireless network. The network may include a signal transmission network formed by signal transmission lines disposed on a circuit board, a cable network, a wire network, an optical fiber network, a telecommunication network, an intranet, the Internet, a local area network LAN, a wide area network WAN, a wireless local area network WLAN, a metropolitan area network MAN, a public switched telephone network PSTN, a Bluetooth network, a wireless personal area network, a near field communication NFC network, a Global System for Mobile communications (GSM) network, a Code Division Multiple Access (CDMA) network, a Time Division Multiple Access (TDMA) network, a General Packet Radio Service (GPRS) network, an enhanced data Rate for GSM evolution (EDGE) network, a Wideband Code Division Multiple Access (WCDMA) network, a High Speed Downlink Packet Access (HSDPA) network, a Long Term Evolution (LTE) network, a User Datagram Protocol (UDP) network, a Transmission control protocol/Internet protocol/TCP/IP network, a short, Infrared, and the like, or any combination thereof.

Fig. 10 shows a block diagram of an electronic system according to an embodiment of the present invention. As shown in fig. 10, the electronic system may include a computer-based intelligent voice noise reducer SS2 as described above and other electronic modules SS3 that are used by the electronic system itself to implement the functions of the electronic system into which the computer-based intelligent voice noise reducer SS2 is integrated.

It is noted that the electronic system may be a navigator, a vehicle control terminal, a wearable device, a mobile phone, a smart robot, a tablet computer, a notebook computer, a desktop computer, a virtual reality device, an augmented reality device, a game device, a point-of-sale POS device, a factory production line device, and the like or any combination thereof.

It should be noted that the foregoing is provided for illustrative purposes only and is not intended to limit the scope of the present application. Various modifications and adaptations may occur to those skilled in the art, given the benefit of this disclosure. However, variations and modifications may be made without departing from the scope of the present application.

Having thus described the basic concepts, it will be apparent to those of ordinary skill in the art having read this application that the foregoing disclosure is to be construed as illustrative only and is not limiting of the application. Various alterations, improvements, and modifications may be suggested to one skilled in the art, though not expressly stated herein. Such modifications, improvements and adaptations are proposed in the present application and thus fall within the spirit and scope of the exemplary embodiments of the present application.

Also, this application uses specific language to describe embodiments of the application. For example, the terms "an embodiment," "some embodiments," and/or "some embodiments" mean that a particular feature, structure, or characteristic described in connection with the embodiments is included in at least one embodiment of the present application. Therefore, it is emphasized and should be appreciated that two or more references to "an embodiment" or "one embodiment" or "an alternative embodiment" in various places throughout this specification are not necessarily all referring to the same embodiment. Furthermore, some features, structures, or characteristics of one or more embodiments of the present application may be combined as appropriate.

Moreover, those of ordinary skill in the art will understand that aspects of the present application may be illustrated and described in terms of several patentable species or contexts, including any new and useful combination of processes, machines, articles, or materials, or any new and useful modification thereof. Accordingly, various aspects of the present application may be embodied entirely in hardware, entirely in software (including firmware, resident software, micro-code, etc.) or in a combination of hardware and software. The above hardware or software may be referred to as "data block," module, "" engine, "" unit, "" component, "or" system. Furthermore, aspects of the present application may be represented as a computer product, including computer readable program code, embodied in one or more computer readable media.

A computer readable signal medium may comprise a propagated data signal with computer program code embodied therein, for example, on a baseband or as part of a carrier wave. Such a propagated signal may take any of a variety of forms, including electromagnetic, optical, and the like, or any suitable combination thereof. A computer readable signal medium may be any computer readable medium that is not a computer readable storage medium and that can communicate, propagate, or transport a program for use by or in connection with an instruction execution system, apparatus, or device. Program code on a computer readable signal medium may be propagated over any suitable medium, including radio, electrical cable, fiber optic cable, RF, or the like, or any combination of the preceding.

Additionally, the order in which elements and sequences of the processes described herein are processed, the use of alphanumeric characters, or the use of other designations, is not intended to limit the order of the processes and methods described herein, unless explicitly claimed. While various presently contemplated embodiments of the invention have been discussed in the foregoing disclosure by way of example, it is to be understood that such detail is solely for that purpose and that the appended claims are not limited to the disclosed embodiments, but, on the contrary, are intended to cover all modifications and equivalent arrangements that are within the spirit and scope of the embodiments herein. For example, although the system components described above may be implemented by hardware devices, they may also be implemented by software-only solutions, such as installing the described system on an existing server or mobile device.

Similarly, it should be noted that in the preceding description of embodiments of the present application, various features are sometimes grouped together in a single embodiment, figure, or description thereof for the purpose of streamlining the disclosure and aiding in the understanding of one or more of the embodiments. This method of disclosure, however, is not intended to require more features than are expressly recited in the claims. Indeed, the embodiments may be characterized as having less than all of the features of a single embodiment disclosed above.

20页详细技术资料下载

Intelligent voice noise reduction device based on storage and calculation integration, voice input equipment and system

相关技术

网友询问留言