Method for identifying coughing and sneezing aiming at real-time voice flow

文档序号:1364317 发布日期:2020-08-11 浏览:40次 中文

阅读说明:本技术 针对实时语音流的咳嗽及打喷嚏识别方法 (Method for identifying coughing and sneezing aiming at real-time voice flow ) 是由 孙宝石 于 2020-03-24 设计创作,主要内容包括:本发明公开了一种针对实时语音流的双域特征化及队列加速的咳嗽及打喷嚏识别方法。本发明一种针对实时语音流的双域特征化及队列加速的咳嗽及打喷嚏识别方法,包括:连续采集语音信号,将采集的语音数据进行分帧;对语音数据帧进行端点检测,以定位候选目标语音的起点帧;端点检测采用三门限法;同时采用时域特征和频域特征,并且针对咳嗽和打喷嚏的特点选取特征值;特征向量队列加速处理;三种工作模式灵活切换等,并形成一整套运行和持续优化流程。本发明的有益效果:1.识别率高:本方法包括多处关键创新点,并且针对咳嗽和打喷嚏进行了特别处理,比现有其他方法识别率明显提升。(The invention discloses a method for identifying coughing and sneezing aiming at double-domain characterization and queue acceleration of real-time voice flow. The invention relates to a method for identifying coughs and sneezes aiming at double-domain characterization and queue acceleration of real-time voice streams, which comprises the following steps: continuously acquiring voice signals, and framing the acquired voice data; carrying out end point detection on the voice data frame to locate a starting point frame of the candidate target voice; the end point detection adopts a three-threshold method; simultaneously adopting time domain characteristics and frequency domain characteristics, and selecting characteristic values aiming at the characteristics of cough and sneeze; accelerating the processing of the feature vector queue; and the three working modes are flexibly switched, and the like, and a whole set of operation and continuous optimization process is formed. The invention has the beneficial effects that: 1. the recognition rate is high: the method comprises a plurality of key innovation points, and the identification rate is obviously improved compared with other existing methods by specially treating cough and sneeze.)

1. A method for cough and sneeze recognition for two-domain characterization and queue acceleration of real-time voice streams, comprising:

continuously acquiring voice signals, and framing the acquired voice data;

carrying out end point detection on the voice data frame to locate a starting point frame of the candidate target voice; the end point detection adopts a three-threshold method, namely:

(1) the average energy of the frame samples is greater than a threshold of 1, and

(2) the frame sample zero crossing rate (percentage of frame sample values greater than zero) is greater than a threshold of 2, and

(3) if the average energy difference (absolute value of the average energy difference between two frames) between the Frame and the previous Frame is greater than the threshold, if the Frame meets the three thresholds, the Frame Mute Flag (FMF) of the Frame is marked as 1 (true), otherwise, the FMF of the Frame is marked as 0 (false);

update "Recognition Activated Flag (RAF)", with RAF initial value of 0 (not Activated): RAF ═ RAF | FMF

Judging the RAF, if the RAF is false and indicates that the recognition process is not activated, directly discarding the current frame and skipping to start to continue voice sampling;

if the RAF is true and indicates that the identification process is activated, performing characterization processing on the current frame to obtain a group of frame feature vectors with 20 feature values;

adding the frame feature vector to the tail of the feature vector queue;

if the length of the feature vector queue reaches the recognizable length (RecoLen), substituting the feature vector queue (RecoLen 20 feature matrix) into a machine learning model which is trained in advance for recognition; otherwise, continuing voice sampling;

the recognizable length RecoLen is one dimension of a two-dimensional input sample of the machine learning model and represents how many data frames one input sample comprises; RecoLen takes on values between 20 frames and 32 frames, corresponding to about 1.25-2 seconds of speech data, which is essentially the time window for just one cough or sneeze;

if the Confidence (CL) of the recognition result exceeds a recognition threshold value set by a system, considering that one cough or sneeze is effectively detected, counting, outputting the recognition result, emptying a feature vector queue and setting RAF to be 0; then jumping to the beginning and starting a new identification process;

if the Confidence (CL) of the recognition result does not exceed the recognition threshold set by the system, the cough or sneeze cannot be detected, but the acceleration processing of the feature vector queue is required according to the specific value of the CL;

after the acceleration process is completed, a new recognition process is started.

2. The method according to claim 1, wherein the whole process flow is "operation mode" of the method, and there are three operation modes of "training mode" and "collection mode", and the operation modes are controlled by system parameters;

if the device works in a training mode, the frame feature vectors need to be reported to a server or a cloud platform while being enqueued;

if the system works in the acquisition mode, the framed voice data needs to be uploaded to a server or a cloud platform.

3. The method of claim 1, wherein threshold 1 is for absolute silence filtering, threshold 2 is for relative silence filtering, and threshold 3 is for abrupt energy cough and sneeze detection to filter smoother normal speech.

4. The method of claim 1, wherein the performing feature vector queue acceleration processing comprises:

(1) acceleration 1: removing the top (100% -CL) number of frames from the feature vector queue, for example, assuming that RecoLen is 20, if the CL obtained by the identification is 60%, the top 40% of frames, i.e. 8 frames, need to be removed from the queue;

(2) acceleration 2: finding the first frame with FMF being 1 (true) in the residual frames in the feature vector queue, and discarding all the frames in front of the first frame; if no frame with FMF 1 (true) is found, the feature vector queue is emptied and RAF is set to 0.

5. The method for real-time speech flow two-domain characterization and queue acceleration cough and sneeze recognition according to claim 1, wherein the cough and sneeze machine learning method training process comprises:

the training process is divided into off-line training and on-line training, and can be used independently or cooperatively;

the off-line training can acquire voice data from the outside, and can also set the running mode of the recognition device as an acquisition mode to acquire original voice data;

preprocessing the voice data, dividing the voice data into segments with the length equal to the length of RecoLen frames, wherein the preprocessing can be completed manually or special voice file processing software can be used;

classifying and labeling voice files, comprising: coughing, sneezing, and others;

extracting a feature vector queue of each voice segment by using the framing and the characterization method shown in the identification process, if the length is less than RecoLen, filling the feature vector queue with zero vectors, and if the length exceeds RecoLen, cutting off the feature vector queue;

on a server or a cloud platform, carrying characteristic values and labels into a model in batches for training and verification;

leading the satisfactorily trained model into a recognition device, and updating the recognition model;

when online learning is carried out, the operation mode of the recognition device is set as a training mode so as to directly obtain the feature vector of the voice data frame;

the feature vectors are uploaded to a server or a cloud platform on line;

the server or the cloud platform takes each RecoLen feature vectors as a training sample;

if the recognition result is received, restarting a new training sample, and if the length of the previous sample is less than RecoLen, filling the sample with a zero vector;

simultaneously, the manual on-line labeling of the sample comprises: coughing, sneezing, and others;

using a newly obtained training sample, and adopting a transfer learning method to perform incremental optimization of the existing model;

the optimized model can be compared with the existing model to identify results so as to evaluate the optimization effect;

and leading the satisfactorily trained model into the MCU recognition device, and updating the recognition model.

6. The method of claim 1, wherein the speech data frame characterization process comprises

Respectively performing time domain characterization and frequency domain characterization on an input voice data frame;

time domain characterization: from the characteristics of the instantaneous amplitude changes of the cough and sneeze sounds, three characteristic values were calculated, including:

(1) the sampling fluctuation value of the frame is the maximum sampling value-the minimum sampling value;

(2) the energy difference between the current frame and the previous frame is abs (the average of the samples of the current frame-the average of the samples of the previous frame), note: abs is an absolute value function;

(3) the energy variance of the frame fragment represents the energy fluctuation in the frame;

frequency domain characterization, comprising two parts; the first part is a Mel-scale Frequency Cepstral Coefficient (MFCC) which is universal for voice signal Frequency domain analysis and mainly comprises three parts of Fast Fourier Transform (FFT), a Mel Frequency filter bank and Discrete Cosine Transform (DCT);

the second part of the frequency domain characterization is to take 16 eigenvalues of the first part, calculate the energy variance of the frequency band by using a standard deviation formula, and then obtain one eigenvalue.

7. The method for cough and sneeze recognition for two-domain characterization and cohort acceleration of real-time voice streams of claim 1, wherein the "machine learning model" includes but is not limited to two-dimensional convolutional neural network (2D CNN), long-short memory network (LSTM), Random Forest (RF).

8. A computer device comprising a memory, a processor and a computer program stored on the memory and executable on the processor, characterized in that the steps of the method of any of claims 1 to 7 are implemented when the program is executed by the processor.

9. A computer-readable storage medium, on which a computer program is stored which, when being executed by a processor, carries out the steps of the method according to any one of claims 1 to 7.

10. A processor, characterized in that the processor is configured to run a program, wherein the program when running performs the method of any of claims 1 to 7.

Technical Field

The invention relates to the field of cough and sneeze recognition, in particular to a method for recognizing cough and sneeze by aiming at two-domain characterization and queue acceleration of real-time voice flow.

Background

Coughing and sneezing are common symptoms of respiratory diseases, in public places such as: the cough and sneeze of unspecified people can be automatically detected in occasions such as classrooms, offices, meeting rooms, restaurants and the like, so that the risk of the disease sources can be found in time, and the prevention and control can be effectively carried out.

The prior art for this problem can be said to be a special application of the general speech recognition technology: basically, the Frequency domain features of the audio signal are extracted by methods such as Short Time Fourier Transform (STFT), Mel-Frequency Cepstral Coefficient (MFCC), and the like, and then feature matching is performed by methods such as pattern recognition or machine learning.

While cough and sneeze detection products meeting the above scenario (unspecified people in public places) are hardly available on the market. Products that can be found include: medical contact personal cough detector, cough detect cell-phone APP (to individual).

Coughing and sneezing are common symptoms of respiratory diseases, in public places such as: the cough and sneeze of people can be automatically detected in occasions such as classrooms, offices, meeting rooms, restaurants and the like, so that the risk of the disease sources can be found in time, and the prevention and control can be effectively carried out.

Aiming at the above-mentioned supposed scene, the conventional technology has the following technical problems:

1. the method has lower precision: in the prior art, Frequency domain features of an audio signal are extracted by methods such as Short Time Fourier Transform (STFT) and Mel-Frequency cepstrum Coefficient (MFCC), and then feature matching is performed by methods such as pattern recognition or machine learning. The existing method lacks special consideration on 'cough and sneeze' and consideration on the environment of public places and unspecified people, and in practical application, the accuracy and the robustness of the method are not high.

2. The practicability is poor: the existing methods, especially those described in academic articles, are basically operated in relatively ideal experimental environments, and are optimized to the greatest extent only for individual indexes. But the complex environment and large-scale deployment of the application are not comprehensively considered, so that the related method is difficult to fall on the ground.

3. The resource occupies much, and the off-line identification is difficult: the existing method has more invalid operations, large redundancy of characteristic data, higher requirements on calculation and storage resources and difficulty in independent completion on a conventional single chip microcomputer, so that off-line identification is difficult to realize, and the application range is greatly limited.

Disclosure of Invention

The technical problem to be solved by the invention is to provide a method for identifying coughing and sneezing aiming at the double-domain characterization and queue acceleration of real-time voice flow, which comprises the following steps: three-threshold end detection, time domain + frequency domain two-domain characterization (optimizing feature vector and compressing dimension), feature vector queue acceleration, flexible switching of three working modes and the like, and a whole set of operation and continuous optimization process is formed. The method has the advantages of high recognition efficiency, high accuracy, less occupied resources, good robustness, large-scale deployment and the like.

In order to solve the above technical problems, the present invention provides a method for identifying coughing and sneezing with respect to two-domain characterization and queue acceleration of real-time voice streams, comprising:

continuously acquiring voice signals, and framing the acquired voice data;

carrying out end point detection on the voice data frame to locate a starting point frame of the candidate target voice; the end point detection adopts a three-threshold method, namely:

(1) the average energy of the frame samples is greater than a threshold of 1, and

(2) the frame sample zero crossing rate (percentage of frame sample values greater than zero) is greater than a threshold of 2, and

(3) the average energy difference between the frame and the previous frame (the absolute value of the average energy difference between the two frames) is greater than the threshold

If the Frame meets the three thresholds, the Frame Mute Flag (FMF) of the Frame is marked as 1 (true), otherwise, the FMF of the Frame is marked as 0 (false);

update "Recognition Activated Flag (RAF)", with RAF initial value of 0 (not Activated): RAF ═ RAF | FMF

Judging the RAF, if the RAF is false and indicates that the recognition process is not activated, directly discarding the current frame and skipping to start to continue voice sampling;

if the RAF is true and indicates that the identification process is activated, performing characterization processing on the current frame to obtain a group of frame feature vectors with 20 feature values;

adding the frame feature vector to the tail of the feature vector queue;

if the length of the feature vector queue reaches the recognizable length (RecoLen), substituting the feature vector queue (RecoLen 20 feature matrix) into a machine learning model which is trained in advance for recognition; otherwise, continuing voice sampling;

the recognizable length RecoLen is one dimension of a two-dimensional input sample of the machine learning model and represents how many data frames one input sample comprises; RecoLen takes on values between 20 frames and 32 frames, corresponding to about 1.25-2 seconds of speech data, which is essentially the time window for just one cough or sneeze;

if the Confidence (CL) of the recognition result exceeds a recognition threshold value set by a system, considering that one cough or sneeze is effectively detected, counting, outputting the recognition result, emptying a feature vector queue and setting RAF to be 0; then jumping to the beginning and starting a new identification process;

if the Confidence (CL) of the recognition result does not exceed the recognition threshold set by the system, the cough or sneeze cannot be detected, but the acceleration processing of the feature vector queue is required according to the specific value of the CL;

after the acceleration process is completed, a new recognition process is started.

In one embodiment, the whole processing flow is an operation mode of the method, and in addition, three working modes including a training mode and an acquisition mode are provided, and the working modes are controlled by system parameters;

if the device works in a training mode, the frame feature vectors need to be reported to a server or a cloud platform while being enqueued;

if the system works in the acquisition mode, the framed voice data needs to be uploaded to a server or a cloud platform.

In one embodiment, threshold 1 is for absolute silence filtering, threshold 2 is for relative silence filtering, and threshold 3 is for abrupt energy cough and sneeze features to filter out smoother normal speech.

In one embodiment, the performing accelerated processing on the feature vector queue specifically includes:

(1) acceleration 1: removing the top (100% -CL) number of frames from the feature vector queue, for example, assuming that RecoLen is 20, if the CL obtained by the identification is 60%, the top 40% of frames, i.e. 8 frames, need to be removed from the queue;

(2) acceleration 2: finding the first frame with FMF being 1 (true) in the residual frames in the feature vector queue, and discarding all the frames in front of the first frame; if no frame with FMF 1 (true) is found, the feature vector queue is emptied and RAF is set to 0.

In one embodiment, the cough and sneeze machine learning method training process comprises:

the training process is divided into off-line training and on-line training, and can be used independently or cooperatively;

the off-line training can acquire voice data from the outside, and can also set the running mode of the recognition device as an acquisition mode to acquire original voice data;

preprocessing the voice data, dividing the voice data into segments with the length equal to the length of RecoLen frames, wherein the preprocessing can be completed manually or special voice file processing software can be used;

classifying and labeling voice files, comprising: coughing, sneezing, and others;

extracting a feature vector queue of each voice segment by using the framing and the characterization method shown in the identification process, if the length is less than RecoLen, filling the feature vector queue with zero vectors, and if the length exceeds RecoLen, cutting off the feature vector queue;

on a server or a cloud platform, carrying characteristic values and labels into a model in batches for training and verification;

leading the satisfactorily trained model into a recognition device, and updating the recognition model;

when online learning is carried out, the operation mode of the recognition device is set as a training mode so as to directly obtain the feature vector of the voice data frame;

the feature vectors are uploaded to a server or a cloud platform on line;

the server or the cloud platform takes each RecoLen feature vectors as a training sample;

if the recognition result is received, restarting a new training sample, and if the length of the previous sample is less than RecoLen, filling the sample with a zero vector;

simultaneously, the manual on-line labeling of the sample comprises: coughing, sneezing, and others;

using a newly obtained training sample, and adopting a transfer learning method to perform incremental optimization of the existing model;

the optimized model can be compared with the existing model to identify results so as to evaluate the optimization effect;

and leading the satisfactorily trained model into the MCU recognition device, and updating the recognition model.

In one embodiment, the speech data frame characterization process comprises

Respectively performing time domain characterization and frequency domain characterization on an input voice data frame;

time domain characterization: from the characteristics of the instantaneous amplitude changes of the cough and sneeze sounds, three characteristic values were calculated, including:

(1) the sampling fluctuation value of the frame is the maximum sampling value-the minimum sampling value;

(2) the energy difference between the current frame and the previous frame is abs (the average of the samples of the current frame-the average of the samples of the previous frame), note: abs is an absolute value function;

(3) the energy variance of the frame fragment represents the energy fluctuation in the frame;

frequency domain characterization, comprising two parts; the first part is a Mel-scale Frequency Cepstral Coefficient (MFCC) which is universal for voice signal Frequency domain analysis and mainly comprises three parts of Fast Fourier Transform (FFT), a Mel Frequency filter bank and Discrete Cosine Transform (DCT);

the second part of the frequency domain characterization is to take 16 eigenvalues of the first part, calculate the energy variance of the frequency band by using a standard deviation formula, and then obtain one eigenvalue.

In one embodiment, the "machine learning model" includes, but is not limited to, two-dimensional convolutional neural network (2D CNN), long-and-short memory network (LSTM), Random Forest (RF).

Based on the same inventive concept, the present application also provides a computer device comprising a memory, a processor and a computer program stored on the memory and executable on the processor, the processor implementing the steps of any of the methods when executing the program.

Based on the same inventive concept, the present application also provides a computer-readable storage medium, on which a computer program is stored, which, when being executed by a processor, carries out the steps of any of the methods.

Based on the same inventive concept, the present application further provides a processor for executing a program, wherein the program executes to perform any one of the methods.

The invention has the beneficial effects that:

1. the recognition rate is high: the method comprises a plurality of key innovation points, and the cough and sneeze are specially treated, so that the recognition rate is obviously improved compared with that of the existing other methods;

2. the efficiency is high: invalid processing is greatly reduced, the feature vector dimension is greatly compressed, and the operating efficiency is obviously improved compared with that of other methods in the prior art by adopting a feature vector queue acceleration processing technology;

3. the method provided by the invention has the advantages of simple structure, high processing efficiency, capability of independently operating in a general single chip Microcomputer (MCU), and dual advantages of functions and cost.

Drawings

FIG. 1 is a flow chart of the method of the present invention for the two-domain characterization of real-time voice streams and the identification of coughing and sneezing with queue acceleration.

FIG. 2 is a flow chart of a speech data frame characterization process for a method of cough and sneeze recognition for two-domain characterization and queue acceleration of real-time speech streams according to the present invention.

FIG. 3 is a diagram of the training process of the cough and sneeze machine learning algorithm of the present invention for a method of cough and sneeze recognition for two-domain characterization and queue acceleration of real-time voice streams.

Detailed Description

The present invention is further described below in conjunction with the following figures and specific examples so that those skilled in the art may better understand the present invention and practice it, but the examples are not intended to limit the present invention.

The invention discloses a method for efficiently and automatically identifying cough and sneeze segments in real-time voice streams, which can be operated on a main stream 32-bit single chip Microcomputer (MCU), and comprises the following steps: STM32, the flow is shown in FIG. 1, and the concrete steps include:

the microphone (both digital and analog) continuously collects voice signals and inputs the voice signals into the MCU port, and the port includes but is not limited to analog-to-digital conversion (A/D), Serial Peripheral Interface (SPI), I2C bus interface and the like. The sampling frequency is 16K per second, 16 bit samples.

The MCU frames the collected voice data, each frame is 125 milliseconds, the rear frame is overlapped with the front frame by 50 percent, and the rear frame corresponds to 2048 sampling values of each frame and is skipped by 1024 sampling values each time.

End point detection is performed on frames of speech data in order to skip silent frames and determinable non-target speech (coughing and sneezing) frames to locate an onset frame of candidate target speech. The end point detection adopts a three-threshold method, namely:

(1) the average energy of the frame samples is greater than a threshold of 1, and

(2) the frame sample zero crossing rate (percentage of frame sample values greater than zero) is greater than a threshold of 2, and

(3) the average energy difference between the frame and the previous frame (the absolute value of the average energy difference between the two frames) is greater than the threshold

Note: threshold 1 is for absolute silence filtering, threshold 2 is for relative silence filtering, and threshold 3 is for the characteristics of sudden energy change of coughing and sneezing, filters out smoother normal voices, such as: background noise, speech, music, etc.

If the Frame meets the three thresholds, the Frame Mute Flag (FMF) of the Frame is marked as 1 (true), otherwise, the FMF of the Frame is marked as 0 (false).

Update "Recognition Activated Flag (RAF)", with RAF initial value of 0 (not Activated): RAF ═ RAF | FMF

Judging the RAF, if the RAF is false and indicates that the recognition process is not activated, directly discarding the current frame and skipping to start to continue voice sampling;

if RAF is true, indicating that the recognition process is activated, the current frame is characterized (even if the FMF of the frame is 0), resulting in a set of 20 feature vectors, and the characterization process flow is described with reference to fig. 2 and related description.

Adding the frame feature vector to the tail of the feature vector queue;

if the length of the feature vector queue reaches the recognizable length (RecoLen), substituting the feature vector queue (RecoLen 20 feature matrix) into a machine learning model which is trained in advance for recognition; otherwise, skipping to the beginning to continue speech sampling.

Note: the "machine learning model" includes, but is not limited to, a two-dimensional convolutional neural network (2D CNN), a long-and-short memory network (LSTM), a Random Forest (RF), etc., and its training process is described with reference to fig. 3 and related descriptions.

The recognizable length RecoLen is a dimension of a two-dimensional input sample of the machine learning model, representing how many data frames a piece of input sample includes. RecoLen takes values between 20 frames and 32 frames, corresponding to about 1.25-2 seconds of speech data, which is essentially the time window for just one cough or sneeze.

If the Confidence (CL) of the recognition result exceeds the recognition threshold set by the system, it is considered that a cough or sneeze is effectively detected, counting is performed, the recognition result is output, the feature vector queue is emptied, and RAF is set to 0. Then jump to the beginning and start a new identification process.

Note: the "output identification result" includes but is not limited to sending a message, driving an indicator light, an alarm device, a display or other peripheral devices.

If the Confidence Level (CL) of the recognition result does not exceed the recognition threshold set by the system, it is considered that the cough or sneeze cannot be detected effectively, but the feature vector queue acceleration processing needs to be performed according to the specific value of CL. The method specifically comprises the following steps:

(1) acceleration 1: removing the top (100% -CL) number of frames from the feature vector queue, for example, assuming that RecoLen is 20, if the CL obtained by the identification is 60%, the top 40% of frames, i.e. 8 frames, need to be removed from the queue;

(2) acceleration 2: the first frame with FMF 1 (true) is found among the remaining frames in the feature vector queue, and all frames before it are discarded. If no frame with FMF 1 (true) is found, the feature vector queue is emptied and RAF is set to 0.

And after the acceleration processing is finished, jumping to the beginning, and starting a new identification process.

The whole processing flow is the 'running mode' of the method, and in addition, three working modes including a 'training mode' and an 'acquisition mode' exist, the working modes are controlled by system parameters, and specific purposes and characteristics are shown in table 1.

If the device works in a training mode, the frame feature vectors need to be reported to a server or a cloud platform while being enqueued;

if the system works in the acquisition mode, the framed voice data needs to be uploaded to a server or a cloud platform.

Table 1: comparison of three working modes of the method

The speech data frame characterization processing flow comprises

A frame of input speech data (125 ms, 2048 sample values) is time-domain characterized and frequency-domain characterized, respectively.

Time domain characterization: from the characteristics of the instantaneous amplitude changes of the cough and sneeze sounds, three characteristic values were calculated, including:

(1) sampling fluctuation value of frame is maximum sampling value-minimum sampling value

(2) The energy difference between the current frame and the previous frame is abs (the average of the samples of the current frame-the average of the samples of the previous frame), note: abs being a function of absolute value

(3) The energy variance of a frame slice represents the energy fluctuation within the frame. The specific method is to divide 2048 sampling values of the frame into a plurality of pieces (4 to 10 pieces) on average, and calculate the variance by using a standard deviation formula.

Frequency domain characterization, comprising two parts. The first part is a Mel-scale Frequency Cepstral Coefficient (MFCC) which is generally used for voice signal Frequency domain analysis and mainly comprises three parts, namely Fast Fourier Transform (FFT), a Mel Frequency filter bank and Discrete Cosine Transform (DCT). Because it is a standard method, it will not be described here in a repeated manner. Specifically, a 16-band mel-frequency filter bank is adopted, so that 16 characteristic values can be obtained;

the second part of the frequency domain characterization is to take 16 eigenvalues of the first part, calculate the energy variance of the frequency band by using a standard deviation formula, and then obtain one eigenvalue.

The cough and sneeze machine learning method training process comprises the following steps:

the training process is divided into off-line training and on-line training, and can be used independently or cooperatively;

the off-line training can acquire voice data from the outside, and can also set the running mode of the MCU recognition device as an acquisition mode to acquire original voice data;

preprocessing the voice data, dividing the voice data into segments with the length equal to the length of RecoLen frames, wherein the preprocessing can be completed manually or special voice file processing software can be used;

classifying and labeling voice files, comprising: coughing, sneezing, and others;

extracting a feature vector queue of each voice segment by using framing shown in the identification flow of FIG. 1 and a characterization method shown in FIG. 2, and if the length is less than RecoLen, filling up with zero vectors, and if the length exceeds RecoLen, truncating;

on a server or a cloud platform, carrying characteristic values and labels into a model in batches for training and verification;

and leading the satisfactorily trained model into the MCU recognition device, and updating the recognition model.

When online learning is carried out, the running mode of the MCU recognition device is set as a training mode so as to directly obtain the feature vector of the voice data frame;

the feature vectors are uploaded to a server or a cloud platform on line;

the server or the cloud platform takes each RecoLen feature vectors as a training sample;

if the MCU identification result is received, restarting a new training sample, and if the length of the previous sample is less than RecoLen, filling the sample with a zero vector;

simultaneously, the manual on-line labeling of the sample comprises: coughing, sneezing, and others;

and performing incremental optimization on the existing model by using a newly obtained training sample and adopting a transfer learning method, wherein the transfer learning is a public method and a tool in the field of machine learning and is not repeated here.

The optimized model can be compared with the existing model to identify results so as to evaluate the optimization effect;

and leading the satisfactorily trained model into the MCU recognition device, and updating the recognition model.

An application scenario of the present invention is given below:

15页详细技术资料下载
上一篇:一种医用注射器针头装配设备
下一篇:磁盘装置以及读处理方法

网友询问留言

已有0条留言

还没有人留言评论。精彩留言会获得点赞!

精彩留言,会给你点赞!