Depression recognition method, system and equipment based on voice analysis

文档序号：1822761 发布日期：2021-11-12 浏览：27次中文

阅读说明：本技术 一种基于语音分析的抑郁症识别方法、系统和设备 (Depression recognition method, system and equipment based on voice analysis ) 是由范青戴伟辉吕娜叶惠玲李踔然李欣于 2021-07-08 设计创作，主要内容包括：本发明提供一种基于语音分析的抑郁症识别方法、系统和设备,通过将语音智能分析技术应用于抑郁症患者,实现在日常生活中采集患者的语音信息,进而获得对抑郁症状变化的实时监控,为抑郁症的识别、复发监测等提供技术支持,以帮助抑郁症患者和医生获得更好的疾病管理,及时调整患者的康复计划,同时大大减少抑郁症患者的疾病评估成本,并且能让患者在日常生活、家庭护理环境中获得准确的症状监测,并作出疾病复发预警,以尽可能减少抑郁症复发带来的负面后果,有利于抑郁症患者的康复。(The invention provides a depression recognition method, a depression recognition system and depression recognition equipment based on voice analysis, wherein voice intelligent analysis technology is applied to depression patients, so that voice information of the patients is collected in daily life, real-time monitoring on depression symptom change is further obtained, technical support is provided for depression recognition, recurrence monitoring and the like, the depression patients and doctors are helped to obtain better disease management, the rehabilitation plans of the patients are adjusted in time, the disease evaluation cost of the depression patients is greatly reduced, the patients can obtain accurate symptom monitoring in daily life and home care environments, disease recurrence early warning is given, negative consequences caused by the recurrence of depression are reduced as far as possible, and the depression patients are benefited to recover.)

1. A depression recognition method based on voice analysis is characterized by comprising the following steps:

dividing the subjects into a healthy group and a confirmed group, and respectively obtaining voice signals and depression degree scores of the subjects, wherein the depression degree scores are set according to a Hamilton depression scale;

preprocessing the voice signals to enable the signals of subsequent voice analysis to be more uniform and smooth;

extracting voice features representing depression states from the preprocessed voice signals, and constructing a voice feature set, wherein the voice features comprise prosodic features and spectral features;

dividing the voice feature set into a training set and a verification set, training a machine learning model based on a decision tree by adopting the training set, and establishing a classification decision tree to identify a depression patient and a healthy subject based on the voice feature;

the model is examined and optimized using a validation set to build an optimal decision tree model.

2. The depression recognition method based on speech analysis as claimed in claim 1, wherein in the speech feature extraction process, the following steps are included:

smoothing the voice signal by adopting a moving average filter;

constructing a voice feature set, and extracting features in the voice signal based on the voice feature set;

calculating a first derivative reflecting the dynamic change of the voice characteristics, and further obtaining long-term characteristics based on the overall change of the voice;

and combining the long-term features into the voice feature set to form a complete voice feature set.

3. The method for identifying depression based on speech analysis according to claim 2, wherein the speech feature set comprises: root mean square signal frame energy, 12 mel-frequency cepstrum coefficients, short-time zero-crossing rate, harmonic signal-to-noise ratio and fundamental frequency.

4. The method of claim 2, wherein the long-term features comprise: maximum, minimum, full range, arithmetic mean, standard deviation, kurtosis, and skewness.

5. The method for identifying depression based on speech analysis as claimed in claim 2, wherein the preprocessing and the acoustic feature extraction are performed in batch by using speech analysis software openSMILE and python algorithms.

6. The method of claim 1, wherein the subject is labeled according to the Hamilton Depression Scale, with a total score of greater than 24 for major depression, greater than 17 for mild to moderate depression, and less than 7 for no symptoms of depression.

7. A system for identifying depression based on speech analysis, comprising:

the voice acquisition module is used for acquiring a voice signal of the subject and inputting a corresponding depression degree score of the subject;

the preprocessing module is used for preprocessing the voice signals to enable the signals of subsequent voice analysis to be more uniform and smooth;

the characteristic extraction module is used for preprocessing the voice signals to enable the signals of subsequent voice analysis to be more uniform and smooth;

the classification learning module is used for dividing the voice characteristic set into a training set and a verification set, training a machine learning model based on a decision tree by adopting the training set, and establishing a classification decision tree to identify a depression patient and a healthy subject based on the voice characteristic;

and the verification module is used for verifying and optimizing the model by using the verification set so as to establish an optimal decision tree model.

8. The system of claim 7, wherein the set of speech characteristics comprises: root mean square signal frame energy, 12 mel-frequency cepstrum coefficients, short-time zero-crossing rate, harmonic signal-to-noise ratio and fundamental frequency.

9. A computer arrangement comprising a memory and a processor, the memory storing a computer program which, when executed by the processor, causes the processor to carry out the steps of the method according to any one of claims 1 to 6.

10. A computer-readable storage medium, characterized in that the computer-readable storage medium stores a computer program comprising program instructions which, when executed by a processor, cause the processor to carry out the steps of the method according to any one of claims 1 to 6.

Technical Field

The invention relates to the technical field of speech recognition of emotion calculation, in particular to a depression recognition method, a depression recognition system and depression recognition equipment based on speech analysis.

Background

Depressive disorder is a group of mood disorders characterized by a depressed mood, loss of interest and lack of energy, accompanied by clinical symptoms such as unresponsiveness, inattention, decreased activity, insomnia, etc. According to the report of the world health organization, the number of people suffering from global depression in 2015 is up to 3.5 hundred million, and the influence range is continuously expanded. According to the latest published research results of Chinese Mental Health Survey (CMHS), the lifetime prevalence rate of depressive disorder reaches 6.8% in China, which is the most common mood disorder. Patients with depressive disorders exhibit high prevalence, high disability rate, high relapse rate, and high suicide rate, and are a prominent mental health problem in today's society.

As with other psychiatric disorders, current assessment of depressive disorders requires a significant expenditure of human and material resources, and relies primarily on intermittent reporting by the individual or their caregiver. These reports are often subjective and the assessment results may be disturbed by factors such as retrospective memory bias (e.g., underestimated or overestimated symptoms), cognitive limitations (e.g., memory of events and environments, causal inferences), and social stigma of the patient, and thus there is an urgent need to objectively diagnose, chronically monitor, and provide evidence-based intervention in patients with mental disorders, including depressive disorders, particularly in those groups that do not have access to traditional mental health services due to geographical, economic, and other factors. Systematic and objective assessment will promote the development of remote and personalized medicine, thereby promoting clinical service in psychiatric medical practice. The voice signals have the characteristics of non-invasion, easiness in obtaining, objectivity and the like, and complex brain and muscle activities are reflected behind the speech behaviors, so that the method has great potential in the aspect of psychological state evaluation. In recent years, the use of speech features for the recognition and prediction of depression has become a hot direction in the field of emotion calculation, combining techniques such as machine learning, artificial intelligence, and the like.

The speech features may reflect the depression state to some extent. Paralinguistic analysis of depressed speech suggests that physiological symptoms associated with depression affect vocal tract characteristics, while changes in cognitive abilities affect measures associated with speech rate. Previous clinical observations and research results also indicate that depression patients do have some unique speech characteristics and can be applied to the identification of depression. However, the present depression voice recognition technology is still in the research stage, and most of the research is abroad, and the depression voice recognition technology is not applied to clinical practice in China. In addition, there is a lack of a technique for using the speech characteristics of depression patients for the early warning of the recurrence of depression patients as an index for monitoring the development of diseases. The voice intelligent analysis technology is applied to the recognition of the depression and the relapse early warning, fills up the technical blank in the field, and brings good news for the treatment and the rehabilitation of the depression patients.

Disclosure of Invention

The invention aims to provide a depression recognition method, a depression recognition system and depression recognition equipment based on voice analysis, so that depression recognition and recurrence early warning are realized.

In order to achieve the above object, an aspect of the present invention provides a depression recognition method based on speech analysis, including the steps of:

preprocessing the voice signals to enable the signals of subsequent voice analysis to be more uniform and smooth;

the model is examined and optimized using a validation set to build an optimal decision tree model.

Further, in the process of extracting the voice feature, the method comprises the following steps:

smoothing the voice signal by adopting a moving average filter;

constructing a voice feature set, and extracting features in the voice signal based on the voice feature set;

calculating a first derivative reflecting the dynamic change of the voice characteristics, and further obtaining long-term characteristics based on the overall change of the voice;

and combining the long-term features into the voice feature set to form a complete voice feature set.

Further, the long-term characteristics include: maximum, minimum, full range, arithmetic mean, standard deviation, kurtosis, and skewness.

Furthermore, the method adopts the speech analysis software openSMILE and python algorithms to carry out preprocessing and acoustic feature extraction in batch.

Further, the method sets the subject's label according to the Hamilton Depression Scale, which scores over 24 for major depression, 17 for mild to moderate depression, and 7 for no depression symptoms.

In another aspect, the present invention also provides a depression recognition system based on speech analysis, including:

the voice acquisition module is used for acquiring a voice signal of the subject and inputting a corresponding depression degree score of the subject;

the preprocessing module is used for preprocessing the voice signals to enable the signals of subsequent voice analysis to be more uniform and smooth;

the characteristic extraction module is used for preprocessing the voice signals to enable the signals of subsequent voice analysis to be more uniform and smooth;

and the verification module is used for verifying and optimizing the model by using the verification set so as to establish an optimal decision tree model.

Further, the speech feature set includes: root mean square signal frame energy, 12 mel-frequency cepstrum coefficients, short-time zero-crossing rate, harmonic signal-to-noise ratio and fundamental frequency.

In another aspect, the invention also provides a computer device comprising a memory and a processor, the memory storing a computer program which, when executed by the processor, causes the processor to perform the steps of the method.

In another aspect, the invention also provides a computer readable storage medium storing a computer program comprising program instructions which, when executed by a processor, cause the processor to perform the steps of the method.

Compared with the prior art, the invention has the following progress and technical effects:

the invention provides a depression recognition method, a depression recognition system and depression recognition equipment based on voice analysis, wherein voice intelligent analysis technology is applied to depression patients, so that voice information of the patients is collected in daily life, real-time monitoring on depression symptom change is further obtained, technical support is provided for depression recognition, recurrence monitoring and the like, the depression patients and doctors are helped to obtain better disease management, the rehabilitation plans of the patients are adjusted in time, the disease evaluation cost of the depression patients is greatly reduced, the patients can obtain accurate symptom monitoring in daily life and home care environments, disease recurrence early warning is given, negative consequences caused by the recurrence of depression are reduced as far as possible, and the depression patients are benefited to recover.

Drawings

In order to more clearly illustrate the embodiments of the present invention or the technical solutions in the prior art, the drawings used in the description of the embodiments or the prior art will be briefly described below, and it is obvious that the drawings in the following description are only some embodiments of the present invention, and for those skilled in the art, other drawings can be obtained according to these drawings without creative efforts.

Fig. 1 is a flow chart of a depression recognition method based on speech analysis according to the present invention.

Fig. 2 is a frame diagram of a depression recognition method system based on speech analysis according to the present invention.

Fig. 3 is a schematic structural diagram of an electronic device according to an embodiment of the present invention.

Detailed Description

The technical solutions in the embodiments of the present invention will be clearly and completely described below with reference to the drawings in the embodiments of the present invention, and it is obvious that the described embodiments are only a part of the embodiments of the present invention, and not all of the embodiments. All other embodiments, which can be derived by a person skilled in the art from the embodiments given herein without making any creative effort, shall fall within the protection scope of the present invention.

In the following description, reference is made to "some embodiments" which describe a subset of all possible embodiments, but it is understood that "some embodiments" may be the same subset or different subsets of all possible embodiments, and may be combined with each other without conflict.

Unless defined otherwise, all technical and scientific terms used herein have the same meaning as commonly understood by one of ordinary skill in the art to which this application belongs. The terminology used herein is for the purpose of describing embodiments of the present application only and is not intended to be limiting of the application.

Fig. 1 is a flow chart of a method for identifying depression based on speech analysis according to the present invention. As shown in fig. 1, a depression recognition method based on speech analysis of the present invention includes the following steps:

and S100, acquiring a voice signal and a depression degree score of the subject.

And S200, preprocessing the voice signal.

And S300, extracting the voice characteristics representing the depression state from the preprocessed voice signals.

And S400, training a machine learning model based on the decision tree.

And S500, checking and optimizing the model by using a verification set.

In step S100, a voice signal of a patient is collected in daily life to monitor a change in depression of the patient in real time.

In one embodiment, the voice capture device is configured to periodically capture the voice signal of the patient as feedback of the efficacy of the patient's rehabilitation program. The acquisition time can be adjusted later according to the rehabilitation plan to match the treatment effect of the patient. The voice acquisition device can be an audio acquisition device, and can also be an electronic device with an audio acquisition function, such as a mobile phone and a tablet computer.

In step S200, voice preprocessing is performed on the voice signal to ensure that the signal of the subsequent voice analysis is more uniform and smooth, and provide high-quality parameters for voice feature extraction, so as to improve the quality of voice analysis.

In step S300, speech features characterizing a depression state are extracted from the preprocessed speech signals, and a speech feature set is constructed, wherein the speech features include prosodic features and spectral features.

Specifically, the method comprises the steps of preprocessing a voice signal and then extracting acoustic features. In one embodiment, the method adopts speech analysis software openSMILE and python algorithms to carry out preprocessing and acoustic feature extraction in batch.

Specifically, in the process of extracting the voice feature, the method further comprises the following steps:

and smoothing the voice signals by adopting a moving average filter. In one embodiment, the speech feature parameters are smoothed by a moving average filter with a window length of 3.

And constructing a voice feature set, and extracting features in the voice signal based on the voice feature set. In one embodiment, the speech feature set comprises 16 acoustic low-level descriptors commonly used in the field of emotion recognition, and it is understood that the acoustic low-level descriptors are defined as speech feature parameters calculated from a short time frame level of an audio signal at a certain point in time.

The 16 acoustic low-level descriptors include: root-mean-square signal frame energy (Root-mean-square signal frame energy), 12 mel-frequency cepstral coefficients (Mfcc1-12), short-time zero-crossing rate (zcr), harmonic signal-to-noise ratio (SNR), and fundamental frequency. Wherein the fundamental frequency is calculated using a cepstrum method.

And calculating a first derivative reflecting the dynamic change of the voice characteristics, and further obtaining long-term characteristics based on the overall change of the voice.

In one embodiment, the long-term features include: maximum, minimum, full range, arithmetic mean, standard deviation, kurtosis, skewness and other statistical indexes.

And finally, the long-term features are combined into a voice feature set, 384 voice features are extracted in total, and a complete voice feature set is formed.

In step S400, the method divides the set of speech features into a training set and a verification set, trains a decision tree-based machine learning model using the training set, and builds a classification decision tree to identify the depressive disorder patient and the healthy subject based on the speech features.

In step S500, the method uses the validation set to verify and optimize the model to build an optimal decision tree model.

Fig. 2 is a frame diagram of a depression recognition method system based on speech analysis according to the present invention. As shown in fig. 2, the depression recognition method system based on speech analysis of the present invention includes:

the voice acquisition module 1 is used for acquiring voice signals of the subjects and inputting corresponding depression degree scores of the subjects.

The preprocessing module 2 is used for preprocessing the voice signals to make the signals of the subsequent voice analysis more uniform and smooth.

The feature extraction module 3 is used for preprocessing the voice signals to make the signals of the subsequent voice analysis more uniform and smooth.

And the classification learning module 4 is used for dividing the voice feature set into a training set and a verification set, training a machine learning model based on a decision tree by adopting the training set, and establishing a classification decision tree to identify the tristimania patient and the healthy subject based on the voice features.

And the verification module 5 is used for verifying and optimizing the model by using a verification set so as to establish an optimal decision tree model.

Specifically, the speech feature set includes 16 acoustic low-level descriptors commonly used in the field of emotion recognition, and the 16 acoustic low-level descriptors include: root-mean-square signal frame energy (Root-mean-square signal frame energy), 12 mel-frequency cepstral coefficients (Mfcc1-12), short-time zero-crossing rate (zcr), harmonic signal-to-noise ratio (SNR), and fundamental frequency. Wherein the fundamental frequency is calculated using a cepstrum method.

Specifically, the long-term features include: maximum, minimum, full range, arithmetic mean, standard deviation, kurtosis, skewness and other statistical indexes.

Fig. 3 is a schematic structural diagram of a computer device according to an embodiment of the present invention. As shown in fig. 3, an electronic device of one embodiment of the invention includes one or more input devices 1000, one or more output devices 1000, one or more processors 3000, and memory 4000.

In one embodiment of the invention, the processor 1000, the input device 2000, the output device 3000, and the memory 4000 may be connected by a bus or other means. The input device 2000, the output device 3000 may be a standard wired or wireless communication interface.

The Processor 1000 may be a Central Processing Unit (CPU), and may be other general purpose Processor, a Digital Signal Processor (DSP), an Application Specific Integrated Circuit (ASIC), an off-the-shelf Programmable Gate Array (FPGA) or other Programmable logic device, discrete Gate or transistor logic, discrete hardware components, etc. A general purpose processor may be a microprocessor or the processor may be any conventional processor or the like.

Memory 4000 may be a high speed RAM memory or a non-volatile memory such as a disk memory. The memory 4000 is used to store a set of computer programs, and the input device 2000, the output device 3000, and the processor 1000 may call the program codes stored in the memory 4000.

The memory 4000 stores a computer program comprising program instructions that, when executed by the processor, cause the processor to perform the steps of the patent value assessment method as described in the above embodiments.

An embodiment of the present invention also provides a computer-readable storage medium. The computer readable storage medium may be a high speed RAM memory or a non-volatile memory such as a disk memory. The computer-readable storage medium may be connected through an external computing device or a network to read a set of computer programs stored in the computer-readable storage medium. The computer program stored by the computer readable storage medium comprises program instructions which, when executed by a processor, cause the processor to perform the steps of the method as described above in the embodiments above.

Although embodiments of the present invention have been shown and described above, it is understood that the above embodiments are exemplary and should not be construed as limiting the present invention, and that variations, modifications, substitutions and alterations can be made to the above embodiments by those of ordinary skill in the art within the scope of the present invention. Any modification, equivalent replacement, or improvement made within the spirit and principle of the present invention should be included in the protection scope of the present invention.

10页详细技术资料下载

上一篇：一种医用注射器针头装配设备

下一篇：一种基于EEMD的疲劳脑电特征提取方法

Depression recognition method, system and equipment based on voice analysis

相关技术

网友询问留言