Auxiliary rescue communication method and device based on unvoiced instruction recognition of facial surface muscle signals

文档序号:1506828 发布日期:2020-02-07 浏览:27次 中文

阅读说明:本技术 一种基于面部表面肌肉信号的不发声指令识别的辅助救援通讯方法及装置 (Auxiliary rescue communication method and device based on unvoiced instruction recognition of facial surface muscle signals ) 是由 杨梦� 于 2019-11-18 设计创作,主要内容包括:本发明公开了一种基于面部表面肌肉信号的不发声指令识别的辅助救援通讯方法及装置,该方法首先通过预先贴肤在使用者面部皮肤上的测量电极实时采集相应位置的肌肉电信号;由数据处理装置对所采集的肌肉电信号进行预处理、特征提取、分类识别操作,识别出对应的指令词;再由语言处理装置将识别出的指令词转化为人工语音;然后通过射频装置将所述人工语音发送至操作人员的耳机,并通过无线方式将语音文本发送至指挥中心。上述方法和装置避免了传统语音识别结果受环境背景噪音干扰的固有问题,适应于高噪音背景或无法收音的应用场景。(The invention discloses an auxiliary rescue communication method and device based on the non-sounding instruction identification of facial surface muscle signals, which comprises the steps of firstly, collecting muscle electric signals at corresponding positions in real time through a measuring electrode which is attached to the facial skin of a user in advance; the data processing device carries out preprocessing, feature extraction and classification identification operations on the collected muscle electric signals, and identifies corresponding instruction words; then the language processing device converts the recognized instruction words into artificial voice; and then the artificial voice is sent to an earphone of an operator through a radio frequency device, and the voice text is sent to a command center in a wireless mode. The method and the device avoid the inherent problem that the traditional voice recognition result is interfered by the environmental background noise, and are suitable for the application scene with high noise background or incapable of receiving sound.)

1. An auxiliary rescue communication method based on unvoiced instruction recognition of facial surface muscle signals, the method comprising:

step 1, firstly, acquiring muscle electric signals of corresponding positions in real time through a measuring electrode which is attached to the facial skin of a user in advance;

step 2, carrying out preprocessing, feature extraction and classification identification operations on the collected muscle electric signals by a data processing device to identify corresponding instruction words;

step 3, the recognized instruction words are converted into artificial voices by the language processing device;

and 4, sending the artificial voice to an earphone of an operator through a radio frequency device, and sending the voice text to a command center in a wireless mode.

2. The method according to claim 1, wherein in step 1, the measuring electrodes comprise five paths, and the measuring electrodes of the five paths are respectively attached to the skin of the face of the user at the set positions.

3. The method according to claim 1, wherein in step 2, the identified instruction words are preset according to the use environment and the requirement, and the instruction words are single words or two-word phrases, specifically including forward, backward, fire-extinguishing, dangerous and collective phrases.

4. The method according to claim 1, wherein in step 2, the specific processing procedure of the data processing device is:

firstly, filtering and denoising collected muscle electric signals;

then, judging the muscle activity state based on the muscle electric signals after the pretreatment, and segmenting effective signals generated during the muscle activity, namely the starting point and the ending point of the muscle activity state;

then, extracting and optimizing the characteristics of the effective signals in the muscle activity state;

and then classifying and identifying the effective signals extracted by the features by adopting a consistency prediction algorithm based on a random forest, and identifying corresponding instruction words.

5. The method according to claim 4, wherein the muscle activity state is determined based on the preprocessed muscle electrical signals, and the process of segmenting the effective signals generated during the muscle activity is specifically as follows:

firstly, carrying out sequential windowing on the muscle electric signals after pretreatment, wherein the window length is 200ms, and the standard deviation of a calculation window is compared with a threshold value, and the calculation formula of the threshold value is as follows:

Th=mean(rest)+μ*std(rest)

wherein Th is a threshold value; rest is the signal within the first 100ms of a window signal; mean is the signal expectation; std is the standard deviation of the signal; mu is a sensitivity value, and the result is optimal when the mu value is 3 through test tests;

obtaining corresponding effective signal data after detecting that the muscle is in an active state, wherein the length of the effective signal is between 200ms and 400 ms;

and further expanding the effective signal to 400ms by using a cubic interpolation method for outputting.

6. The method according to claim 4, wherein the process of feature extraction and optimization of the valid signals within the muscle activity state is specifically as follows:

the characteristic adopted by the characteristic extraction is a time characteristic, firstly, effective signals are framed, the frame length is 30ms, and the frame shift is 15 ms;

for each moving window, extracting four characteristic values, which are respectively:

Figure FDA0002277491400000021

Figure FDA0002277491400000024

wherein, N represents the number of numerical values contained in a moving window; xi is the ith numerical value of the current window; the characteristic dimensionality obtained after windowing characteristic extraction is carried out on a 5-channel effective signal is 520 dimensions;

then, a linear discriminant analysis method is used for optimizing the characteristic dimension from 520 dimensions to 50 dimensions, and a small amount of characteristic dimension is used for guaranteeing high-latitude information.

7. The method as claimed in claim 4, wherein the step of classifying and identifying the effective signals extracted by the features by using a consistency prediction algorithm based on a random forest is specifically as follows:

first, a random forest based definition AnCalculating a closeness value (RandomForest proximity) P (i, j) between two samples in a random forest, wherein the value i, j is 1, the.., n, and the closeness value represents the similarity between the two samples without considering a real label, and using the closeness value, a singular measurement function A of a consistency prediction algorithm can be defined, and the singular measurement function A is used for calculating a singular value (Strangenness Score) α of the sample, and the ith sample z is used for calculating a singular value (Strangenness Score) α of the sampleiThe singular values of (a) can be calculated as follows:

αi=An({z1,...,zi-1,zi+1,...,zn},zi)

wherein the data in { } is unordered, zi=(xi,yi),xiFeatures representing an ith sample in the sample set; y isiA label representing the specimen;

the singular metric function a based on random forests is defined as follows:

for sample (x)i,yi) Comprises the following steps:

A(xi,yi)=A(xi,yi)-/A(xi,yi)+

wherein the content of the first and second substances,

Figure FDA0002277491400000031

tsfinger sample (x)i,yi) (ii) an s-th greater affinity value than samples in the subsequence of samples having the same tag as the sequence; j is a function ofsFinger sample (x)i,yi) (ii) an s-th greater affinity value than samples in the subsequence of samples with different tags to which it is tagged;

then, classifying and identifying the effective signals extracted by the features based on a consistency prediction algorithm of the random forest, wherein the specific process is as follows:

for a valid signal xnFor which a hypothetical instruction tag y is setnY, with a pre-collected and processed instruction sample set Zt={(x1,y1),...,(xn-1,yn-1) Generating a new sample sequence Z ═ Zt,(xn,y)};

Applying A based on random forest definition to new sample sequence Zn,AnAllocating a singular value to each sample in the sample sequence to form a singular value sequence

Figure FDA0002277491400000032

then is the valid signal xnResetSetting a new hypothetical instruction tag, and repeating the above steps until all instructions are assumed as hypothetical instruction tags, where pyThe hypothetical instruction tag y with the largest value is considered as the recognition result.

8. An auxiliary rescue communication device based on unvoiced instruction recognition of facial surface muscle signals, the device comprising a collection device, a data processing device, a language processing device and a radio frequency device, wherein:

the acquisition device consists of five channels of measuring electrodes, and the measuring electrodes are attached to the skin of the face of a user and acquire muscle electric signals at corresponding positions in real time;

the acquisition device is in wired connection with the data processing device, and the data processing device receives the muscle electric signals transmitted by the acquisition device, performs preprocessing, feature extraction and classification identification operations on the muscle electric signals and identifies corresponding instruction words;

the language processing device is electrically connected with the data processing device and is used for converting the recognized instruction words into artificial voice;

the language processing device is in wired connection with the radio frequency device, the radio frequency device is used for receiving and sending an identification result, sending the artificial voice to an earphone of an operator through the radio frequency device, and sending a voice text to a command center in a wireless mode.

9. The device of claim 8, wherein the device is integrally configured as a hanging neck stand-alone device;

or the device is combined with a breathing mask, and specifically comprises:

the acquisition device is in wired connection with the breathing mask, and the breathing mask is in wired connection with the battery; the data processing device, the language processing device and the radio frequency device are arranged on the bearing equipment.

Technical Field

The invention relates to the technical field of rescue communication, in particular to an auxiliary rescue communication method and device based on the unvoiced instruction recognition of facial surface muscle signals.

Background

Speech recognition under strong background noise environment, such as disasters, wars, etc., is always one of the important problems in the speech recognition field, and is also one of the inherent problems that have not been solved perfectly. If communication is needed in an environment with strong background noise, for example, firefighters or divers need to give instructions according to the current state, communicate among teammates, and the like, the performance of voice recognition is more important.

In the safe production and rescue work of coal mines, rescue teams or reconnaissance teams need to communicate effectively under severe and extreme environments. However, the live engine sound and the working sound can weaken the sound of communication between team members, even if the existing digital voice communication technology is adopted and the microphone is carried for receiving sound, the obtained voice still cannot completely eliminate the interference of background sound, such as intermittent hissing sound emitted by breathing equipment carried by the team members, and the breathing device covering the face can distort the emitted sound, so that the speaking is unclear. In order to solve the problem of speech recognition in such extreme environments, many researches have been made in the prior art, such as noise reduction, bone conduction, etc., but the prior arts are still not suitable for complex and variable real-world environments.

Disclosure of Invention

The invention aims to provide an auxiliary rescue communication method and device based on the unvoiced instruction recognition of facial surface muscle signals, which avoid the inherent problem that the traditional voice recognition result is interfered by environmental background noise and are suitable for application scenes with high noise backgrounds or incapable of receiving sound.

The purpose of the invention is realized by the following technical scheme:

an assisted rescue communication method based on unvoiced instruction recognition of facial surface muscle signals, the method comprising:

step 1, firstly, acquiring muscle electric signals of corresponding positions in real time through a measuring electrode which is attached to the facial skin of a user in advance;

step 2, carrying out preprocessing, feature extraction and classification identification operations on the collected muscle electric signals by a data processing device to identify corresponding instruction words;

step 3, the recognized instruction words are converted into artificial voices by the language processing device;

and 4, sending the artificial voice to an earphone of an operator through a radio frequency device, and sending the voice text to a command center in a wireless mode.

The invention also provides an auxiliary rescue communication device based on the non-sounding instruction identification of the facial surface muscle signal, which comprises a collecting device, a data processing device, a language processing device and a radio frequency device, wherein:

the acquisition device consists of five channels of measuring electrodes, and the measuring electrodes are attached to the skin of the face of a user and acquire muscle electric signals at corresponding positions in real time;

the acquisition device is in wired connection with the data processing device, and the data processing device receives the muscle electric signals transmitted by the acquisition device, performs preprocessing, feature extraction and classification identification operations on the muscle electric signals and identifies corresponding instruction words;

the language processing device is electrically connected with the data processing device and is used for converting the recognized instruction words into artificial voice;

the language processing device is in wired connection with the radio frequency device, the radio frequency device is used for receiving and sending an identification result, sending the artificial voice to an earphone of an operator through the radio frequency device, and sending a voice text to a command center in a wireless mode.

According to the technical scheme provided by the invention, the method and the device avoid the inherent problem that the traditional voice recognition result is interfered by the environmental background noise, and are suitable for the application scene with high noise background or incapable of receiving sound.

Drawings

In order to more clearly illustrate the technical solutions of the embodiments of the present invention, the drawings needed to be used in the description of the embodiments are briefly introduced below, and it is obvious that the drawings in the following description are only some embodiments of the present invention, and it is obvious for those skilled in the art to obtain other drawings based on the drawings without creative efforts.

Fig. 1 is a schematic flow chart of an auxiliary rescue communication method based on unvoiced instruction recognition of facial surface muscle signals according to an embodiment of the present invention;

FIG. 2 is a schematic view of a skin-contacting position of a measuring electrode according to an embodiment of the present invention;

fig. 3 is a schematic structural diagram of an apparatus according to an embodiment of the present invention.

Detailed Description

The technical solutions in the embodiments of the present invention are clearly and completely described below with reference to the drawings in the embodiments of the present invention, and it is obvious that the described embodiments are only a part of the embodiments of the present invention, and not all embodiments. All other embodiments, which can be derived by a person skilled in the art from the embodiments of the present invention without making any creative effort, shall fall within the protection scope of the present invention.

The embodiment of the present invention will be further described in detail with reference to the accompanying drawings, and as shown in fig. 1, a schematic flow chart of an auxiliary rescue communication method based on unvoiced instruction recognition of facial surface muscle signals provided by the embodiment of the present invention is shown, where the method includes:

step 1, firstly, acquiring muscle electric signals of corresponding positions in real time through a measuring electrode which is attached to the facial skin of a user in advance;

in this step, as shown in fig. 2, a schematic view of a skin-contacting position of the measuring electrode according to an embodiment of the present invention is shown, where the measuring electrode includes five passages, and the measuring electrodes of the five passages are respectively skin-contacted on the set positions of the facial skin of the user.

Step 2, carrying out preprocessing, feature extraction and classification identification operations on the collected muscle electric signals by a data processing device to identify corresponding instruction words;

in this step, the recognized command words need to be preset according to the use environment and requirements, the command words can be single words or two-word phrases, such as forward, backward, fire extinguishing, danger, aggregation, etc., before the device is used, the signal is collected and processed to generate a command sample set, and then the command sample set is stored in the data processing device for realizing the classification recognition operation.

In a specific implementation, the specific processing procedure of the data processing apparatus is as follows:

1) firstly, filtering and denoising collected muscle electric signals;

because the collected signal is interfered by other physiological electric signals or electronic equipment, the signal needs to be preprocessed, and the embodiment of the invention adopts a high-channel filtering of 20HZ and a comb notch filter (notch filter) of 50HZ to eliminate noise.

2) Then, judging the muscle activity state based on the preprocessed muscle electric signals, and segmenting effective signals generated during muscle activity, namely the starting point and the ending point of the muscle activity state, wherein the specific process is as follows:

firstly, carrying out sequential windowing on the muscle electric signals after pretreatment, wherein the window length is 200ms, and the standard deviation of a calculation window is compared with a threshold value, and the calculation formula of the threshold value is as follows:

Th=mean(rest)+μ*std(rest)

wherein Th is a threshold value; rest is the signal within the first 100ms of a window signal; mean is the signal expectation; std is the standard deviation of the signal; mu is a sensitivity value, and the result is optimal when the mu value is 3 through test tests;

corresponding signal data are obtained after detecting that muscles are in an active state, and the lengths of extracted signals are different due to different speaking speeds of users, wherein the length of the signals is between 200ms and 400ms in general;

the signal is further extended to 400ms by cubic interpolation as a valid signal.

3) And then carrying out feature extraction and optimization on the effective signals in the muscle activity state, wherein the specific process is as follows:

the characteristic adopted by the characteristic extraction is a time characteristic, firstly effective signals are framed, the frame length is 30ms, the frame shift is 15 ms:

for each moving window, extracting four characteristic values, which are respectively:

Figure BDA0002277491410000041

Figure BDA0002277491410000042

Figure BDA0002277491410000043

Figure BDA0002277491410000044

wherein, N represents the number of numerical values contained in a moving window; xi is the ith numerical value of the current window; the characteristic dimensionality obtained after windowing characteristic extraction is carried out on a 5-channel effective signal is 520 dimensions;

here, considering that the activity of one muscle cluster affects the coarse activity of surrounding muscle clusters, the correlation exists in nature in the characteristics, the algorithm efficiency can be improved from a small amount of effective characteristics, then the characteristic dimension is optimized from 520 dimensions to 50 dimensions by using a linear discriminant analysis method (LDA), and information of high latitude is ensured by using a small amount of characteristic dimension, so that a classification result with similar or even same effect is obtained.

4) Then, classifying and identifying the effective signals extracted by the features by adopting a consistency prediction algorithm based on random forests, and identifying corresponding instruction words, wherein the specific process comprises the following steps:

first, a random forest based definition AnA closeness value (RandomForest proximity) P (i, j) between two samples is calculated in a random forest, i, j being 1Similarity between the samples can be defined by using the closeness value, a singular measurement function A of a consistency prediction algorithm is defined and used for calculating a singular value (Strengenses Score) α of a sample, and an ith sample ziThe singular values of (a) can be calculated as follows:

αi=An({z1,...,zi-1,zi+1,...,zn},zi)

wherein the data in { } is unordered, zi=(xi,yi),xiFeatures representing an ith sample in the sample set; y isiA label representing the specimen;

the singular metric function a based on random forests is defined as follows:

for sample (x)i,yi) Comprises the following steps:

A(xi,yi)=A(xi,yi)-/A(xi,yi)+

wherein the content of the first and second substances,

Figure BDA0002277491410000051

tsfinger sample (x)i,yi) (ii) an s-th greater affinity value than samples in the subsequence of samples having the same tag as the sequence; j is a function ofsFinger sample (x)i,yi) (ii) an s-th greater affinity value than samples in the subsequence of samples with different tags to which it is tagged;

then, classifying and identifying the effective signals extracted by the features based on a consistency prediction algorithm of the random forest, wherein the specific process is as follows:

for a valid signal xnFor which a hypothetical instruction tag y is setnY, with a pre-collected and processed instruction sample set Zt={(x1,y1),...,(xn-1,yn-1) Generating a new sample sequence Z ═ Zt,(xn,y)};

Applying A based on random forest definition to new sample sequence Zn,AnFeeding sampleEach sample in the sequence is assigned a singular value to form a singular value sequence

Figure BDA0002277491410000052

By combining singular values in a sequenceWith others

Figure BDA0002277491410000054

Comparing to obtain effective signal xnCurrent hypothesis of confidence level of instruction tag y

Figure BDA0002277491410000055

P is aboveyIs shown when xnWhen the label of (a) is y, how consistent the sample is with the rest of the sample in the sample sequence, pyThe greater the degree of consistency, the better;

then is the valid signal xnResetting a new hypothetical instruction tag, repeating the above steps until all instructions are treated as hypothetical instruction tags, where pyThe hypothetical instruction tag y with the largest value is considered as the recognition result.

The above method is to perform analysis by acquiring surface electromyogram (sEMG) through electrodes attached to facial muscles, which may also be called merry recognition. The surface electromyogram is to convert the weak potential difference generated by muscle fiber contraction into digital signal to reflect the state of the neuromuscular, and the potential difference is changed due to the change of muscle structure and function, and the electromyogram signal generated by different muscle fiber contraction is also changed correspondingly. The speech sound is generated by the complex cooperation of a series of facial and other parts of muscle clusters, and the corresponding muscle clusters used for different pronunciations or characters are different, so that the identification of instruction words can be realized through the skin electromyographic signals, and the generation of the muscle electrical signals is related to the contraction of the muscle clusters and is not related to the real pronunciations, so that the instruction words can be in the form of small sound or uninteresting reading and non-sounding, thereby solving the problem of distorted sound caused by background noise or breathing equipment.

Step 3, the recognized instruction words are converted into artificial voices by the language processing device;

and 4, sending the artificial voice to an earphone of an operator through a radio frequency device, and sending the voice text to a command center in a wireless mode.

Based on the above method, an embodiment of the present invention further provides an auxiliary rescue communication device based on the non-vocal instruction recognition of the facial surface muscle signal, as shown in fig. 3, which is a schematic structural diagram of the device according to the embodiment of the present invention, and the device mainly includes an acquisition device, a data processing device, a language processing device, and a radio frequency device, wherein:

the acquisition device consists of five channels of measuring electrodes, and the measuring electrodes are attached to the skin of the face of a user and acquire muscle electric signals at corresponding positions in real time;

the acquisition device is in wired connection with the data processing device, and the data processing device receives the muscle electric signals transmitted by the acquisition device, performs preprocessing, feature extraction and classification identification operations on the muscle electric signals and identifies corresponding instruction words;

the language processing device is electrically connected with the data processing device and is used for converting the recognized instruction words into artificial voice;

the language processing device is in wired connection with the radio frequency device, the radio frequency device is used for receiving and sending an identification result, sending the artificial voice to an earphone of an operator through the radio frequency device, and sending a voice text to a command center in a wireless mode.

The specific implementation of each component in the above-described apparatus is described in the above-described method embodiment.

In a specific implementation, the device can be integrated into a neck hanging type independent device; or the device is combined with a breathing mask, and specifically comprises:

the acquisition device is in wired connection with the breathing mask, and the breathing mask is in wired connection with the battery; the data processing device, the language processing device and the radio frequency device are arranged on the bearing equipment.

The working process of the auxiliary rescue communication device is as follows:

(1) the device is turned on, and connection (Bluetooth, wireless) is established;

(2) the acquisition device acquires real-time electromyographic signals, and the electromyographic signals are transmitted to the data processing device through a cable;

(3) the data processing device processes the electromyographic signals in real time and enters a monitoring state during initialization, if the awakening words are monitored to enter a recognition state, the radio frequency device sends a prompt tone to the earphone of a user to indicate that the instruction can be described, the prompt tone is also sent to the earphones of other teammates through the radio frequency device, and the prompt is that the instruction is issued;

(4) the user describes the instruction words, the description mode can be voiced, the worship mode can also be silent, and the acquisition device acquires corresponding myoelectric signals and transmits the corresponding myoelectric signals to the data processing device;

(5) the data processing device in the identification state carries out identification operation on the electromyographic signals and identifies corresponding instruction words;

(6) and the language processing device converts the recognized instruction words into artificial voice, the artificial voice is sent to earphones of other teammates through the radio frequency device, and the voice text is sent to the command center in a wireless mode.

It is noted that those skilled in the art will recognize that embodiments of the present invention are not described in detail herein.

The above description is only for the preferred embodiment of the present invention, but the scope of the present invention is not limited thereto, and any changes or substitutions that can be easily conceived by those skilled in the art within the technical scope of the present invention are included in the scope of the present invention. Therefore, the protection scope of the present invention shall be subject to the protection scope of the claims.

11页详细技术资料下载
上一篇:一种医用注射器针头装配设备
下一篇:语音合成方法、装置、系统和存储介质

网友询问留言

已有0条留言

还没有人留言评论。精彩留言会获得点赞!

精彩留言,会给你点赞!