It is used to help the SOS system of deaf-mute

文档序号:1775477 发布日期:2019-12-03 浏览:7次 中文

阅读说明:本技术 用于帮助聋哑人的sos系统 (It is used to help the SOS system of deaf-mute ) 是由 穆斯塔法 艾扎达 李业芃 姚凯 阿克巴 于 2019-07-10 设计创作,主要内容包括:一种用于帮助聋哑人的SOS系统,包括两个单色红外摄像头、三个红外LED、用于语音识别的高清电容式麦克风、GPRS实时语音呼叫系统、显示屏和处理器,所述处理器连接两个单色红外摄像头、三个红外LED、用于语音识别的高清电容式麦克风、GPRS实时语音呼叫系统和显示屏,所述处理器中,将用户的标志转换为响应者的音频,并且响应者的语音可以作为实时服务同时为呼叫者签名。本发明能够帮助聋哑人快速无障碍求救。(A kind of SOS system being used to help deaf-mute, including two monochromatic infrared cameras, three infrared LEDs, the high definition Electret Condencer Microphone for speech recognition, GPRS real-time voice call system, display screen and processor, the processor connects two monochromatic infrared cameras, three infrared LEDs, the high definition Electret Condencer Microphone for speech recognition, GPRS real-time voice call system and display screen, in the processor, the mark of user is converted to the audio of respondent, and the voice of respondent can be used as real time service while be caller's signature.The present invention can help deaf-mute's quickly accessible emergency.)

1. a kind of SOS system for being used to help deaf-mute, which is characterized in that including two monochromatic infrared cameras, three infrared LED, the high definition Electret Condencer Microphone for speech recognition, GPRS real-time voice call system, display screen and processor, the place It manages device and connects two monochromatic infrared cameras, three infrared LEDs, the high definition Electret Condencer Microphone for speech recognition, GPRS reality When voice calling system and display screen, in the processor, the mark of user is converted to the audio of respondent, and respondent Voice can be used as real time service and meanwhile be caller signature.

2. being used to help the SOS system of deaf-mute as described in claim 1, which is characterized in that the SOS system is mobile logical Believe terminal, be suitable for Android and iOS operating system, setting number shakes mobile communication terminal starting SOS again and exhales after shaking It cries.

3. being used to help the SOS system of deaf-mute as claimed in claim 2, which is characterized in that the mobile communication terminal is also Including for using model trained in advance to carry out the module of sign language and audio conversion.

4. being used to help the SOS system of deaf-mute as claimed in claim 3, which is characterized in that the step of the model foundation It is as follows:

1. constructing 3D animation data collection;

2. analog to digital is converted;

3. unsupervised learning model;

4. repeating step 3 to reach local minimum error;

5. forwarding the internal representation of supervised learning model;

6. comparing the estimation and reality output result of supervised learning;

7. repeating step 5 accurately and effectively to be exported;

8. the sign language of translation is generated as the animation 3D role with label;

9. trained model to be exported to the App of autonomous device and mobile communication terminal;

After generating the sign language movement of different language, in memory by these sign languages movement storage.

5. being used to help the SOS system of deaf-mute as claimed in claim 4, which is characterized in that in the step 1, use two A monochrome thermal camera and three infrared LEDs simulate all alphabet sets of different sign languages, and store it in memory;

It in the step 2, using analog-digital converter, is modulated using 8 bit pulse lengths, generates numeral input and instructed to computer Practice, test and verification;

In the step 3, audio bitstream is forwarded to the input layer of autocoder neural network, starts unsupervised learning, leads to It crosses and input content encode and be decoded it to generate internal representation to reproduce it again;

Autocoder is a three-layer neural network: input, hides and output layer, input layer are come using instar learning algorithm Its weight is updated, as shown in formula (1):

Dw=lr*a* (p '-w) (1)

Output layer uses outstar learning algorithm, as shown in formula (2):

Dw=lr* (a-w) * p ' (2)

Wherein, dw is weight transformation matrices, and lr is learning rate, and it is the output vector of this layer, w that it, which is the adjustable non-negative factor, It is weight matrix, p ' is input vector;

In the step 4, step 2 is repeated, to reach local minimum error, by reducing mean absolute error MAE, such as formula (3) shown in:

Error=output vector-input cue (3)

Wherein, error is mean absolute error MAE, and output vector is output vector, and input cue is input vector;

In the step 5, after completing unsupervised learning, the internal representation of autocoder has been forwarded to quantum neural net Network is with the study that exercises supervision;

The network is three layers of quantum nerve network, uses the internal representation of formula (4) processing autocoder

UNIt is the unitary matrix of qubit, σiIt is the Pauli matrix of i ∈ { 1,2,3 }, i.e. Pauli matrices, σ0It is 2 × 2 unit matrixs, ajIt is reality output;

Pauli matrix, also referred to as pauli spin matrix are the complex matrixes generated when Pauli processing spin in quantum mechanics, should Matrix is defined by equation (5);

Wherein, σ0It is 2 × 2 unit matrixs, σiIt is the Pauli matrix of i ∈ { 1,2,3 };

In the step 6, while it is desirable to output be correlated inputs audio content 3D animation sign language, equation (6) and (7) use It is compared by gradient descent algorithm with reality output:

Wherein,It is variation renewal amount, lr is the learning rate in formula (2),It is real output value,It is cost letter Several local derviation values;

Wherein, C is the cost function defined using mean square error,For real output value, For desired output;

In the step 7, step 5 is repeated, to reach the smallest global error, by the mean square error for reducing equation (7);

In the step 8, trained model is now ready for associating each sign language, and the sign language is as database and its Relevant input audio content stores together;

In the step 9, user can he customized 3D role, as autonomous device or mobile communication terminal application 3D animation object derived from program institute.

6. the SOS system for being used to help deaf-mute as described in one of Claims 1 to 4, which is characterized in that the processor is raspberry pi 3B +。

Technical field

The present invention relates to a kind of SOS systems, and be converted to any given audio content using artificial intelligence may customize by 3D The sign language of animation model, to help deaf-mute's quickly accessible emergency.

Background technique

Deaf-mute can not speak or listen attentively to, this is usually caused by disfluency or operation, and this inconvenience leads to deaf-mute It is more reluctant to speak in certain social lives.The damage of trachea cannula, tracheostomy or vocal cords or tracheae to disease or wound Patient may be made very dejected.According to statistics, 10, there is in 000 people the life of 8 man days deaf and dumb.But presently, there are more in world population Few deaf-mute does not have precise figure.Deaf and dumb the problem of being often as injured or related Broca brain region, is caused.

As soon as he suffers from hearing loss symptom when people's binaural listening threshold value is 25dB or higher." hearing barrier Hinder " refer to hearing loss from slightly to serious people.Deaf person has serious hearing loss mostly, it means that they almost without Hearing.There are about 4.66 hundred million people to suffer from hearing loss in the whole world, more than the 5% of world population.It is estimated that the year two thousand fifty, more than 900,000,000 people Or just there is a people to will suffer from hearing disability in every 10 people.They will have to exchange using sign language.

Therefore, when hearing-impaired people wants urgent call fireman, police or ambulance, the problem arises.Promptly In the case of seize every minute and second, be a matter of life and death sometimes.The people (deaf-mute) of many dysaudias or disfluency has found that oneself can not be Effective communication is carried out in immense pressure and panic situation.So should have the translation clothes that can save life as SOS system Business.It may be said in Britain someone, if those deaf-mutes can not link up, they can send short messages, and be contacted using eSMS SOS service centre;But actually claim according to urgent text service website (emergency text service website): " you (deaf-mute) needs about two minutes time to inform their emergence messages.If other side does not reply in three minutes, I Suggest that you send another a piece of news."

Although the average answering time of SOS calling is only 7 seconds or so, text service can not phase with voice-based service It mentions and mentioning in the same breath.Deaf-mute or any people with asthma and dyspnea symptom, when they want to send a telegraph first-aid centre notify police, When ambulance or Fire-Fighting Service, they require to help immediately.

On the other hand, SOS signal is one by three points, the continuous Morse code of three dashes and three points composition String, between there is no space or fullstop (... --- ...).Due in international morse code, three points be denoted as " S " and Three dashes are denoted as " O ", therefore for convenience, therefore the signal is referred to as " SOS ".

As component part huge and important in community, deaf-mute need especially service by sign language interpreter at audio with And by audio translation at sign language, to help them to understand the thing around occurred, especially in the public field comprising audio content Institute.Such as police office, hospital and fire-fighting and various emergencies;Either street and any crowded places, or It is other any places that emergency occurs and needs to immediately treat.

Summary of the invention

It, can be with the present invention provides one kind in order to overcome the shortcomings of that prior art deaf-mute can not realize that SOS is called in time Auxiliary deaf-mute realizes the SOS system for being used to help deaf-mute of SOS calling in time.

The technical solution adopted by the present invention to solve the technical problems is:

A kind of SOS system being used to help deaf-mute, including two monochromatic infrared cameras, three infrared LEDs, it is used for language High definition Electret Condencer Microphone, GPRS real-time voice call system, display screen and the processor of sound identification, the processor connection two A monochrome infrared camera, three infrared LEDs, the high definition Electret Condencer Microphone for speech recognition, GPRS real-time voice call The mark of user in the processor, is converted to the audio of respondent by system and display screen, and the voice of respondent can be with It is simultaneously caller's signature as real time service.

Further, the SOS system is mobile communication terminal, is suitable for Android and iOS operating system, sets number Mobile communication terminal starting SOS calling is shaken after shake again.

Further, the mobile communication terminal further includes for using model trained in advance to carry out sign language and audio conversion The module changed.

The step of model foundation, is as follows:

1. constructing 3D animation data collection;

2. analog to digital is converted;

3. unsupervised learning model;

4. repeating step 3 to reach local minimum error;

5. forwarding the internal representation of supervised learning model;

6. comparing the estimation and reality output result of supervised learning;

7. repeating step 5 accurately and effectively to be exported;

8. the sign language of translation is generated as the animation 3D role with label;

9. trained model to be exported to the App of autonomous device and mobile communication terminal.

After generating the sign language movement of different language, in memory by these sign languages movement storage;Above-mentioned preparatory training Model can by any input audio Content Transformation be sign language.

In the step 1, different sign languages are simulated (such as Arab using two monochromatic thermal cameras and three infrared LEDs Language, Chinese, English and Russian) all alphabet sets, and store it in memory.

In the step 2, using analog-digital converter, modulated using 8 bit pulse lengths, generate numeral input to computer into Row training, test and verification;

In the step 3, audio bitstream is forwarded to the input layer of autocoder neural network, starts unsupervised It practises, generates internal representation by encode and be decoded it to input content to reproduce it again;

Autocoder is a three-layer neural network: input, hides and output layer, input layer are calculated using instar study Method updates its weight, as shown in formula (1):

Dw=lr*a* (p '-w) (1)

Output layer uses outstar learning algorithm, as shown in formula (2):

Dw=lr* (a-w) * p ' (2)

Wherein, dw is weight transformation matrices, and lr is learning rate, it is the adjustable non-negative factor, a be this layer output to Amount, w is weight matrix, and p ' is input vector;

In the step 4, step 2 is repeated, it is such as public by reducing mean absolute error MAE to reach local minimum error Shown in formula (3):

Error=output vector-input cue (3)

Wherein, error is mean absolute error MAE, and output vector is output vector, input cue be input to Amount;

In the step 5, after completing unsupervised learning, the internal representation of autocoder has been forwarded to quantum mind Through network with the study that exercises supervision;

The network is three layers of quantum nerve network, uses the internal representation of formula (4) processing autocoder

UNIt is the unitary matrix of qubit, σiIt is the Pauli matrix of i ∈ { 1,2,3 }, i.e. Pauli matrices, σ0It is 2 × 2 units Matrix, ajIt is reality output;

Pauli matrix, also referred to as pauli spin matrix are the complicated squares generated when Pauli processing spin in quantum mechanics Battle array, the matrix are defined by equation (5);

Wherein, σ0It is 2 × 2 unit matrixs, σiIt is the Pauli matrix of i ∈ { 1,2,3 };

In the step 6, while it is desirable to output be correlated inputs audio content 3D animation sign language, equation (6) and (7) It is compared with reality output using gradient descent algorithm:

Wherein,It is variation renewal amount, lr is the learning rate in formula (2),It is real output value,It is into The local derviation value of this function;

Wherein, C is the cost function defined using mean square error,For real output value,For desired output;

In the step 7, step 5 is repeated, to reach the smallest global error, by the mean square error for reducing equation (7);

In the step 8, trained model is now ready for associating each sign language, and the sign language is as database Relative input audio content stores together;

In the step 9, user can he customized 3D role's (colour of skin, the bodily form of dressing up, facial expression, style etc.), As 3D animation object derived from autonomous device or mobile communication terminal application program institute.

Further, the processor is raspberry pi 3B+.

Beneficial effects of the present invention are mainly manifested in: can help deaf-mute's quickly accessible emergency.

Detailed description of the invention

Fig. 1 is the schematic diagram of self-encoding encoder neural network (unsupervised void learning model).

Fig. 2 is the schematic diagram of quantum nerve network (supervised learning model).

Specific embodiment

The invention will be further described below in conjunction with the accompanying drawings.

Referring to Figures 1 and 2, a kind of SOS system being used to help deaf-mute, including two monochromatic infrared cameras, three Infrared LED, the high definition Electret Condencer Microphone for speech recognition, GPRS real-time voice call system, display screen and processor, institute State processor connect two monochromatic infrared cameras, three infrared LEDs, for speech recognition high definition Electret Condencer Microphone, The mark of user in the processor, is converted to the audio of respondent by GPRS real-time voice call system and display screen, and The voice of respondent can be used as real time service while be caller's signature.

Further, the SOS system is mobile communication terminal, is suitable for Android and iOS operating system, sets number Mobile communication terminal starting SOS calling is shaken after shake again.

Further, the mobile communication terminal further includes for using model trained in advance to carry out sign language and audio conversion The module changed.

The step of model foundation, is as follows:

1. constructing 3D animation data collection;

2. analog to digital is converted;

3. unsupervised learning model;

4. repeating step 3 to reach local minimum error;

5. forwarding the internal representation of supervised learning model;

6. comparing the estimation and reality output result of supervised learning;

7. repeating step 5 accurately and effectively to be exported;

8. the sign language of translation is generated as the animation 3D role with label;

9. trained model to be exported to the App of autonomous device and mobile communication terminal.

After generating the sign language movement of different language, in memory by these sign languages movement storage;Above-mentioned preparatory training Model can by any input audio Content Transformation be sign language.

In the step 1, different sign languages are simulated (such as Arab using two monochromatic thermal cameras and three infrared LEDs Language, Chinese, English and Russian) all alphabet sets, and store it in memory.

In the step 2, using analog-digital converter, modulated using 8 bit pulse lengths, generate numeral input to computer into Row training, test and verification.

In the step 3, audio bitstream is forwarded to the input layer (Fig. 1) of autocoder neural network, starts without prison Educational inspector practises, and generates internal representation by encode and be decoded it to input content to reproduce it again;

Autocoder is a three-layer neural network: input, hides and output layer, input layer are calculated using instar study Method updates its weight, as shown in formula (1):

Dw=lr*a* (p '-w) (1)

Output layer uses outstar learning algorithm, as shown in formula (2):

Dw=lr* (a-w) * p ' (2)

Wherein, dw is weight transformation matrices, and lr is learning rate, it is the adjustable non-negative factor, a be this layer output to Amount, w is weight matrix, and p ' is input vector;

In the step 4, step 2 is repeated, it is such as public by reducing mean absolute error MAE to reach local minimum error Shown in formula (3):

Error=output vector-input cue (3)

Wherein, error is mean absolute error MAE, and output vector is output vector, input cue be input to Amount;

In the step 5, after completing unsupervised learning, the internal representation of autocoder has been forwarded to quantum mind Through network (Fig. 2) with the study that exercises supervision;

The network is three layers of quantum nerve network, uses the internal representation of formula (4) processing autocoder

UNIt is the unitary matrix of qubit, σiIt is the Pauli matrix of i ∈ { 1,2,3 }, i.e. Pauli matrices, σ0It is 2 × 2 units Matrix, ajIt is reality output;

Pauli matrix, also referred to as pauli spin matrix are the complicated squares generated when Pauli processing spin in quantum mechanics Battle array, the matrix are defined by equation (5);

Wherein, σ0It is 2 × 2 unit matrixs, σiIt is the Pauli matrix of i ∈ { 1,2,3 };

In the step 6, while it is desirable to output be correlated inputs audio content 3D animation sign language, equation (6) and (7) It is compared with reality output using gradient descent algorithm:

Wherein,It is variation renewal amount, lr is the learning rate in formula (2),It is real output value,It is into The local derviation value of this function;

Wherein, C is the cost function defined using mean square error,For real output value,For desired output;

In the step 7, step 5 is repeated, to reach the smallest global error, by the mean square error for reducing equation (7);

In the step 8, trained model is now ready for associating each sign language, and the sign language is as database Relative input audio content stores together;

In the step 9, user can he customized 3D role's (colour of skin, the bodily form of dressing up, facial expression, style etc.), As 3D animation object derived from autonomous device or mobile communication terminal application program institute.

Further, the processor is raspberry pi 3B+.

It in the present embodiment, is trained using program identical with previous model and step, in addition to step 2 and 8, should be made Following adjustment:

Step 2: image procossing is as object detection:

2.1 compensate background object (such as head) and surrounding ring using two monochromatic thermal cameras and three infrared LEDs Border illumination;

2.2 tracking layer matched data are to extract tracking information, such as the position of finger and hand;

Input data is generated as vector by 2.3;

Step 8: digital analog converter

8.1 obtain label of the supervision output as each input alphabet;

8.2 generate output audio using text voice API.

In the present embodiment, the SOS system for being used to help deaf-mute can be a independent equipment, be obtained using solar energy Clean energy resource provides gesture identification function by two monochromatic infrared cameras and three infrared LEDs.It also has knows for voice Other high definition Electret Condencer Microphone.The machine has GPRS real-time voice call system and 7 inches of high-definition display screens.Internal processor (raspberry pi 3B+) connects all terminal parts and handles input/output signal.Hardware platform can be " anchor type " and " movable type " (static and packaged type).Mobile model has GPS to obtain the accurate coordinates of user.The hardware platform can will be used The mark at family is converted to the audio of respondent, and the voice of respondent can be used as real time service while be caller's signature.

The equipment is mobile communication terminal, is suitable for Android and iOS operating system.Application method is to shake three times Mobile phone is shaken again after (can change).It uses integrated camera, microphone, GPS, GPRS, display and smart phone The power supply of itself.It requires the license using the above equipment.This application program also using model trained in advance carry out sign language and Audio conversion.

10页详细技术资料下载
上一篇:一种医用注射器针头装配设备
下一篇:多传感器应用于具有柔性屏幕的电子设备的方法及电子设备

网友询问留言

已有0条留言

还没有人留言评论。精彩留言会获得点赞!

精彩留言,会给你点赞!

技术分类