Cloud computing all-in-one machine and voice interaction system

文档序号:170855 发布日期:2021-10-29 浏览:17次 中文

阅读说明:本技术 一种云计算一体机及语音交互系统 (Cloud computing all-in-one machine and voice interaction system ) 是由 生桂勇 唐明军 王超 刘方 王平泉 陆延 于 2021-08-13 设计创作,主要内容包括:本发明属于语音交互技术领域,公开了一种云计算一体机及语音交互系统,且所述语音交互系统包括:语音采集模块,用于采集目标用户的语音资料;语音解析模块,用于从所述语音资料中识别语音信息,并将所述语音信息解析为第一文本;视频采集模块,用于采集目标用户的脸部图像资料;视频解析模块,用于从所述脸部图像资料中提取唇部特征,根据所述唇部特征识别唇语信息,并将所述唇语信息解析为第二文本;判断模块,用于判断所述第一文本与第二文本的相似度;主控模块,所述主控模块在所述第一文本与第二文本的相似度超过阈值时获取第二文本,并根据所述第二文本的文本信息执行控制;综上基于双重验证有效提高了语音交互的准确性。(The invention belongs to the technical field of voice interaction, and discloses a cloud computing all-in-one machine and a voice interaction system, wherein the voice interaction system comprises: the voice acquisition module is used for acquiring voice data of a target user; the voice analysis module is used for identifying voice information from the voice data and analyzing the voice information into a first text; the video acquisition module is used for acquiring facial image data of a target user; the video analysis module is used for extracting lip features from the face image data, identifying lip language information according to the lip features and analyzing the lip language information into a second text; the judging module is used for judging the similarity between the first text and the second text; the main control module acquires a second text when the similarity between the first text and the second text exceeds a threshold value, and executes control according to text information of the second text; therefore, the accuracy of voice interaction is effectively improved based on double verification.)

1. A voice interaction system, comprising:

the voice acquisition module is used for acquiring voice data of a target user;

the voice analysis module is used for identifying voice information from the voice data and analyzing the voice information into a first text;

the video acquisition module is used for acquiring facial image data of a target user;

the video analysis module is used for extracting lip features from the face image data, identifying lip language information according to the lip features and analyzing the lip language information into a second text;

the judging module is used for judging the similarity between the first text and the second text;

and the main control module acquires the second text when the similarity between the first text and the second text exceeds a threshold value, and executes control according to the text information of the second text.

2. The voice interaction system of claim 1, further comprising:

and the camera is connected with the video acquisition module and is used for shooting images of a preset area in the current environment.

3. A voice interaction system as claimed in claim 2, characterized in that:

the voice acquisition module sends a starting notice to the video acquisition module when starting to acquire voice data of a target user;

and the video acquisition module enters a data acquisition working state when receiving the starting notice.

4. A voice interaction system as claimed in claim 3, characterized in that:

when the video acquisition module does not acquire the facial image data of the target user, a dormancy notification is sent to the voice acquisition module;

and the voice acquisition module enters a dormancy preparation state when receiving the dormancy notification.

5. A voice interaction system as claimed in claim 3, characterized in that:

when the voice data of the target user is not acquired, the voice acquisition module sends a dormancy notification to the video acquisition module and continues to acquire the voice data;

and the video acquisition module enters a sleep preparation state when receiving the sleep notification.

6. The voice interaction system of claim 1, further comprising:

and the extraction module is used for extracting partial data from the face image data and the voice data and correspondingly transmitting the partial data to the video analysis module and the voice analysis module for identification and analysis.

7. The voice interaction system of claim 6, wherein: the extraction module extracts the face image data and the voice data according to the same time starting point and the same time ending point.

8. A voice interaction system as claimed in claim 1, characterized in that: the system also comprises a login module and a marking module:

the login module is used for receiving a login instruction and awakening the mark module according to the login instruction;

and the marking module marks the target of the voiceprint of the login instruction and determines the user logged in by the current voiceprint as the target user.

9. The voice interaction system of claim 1, further comprising:

and the voice playing module is used for playing the control feedback information of the main control module.

10. The utility model provides a cloud computing all-in-one which characterized in that: comprising the voice interaction system of any of claims 1 to 9.

Technical Field

The invention belongs to the technical field of voice interaction, and particularly relates to a cloud computing all-in-one machine and a voice interaction system.

Background

The cloud computing all-in-one machine is integrated equipment integrating computing, storage, virtualization and management, and manual operation is usually adopted when the equipment is maintained and subjected to basic control. With the continuous development of voice technology, various voice interaction control devices are in endless, and are widely applied to various fields such as finance, home furnishing, manufacturing, building, medical treatment and the like, so that great convenience is brought to daily production and life of people.

In conclusion, the voice interaction technology can be combined with the cloud computing all-in-one machine to provide integrated equipment which is convenient to operate, but the existing voice interaction is poor in recognition accuracy, so that the phenomenon of control errors is easy to occur.

Disclosure of Invention

In view of the above, in order to solve the problems in the background art, the present invention provides a cloud computing all-in-one machine and a voice interaction system.

In order to achieve the purpose, the invention provides the following technical scheme:

a voice interaction system, comprising:

the voice acquisition module is used for acquiring voice data of a target user;

the voice analysis module is used for identifying voice information from the voice data and analyzing the voice information into a first text;

the video acquisition module is used for acquiring facial image data of a target user;

the video analysis module is used for extracting lip features from the face image data, identifying lip language information according to the lip features and analyzing the lip language information into a second text;

the judging module is used for judging the similarity between the first text and the second text;

and the main control module acquires the second text when the similarity between the first text and the second text exceeds a threshold value, and executes control according to the text information of the second text.

Preferably, the voice interaction system further comprises: and the camera is connected with the video acquisition module and is used for shooting images of a preset area in the current environment.

Preferably, the voice acquisition module sends a start notification to the video acquisition module when starting to acquire the voice data of the target user; and the video acquisition module enters a data acquisition working state when receiving the starting notice.

Preferably, when the video acquisition module does not acquire the facial image data of the target user, a dormancy notification is sent to the voice acquisition module; and the voice acquisition module enters a dormancy preparation state when receiving the dormancy notification.

Preferably, when the voice data of the target user is not acquired, the voice acquisition module sends a dormancy notification to the video acquisition module, and continues to perform voice data acquisition; and the video acquisition module enters a sleep preparation state when receiving the sleep notification.

Preferably, the voice interaction system further comprises: and the extraction module is used for extracting partial data from the face image data and the voice data and correspondingly transmitting the partial data to the video analysis module and the voice analysis module for identification and analysis.

Preferably, the extracting module extracts the facial image data and the voice data according to a same time starting point and a same time ending point.

Preferably, the voice interaction system further comprises a login module and a marking module; the login module is used for receiving a login instruction and awakening the mark module according to the login instruction; and the marking module marks the target of the voiceprint of the login instruction and determines the user logged in by the current voiceprint as the target user.

Preferably, the voice interaction system further comprises: and the voice playing module is used for playing the control feedback information of the main control module.

A cloud computing all-in-one machine comprises the disclosed voice interaction system.

Compared with the prior art, the invention has the following beneficial effects:

in the invention, double verification is carried out based on voice recognition and lip motion recognition, so that the accuracy of voice interaction instruction recognition is effectively ensured, false recognition is avoided and the user experience is improved. Particularly, a fragment comparison verification mode is provided during double verification, so that the verification accuracy can be further improved.

In addition, in the invention, the response of the voice acquisition module and the video acquisition module is limited, so that the integral cloud computing all-in-one machine and the voice interaction system can acquire the face image data only after acquiring the voice data, thereby effectively ensuring the integrity of the voice data and effectively reducing the power consumption.

Drawings

FIG. 1 is a block diagram of a voice interaction system according to an embodiment of the present invention;

fig. 2 is a schematic structural diagram of a cloud computing all-in-one machine according to the third embodiment of the present invention.

Detailed Description

The technical solutions in the embodiments of the present invention will be clearly and completely described below with reference to the drawings in the embodiments of the present invention, and it is obvious that the described embodiments are only a part of the embodiments of the present invention, and not all of the embodiments. All other embodiments, which can be derived by a person skilled in the art from the embodiments given herein without making any creative effort, shall fall within the protection scope of the present invention.

The first embodiment is as follows:

referring to fig. 1, the present embodiment provides a voice interaction system, including:

the voice acquisition module 10 is used for acquiring voice data of a target user;

the voice analyzing module 20 is configured to recognize voice information from the voice data and analyze the voice information into a first text;

the video acquisition module 30 is used for acquiring facial image data of a target user;

the video analysis module 40 is used for extracting lip characteristics from the face image data, identifying lip language information according to the lip characteristics, and analyzing the lip language information into a second text;

the judging module 50 is used for judging the similarity between the first text and the second text;

the main control module 60, the main control module 60 obtains the second text when the similarity between the first text and the second text exceeds the threshold value, and executes control according to the text information of the second text;

the camera 70 is connected with the video acquisition module 30, and the camera 70 is used for shooting an image of a preset area in the current environment;

the extracting module 80 is used for extracting part of data from the face image data and the voice data, and correspondingly transmitting the part of data to the video analyzing module 40 and the voice analyzing module 20 for identification and analysis;

the login module 90 is used for receiving a login instruction and awakening the mark module 100 according to the login instruction;

the marking module 100 is used for marking the target of the voiceprint of the login instruction and determining the user logged in by the current voiceprint as the target user;

and the voice playing module 110 is configured to play the control feedback information of the main control module 60.

Specifically, the voice interaction system includes the following embodiments:

one is as follows:

logging in through a logging module 90 and marking through a marking module 100;

when the voice acquisition module 10 starts to acquire the voice data of the target user, sending a start notification to the video acquisition module 30; the video capture module 30 enters a data capture operating state upon receiving the start notification.

When the voice collecting module 10 does not collect the voice data of the target user, it sends a dormancy notification to the video collecting module 30, and continues to collect the voice data; video capture module 30 enters the sleep preparation state upon receiving the sleep notification.

In conclusion, a section of voice data and a section of facial image data of the target user are obtained;

the voice analysis module 20 and the video analysis module 40 respectively perform complete analysis on the voice data and the face image data to obtain a complete first text and a complete second text;

the judgment module 50 performs the comparison between the complete first text and the complete second text, and when the similarity exceeds 90%, the main control module 60 performs control according to the text information of the second text.

The second step is as follows:

logging in through a logging module 90 and marking through a marking module 100;

when the voice acquisition module 10 starts to acquire the voice data of the target user, sending a start notification to the video acquisition module 30; the video capture module 30 enters a data capture operating state upon receiving the start notification.

When the voice collecting module 10 does not collect the voice data of the target user, it sends a dormancy notification to the video collecting module 30, and continues to collect the voice data; video capture module 30 enters the sleep preparation state upon receiving the sleep notification.

In conclusion, a section of voice data and a section of facial image data of the target user are obtained;

the extracting module 80 extracts the complete voice data and the complete face image data according to the same time starting point and the same time ending point, for example, if the complete time of the data is 5min, the voice segments and the face image segments from the 30 th to the 40 th are extracted;

the voice analysis module 20 and the video analysis module 40 respectively analyze the voice fragment and the face image fragment, so as to obtain a part of first text and second text;

the judgment module 50 compares the complete first text with the complete second text, and when the similarity exceeds 90%, the main control module 60 performs control according to the text information of the complete second text based on the complete second text obtained by the voice parsing module 20.

And thirdly:

logging in through a logging module 90 and marking through a marking module 100;

when the voice acquisition module 10 starts to acquire the voice data of the target user, sending a start notification to the video acquisition module 30; the video capture module 30 enters a data capture operating state upon receiving the start notification.

When the video acquisition module 30 does not acquire the facial image data of the target user, a sleep notification is sent to the voice acquisition module 10; the voice collecting module 10 enters a sleep preparation state when receiving the sleep notification.

In summary, in the embodiment, it is ensured that the entire voice interaction system enters the working state only when the voice capture module 10 and the video capture module 30 can capture data, so that interference caused by voices outside the predetermined area on the entire system can be further avoided.

Example two:

the embodiment provides a cloud computing all-in-one machine, which comprises the voice interaction system disclosed in the first embodiment, and in the embodiment, the voice interaction system is local interaction, so that each module structure in the voice interaction system is installed in a case of the cloud computing all-in-one machine.

Example three:

with reference to fig. 2, in the present embodiment, a cloud computing all-in-one machine is provided, where the cloud computing all-in-one machine includes the voice interaction system disclosed in the first embodiment, and in the present embodiment, the voice interaction system is remote interaction, so that each module structure in the voice interaction system is divided into a local terminal and a remote terminal, where:

the remote terminal is installed on the mobile electronic device and includes a login module 90, a mark-up module 100, a voice capture module 10, a camera 70, a video capture module 30, and a voice play module 110.

The local terminal is installed in a case of the cloud computing all-in-one machine and comprises a voice analysis module 20, a video analysis module 40, a judgment module 50, a main control module 60 and an extraction module 80.

And the remote terminal is remotely connected with each module structure in the local terminal through a wireless network.

Although embodiments of the present invention have been shown and described, it will be appreciated by those skilled in the art that changes, modifications, substitutions and alterations can be made in these embodiments without departing from the principles and spirit of the invention, the scope of which is defined in the appended claims and their equivalents.

7页详细技术资料下载
上一篇:一种医用注射器针头装配设备
下一篇:一种基于边界攻击的声纹识别对抗样本生成方法

网友询问留言

已有0条留言

还没有人留言评论。精彩留言会获得点赞!

精彩留言,会给你点赞!