Intelligent voice interaction system and method

文档序号：276380 发布日期：2021-11-19 浏览：2次中文

阅读说明：本技术 一种智能语音交互系统及方法 (Intelligent voice interaction system and method ) 是由周继敏张晓地于 2021-09-30 设计创作，主要内容包括：本发明公开了一种智能语音交互系统及方法,该系统包括通讯连接且实现语音信息交互的用户终端和人工语音终端；与人工语音终端实现语音信息交互的智能语音机器人,智能语音机器人对接收的语音进行用户意图分析,生成回复语音并传输至人工语音终端；机器人耳麦单元,用于监控人工语音终端,分析用户终端发送的语音A,对语音A的音频流编码成语音B；或是接收智能语音机器人发送的语音C,对语音C的音频流编码成语音D；人工语音终端将语音D发送至用户终端；机器人耳麦单元与人工语音终端通讯连接。本发明从繁杂的接口、协议、信息安全和流程中解耦,在技术底层进行切入,任意以CTI技术为基础的应用上实现智能语音机器人的无缝对接。(The invention discloses an intelligent voice interaction system and a method, wherein the system comprises a user terminal and an artificial voice terminal which are in communication connection and realize voice information interaction; the intelligent voice robot performs user intention analysis on the received voice, generates reply voice and transmits the reply voice to the artificial voice terminal; the robot headset unit is used for monitoring the artificial voice terminal, analyzing the voice A sent by the user terminal and coding the audio stream of the voice A into a voice B; or receiving voice C sent by the intelligent voice robot, and coding the audio stream of the voice C into voice D; the artificial voice terminal sends the voice D to the user terminal; the robot headset unit is in communication connection with the artificial voice terminal. The invention is decoupled from complicated interfaces, protocols, information safety and processes, is switched in at the technical bottom layer, and realizes seamless connection of the intelligent voice robot on the basis of any CTI technology.)

1. An intelligent voice interaction system is characterized by comprising

The system comprises a user terminal (1) and an artificial voice terminal (2) which are in communication connection and realize voice information interaction;

the intelligent voice robot (3) is used for realizing voice information interaction with the artificial voice terminal (2), and the intelligent voice robot (3) is used for analyzing the user intention of the received voice, generating reply voice and transmitting the reply voice to the artificial voice terminal (2);

the artificial voice terminal (2) is also provided with a robot headset unit (4) which is used for monitoring the artificial voice terminal (2), analyzing the voice A sent by the user terminal (1), and fluidizing and packaging the audio stream of the voice A into the voice B which can be received by the intelligent voice robot (3) through an audio coding technology; or receiving the voice C sent by the intelligent voice robot (3), and streaming and packaging the audio stream of the voice C into the voice D which can be received by the user terminal (1) through an audio coding technology; the artificial voice terminal (2) sends the voice D to the user terminal (1);

the robot headset unit is in communication connection with the artificial voice terminal (2).

2. The intelligent voice interaction system according to claim 1, characterized in that the robotic headset unit (4) comprises

The voice monitoring module (41) is used for monitoring whether the artificial voice terminal (2) has voice A input;

a first encoding module (42) for encoding the audio stream of the voice A monitored by the voice monitoring module (41) into a voice B;

the communication module (43) is used for realizing voice communication between the user terminal (1) and the intelligent voice robot (3);

the voice sending module (44) is used for sending the voice B to the intelligent voice robot (3) or sending the voice D to the user terminal (1);

the voice receiving module (45) is used for receiving the voice A monitored by the voice monitoring module (41) or the voice C fed back by the intelligent voice robot (3);

the second coding module (49) is used for coding the audio stream of the voice C fed back by the intelligent voice robot (3) into voice D;

the voice sending module (44) is used for sending the voice B to the intelligent voice robot (3) or sending the voice D to the user terminal (1).

3. The intelligent voice interaction system of claim 2, wherein the voice interaction system is configured to be executed in a voice communication system

The intelligent voice robot (3) comprises a voice receiving unit (31), a voice analyzing unit (32), a communication unit (34) and a dialect library (33) which is connected with the voice analyzing unit (32) in an indexing way;

the voice receiving unit (31) is used for receiving the voice B sent by the artificial voice terminal (2);

the voice analysis unit (32) is used for analyzing the semantics of the voice B, acquiring the corresponding dialogues in the dialogues library (33) according to the semantics to form a voice C, and sending the voice C back to the voice receiving module (45) of the artificial voice terminal (2) through the communication unit (34).

4. An intelligent voice interaction system according to claim 2 or 3, characterised in that the robotic headset unit (4) further comprises

A calling module (46) and an alarm module (48);

the intelligent voice robot (3) also comprises a call response unit (35) which is arranged corresponding to the call module (46);

the calling module (46) is used for sending prompt information to the intelligent voice robot (3) when voice A is input, determining whether the intelligent voice robot (3) is on line and can receive voice, and when the intelligent voice robot (3) receives the prompt information and returns response information to the voice receiving module (45) through the calling response unit (35), the voice sending module (44) sends voice B to the intelligent voice robot (3); if the voice receiving module (45) does not receive the response information returned by the call response unit 35 within the set time, the alarm module (48) sends out an alarm to prompt an artificial agent.

5. The intelligent voice interaction system of any one of claims 1-3,

the artificial voice terminal (2) is also provided with an artificial seat for controlling the robot headset unit (4) to be switched on or off in the whole voice exchange process or not (5);

when the artificial seat closes the robot headset unit (4) through the switch control unit (5), the system is switched to an artificial voice mode, and the artificial seat is in voice conversation with a user; when the artificial seat opens the robot headset unit (4) through the switch control unit (5), the robot voice mode is switched to, and the intelligent voice robot (3) is connected to the system to complete voice processing and conversation.

6. The intelligent voice interaction system of claim 5,

the robot headset unit (4) further comprises an artificial seat monitoring module (47), and when the robot headset unit (4) is in a closed state, the artificial seat monitoring module is used for monitoring whether an artificial voice terminal (2) of an artificial seat has artificial seat voice input in real time;

the intelligent voice robot (3) further comprises an artificial warning unit (37), a forbidden word bank (35) and a semantic violation word bank (36) which are related to the voice analysis unit (32) in an indexing way;

when artificial seat voice is input, the artificial seat voice is output to the user terminal (1) through the input device and the communication module (43), the artificial seat monitoring module (47) sends the artificial seat voice to the intelligent voice robot (3), after the voice receiving unit (31) receives the artificial seat voice, the voice analyzing unit (32) analyzes the artificial seat voice, key words in the voice are extracted, whether illegal words, illegal words and forbidden words exist or not are respectively contrastively analyzed with the forbidden word bank (35) and the semantic illegal word bank (36) through the key words, and if the illegal words and forbidden words exist, the artificial warning module (37) sends warning information to the artificial voice terminal (2).

7. The intelligent voice interaction system according to claim 5, wherein the intelligent voice robot (3) further comprises a dialog suggestion unit (38);

according to the result of the analysis of the manual seat voice by the voice analysis unit (32), generating a suggested statement aiming at the user intention obtained by the analysis, and sending the suggested statement to the manual voice terminal (2) through a dialog suggestion unit (38); or generating a suggested speech aiming at improper/illegal/forbidden words and sending the suggested speech to the artificial voice terminal (2) through a speech suggestion unit (38).

8. The intelligent voice interaction system according to claim 3, wherein the communication module (43) and the communication unit (34) implement voice communication by using SIP communication technology.

9. The intelligent voice interaction system of claim 1, wherein the audio coding technique employs a PCMU 8K coding technique.

10. An intelligent voice interaction method using the intelligent voice interaction system according to any one of claims 1 to 9, comprising the steps of:

the artificial voice terminal receives a voice A input by a user terminal; a robot headset unit arranged on the artificial voice terminal fluidizes and encapsulates the audio stream of the voice A into voice B which can be received by the intelligent voice robot through an audio coding technology, and transmits the voice B to the intelligent voice robot; the intelligent voice robot analyzes the received voice B, generates a reply voice C and transmits the reply voice C to the artificial voice terminal; and the robot headset unit streams the audio stream of the voice C through an audio coding technology, encapsulates the audio stream into voice D which can be received by the user terminal, and sends the voice D to the user terminal.

11. The intelligent voice interaction method of claim 10, further comprising the step of

The robot headset unit encodes the voice A when monitoring that the voice A is input, sends prompt information to the intelligent voice robot at the same time, determines whether the intelligent voice robot is on line and can receive the voice, sends the voice B to the intelligent voice robot after receiving response information returned by the intelligent voice robot, and sends an alarm to prompt an artificial seat if the response information returned by the intelligent voice robot is not received within a set time limit, and timely accesses the seat to work and checks faults.

12. The intelligent voice interaction method of claim 10, further comprising

The switch control unit is used for controlling whether the headset unit of the robot intervenes in the voice exchange process in the whole process or not by a manual seat;

if the switch control unit is set to be off, the system is switched to an artificial voice mode, and an artificial seat is in voice conversation with a user;

if the switch control unit is set to be on, the voice mode of the robot is switched to, and the intelligent voice robot access system completes voice processing and conversation.

13. The intelligent voice interaction method of claim 12, wherein the method further comprises

When the switch control unit is off, the system also realizes the following steps:

the robot headset unit monitors whether an input device of an artificial seat inputs artificial seat voice in real time, if the artificial seat voice inputs, the artificial seat voice is output to a user terminal through the input device and a communication module, meanwhile, the artificial seat voice is sent to the intelligent voice robot, the intelligent voice robot analyzes the artificial seat voice, keywords in the voice are extracted, whether illegal terms, illegal terms and forbidden terms exist or not are analyzed through the keywords and compared with a forbidden word bank and a semantic illegal word bank respectively, if the intelligent voice robot exists, warning information is sent to the artificial voice terminal and information content is sent to the artificial voice terminal to be displayed, and the information content comprises but is not limited to use of the illegal/forbidden terms.

14. The intelligent voice interaction method of claim 13, further comprising the step of

According to the result of the manual seat voice analysis, the intelligent voice robot generates a suggested speech for the user intention obtained through analysis and sends the suggested speech to the manual voice terminal for displaying; or generating a suggested statement aiming at improper/illegal/forbidden terms, and sending the suggested statement to the artificial voice terminal for displaying.

15. Computer device comprising a memory, a processor and a computer program stored in said memory and executable on said processor, characterized in that said processor implements the intelligent voice interaction method according to any of claims 10 to 14 when executing said computer program.

16. Computer-readable storage medium, in which a computer program is stored which, when being executed by a processor, carries out the intelligent voice interaction method according to any one of claims 10 to 14.

Technical Field

The invention relates to the technical field of voice interaction, in particular to an intelligent voice interaction system and method.

Background

Computer-telephone Integration technology, the name of Computer telecommunications Integration, abbreviated as CTI, is an application technology that integrates Computer and telephone technologies and combines two functions of telephone communication and Computer information processing. Is the core technology of Call Center (Call Center), video/voice instant communication (such as WeChat video/voice, Tencent meeting, Zoom, etc.) system.

The intelligent voice interaction is a very important branch in the field of artificial intelligence, and has wide application and prospect in the industries of artificial customer service, instant voice communication and the like; the intelligent speech robot realizes the system of natural speech dialogue between robot and human through artificial intelligence, it combines the technologies of intelligent speech recognition, natural language understanding, dialogue interactive engine, intelligent speech synthesis, etc. with the computer information technology, and the formed application technology is the core technology for the call center system to further develop intellectualization.

As shown in fig. 1, in the field of traditional call/customer service/video/voice conference, a manual call with a user needs to be implemented on a manual terminal by means of a system based on CTI technology, such as a call center system, a voice instant messaging system, an IP phone, and the like.

In view of the above, it is desirable to provide an intelligent voice interaction system and method capable of avoiding technical limitations caused by docking, modifying and upgrading new and old systems and docking a voice instant call and a CTI system to an intelligent voice robot in an 'imperceptible' manner.

Disclosure of Invention

In order to solve the technical problem, the technical scheme adopted by the invention is to provide an intelligent voice interaction system, which comprises

The user terminal and the artificial voice terminal are in communication connection and realize voice information interaction;

the intelligent voice robot performs user intention analysis on the received voice, generates reply voice and transmits the reply voice to the artificial voice terminal;

the artificial voice terminal is also provided with a robot headset unit which is used for monitoring the artificial voice terminal, analyzing the voice A sent by the user terminal, and fluidizing and packaging the audio stream of the voice A into the voice B which can be received by the intelligent voice robot through an audio coding technology; or receiving voice C sent by the intelligent voice robot, and streaming and packaging the audio stream of the voice C into voice D which can be received by the user terminal through an audio coding technology; the artificial voice terminal sends the voice D to the user terminal;

the robot headset unit is in communication connection with the artificial voice terminal.

In the above aspect, the robot headset unit includes

The voice monitoring module is used for monitoring whether the artificial voice terminal has voice A input;

the first coding module is used for coding the audio stream of the voice A monitored by the voice monitoring module into a voice B;

the communication module is used for realizing voice communication between the user terminal and the intelligent voice robot;

the voice sending module is used for sending the voice B to the intelligent voice robot or sending the voice D to the user terminal;

the voice receiving module is used for receiving the voice A monitored by the voice monitoring module or the voice C fed back by the intelligent voice robot;

the second coding module is used for coding the audio stream of the voice C fed back by the intelligent voice robot into a voice D;

the voice sending module is used for sending the voice B to the intelligent voice robot or sending the voice D to the user terminal.

In the scheme, the intelligent voice robot comprises a voice receiving unit, a voice analysis unit, a communication unit and a voice library which is connected with the voice analysis unit in an index manner;

the voice receiving unit is used for receiving a voice B sent by the artificial voice terminal;

the voice analysis unit is used for analyzing the semantics of the voice B and acquiring the corresponding dialect in the dialect library according to the semantics to form a voice C, and the voice C is sent back to the voice receiving module of the artificial voice terminal through the communication unit.

In the above scheme, the robot headset unit further comprises a calling module and an alarm module;

the intelligent voice robot also comprises a call response unit which is arranged corresponding to the call module;

the calling module is used for sending prompt information to the intelligent voice robot when voice A is input, determining whether the intelligent voice robot is on line and can receive voice, and when the intelligent voice robot receives the prompt information and then returns response information to the voice receiving module through the calling response unit, the voice sending module sends voice B to the intelligent voice robot; if the voice receiving module does not receive the response information returned by the call response unit 35 within the set time, the alarm module will send out an alarm to prompt the human agent.

In the scheme, the artificial voice terminal is also provided with an artificial seat for controlling a switch control unit of the robot headset unit whether to intervene in the voice interchange process in the whole process;

when the artificial seat closes the headset unit of the robot through the switch control unit, the system is switched to an artificial voice mode, and the artificial seat is in voice conversation with a user; when the artificial seat opens the headset unit of the robot through the switch control unit, the robot voice mode is switched to, and the intelligent voice robot access system completes voice processing and conversation.

In the above scheme, the robot headset unit further includes an artificial seat monitoring module, and when the robot headset unit is in a closed state, the artificial seat monitoring module is used for monitoring whether an artificial voice terminal of an artificial seat has artificial seat voice input in real time;

the intelligent voice robot also comprises an artificial warning unit, a forbidden word bank and a semantic violation word bank which are related to the voice analysis unit index;

when artificial seat voice is input, the artificial seat voice is output to a user terminal through input equipment and a communication module, the artificial seat monitoring module sends the artificial seat voice to the intelligent voice robot, after the voice receiving unit receives the artificial seat voice, the voice analyzing unit analyzes the artificial seat voice, keywords in the voice are extracted, whether illegal terms, illegal terms and forbidden terms exist or not are analyzed through the keywords respectively by comparison with a forbidden word bank and a semantic illegal word bank, and if the illegal terms and the forbidden terms exist, warning information is sent to the artificial voice terminal through the artificial warning module.

In the above scheme, the intelligent voice robot further comprises a dialogue suggestion unit;

generating a user intention providing suggestion statement aiming at the analysis according to the result of the voice analysis of the artificial seat by the voice analysis unit, and sending the suggestion statement to the artificial voice terminal through the dialog suggestion unit; or generating a suggested speech aiming at improper/illegal/forbidden words and sending the suggested speech to the artificial voice terminal through the speech suggestion unit.

In the above scheme, the communication module and the communication unit implement voice communication by using an SIP communication technology.

In the above scheme, the audio coding technique adopts a PCMU 8K coding technique.

The invention also provides an intelligent voice interaction method, which adopts the intelligent voice interaction system and comprises the following steps:

The method further comprises the steps that when the robot headset unit monitors that the voice A is input, the voice A is coded, meanwhile, prompt information is sent to the intelligent voice robot, whether the intelligent voice robot is on-line and can receive voice is determined, if response information returned by the intelligent voice robot is received, the voice B is sent to the intelligent voice robot, if response information returned by the intelligent voice robot is not received within a set time limit, the robot headset unit sends an alarm to prompt an artificial seat, the artificial seat is accessed to work in an on-line mode, and faults are cleared.

In the method, the system also comprises a switch control unit which is used for controlling whether the headset unit of the robot intervenes in the voice interchange process in the whole process or not by a human seat;

if the switch control unit is set to be off, the system is switched to an artificial voice mode, and an artificial seat is in voice conversation with a user;

if the switch control unit is set to be on, the voice mode of the robot is switched to, and the intelligent voice robot access system completes voice processing and conversation.

In the above method, when the switch control unit is off, the system further implements the following steps:

According to the result of the artificial seat voice analysis, the intelligent voice robot generates a suggested speech for the user intention obtained by analysis and sends the suggested speech to the artificial voice terminal for displaying; or generating a suggested statement aiming at improper/illegal/forbidden terms, and sending the suggested statement to the artificial voice terminal for displaying.

The invention also provides computer equipment which comprises a memory, a processor and a computer program which is stored in the memory and can run on the processor, wherein the processor realizes the intelligent voice interaction method when executing the computer program.

The invention also provides a computer readable storage medium, which stores a computer program, and the computer program realizes the intelligent voice interaction method when being executed by a processor.

The invention sets the robot headset unit as the medium in the artificial voice terminal, and accesses the intelligent voice robot into the whole voice exchange system, thereby realizing the direct communication between the user terminal and the intelligent voice robot; the invention breaks away from the technical limit of the old and new systems on butt joint, transformation or upgrading, is decoupled from complicated interfaces, protocols, information safety and flows, is switched in at the technical bottom layer (sound layer), and realizes the seamless butt joint of the intelligent voice robot on the basis of any CTI technology.

Drawings

In order to more clearly illustrate the embodiments of the present invention or the technical solutions in the prior art, the drawings used in the description of the embodiments or the prior art will be briefly described below, and it is obvious that the drawings in the following description are some embodiments of the present invention, and other drawings can be obtained by those skilled in the art without creative efforts.

FIG. 1 is a schematic diagram of a conventional voice interchange system architecture provided in the present invention;

FIG. 2 is a schematic diagram of a voice interchange system provided in the present invention;

FIG. 3 is a schematic diagram of a voice interchange system according to the present invention;

FIG. 4 is a flow chart of an intelligent voice interaction method provided by the present invention;

FIG. 5 is a block diagram schematically illustrating the structure of a computer apparatus according to the present invention;

description of reference numerals:

1: a user terminal;

2: a manual voice terminal; 21: an information receiving display unit;

3: an intelligent voice robot; 31: a voice receiving unit; 32: a voice analysis unit; 33: a word-art library; 34: a communication unit; 35: a call response unit; 36: a semantic violation word library; 37: a manual warning module; 38: a conversation suggestion unit;

4: a robot headset unit; 41: a voice monitoring module; 42: a first encoding module; 43: a communication module; 44: a voice sending module; 45: a voice receiving module; 46: a calling module; 47: an artificial seat monitoring module; 48: an alarm module; 49: a second encoding module;

5: and a switch control unit.

Detailed Description

The technical solutions of the present invention will be described clearly and completely with reference to the following embodiments, and it should be understood that the described embodiments are some, but not all, embodiments of the present invention. All other embodiments, which can be derived by a person skilled in the art from the embodiments given herein without making any creative effort, shall fall within the protection scope of the present invention.

In the description of the present invention, it is to be understood that the terms "center", "longitudinal", "lateral", "length", "width", "thickness", "upper", "lower", "front", "rear", "left", "right", "vertical", "horizontal", "top", "bottom", "inner", "outer", "clockwise", "counterclockwise", and the like, indicate orientations and positional relationships based on those shown in the drawings, and are used only for convenience of description and simplicity of description, and do not indicate or imply that the device or element being referred to must have a particular orientation, be constructed and operated in a particular orientation, and thus, should not be considered as limiting the present invention.

Furthermore, the terms "first", "second" and "first" are used for descriptive purposes only and are not to be construed as indicating or implying relative importance or implicitly indicating the number of technical features indicated. Thus, features defined as "first", "second", may explicitly or implicitly include one or more of the described features. In the description of the present invention, "a plurality" means two or more unless specifically defined otherwise. Furthermore, the terms "mounted," "connected," and "connected" are to be construed broadly and may, for example, be fixedly connected, detachably connected, or integrally connected; can be mechanically or electrically connected; they may be connected directly or indirectly through intervening media, or they may be interconnected between two elements. The specific meanings of the above terms in the present invention can be understood in specific cases to those skilled in the art.

The invention is described in detail below with reference to specific embodiments and the accompanying drawings.

As shown in fig. 2-3, the present invention provides an intelligent voice interaction system, comprising:

the user terminal 1 and the artificial voice terminal 2 are in communication connection and realize voice information interaction;

the intelligent voice robot 3 is in communication connection with the artificial voice terminal 2 to achieve voice information interaction, and the intelligent voice robot 3 performs user intention analysis on received voice and generates reply voice to be transmitted to the artificial voice terminal 2.

The system also comprises a robot headset unit 4 arranged on the artificial voice terminal 2 and used for monitoring input equipment and output equipment of the artificial voice terminal 2, analyzing the voice A sent by the user terminal 1 received by the output equipment, and fluidizing and packaging the audio stream of the voice A into the voice B which can be received by the intelligent voice robot 3 through an audio coding technology; or the input equipment receives the voice C sent by the intelligent voice robot 3, and the audio stream of the voice C is streamed and encapsulated into the voice D which can be received by the user terminal 1 through the audio coding technology;

in this embodiment, the input device is a device for receiving and playing voice, such as an earphone, of the artificial voice terminal 2; the output device is a device used by the artificial voice terminal 2 for voice input, such as a microphone.

In this embodiment, the robot headset unit 4 reads the output device of the human voice terminal 2 through real-time monitoring, acquires the audio stream of voice a, encodes the audio stream of voice a into voice B that can be collected by the intelligent voice robot 3 through an audio encoding technique, and the robot headset unit 4 outputs the voice B to the intelligent voice robot 3 as voice communication and continuously outputs the voice B to the intelligent voice robot 3 in the communication. The intelligent voice robot 3 analyzes the received voice B and generates a reply voice C to be transmitted to the artificial voice terminal 2, the robot headset unit 4 receives the voice C sent by the intelligent voice robot 3, the audio stream of the voice C is coded and converted into a voice D which can be received by the input equipment of the artificial voice terminal 2, and the voice D is sent to the input equipment which is used for communicating with the user terminal 1, so that the user terminal 1 can directly listen to the voice D.

In the invention, a robot headset unit 4 is arranged in an artificial voice terminal 2 as a medium, and an intelligent voice robot 3 is accessed into the whole voice exchange system, so that the user terminal 1 is directly communicated with the intelligent voice robot 3; the invention breaks away from the technical limit of the old and new systems on butt joint, transformation or upgrading, is decoupled from complicated interfaces, protocols, information safety and flows, is switched in at the technical bottom layer (sound layer), and realizes the seamless butt joint of the intelligent voice robot on the basis of any CTI technology.

In this embodiment, preferably, in order to improve the voice call quality between the artificial voice terminal 2 and the intelligent voice robot 3, the PCMU 8K coding technology is adopted in the above audio coding technology.

In this embodiment, voice a and voice B, or voice C and voice D are the same sound, but are encoded differently to be compatible with different terminal receiving devices.

Preferably, in this embodiment, the robot headset unit 4 includes a voice monitoring module 41, a first encoding module 42, a communication module 43, a voice sending module 44, a voice receiving module 45, and a second encoding module 49;

the voice monitoring module 41 is used for monitoring whether the artificial voice terminal 2 has voice a input;

the first encoding module 42 is configured to encode the audio stream of the speech a monitored by the speech monitoring module 41 into speech B;

the communication module 43 is used for realizing voice communication between the user terminal 1 and the intelligent voice robot 3; wherein, the communication module 43 adopts the SIP communication technology to realize the voice communication with the intelligent voice robot 3 by adopting the SIP [ Session Initiation Protocol ] communication technology Protocol; the communication module 43 adopts an internet communication mode to realize voice communication with the user terminal 1;

the voice sending module 44 sends the voice B to the intelligent voice robot 3, or sends the voice D to the user terminal 1;

the voice receiving module 45 is configured to receive the voice a monitored by the voice monitoring module 41 or the voice C fed back by the intelligent voice robot 3;

the second coding module 49 is configured to code the audio stream of the voice C fed back by the intelligent voice robot 3 into a voice D;

the voice sending module 44 is configured to send the voice B to the intelligent voice robot 3 or send the voice D to the user terminal 1;

a user inputs voice A through the user terminal 1, the voice monitoring module 41 monitors the input of the voice A, the first coding module 42 codes an audio stream of the voice A into voice B, and the voice sending module 44 sends the voice B to the intelligent voice robot 3 through the communication module 43; the intelligent voice robot 3 analyzes the voice B to generate a reply speech forming voice C and sends the reply speech forming voice C to the voice receiving module 45, the second encoding module 49 encodes the audio stream of the voice C into a voice D, and sends the voice D to the user terminal 1 through the voice sending module 44.

The intelligent voice robot 3 comprises a voice receiving unit 31, a voice analyzing unit 32, a communication unit 34, and a speech library 33 connected to the voice analyzing unit 32.

The voice receiving unit 31 is used for receiving the voice B sent by the artificial voice terminal 2;

the voice analysis unit 32 is used for analyzing the semantic meaning of the voice B, acquiring the corresponding dialect in the dialect library 33 according to the semantic meaning to form a voice C, and sending the voice C back to the voice receiving module 45 of the artificial voice terminal 2 through the communication unit 34;

the communication unit 34 and the communication module 43 implement voice communication by using the SIP communication technology protocol.

It should be noted that, in this embodiment, the communication between the artificial voice terminal and the intelligent robot is the communication module 43 installed in the headset of the robot, and the communication module 34 of the intelligent voice robot communicates with the artificial voice terminal through the SIP communication technology protocol. The communication between the artificial voice terminal and the user terminal is realized through a communication module of the original artificial voice terminal, which is not described herein;

in addition, for the speech analysis unit 32 in this embodiment, parsing the speech B and obtaining the corresponding utterance forming speech C in the utterance library 33 according to semantics can be implemented by using existing intelligent speech robot speech analysis technology, such as converting speech into words by a real-time speech recognition (ASR) technology, performing intent determination on the words by a Natural Language Understanding (NLU) technology, then selecting out the utterance corresponding to the intent in the utterance library (by a data processing/query technology such as a database or a cache database), and converting the words into sound for playback by a speech synthesis (TTS) technology.

The robot headset unit 4 further comprises a calling module 46 and an alarm module 48; the intelligent voice robot 3 further includes a call response unit 35 provided in correspondence with the call module 46.

The calling module 46 is configured to send a prompt message to the intelligent voice robot 3 when the voice a is input, determine whether the intelligent voice robot 3 is online and can receive voice, and when the intelligent voice robot 3 receives the prompt message and returns a response message to the voice receiving module 45 through the call response unit 35, the voice sending module 44 sends the voice B to the intelligent voice robot 3; if the voice receiving module 45 does not receive the response information returned by the call response unit 35 within the set time, the alarm module 48 will send an alarm to prompt the human operator, and the intelligent voice robot 3 may be in a fault state and cannot work normally, so as to prompt the human operator to get on line to access the human operator to work in time.

Preferably, the voice sending module 45 is provided with a voice temporary storage sub-module 451, which is used for storing the encoded voice B in time sequence, and since the voice sending module 45 needs to obtain the voice robot 3 to return a response call before sending the voice B to the intelligent voice robot 3, the encoded voice B needs to be temporarily stored first, so as to ensure real-time input and reception of the user voice a.

The artificial voice terminal 2 is also provided with an artificial seat for controlling the robot headset unit 4 whether to intervene in the voice exchange process in the whole process or not by a switch control unit 5; when the artificial seat closes the robot headset unit 4 through the switch control unit 5, the system is switched to an artificial voice mode, and the artificial seat is in voice conversation with a user; when the artificial seat turns on the robot headset unit 4 through the switch control unit 5, the robot voice mode is switched to, and the intelligent voice robot 3 is connected with the system to complete voice processing and conversation. The artificial seat controls whether the robot headset unit 4 intervenes in voice conversation or not, the artificial seat and the robot voice seat are freely switched, and the robot headset unit 4 is switched to an artificial voice mode when closed, so that the voice conversation of complex services from the human and the client is realized; when the robot headset unit 4 is opened, the voice mode of the robot is switched to, and voice conversation between the intelligent voice robot 3 and a user is achieved in the whole simple service process.

In this embodiment, the artificial voice terminal 2 further includes an information receiving and displaying unit 21 for receiving and displaying various information or messages transmitted by the user terminal 1 or the intelligent voice robot 3.

Robot headset unit 4 still includes artifical seat monitoring module 47, when robot headset unit 4 is in the closed condition, whether input device (microphone) for real-time supervision artifical seat has artifical seat speech input, if artifical seat speech input, when artifical seat speech input was exported for user terminal 1 through input device and communication module, artifical seat monitoring module 47 sends artifical seat speech to intelligent voice robot 3.

The intelligent voice robot 3 further comprises a forbidden word bank 35 and a semantic violation word bank 36 which are in index association with the voice analysis unit 32, and an artificial warning unit 37; when the voice receiving unit 31 receives the artificial seat voice, the voice analyzing unit 32 analyzes the artificial seat voice, extracts keywords in the voice, compares the keywords with the forbidden word library 35 and the semantic illegal word library 36 respectively, and analyzes whether there are inappropriate terms, illegal terms and forbidden terms, if there are any, the artificial warning module 37 sends warning information to the artificial voice terminal 2 and sends information contents to the information receiving and displaying unit 21, and the information contents include but are not limited to the use of inappropriate/illegal/forbidden terms.

In this embodiment, preferably, the intelligent voice robot 3 further includes a dialogue suggesting unit 38, which generates a suggested utterance for the user's intention obtained by the analysis according to the result of the manual seat voice analysis performed by the voice analyzing unit 32 and sends the suggested utterance to the information receiving and displaying unit 21 through the dialogue suggesting unit 38; or generate a suggested word for inappropriate/illegal/prohibited terms and send it to the information receiving and displaying unit 21 through the word suggestion unit 38. The intelligent voice robot has the advantages that a terminal user (voice A) is transmitted to the intelligent voice robot 3 (voice B) in real time, the intelligent voice robot 3 provides a conversation suggestion after analyzing the intention of the user and pushes the suggestion to an artificial seat in real time to help the artificial seat to improve the service capability and the service quality; meanwhile, alarms and suggestions are provided for improper words, illegal words or optimized words in real time, and the service quality of the manual seat is improved.

In the embodiment, the voice technology and the communication mode of the existing user terminal and the existing artificial voice terminal do not need to be changed, the intelligent voice robot 3 is accessed into the system through the robot headset unit 4, and seamless docking of the intelligent voice robot is realized on any application based on the CTI technology from the cut-in of a sound layer.

As shown in fig. 4, the present invention further provides an intelligent voice interaction method, which is implemented based on the intelligent voice interaction system provided in the foregoing embodiment, and includes the steps of:

s1, the artificial voice terminal receives the voice A input by the user terminal;

s2, the robot headset unit arranged on the artificial voice terminal fluidizes and encapsulates the audio stream of the voice A into a voice B which can be received by the intelligent voice robot through an audio coding technology, and transmits the voice B to the intelligent voice robot;

s3, the intelligent voice robot analyzes the received voice B, generates a reply voice C and transmits the reply voice C to the artificial voice terminal;

and S4, the robot headset unit streams and encapsulates the audio stream of the voice C into voice D which can be received by the user terminal through an audio coding technology, and sends the voice D to the user terminal.

In this embodiment, preferably, in order to improve the voice communication quality between the artificial voice terminal and the intelligent voice robot, the audio coding technology all adopts a PCMU 8K coding technology; the artificial voice terminal and the intelligent voice robot are both provided with communication modules, and the communication modules adopt SIP (Session Initiation Protocol) communication technology Protocol to realize voice communication.

In this embodiment, the intelligent voice robot parses the received voice B, and obtains the corresponding dialect in the dialect library according to the semantics to form a voice C, which is returned to the artificial voice terminal.

Preferably, in this embodiment, the robot headset unit further implements the following steps:

the robot headset unit encodes the voice A when monitoring that the voice A is input, sends prompt information to the intelligent voice robot at the same time, determines whether the intelligent voice robot is on line and can receive the voice, sends the voice B to the intelligent voice robot after receiving response information returned by the intelligent voice robot, and sends an alarm to prompt an artificial seat if the response information returned by the intelligent voice robot is not received within a set time limit, and timely accesses the seat to work and checks faults.

Preferably, in this embodiment, the manual seat control switch control unit controls whether the headset unit of the robot intervenes in the voice exchange process in the whole process, if the switch control unit is set to be off, the system is switched to the manual voice mode, and the manual seat and the user have voice conversation; if the switch control unit is set to be on, the voice mode of the robot is switched to, and the intelligent voice robot access system completes voice processing and conversation.

Preferably, in this embodiment, when the switch control unit is off, the system further implements the following steps:

the robot headset unit monitors whether an artificial seat voice is input into an input device (microphone) of an artificial seat in real time, if the artificial seat voice is input, the artificial seat voice is output to a user terminal through the input device and a communication module, meanwhile, the artificial seat voice is sent to the intelligent voice robot, the intelligent voice robot analyzes the artificial seat voice, extracts key words in the voice, compares and analyzes whether an illegal term, an illegal term and a forbidden term exist or not with a forbidden word bank and a semantic illegal word bank respectively through the key words, if the intelligent voice robot exists, warning information is sent to the artificial voice terminal and information content is sent to the artificial voice terminal to be displayed, and the information content comprises but is not limited to the use of the illegal/forbidden terms.

Preferably, according to the result of analyzing the artificial seat voice, the intelligent voice robot generates a suggested utterance aiming at the user intention obtained by analysis and sends the suggested utterance to the artificial voice terminal for displaying; or generating a suggested statement aiming at improper/illegal/forbidden terms, and sending the suggested statement to the artificial voice terminal for displaying.

As shown in fig. 5, the present invention also provides a computer readable storage medium, on which a computer program is stored, and the computer program is executed by a processor to implement the intelligent voice interaction method in the above embodiments, or the computer program is executed by the processor to implement the intelligent voice interaction method in the above embodiments.

It will be understood by those skilled in the art that all or part of the processes of the methods of the embodiments described above can be implemented by hardware instructions of a computer program, which can be stored in a non-volatile computer-readable storage medium, and when executed, can include the processes of the embodiments of the methods described above. Any reference to memory, storage, database, or other medium used in the embodiments provided herein may include non-volatile and/or volatile memory, among others. Non-volatile memory can include read-only memory (ROM), Programmable ROM (PROM), Electrically Programmable ROM (EPROM), Electrically Erasable Programmable ROM (EEPROM), or flash memory. Volatile memory can include Random Access Memory (RAM) or external cache memory. By way of illustration and not limitation, RAM is available in a variety of forms such as Static RAM (SRAM), Dynamic RAM (DRAM), Synchronous DRAM (SDRAM), Double Data Rate SDRAM (DDRSDRAM), Enhanced SDRAM (ESDRAM), Synchronous Link DRAM (SLDRAM), Rambus Direct RAM (RDRAM), direct bus dynamic RAM (DRDRAM), and memory bus dynamic RAM (RDRAM).

The embodiments in the present specification are described in a progressive manner, and the same and similar parts among the embodiments are referred to each other, and each embodiment focuses on the differences from the other embodiments. In particular, for apparatus or system embodiments, since they are substantially similar to method embodiments, they are described in relative terms, as long as they are described in partial descriptions of method embodiments. The above-described embodiments of the apparatus and system are merely illustrative, and the units described as separate parts may or may not be physically separate, and the parts displayed as units may or may not be physical units, may be located in one place, or may be distributed on a plurality of network units. Some or all of the modules may be selected according to actual needs to achieve the purpose of the solution of the present embodiment. One of ordinary skill in the art can understand and implement it without inventive effort.

It is noted that, in this document, relational terms such as "first" and "second," and the like, may be used solely to distinguish one entity or action from another entity or action without necessarily requiring or implying any actual such relationship or order between such entities or actions. Also, the terms "comprises," "comprising," or any other variation thereof, are intended to cover a non-exclusive inclusion, such that a process, method, article, or apparatus that comprises a list of elements does not include only those elements but may include other elements not expressly listed or inherent to such process, method, article, or apparatus. Without further limitation, an element defined by the phrase "comprising an … …" does not exclude the presence of other identical elements in a process, method, article, or apparatus that comprises the element.

The foregoing are merely exemplary embodiments of the present invention, which enable those skilled in the art to understand or practice the present invention. Various modifications to these embodiments will be readily apparent to those skilled in the art, and the generic principles defined herein may be applied to other embodiments without departing from the spirit or scope of the invention. Thus, the present invention is not intended to be limited to the embodiments shown herein but is to be accorded the widest scope consistent with the principles and novel features disclosed herein.

16页详细技术资料下载

上一篇：一种医用注射器针头装配设备

下一篇：一种在自动应答中转接人工的处理方法和装置

Intelligent voice interaction system and method

相关技术

网友询问留言