Real-time talkback intervention and alarm platform based on AI technology

文档序号：156108 发布日期：2021-10-26 浏览：28次中文

阅读说明：本技术 一种基于ai技术的实时对讲干预与告警平台 (Real-time talkback intervention and alarm platform based on AI technology ) 是由谢建华张伟雄戴东旭蔡存忠陈秋林于 2021-01-27 设计创作，主要内容包括：本发明提供了一种基于AI技术的实时对讲干预与告警平台包括通讯服务器、媒体资源控制服务器和AI语音训练与识别平台,通讯服务器用于提供通信服务,并将通信内容实时转化成音频媒体流,发送到媒体资源控制服务器；媒体资源控制服务器用于将音频媒体流转换成文本内容发送到AI语音训练与识别平台；AI语音训练与识别平台用于识别文本内容中涉及的敏感信息,以及识别文本内容中涉及的音频进行分类,发送告警信息至通讯服务器,启动通讯服务器中的干预模块。本发明用于识别多个业务场景中出现的敏感词、暴力恐吓、求救声、异常声音等风险信息,启动对应的干预动作,以达到净化会话环境,及时处理意外事件发生的目的。(The invention provides a real-time talkback intervention and alarm platform based on AI technology, which comprises a communication server, a media resource control server and an AI voice training and recognition platform, wherein the communication server is used for providing communication service, converting communication contents into audio media streams in real time and sending the audio media streams to the media resource control server; the media resource control server is used for converting the audio media stream into text content and sending the text content to the AI voice training and recognition platform; the AI speech training and recognition platform is used for recognizing sensitive information related to text content, recognizing audio related to the text content, classifying, sending alarm information to the communication server, and starting an intervention module in the communication server. The method and the system are used for identifying risk information such as sensitive words, violent scares, distress sounds, abnormal sounds and the like appearing in a plurality of service scenes and starting corresponding intervention actions so as to achieve the purposes of purifying conversation environment and timely processing accidents.)

1. A real-time talkback intervention and alarm platform based on AI technology is characterized by comprising a communication server (1), a media resource control server (2) and an AI voice training and recognition platform (3), wherein,

the communication server (1) is electrically connected and in signal connection with the media resource control server (2), and the communication server (1) comprises an MRCP client (11), a user agent (12), a session communication component (13) and an intervention module (14); the system comprises a user agent (12), a session communication component (13), an MRCP client (11) and a media resource control server (2), wherein the user agent (12) is used for accessing a plurality of user terminals, the session communication component is used for acquiring communication contents of the user terminals in real time and converting the communication contents into audio media streams, and the MRCP client is used for pulling the audio media streams in real time and sending the audio media streams to the media resource control server (2);

the media resource control server (2) is electrically connected and in signal connection with the AI voice training and recognition platform (3), and the media resource control server (2) is used for converting an audio media stream into text contents and sending the text contents to the AI voice training and recognition platform (3);

The AI voice training and recognition platform (3) is electrically connected and in signal connection with the communication server (1), and the AI voice training and recognition platform (3) comprises a voice recognition engine (31), a training module (31) and an alarm module (33); the system comprises a voice recognition engine (31), a training module (32), an alarm module (33) and a communication server (1), wherein the voice recognition engine (31) is used for receiving text contents of the media resource control server (2), the training module (32) is used for recognizing sensitive information related to the text contents through a voice recognition model and recognizing audio related to the text contents through a sound classification model for classification, and the alarm module (33) is used for generating alarm information with message codes corresponding to classes and sending the alarm information to the communication server (1); an intervention module (14) in the communication server (1) starts corresponding intervention actions according to the alarm information with the message codes.

2. The AI-technology-based real-time intercom intervention and alarm platform as claimed in claim 1, wherein the media resource control server (2) comprises a master server (21) and a plurality of slave servers (22), the MRCP client (11) communicates with the master server (21), the master server (21) communicates with the plurality of slave servers (22) so that the MRCP client (11) sends the IP address and port number of the user terminal to the master server (21), and the master server (21) controls the idle slave servers (22) to establish a communication connection with the MRCP client (11).

3. The AI technology based real-time intercom intervention and alarm platform of claim 1, wherein the speech recognition engine (31) comprises a segmentation module (311) and a semantic analysis module (312); the word segmentation module (311) is used for dividing the text content into a word vector set according to the word segmentation set and transmitting the word vector set to the semantic analysis module (312), and the semantic analysis module (312) is used for performing semantic analysis on the word vector set, primarily determining the classification category corresponding to the word vector set and transmitting the classification category to the training module (32).

4. The AI technology based real-time intercom intervention and alarm platform of claim 1, wherein the alarm module (33) comprises an encoder (331), an alarm information generating module (332) and an alarm information transmitting module (333); the training module (32) is connected with the encoder (331), the encoder (331) is connected with the alarm information generating module (332), the alarm information generating module (332) is connected with the alarm information transmitting module (333), and the alarm information transmitting module (333) is connected with the intervention module (14); the encoder (331) is used for receiving the sensitive information and the audio frequency classification result of the training module (32), generating a corresponding message code and sending the message code to the warning information generating module (332), the warning information generating module (332) is used for generating warning information with the message code after receiving the message code, and the warning information transmitting module (333) is used for sending the warning information with the message code to the intervention module (14).

5. The AI technology based real-time intercom intervention and alarm platform of claim 4, wherein the intervention module (14) comprises an alarm information receiving module (141), a decoder (142), an interrupt intervention module (143), a reminder intervention module (144) and a keyword silencing module (145); the alarm information transmitting module (333) is connected with the alarm information receiving module (141), the alarm information receiving module (141) is connected with the decoder (142), and the decoder (142) is respectively connected with the interruption intervention module (143), the reminding intervention module (144) and the keyword silencing module (145); the system comprises a warning information receiving module (141), a decoder (142), an interruption intervention module (143), a reminding intervention module (144) or a keyword silencing module (145), wherein the warning information receiving module (141) is used for receiving warning information with message codes of a warning information transmitting module (333) and transmitting the warning information to the decoder (142), the decoder (142) is used for analyzing the message codes and starting the interruption intervention module (143), the reminding intervention module (144) or the keyword silencing module (145) according to the message codes, the interruption intervention module (143) is used for cutting off the conversation of a user terminal, the reminding intervention module (144) is used for sending out a text warning or inserting voice to the user terminal, and the keyword silencing module (145) is used for silencing sensitive words in the communication content of the user terminal.

6. The AI technology based real-time intercom intervention and warning platform of claim 1, wherein the AI voice training and recognition platform (3) comprises a database module (34), the database module (34) being configured to store the set of sensitive word data and the set of audio classification data, and to provide the training module (32) with model training data of a training set and a testing set.

7. The AI technology based real-time intercom intervention and warning platform of claim 1, characterized in that the communication server (1), the media resource control server (2) and the AI voice training and recognition platform (3) are connected by real-time media streaming.

8. The AI technology based real-time intercom intervention and warning platform of claim 1, characterized in that the communication server (1) communicates with the media resource control server (2) through SIP protocol.

9. The AI technology-based real-time intercom intervention and alarm platform of claim 1, wherein said speech recognition model and said sound classification model are deployed on a private CPU/GPU server.

10. The AI technology-based real-time intercom intervention and alert platform of claim 1, wherein said speech recognition model and said sound classification model are paddlepaddlepaddlefluid and Kaldi-based speech recognition systems depepsar.

Technical Field

The invention belongs to the technical field of talkback intervention and alarm, and particularly relates to a real-time talkback intervention and alarm platform based on an AI technology.

Background

In daily life, communication is ubiquitous, and communication modes are various. In a specific scene, even in a state that an administrator monitors during the call, it is difficult to reflect sensitive information in the call in real time and intervene.

Disclosure of Invention

In order to overcome the defects in the prior art, the invention aims to provide a real-time talkback intervention and alarm platform based on an AI technology, so as to overcome the defects in the prior art.

In order to achieve the aim, the invention provides a real-time talkback intervention and alarm platform based on AI technology, which comprises a communication server, a media resource control server and an AI voice training and recognition platform, wherein the communication server is electrically connected and in signal connection with the media resource control server and comprises an MRCP client, a user agent, a session communication component and an intervention module; the system comprises a user agent, a session communication component, an MRCP client and a media resource control server, wherein the user agent is used for accessing a plurality of user terminals, the session communication component is used for acquiring communication contents of the user terminals in real time and converting the communication contents into audio media streams, and the MRCP client is used for pulling the audio media streams in real time and sending the audio media streams to the media resource control server; the media resource control server is electrically connected and in signal connection with the AI voice training and recognition platform and is used for converting the audio media stream into text content and sending the text content to the AI voice training and recognition platform; the AI voice training and recognition platform is electrically connected and in signal connection with the communication server and comprises a voice recognition engine, a training module and an alarm module; the system comprises a voice recognition engine, a training module, a voice classification module and a communication server, wherein the voice recognition engine is used for receiving text contents of the media resource control server, the training module is used for recognizing sensitive information related to the text contents through a voice recognition model and recognizing audio related to the text contents through a sound classification model for classification, and the alarm module is used for generating alarm information with message codes in corresponding classes and sending the alarm information to the communication server; and an intervention module in the communication server starts corresponding intervention actions according to the alarm information with the message codes.

By the technical scheme, when two parties of the user terminal have a conversation, conversation content is converted into audio streams in real time and sensitive word recognition is carried out, when sensitive information is detected, which sound or what state or scene sound of the current audio is can be recognized, alarm information of corresponding categories is sent to the communication server, timely intervention is achieved, risk information such as sensitive words, distress sounds and abnormal sounds appearing in a plurality of business scenes can be recognized, corresponding intervention actions are started, and the purposes of purifying conversation environment and timely processing unexpected events are achieved.

As a further description of the AI technology based real-time intercom intervention and alarm platform of the present invention, preferably, the media resource control server includes a master server and a plurality of slave servers, the MRCP client communicates with the master server, the master server communicates with the plurality of slave servers, so that the MRCP client sends the IP address and the port number of the user terminal to the master server, and the master server controls the idle slave servers to establish a communication connection with the MRCP client.

As a further description of the real-time intercom intervention and alarm platform based on the AI technology, preferably, the speech recognition engine includes a word segmentation module and a semantic analysis module; the word segmentation module is used for dividing the text content into a word vector set according to the word segmentation set and transmitting the word vector set to the semantic analysis module, and the semantic analysis module is used for performing semantic analysis on the word vector set, preliminarily determining classification categories corresponding to the word vector set and transmitting the classification categories to the training module.

As a further description of the real-time intercom intervention and alarm platform based on the AI technology, preferably, the alarm module includes an encoder, an alarm information generation module and an alarm information transmission module; the training module is connected with the encoder, the encoder is connected with the warning information generating module, the warning information generating module is connected with the warning information transmitting module, and the warning information transmitting module is connected with the intervention module; the encoder is used for receiving the sensitive information and the audio frequency classification result of the training module, generating a corresponding message code and sending the message code to the warning information generation module, the warning information generation module is used for generating warning information with the message code after receiving the message code, and the warning information emission module is used for sending the warning information with the message code to the intervention module.

As a further description of the AI technology based real-time intercom intervention and alarm platform of the present invention, preferably, the intervention module includes an alarm information receiving module, a decoder, an interruption intervention module, a reminding intervention module, and a keyword silencing module; the alarm information transmitting module is connected with the alarm information receiving module, the alarm information receiving module is connected with the decoder, and the decoder is respectively connected with the interruption intervention module, the reminding intervention module and the keyword silencing module; the device comprises a warning information receiving module, a decoder, an interruption intervention module, a reminding intervention module or a keyword silencing module, wherein the warning information receiving module is used for receiving warning information with message codes of the warning information transmitting module and transmitting the warning information to the decoder, the decoder is used for analyzing the message codes and starting the interruption intervention module, the reminding intervention module or the keyword silencing module according to the message codes, the interruption intervention module is used for cutting off conversation of a user terminal, the reminding intervention module is used for sending a text warning or inserting voice to the user terminal, and the keyword silencing module is used for silencing sensitive words in communication contents of the user terminal.

As a further description of the real-time intercom intervention and alarm platform based on the AI technology described in the present invention, preferably, the AI voice training and recognition platform includes a database module, and the database module is configured to store the sensitive word data set and the audio classification data set, and provide model training data of a training set and a test set for the training module.

As a further description of the AI technology based real-time intercom intervention and alarm platform of the present invention, preferably, the communication server, the media resource control server and the AI voice training and recognition platform are connected by real-time media stream transmission.

As a further description of the AI technology based real-time intercom intervention and alarm platform of the present invention, preferably, the communication server communicates with the media resource control server through an SIP protocol.

As a further description of the AI technology based real-time intercom intervention and alarm platform of the present invention, preferably, the speech recognition model and the sound classification model are deployed on a private CPU/GPU server.

By the technical scheme, the model is used in an intranet or a non-network environment, and data privacy is ensured.

As a further description of the AI technology-based real-time intercom intervention and alarm platform of the present invention, preferably, the speech recognition model and the sound classification model are paddlepaddlepaddlefluid and Kaldi-based speech recognition systems DeepASR.

Through the technical scheme, the DeepASR completes the configuration and training of the acoustic model in the speech recognition by using the Fluid framework, integrates a Kaldi decoder, realizes the rapid and large-scale training of the acoustic model, and completes the complex speech data preprocessing and the final decoding process by using the Kaldi.

The invention has the beneficial effects that: the invention provides an intervention and alarm platform supporting real-time talkback, which can be accessed to a plurality of different user terminals through a communication server, when two user terminals are in conversation, conversation contents are converted into audio streams in real time and sensitive word recognition is carried out through communication connection established between the communication server and a media resource control server and between an AI voice training and recognition platform, when sensitive information is detected, which type of sound or what state or scene sound is the current audio can be recognized, alarm information of corresponding types is sent to the communication server, timely intervention is realized, risk information such as sensitive words, distress sounds, abnormal sounds and the like appearing in a plurality of service scenes can be recognized, and corresponding intervention actions are started, so that the purposes of purifying conversation environment and timely processing unexpected events are achieved.

Drawings

Fig. 1 is a schematic structural diagram of the real-time intercom intervention and alarm platform based on the AI technology of the present invention.

Fig. 2 is a schematic structural diagram of a media resource control server according to the present invention.

FIG. 3 is a schematic diagram of the structure of the speech recognition engine of the present invention.

FIG. 4 is a schematic structural diagram of an alarm module and an intervention module of the present invention.

Detailed Description

To further understand the structure, characteristics and other objects of the present invention, the following detailed description is given with reference to the accompanying preferred embodiments, which are only used to illustrate the technical solutions of the present invention and are not to limit the present invention.

First, referring to fig. 1, fig. 1 is a schematic structural diagram of a real-time intercom intervention and alarm platform based on AI technology according to the present invention. The real-time talkback intervention and alarm platform based on the AI technology comprises a communication server 1, a media resource control server 2 and an AI voice training and recognition platform 3.

The communication server 1 is electrically connected and in signal connection with the media resource control server 2, the communication server is used for providing communication service, and the communication server 1 comprises an MRCP client 11, a user agent 12, a session communication component 13 and an intervention module 14; the user agent 12 is used for accessing a plurality of user terminals, the session communication component 13 is connected with the user agent 12, the session communication component 13 is used for acquiring communication contents of the user terminals in real time and converting the communication contents into audio media streams, the MRCP client 11 is connected with the session communication component 13, the MRCP client 11 is connected with the media resource control server 2, and the MRCP client 11 is used for pulling the audio media streams in real time and sending the audio media streams to the media resource control server 2.

The media resource control server 2 is electrically connected and in signal connection with the AI voice training and recognition platform 3, and the media resource control server 2 is used for converting an audio media stream into text contents and sending the text contents to the AI voice training and recognition platform 3. As shown in fig. 2, the media resource control server 2 includes a master server 21 and a plurality of slave servers 22, the MRCP client 11 communicates with the master server 21, the master server 21 communicates with the plurality of slave servers 22, so that the MRCP client 11 sends the IP address and the port number of the user terminal to the master server 21, and the master server 21 controls the idle slave servers 22 to establish a communication connection with the MRCP client 11.

The AI voice training and recognition platform 3 is electrically connected and in signal connection with the communication server 1, and the AI voice training and recognition platform 3 comprises a voice recognition engine 31, a training module 31 and an alarm module 33; the media resource control server 2 is connected to the speech recognition engine 31, the speech recognition engine 31 is configured to receive text content of the media resource control server 2, and the speech recognition engine 31 is connected to the training module 32, where as shown in fig. 3, the speech recognition engine 31 includes a word segmentation module 311 and a semantic analysis module 312; the word segmentation module 311 is configured to divide the text content into a word vector set according to the word segmentation set and transmit the word vector set to the semantic analysis module 312, and the semantic analysis module 312 is configured to perform semantic analysis on the word vector set, preliminarily determine a classification category corresponding to the word vector set, and transmit the classification category to the training module 32. The training module 32 comprises a voice recognition model and a sound classification model, the training module 32 is used for recognizing sensitive information related to text content through the voice recognition model and recognizing audio related to the text content through the sound classification model for classification, the training module 32 is connected with the alarm module 33, the alarm module 33 is connected with the intervention module 14, and the alarm module 33 is used for generating alarm information with message codes of corresponding classes and sending the alarm information to the communication server 1 when the training module 32 detects that the sensitive information exists; the intervention module 14 in the communication server 1 initiates a corresponding intervention action according to the alarm information with the message code. As shown in fig. 4, the alarm module 33 includes an encoder 331, an alarm information generating module 332, and an alarm information transmitting module 333; the training module 32 is connected with the encoder 331, the encoder 331 is connected with the alarm information generating module 332, the alarm information generating module 332 is connected with the alarm information transmitting module 333, and the alarm information transmitting module 333 is connected with the intervention module 14; the encoder 331 is configured to receive the sensitive information and the audio classification result of the training module 32, generate a corresponding message code, and send the message code to the warning information generating module 332, the warning information generating module 332 is configured to generate warning information with the message code after receiving the message code, and the warning information transmitting module 333 is configured to send the warning information with the message code to the intervention module 14. The intervention module 14 comprises an alarm information receiving module 141, a decoder 142, an interruption intervention module 143, a reminding intervention module 144 and a keyword silencing module 145; the warning information transmitting module 333 is connected with the warning information receiving module 141, the warning information receiving module 141 is connected with the decoder 142, and the decoder 142 is respectively connected with the interruption intervention module 143, the reminding intervention module 144 and the keyword silencing module 145; the warning information receiving module 141 is configured to receive warning information with a message code from the warning information transmitting module 333 and send the warning information to the decoder 142, the decoder 142 is configured to analyze the message code and start the interruption intervention module 143, the prompt intervention module 144, or the keyword silencing module 145 according to the message code, the interruption intervention module 143 is configured to cut off a call of the user terminal, the prompt intervention module 144 is configured to send a text warning or insert a voice to the user terminal, and the keyword silencing module 145 is configured to perform silencing processing on a sensitive word in communication content of the user terminal. Therefore, the intervention actions initiated by the intervention module 14 include call disconnection, alarm issuance, call insertion, and sensitive word silencing of the communication content of the user terminal. The encoder 331 in the alarm module 33 and the decoder 142 in the intervention module 14, and the alarm information transmitting module 333 in the alarm module 33 and the alarm information receiving module 141 in the intervention module 14 are all matched with each other, so as to ensure the correct transmission of the alarm information and the accuracy of decoding and encoding, and improve the security. When the AI voice training and recognition platform detects that sensitive information exists, the AI voice training and recognition platform can recognize which kind of sound the current audio is, or what state or scene sound, send alarm information of corresponding categories to the communication server, and perform timely intervention, can be used for recognizing risk information such as sensitive words, distress sounds, abnormal sounds and the like appearing in a plurality of service scenes, and start corresponding intervention actions, so as to achieve the purposes of purifying conversation environment and timely handling the occurrence of unexpected events.

Preferably, the AI speech training and recognition platform 3 further includes a database module 34, the database module 34 is connected to the training module 32, and the database module 34 is configured to store the sensitive word data set and the audio classification data set, and provide the training module 32 with model training data of a training set and a testing set. The AI speech training and recognition platform 3 is provided with a real-time speech transcription interface adopting a websocket protocol connection mode, and can realize the real-time recognition of the audio stream into characters while uploading the audio and acquiring the recognition result. The speech recognition model and the sound classification model are the PaddlePaddle Fluid and Kaldi based speech recognition system DeepASR. The DeepASR utilizes a Fluid framework to complete the configuration and training of an acoustic model in speech recognition, integrates a Kaldi decoder, realizes the rapid and large-scale training of the acoustic model, and utilizes the Kaldi decoder to complete complex speech data preprocessing and final decoding processes. The trained voice recognition model and the trained sound classification model are deployed on a private CPU/GPU server, and the models are used in an intranet or a non-network environment, so that data privacy is ensured. The model may also be published as an API, used by calling the model.

Preferably, the communication server 1, the media resource control server 2 and the AI voice training and recognition platform 3 are connected through real-time media streaming. The communication server 1 communicates with the media resource control server 2 by the SIP protocol. The MRCP client 11 includes an SIP protocol stack and an MRCP protocol stack, where the MRCP protocol stack of the MRCP client 11 is used to call an API interface of the media resource control server 2, and the API interface creates an SIP dialog through the SIP protocol stack of the MRCP client 11 and carries information of the media resource control server 2; the SIP protocol stack of the MRCP client 11 is used to initialize a media session to the media resource control server 2 through RTP, and create a control session to the media resource control server 2 through the MRCP protocol stack of the MRCP client 11. The media resource control server 2 also comprises an MRCP protocol stack and an SIP protocol stack, and the media resource control server 2 comprises various media resources such as speech recognition, speech synthesis, speech recording, speaker verification, voiceprint matching.

The real-time talkback intervention and alarm platform based on the AI technology can be applied to live broadcast audio, live broadcast equipment is connected to a user agent 12 of a communication server 1, the communication server 1 sends the live broadcast room audio to a media resource control server 2, the media resource control server 2 processes the audio into a text and sends the text to an AI voice training and recognition platform 3, the AI voice training and recognition platform 3 detects whether the text of the live broadcast room audio has sensitive words, can perform silencing treatment on the sensitive words, sends an alarm message to the communication server 1 for silencing treatment, or cuts off the live broadcast and sends the alarm message to the live broadcast room, so that the manpower supervision cost is saved, the content safety of the live broadcast room is ensured, and the network environment is purified; the method can also be applied to identifying conversation contents and intervening and processing the occurrence of unexpected events in time; the method can also be applied to public places such as schools, banks and the like, and the AI voice training and recognition platform 3 recognizes the audio content and timely processes the occurrence of unexpected events.

It should be noted that the above summary and the detailed description are intended to demonstrate the practical application of the technical solutions provided by the present invention, and should not be construed as limiting the scope of the present invention. Various modifications, equivalent substitutions, or improvements may be made by those skilled in the art within the spirit and principles of the invention. The scope of the invention is to be determined by the appended claims.

10页详细技术资料下载

上一篇：一种医用注射器针头装配设备

下一篇：用于使用移动装置来控制车辆操作的系统及其相关方法

Real-time talkback intervention and alarm platform based on AI technology

相关技术

网友询问留言