Audio and video conversation method, device, equipment and storage medium based on artificial intelligence

文档序号：1893500 发布日期：2021-11-26 浏览：12次中文

阅读说明：本技术 基于人工智能的音视频对话方法、装置、设备及存储介质 (Audio and video conversation method, device, equipment and storage medium based on artificial intelligence ) 是由黄良斌于 2021-08-31 设计创作，主要内容包括：本申请涉及人工智能技术领域,揭示了一种基于人工智能的音视频对话方法、装置、设备及存储介质,其中方法包括：通过第一聊天客户端响应对话开始请求得到目标浏览器；调用目标浏览器根据目标对话邀请链接加载第一会客厅页面；通过第一会客厅页面,将用户输入的第一音视频数据发送给会议服务端,会议服务端用于将第一音视频数据发送给目标聊天客户端的第二会客厅页面,将第一音视频数据更新到音视频对话数据库中；通过第一会客厅页面,从会议服务端获取与第二会客厅页面对应的第二音视频数据进行展示。采用聊天客户端调用浏览器加载的会客厅页面与会议服务端通信进行音视频对话,将音视频数据发送给会议服务端,有利于企业进行音视频数据的存档。(The application relates to the technical field of artificial intelligence, and discloses an audio and video conversation method, device, equipment and storage medium based on artificial intelligence, wherein the method comprises the following steps: responding to the conversation starting request through the first chat client to obtain a target browser; calling a target browser to load a first living room page according to the target conversation invitation link; the method comprises the steps that first audio and video data input by a user are sent to a conference server through a first meeting room page, the conference server is used for sending the first audio and video data to a second meeting room page of a target chat client, and the first audio and video data are updated to an audio and video conversation database; and acquiring second audio and video data corresponding to the second living room page from the conference server for displaying through the first living room page. And the chat client is adopted to call the conference room page loaded by the browser to communicate with the conference server to carry out audio and video conversation, and the audio and video data is sent to the conference server, so that the enterprise can archive the audio and video data.)

1. An artificial intelligence based audio-video dialog method, characterized in that the method comprises:

obtaining a conversation starting request through a first chat client, wherein the conversation starting request carries a target conversation invitation link;

responding to the conversation starting request through the first chat client to start a browser to obtain a target browser;

calling the target browser, and loading a living room page according to the target session invitation link to obtain a first living room page;

acquiring first audio and video data input by a user through the first meeting room page, and sending the first audio and video data to a conference server, wherein the conference server is used for sending the first audio and video data to a second meeting room page of a target chat client, and updating the first audio and video data into an audio and video conversation database;

and acquiring second audio and video data corresponding to the second living room page from the conference server through the first living room page, and displaying the second audio and video data.

2. The artificial intelligence based audio-visual conversation method according to claim 1, wherein before the step of obtaining the conversation start request through the first chat client, further comprising:

calling a conference terminal through a second chat client to acquire a conversation reservation request, wherein the conversation reservation request carries conversation configuration data;

generating a meeting room reservation creation request according to the conversation configuration data through the conference terminal, and sending the meeting room reservation creation request to the conference server;

acquiring the target conversation invitation link sent by the conference server according to the conversation configuration data through the conference terminal;

and sending the target dialogue invitation link to a message dialogue box of the second chat client through the conference terminal, wherein the target dialogue invitation link is used for being sent to the message dialogue box corresponding to the target chat client through the second chat client.

3. The audio-video conversation method based on artificial intelligence according to claim 2, wherein after the step of generating a meeting room reservation creation request according to the conversation configuration data by the conference terminal and sending the meeting room reservation creation request to the conference server, the method further comprises:

acquiring the meeting room reservation record sent by the meeting service end through the meeting terminal, and updating a meeting room reservation record table according to the meeting room reservation record;

acquiring a meeting room starting request according to the meeting room reservation recording table through the conference terminal, wherein the meeting room starting request carries a target meeting room identifier which is the same as the meeting room identifier in the target session invitation link;

sending the meeting room starting request to the meeting service end through the meeting terminal, wherein the meeting service end starts the meeting room according to the target meeting room identification in the meeting room starting request to obtain a target meeting room;

loading a page corresponding to the target conference room through the conference terminal to obtain a conference room page corresponding to the conference terminal;

the step of calling the target browser and loading the conference room page according to the target session invitation link to obtain a first conference room page comprises the following steps:

and calling the target browser, and loading a page corresponding to the target living room according to the target session invitation link to obtain the page of the first living room.

4. The artificial intelligence based audio-video conversation method of claim 1, wherein the step of invoking the target browser and loading a living room page according to the target conversation invitation link to obtain a first living room page comprises:

calling the target browser, generating a resource acquisition request according to a meeting room link address in the target session invitation link, and sending the resource acquisition request to the meeting server;

calling the target browser to acquire the conference room client resources sent by the conference server according to the resource acquisition request;

calling the target browser, and loading a meeting room client plug-in according to the meeting room client resource;

and calling the target browser to execute the plug-in of the client side of the meeting room, and loading a page of the meeting room according to the identification of the meeting room and the password of the meeting room in the target session invitation link to obtain the page of the first meeting room.

5. The artificial intelligence based audio/video conversation method according to claim 4, wherein the step of obtaining first audio/video data input by a user through the first living room page and sending the first audio/video data to a conference server comprises:

acquiring audio and video data to be processed input by a user through the first living room page;

calling the meeting room client plug-in through the first meeting room page, and respectively carrying out denoising processing and echo cancellation processing on the audio and video data to be processed to obtain audio and video data to be encoded;

calling the meeting room client plug-in through the first meeting room page to acquire a preset coding mode determination rule;

calling the client plug-in unit of the conference room through the page of the first conference room, and determining a coding mode according to the audio and video data to be coded and the preset coding mode determination rule to obtain a target coding mode;

calling the meeting room client plug-in through the first meeting room page, and coding the audio and video data to be coded according to the coding rule of the target coding mode to obtain the first audio and video data;

and calling the meeting room client plug-in through the first meeting room page, and sending the first audio and video data to the meeting server according to the transmission channel identifier of the target coding mode.

6. The artificial intelligence based audio-video conversation method according to claim 1, wherein said step of invoking said target browser, loading a living room page according to said target conversation invitation link, and obtaining a first living room page further comprises:

acquiring a screen sharing request through the first living room page, sending the screen sharing request to the conference server, and acquiring a screen sharing start signal sent by the conference server according to the screen sharing request;

responding to the screen sharing starting signal to acquire preset screen capturing configuration data through the first living room page;

calling a screen capture module of a client plug-in of the meeting room through the first meeting room page, and capturing a screen according to the preset screen capture configuration data to obtain an ith screen capture image;

calling the client plug-in of the living room to acquire the screen capture image of the (i-1) th time;

calling an image processing module of the conference room client plug-in unit, and acquiring image difference data according to the ith screen capture image and the (i-1) th screen capture image to obtain image difference data to be encrypted;

calling a symmetric encryption module of the conference room client plug-in unit, and symmetrically encrypting the image difference data to be encrypted to obtain ith image difference data and a symmetric encryption key;

calling an asymmetric encryption module of the conference room client plug-in unit, and carrying out asymmetric encryption on the symmetric encryption key to obtain a target encryption key;

and sending the ith image difference data and the target encryption key to the conference server through the first living room page.

7. The artificial intelligence based audio-video conversation method according to claim 1, wherein when the first chat client is configured with a conference terminal, said invoking the target browser, loading a conference room page according to the target conversation invitation link, and after the step of obtaining the first conference room page, further comprising:

calling a conversation assistant of the conference terminal through the first living room page to acquire a real-time portrait request;

acquiring data from the conference server by the conversation assistant according to portrait configuration data carried by the real-time portrait request to obtain data to be portrait;

calling a preset user portrait model through the conversation assistant, and performing portrait according to the data to be portrait to obtain a target portrait result;

sending the target portrait result to the first living room page through the conversation assistant;

and displaying the target portrait result through the first living room page.

8. An artificial intelligence based audio-video dialog device, characterized in that the device comprises:

the request acquisition module is used for acquiring a conversation start request through a first chat client, wherein the conversation start request carries a target conversation invitation link;

the target browser determining module is used for responding to the conversation starting request through the first chat client to start a browser to obtain a target browser;

the first living room page determining module is used for calling the target browser and loading a living room page according to the target conversation invitation link to obtain a first living room page;

the audio and video data uploading module is used for acquiring first audio and video data input by a user through the first meeting room page and sending the first audio and video data to the conference server, wherein the conference server is used for sending the first audio and video data to a second meeting room page of a target chat client and updating the first audio and video data into an audio and video conversation database;

and the audio and video data display module is used for acquiring second audio and video data corresponding to the second meeting room page from the meeting server through the first meeting room page and displaying the second audio and video data.

9. A computer device comprising a memory and a processor, the memory storing a computer program, wherein the processor implements the steps of the method of any one of claims 1 to 7 when executing the computer program.

10. A computer-readable storage medium, on which a computer program is stored, which, when being executed by a processor, carries out the steps of the method of any one of claims 1 to 7.

Technical Field

The present application relates to the field of artificial intelligence technologies, and in particular, to an audio/video dialog method, apparatus, device, and storage medium based on artificial intelligence.

Background

With the technical development of public chat tools, companies have started to use chat tools to realize three-terminal interconnection of companies, employees and clients, so that on one hand, the companies can be effectively helped to manage the employees, the management is more convenient and faster in the aspects of flexible card punching, efficient meeting, leaving succession and the like, and on the other hand, the employees are assisted to provide client services with enterprise identities to realize client reaching and client conversion. Although the public chatting tool opens the audio and video functions of voice call, video call, conference, live broadcast and the like, the audio and video data cannot be synchronized to enterprises, the archiving of the audio and video data by the enterprises is not facilitated, and the client service by adopting the universal chatting tool cannot be suitable for the enterprises which pay attention to the information archiving and the safety compliance.

Disclosure of Invention

The application mainly aims to provide an audio and video conversation method, device, equipment and storage medium based on artificial intelligence, and aims to solve the technical problems that an enterprise cannot be synchronized with audio and video data by adopting a universal chat tool for customer service, the enterprise is not facilitated to archive the audio and video data, and the method cannot be applied to the enterprise paying attention to information archiving and safety compliance.

In order to achieve the above object, the present application provides an audio/video dialog method based on artificial intelligence, the method comprising:

obtaining a conversation starting request through a first chat client, wherein the conversation starting request carries a target conversation invitation link;

responding to the conversation starting request through the first chat client to start a browser to obtain a target browser;

calling the target browser, and loading a living room page according to the target session invitation link to obtain a first living room page;

and acquiring second audio and video data corresponding to the second living room page from the conference server through the first living room page, and displaying the second audio and video data.

The application also provides an audio and video dialogue device based on artificial intelligence, the device includes:

the request acquisition module is used for acquiring a conversation start request through a first chat client, wherein the conversation start request carries a target conversation invitation link;

the target browser determining module is used for responding to the conversation starting request through the first chat client to start a browser to obtain a target browser;

The present application further proposes a computer device comprising a memory and a processor, the memory storing a computer program, the processor implementing the steps of any of the above methods when executing the computer program.

The present application also proposes a computer-readable storage medium having stored thereon a computer program which, when being executed by a processor, carries out the steps of the method of any of the above.

The method comprises the steps of firstly obtaining a conversation starting request through a first chat client, wherein the conversation starting request carries a target conversation invitation link, responding to the conversation starting request through the first chat client to start a browser to obtain a target browser, calling the target browser, loading a meeting room page according to the target conversation invitation link to obtain a first meeting room page, then obtaining first audio and video data input by a user through the first meeting room page, and sending the first audio and video data to a conference server, wherein the conference server is used for sending the first audio and video data to a second meeting room page of the target chat client and updating the first audio and video data to an audio and video conversation database, and through the first meeting room page, second audio and video data corresponding to the second meeting room page are obtained from the conference server, and the second audio and video data are displayed, so that the communication between the meeting room page loaded by the browser called by the chat client and the conference server is realized, the audio and video data are sent to the conference server, and an enterprise can archive the audio and video data through the conference server, and the method for client service by adopting a general chat tool is suitable for enterprises paying attention to information archiving and safety compliance.

Drawings

FIG. 1 is a schematic flow chart of an audio-video conversation method based on artificial intelligence according to an embodiment of the present application;

fig. 2 is a schematic block diagram of a structure of an audio/video conversation device based on artificial intelligence according to an embodiment of the present application;

fig. 3 is a block diagram illustrating a structure of a computer device according to an embodiment of the present application.

The implementation, functional features and advantages of the objectives of the present application will be further explained with reference to the accompanying drawings.

Detailed Description

In order to make the objects, technical solutions and advantages of the present application more apparent, the present application is described in further detail below with reference to the accompanying drawings and embodiments. It should be understood that the specific embodiments described herein are merely illustrative of the present application and are not intended to limit the present application.

Referring to fig. 1, an embodiment of the present application provides an audio/video conversation method based on artificial intelligence, where the method includes:

s1: obtaining a conversation starting request through a first chat client, wherein the conversation starting request carries a target conversation invitation link;

s2: responding to the conversation starting request through the first chat client to start a browser to obtain a target browser;

s3: calling the target browser, and loading a living room page according to the target session invitation link to obtain a first living room page;

s4: acquiring first audio and video data input by a user through the first meeting room page, and sending the first audio and video data to a conference server, wherein the conference server is used for sending the first audio and video data to a second meeting room page of a target chat client, and updating the first audio and video data into an audio and video conversation database;

s5: and acquiring second audio and video data corresponding to the second living room page from the conference server through the first living room page, and displaying the second audio and video data.

The embodiment first obtains a session start request through a first chat client, the session start request carries a target session invitation link, the first chat client responds to the session start request to start a browser to obtain a target browser, calls the target browser, loads a living room page according to the target session invitation link to obtain a first living room page, then obtains first audio and video data input by a user through the first living room page, and sends the first audio and video data to a conference server, wherein the conference server is used for sending the first audio and video data to a second living room page of the target chat client, updating the first audio and video data to an audio and video session database, and obtains second audio and video data corresponding to the second living room page from the conference server through the first living room page, and displaying the second audio and video data, realizing that the communication between the conference room page loaded by the browser and the conference server is carried out by adopting the chat client to call the browser to carry out audio and video conversation, and sending the audio and video data to the conference server, which is beneficial to an enterprise to archive the audio and video data through the conference server, so that the method for carrying out customer service by adopting a general chat tool is suitable for the enterprise which emphasizes information archiving and safety compliance.

For S1, the user clicks the target dialog invitation link in the first chat client, and triggers the dialog start request, and when the dialog start request is triggered, the target dialog invitation link is used as a parameter of the dialog start request.

The first chat client is the client of the chat tool which needs to carry out audio and video conversation.

Clients of the chat tool include, but are not limited to: enterprise wechat clients, personal wechat clients.

The conversation start request is a request for performing an audio-video conversation.

The target dialog invitation link includes: the meeting room links the address and the meeting room information. The information of the living room includes but is not limited to: conversation topic, conversation start time, meeting room identification, and meeting room password. The living room identifier may be data that uniquely identifies one living room, such as a living room ID and a living room name. It will be appreciated that a user clicking on the meeting room link address in the first chat client will trigger the session start request.

For S2, starting a browser by the first chat client upon receiving the conversation start request, and regarding the started browser as a target browser.

For S3, based on WebRtc (web-originated instant messaging), the target browser is called by the first chat client, the living room link address of the target session invitation link is loaded, the living room client resource is obtained from the conference server according to the living room link address, the living room client resource is loaded to obtain the target browser with the living room client plug-in installed, then the target browser with the living room client plug-in installed is used to load the living room page according to the living room identifier and the living room password in the target session invitation link, and the living room page obtained by loading is used as the first living room page.

For S4, the conference server is a server for performing conference management.

Calling a meeting room client plug-in through the first meeting room page, acquiring first audio and video data input by a user, and sending the first audio and video data to a conference server; the conference server is used for sending the first audio and video data to a second conference room page of the target chat client; and the second living room page is used for displaying the received first audio and video data, so that audio and video conversation is realized.

The target chat client, i.e. the client of the chat tool.

The conference server side updates the first audio and video data to an audio and video conversation database, so that the first audio and video data are backed up, information archiving and safety compliance emphasizing enterprises can archive the audio and video data, and requirements emphasizing information archiving and safety compliance are met.

Enterprises that focus on information archiving and security compliance include, but are not limited to: insurance companies, banks.

It will be appreciated that the second hall page may be a hall page loaded by other users via the first chat client of the other electronic device.

For S5, audio and video data are acquired from the conference server in real time through the first living room page to serve as second audio and video data, videos of the second audio and video data are displayed through a screen of local equipment, and audios of the second audio and video data are played through a loudspeaker of the local equipment, wherein the second audio and video data are the audio and video data sent to the conference server by the second living room page, so that audio and video conversation between the first living room page and the second living room page is achieved.

It is to be understood that steps S4 and S5 are continuously executed until the session end request is acquired.

It is to be understood that the first chat client may be a client of the chat tool configured with the conference terminal, or may not be a client of the chat tool configured with the conference terminal.

And the conference terminal is a client for conference management.

In an embodiment, before the step of obtaining, by the first chat client, the session start request, the method further includes:

s11: calling a conference terminal through a second chat client to acquire a conversation reservation request, wherein the conversation reservation request carries conversation configuration data;

s12: generating a meeting room reservation creation request according to the conversation configuration data through the conference terminal, and sending the meeting room reservation creation request to the conference server;

s13: acquiring the target conversation invitation link sent by the conference server according to the conversation configuration data through the conference terminal;

s14: and sending the target dialogue invitation link to a message dialogue box of the second chat client through the conference terminal, wherein the target dialogue invitation link is used for being sent to the message dialogue box corresponding to the target chat client through the second chat client.

In the embodiment, the second chat client calls the conference terminal to generate the conversation reservation and the target conversation invitation link, and finally the target conversation invitation link is sent to the message dialog box corresponding to the target chat client through the second chat client, so that the client based on the universal chat tool shares the conversation reservation and the target conversation invitation link, and the efficiency of client service by adopting the universal chat tool is improved.

For S11, the conference terminal is called by the second chat client, and the conversation reservation request input by the user is obtained.

The conversation reservation request is a request for reservation in a living room.

The dialog configuration data includes, but is not limited to: conversation topic, conversation start time, customer name, and customer identification. The client identification may be a client ID for uniquely identifying a client.

The second chat client, that is, the client of the chat tool that needs to make a conference reservation.

And the second chat client is loaded with a conference terminal.

And the conference terminal is a third-party application developed according to the application docking specification of the second chat client. The conference terminal is configured into a chat toolbar of the second chat client. Authorized users can perform functional operations such as conversation reservation, audio and video conversation, conversation data archiving, conversation summarization and the like in the conference terminal of the second chat client.

It can be understood that the conference terminal is an application developed based on the WebRtc technology.

It will be appreciated that the second chat client uses the loaded meeting room pages described in steps S1 through S5 for the audiovisual conversation. That is, the first chat room page may also be a meeting room page loaded by the user through the second chat client, and the second meeting room page may also be a meeting room page loaded by the user through the second chat client.

For step S12, generating, by the conference terminal, a conference room reservation creation request, and when the conference room reservation creation request is generated, using the session configuration data as a parameter of the conference room reservation creation request; and the conference terminal sends the meeting room reservation creation request to the conference server through the communication connection with the conference server.

The meeting room reservation creation request is a request for reservation creation of a meeting room.

It can be understood that the conference server performs reservation creation of the conference room and generation of the target session invitation link according to the session configuration data carried by the living room reservation creation request, so as to obtain a living room reservation record and the target session invitation link.

The meeting room reservation records include, but are not limited to: conversation topic, conversation start time, meeting room identification, client name, client identification, and conversation originator identification. The conversation initiator identification is a user identification corresponding to the conversation reservation request.

For S13, the conference terminal obtains the target session invitation link sent by the conference server according to the session configuration data through a communication connection with the conference server.

For S14, the target conversation invitation link is synchronized to the second chat client by sending the target conversation invitation link to the message dialog of the second chat client through the conference terminal. Through the communication connection between the second chat client and the target chat client, the user sends the target conversation invitation link to a message dialog box corresponding to the target user identifier in the second chat client through a sharing function in the message dialog box of the second chat client; and the target conversation invitation link is displayed in a message dialog box corresponding to the target chat client corresponding to the target user identification.

In an embodiment, after the step of generating, by the conference terminal, a conference room reservation creation request according to the session configuration data and sending the conference room reservation creation request to the conference server, the method further includes:

s121: acquiring the meeting room reservation record sent by the meeting service end through the meeting terminal, and updating a meeting room reservation record table according to the meeting room reservation record;

s122: acquiring a meeting room starting request according to the meeting room reservation recording table through the conference terminal, wherein the meeting room starting request carries a target meeting room identifier which is the same as the meeting room identifier in the target session invitation link;

s123: sending the meeting room starting request to the meeting service end through the meeting terminal, wherein the meeting service end starts the meeting room according to the target meeting room identification in the meeting room starting request to obtain a target meeting room;

s124: loading a page corresponding to the target conference room through the conference terminal to obtain a conference room page corresponding to the conference terminal;

the step of calling the target browser and loading the conference room page according to the target session invitation link to obtain a first conference room page comprises the following steps:

s12: 5: and calling the target browser, and loading a page corresponding to the target living room according to the target session invitation link to obtain the page of the first living room.

According to the embodiment, the conference terminal acquires the conference room starting request to determine the target conference room, so that support is provided for rapidly holding the conference of the audio and video conversation, and the efficiency of customer service by adopting a general chat tool is further improved.

For step S121, the conference terminal obtains the conference room reservation record sent by the conference server according to the user identifier corresponding to the conference terminal through the communication connection with the conference server, and updates the conference room reservation record to the conference room reservation record table of the conference terminal.

For step S122, the user clicks a "initiate conference" button corresponding to the target conference room identifier in the conference room reservation recording table of the conference terminal, so as to implement one-click start of the conference room; and when a 'initiate conference' button corresponding to the target living room identification is clicked, the living room starting request is triggered, and the target living room identification is used as a parameter of the living room starting request.

For step S123, the conference terminal sends the meeting room starting request to the conference service terminal through the communication connection with the conference service terminal, so as to request the conference service terminal to start the meeting room; and when the conference server receives the conference room starting request, starting the conference room according to the target conference room identifier in the conference room starting request, taking the created conference room as the target conference room, and generating a conference room starting completion signal.

In another embodiment, the conference server may start the conference room according to the active start condition of the conference room and the session start time in the reservation record of the conference room corresponding to the identifier of the target conference room, and use the started conference room as the target conference room. For example, the living room active opening condition is to open 10 minutes ahead, which is not specifically limited by this example.

For step S124, the conference terminal obtains a conference room start completion signal sent by the conference server through communication connection with the conference server, loads a page corresponding to the target conference room according to the conference room start completion signal, and takes the loaded page as a conference room page corresponding to the conference terminal. Therefore, the user can carry out audio and video conversation with the first meeting room page and the second meeting room page through the meeting room page corresponding to the meeting terminal.

And S125, the target browser is called, the page corresponding to the target living room is loaded according to the target session invitation link, and the loaded page is used as the first living room page.

In an embodiment, the step of invoking the target browser and loading the conference room page according to the target session invitation link to obtain the first conference room page includes:

s31: calling the target browser, generating a resource acquisition request according to a meeting room link address in the target session invitation link, and sending the resource acquisition request to the meeting server;

s32: calling the target browser to acquire the conference room client resources sent by the conference server according to the resource acquisition request;

s33: calling the target browser, and loading a meeting room client plug-in according to the meeting room client resource;

s34: and calling the target browser to execute the plug-in of the client side of the meeting room, and loading a page of the meeting room according to the identification of the meeting room and the password of the meeting room in the target session invitation link to obtain the page of the first meeting room.

In this embodiment, the target browser is called, the target session invitation link is loaded with the guest room client plug-in, and then the guest room page is loaded by using the target browser loaded with the guest room client plug-in according to the guest room identifier and the guest room password, so that the guest room client plug-in and the guest room are automatically loaded, the operation of customers is reduced, the user experience is improved, and the efficiency of customer service by using a general chat tool is further improved.

For S31, the target browser is called to load the conference room link address in the target session invitation link, a resource acquisition request is generated according to the conference room link address, and then the target browser sends the resource acquisition request to the conference server through the communication connection with the conference server.

When receiving the resource acquisition request, the conference server sends the conference room client resource corresponding to the resource acquisition request to the target browser through the communication connection with the target browser.

The parlor client resources include, but are not limited to: and (5) audio and video module resources.

The target browser provides an API for real-time voice conversation and/or video conversation based on a WebRtc technology, and the conference room client plug-in is loaded according to the conference room client resources, so that loading of the audio and video module is completed, and the page calls the audio and video module to perform real-time voice conversation and/or video by calling the target browser loaded with the conference room client plug-in.

And S32, the target browser is called, and the meeting room client resources sent by the meeting service side according to the resource acquisition request are acquired through the communication connection with the meeting service side.

It is understood that the parlor client resources further include resources, such as CSS files, JS files, and this example is not limited in particular.

And S33, calling the target browser, and loading a guest room client plug-in according to the guest room client resource, wherein the guest room client plug-in comprises an audio and video module (namely a module obtained by installing the audio and video module resource), and the target browser provides an API for real-time voice conversation and/or video conversation based on the WebRtc technology, so that the target browser calls the audio and video module to carry out real-time voice conversation and/or video conversation.

And S34, calling the target browser to execute the plug-in of the meeting room client, loading a page of the meeting room according to the identification and the password of the meeting room in the target session invitation link, and taking the loaded page of the meeting room as the page of the first meeting room, wherein in the process of loading the page of the meeting room, the identification and the password of the meeting room are automatically filled in the page of the meeting room by executing the information entry module of the plug-in of the meeting room client, so that the loading of the meeting room is automatically carried out, the operation of a user is reduced, and the user experience is improved.

In an embodiment, the step of obtaining first audio/video data input by a user through the first living room page and sending the first audio/video data to the conference server includes:

s51: acquiring audio and video data to be processed input by a user through the first living room page;

s52: calling the meeting room client plug-in through the first meeting room page, and respectively carrying out denoising processing and echo cancellation processing on the audio and video data to be processed to obtain audio and video data to be encoded;

s53: calling the meeting room client plug-in through the first meeting room page to acquire a preset coding mode determination rule;

s54: calling the client plug-in unit of the conference room through the page of the first conference room, and determining a coding mode according to the audio and video data to be coded and the preset coding mode determination rule to obtain a target coding mode;

s55: calling the meeting room client plug-in through the first meeting room page, and coding the audio and video data to be coded according to the coding rule of the target coding mode to obtain the first audio and video data;

s56: and calling the meeting room client plug-in through the first meeting room page, and sending the first audio and video data to the meeting server according to the transmission channel identifier of the target coding mode.

In the embodiment, the audio and video data to be processed are respectively subjected to denoising processing and echo cancellation processing, then the conference room client plug-in is called through the first conference room page, the coding mode is determined according to the audio and video data to be coded and the preset coding mode determination rule, finally the audio and video data are coded according to the determined coding mode and the coded data are sent to the conference server, the conference room client plug-in is called through the first conference room page, and the coding mode is determined according to the audio and video data to be coded and the preset coding mode determination rule, so that the determined coding mode is more in line with the actual requirement, the smoothness of audio and video conversation is improved, and the effect of adopting a general chat tool to perform client service is improved.

For S51, the audio/video module of the parlor client plug-in is called through the first parlor page, audio/video data input by a user through a camera and/or a microphone of the electronic equipment where the first parlor page is located is obtained, and the obtained audio/video data is used as audio/video data to be processed.

For S52, the first meeting room page calls the audio/video module of the meeting room client plug-in, the audio/video data to be processed is denoised to obtain denoised audio/video data, and then the denoised audio/video data is subjected to echo cancellation to obtain audio/video data to be encoded, so that the quality of the audio/video data to be encoded is improved, the effect of using a general chat tool for customer service is improved, and the quality of archiving the audio/video data is also improved.

For step S53, the first living room page calls the living room client plug-in to execute an audio/video module, and a preset encoding mode determination rule is obtained from a storage space of the living room client plug-in.

Wherein the preset encoding mode determination rule includes: mode determination rules and coding modes. The pattern determination rule includes: data volume range.

The data volume range includes: a start value of the amount of data and an end value of the amount of data. The data volume refers to the size of the audio and video data.

The encoding mode includes: coding rules and transmission channel identification. When the encoding rule is a narrowband encoding rule, the transmission channel identification may be a narrowband identification. When the coding rule is a wideband coding rule, the transmission channel identification may be a wideband identification.

It is to be understood that the mode determination rule of the preset encoding mode determination rule may also adopt other rules, for example, the mode determination rule includes: the data size range and the network parameter range are not specifically limited herein.

For step S54, the first living room page calls the living room client plug-in to execute an audio/video module, and according to the data volume of the audio/video data to be encoded, matching is performed in the data volume range of the preset encoding mode determination rule, and an encoding mode corresponding to the matched data volume range is taken as the target encoding mode.

For step S55, the first parlor client plug-in is called to execute an audio/video module, the audio/video data to be encoded is encoded according to the encoding rule of the target encoding mode, and the data obtained by encoding is used as the first audio/video data.

For step S56, the first meeting room client plug-in is called through the first meeting room page to execute an audio/video module, and the first audio/video data is sent to the conference server by using the transmission channel corresponding to the transmission channel identifier of the target coding mode.

In an embodiment, after the step of invoking the target browser and loading the living room page according to the target session invitation link to obtain the first living room page, the method further includes:

s611: acquiring a screen sharing request through the first living room page, sending the screen sharing request to the conference server, and acquiring a screen sharing start signal sent by the conference server according to the screen sharing request;

s612: responding to the screen sharing starting signal to acquire preset screen capturing configuration data through the first living room page;

s613: calling a screen capture module of a client plug-in of the meeting room through the first meeting room page, and capturing a screen according to the preset screen capture configuration data to obtain an ith screen capture image;

s614: calling the client plug-in of the living room to acquire the screen capture image of the (i-1) th time;

s615: calling an image processing module of the conference room client plug-in unit, and acquiring image difference data according to the ith screen capture image and the (i-1) th screen capture image to obtain image difference data to be encrypted;

s616: calling a symmetric encryption module of the conference room client plug-in unit, and symmetrically encrypting the image difference data to be encrypted to obtain ith image difference data and a symmetric encryption key;

s617: calling an asymmetric encryption module of the conference room client plug-in unit, and carrying out asymmetric encryption on the symmetric encryption key to obtain a target encryption key;

s618: and sending the ith image difference data and the target encryption key to the conference server through the first living room page.

The embodiment carries out screen sharing by calling the screen capture module of the plug-in unit of the client side of the conference room, symmetrically encrypts image difference data by the symmetric encryption module and asymmetrically encrypts a symmetric encryption key by the asymmetric encryption module, reduces the transmitted data volume by transmitting the image difference data, improves the real-time performance of screen sharing, and improves the safety of the data shared by the screen by combining the symmetric encryption and the asymmetric encryption, thereby further enabling the application to be applicable to enterprises paying attention to information archiving and safety compliance.

For S611, acquiring a screen sharing request input by a user through the first living room page; and the first meeting room page sends the screen sharing request to the meeting server through the communication connection with the meeting server, and acquires a screen sharing starting signal sent by the meeting server according to the screen sharing request.

And the conference server sends a screen sharing start signal to each conference room page in the same conference room according to the screen sharing request.

For step S612, the screen capture module of the guest room client plug-in is called through the first guest room page, and when the screen sharing start signal is received, preset screen capture configuration data is obtained from the storage space of the guest room client plug-in.

The preset screenshot configuration data includes, but is not limited to: a screen capture interval duration.

And S613, calling a screen capture module of the client plug-in of the living room through the first living room page, carrying out screen capture according to the preset screen capture configuration data, and taking an image obtained by the ith screen capture as an ith screen capture image.

And S614, calling a screen capture module of the plug-in unit of the living room client side, and acquiring the screen capture image of the (i-1) th time from the cache.

The screen capture image of the (i-1) th time is the image obtained by the screen capture module from the (i-1) th time.

And S615, calling an image processing module of the conference room client plug-in, acquiring image difference data of the screen capture image of the (i-1) th time from the screen capture image of the ith time, and taking the acquired image difference data as image difference data to be encrypted.

For step S616, the symmetric encryption module of the living room client plug-in is called to symmetrically encrypt the image difference data to be encrypted, the encrypted image difference data is used as the ith image difference data, and the key corresponding to the symmetric encryption is used as the symmetric encryption key.

For S617, the asymmetric encryption module of the living room client plug-in is called, the symmetric encryption key is asymmetrically encrypted, and the encrypted data is used as the target encryption key.

For S618, the first living room page sends the ith image difference data and the target encryption key to the conference server through a communication connection with the conference server.

And the conference server sends the ith image difference data and the target encryption key to other conference room pages, wherein the other conference room pages firstly carry out asymmetric decryption on the target encryption key to obtain a decrypted encryption key, then adopt the decrypted encryption key to decrypt the ith image difference data, and carry out video bit error updating on the decrypted image difference data.

It is understood that, in another embodiment, the step of invoking the parlor client plug-in to obtain the screen capture image of the (i-1) th time is followed by: obtaining image difference data according to the ith screen capture image and the (i-1) th screen capture image by using an image processing module of the meeting room client plug-in unit to obtain image difference data to be sent; and sending the image difference data to be sent to the conference server through the first living room page.

In an embodiment, when the first chat client is configured with a conference terminal, after the step of invoking the target browser, loading a conference room page according to the target session invitation link, and obtaining a first conference room page, the method further includes:

s71: calling a conversation assistant of the conference terminal through the first living room page to acquire a real-time portrait request;

s72: acquiring data from the conference server by the conversation assistant according to portrait configuration data carried by the real-time portrait request to obtain data to be portrait;

s73: calling a preset user portrait model through the conversation assistant, and performing portrait according to the data to be portrait to obtain a target portrait result;

s74: sending the target portrait result to the first living room page through the conversation assistant;

s75: and displaying the target portrait result through the first living room page.

In the embodiment, the conversation assistant of the conference terminal is used for imaging the client, so that the enterprise staff is assisted in client service, and the client service effect is further improved.

For S71, the conversation assistant of the conference terminal is called through the first living room page, and the real-time representation request input by the user is obtained.

The real-time portrait request is a request for people in the living room to portray people.

And S72, acquiring data from the conference server by the conversation assistant according to the portrait configuration data carried by the real-time portrait request, and using the acquired data as the data to be portrait.

Portrait configuration data includes, but is not limited to: and (4) identifying the user. The user identification may be a user name, a user ID, or other data that uniquely identifies a user (e.g., an employee or a customer).

Data to be rendered includes, but is not limited to: user basic information and historical purchased product information. The user basic information includes but is not limited to: user identification, name, age. Historical purchase product information includes, but is not limited to: time of purchase, product identification, number of purchases, amount of purchases. The product identification may be a product name, a product ID, or the like, which uniquely identifies a product.

And S73, calling a preset user portrait model through the conversation assistant, performing portrait according to the data to be portrait, and taking the portrait obtained data as a target portrait result.

The preset user portrait model is a model obtained based on convolutional neural network training, and the specific training method is not described herein.

For S74, the dialog assistant sends the target representation result to the first living room page corresponding to the dialog assistant, which provides a basis for rapidly showing the target representation result to the user.

And S75, displaying the target portrait result through the first living room page by adopting a preset portrait display rule, so that the target portrait result is displayed for a user, and the assistant of enterprise employees for customer service is realized.

In one embodiment, when the first chat client is configured with a conference terminal, after the step of obtaining, by the first chat room page, second audio/video data corresponding to the second chat room page from the conference server, the method further includes:

s811: text conversion is carried out on the second audio and video data through a conversation assistant called by the conference terminal, and text data to be analyzed are obtained;

s812: calling a preset client intention recognition model through the conversation assistant, and performing client intention recognition on the text data to be analyzed to obtain a client intention recognition result;

s813: obtaining a conversation from a conversation knowledge base according to the client intention recognition result by the conversation assistant to obtain an alternative conversation set;

s814: performing dialect screening from the alternative dialect collection by the dialog assistant by adopting a preset dialect screening rule to obtain a target dialect;

s815: sending, by the conversation assistant, the target conversation to the first living room page;

s816: and displaying the target conversation through the first living room page.

According to the embodiment, the dialect matching is carried out on the second audio and video data, and the matched dialect is displayed on the page of the first living room, so that the client service can be rapidly and effectively carried out by enterprise staff, and the client service effect is further improved.

And S811, performing voice conversion text on the second audio and video data by using an ASR technology through a conversation assistant called by the conference terminal, and taking the text obtained by conversion as text data to be analyzed.

ASR techniques, speech recognition techniques.

For step S812, the preset client intention recognition model is called by the conversation assistant, the client intention recognition is performed on the text data to be analyzed, and the result of the intention recognition is used as the client intention recognition result.

The preset client intention recognition model is an intention recognition model obtained based on the classification model.

For step S813, performing, by the conversation assistant, conversation matching in a conversation knowledge base according to the client intention recognition result, and using each matched conversation as a candidate conversation set.

The dialogies knowledge base includes: intention set and dialect.

For S814, performing, by the conversation assistant, a conversation screening from the alternative conversation collection by using a preset conversation screening rule, and taking the screened conversation as a target conversation. There may be 1 or more target surgeries.

The preset dialect screening rules are set as follows: the preset number of words with the highest frequency of use.

For S815, the conversation assistant sends each target conversation to the first living room page, so as to provide a basis for rapidly presenting each target conversation to the user.

And S816, displaying each target speech by the first living room page according to preset speech display rules, so that each target speech is displayed to the user, and the enterprise staff is assisted in performing client service.

s821: calling a conversation assistant through the conference terminal to obtain a user concentration degree analysis request;

s822: acquiring data from the conference server by the conversation assistant according to concentration configuration data carried by the user concentration analysis request to obtain data to be analyzed;

s823: calling a preset concentration degree prediction model through the conversation assistant, and performing concentration degree analysis according to the data to be analyzed to obtain a concentration degree prediction result;

s824: sending, by the conversation assistant, the concentration prediction result to the first parlor page;

s825: and displaying the concentration degree prediction result through the first living room page.

In the embodiment, the concentration degree analysis is performed on the client through the conversation assistant of the conference terminal, so that the enterprise staff is assisted in performing client service, and the effect of client service is further improved.

For S821, the conversation assistant of the conference terminal is called through the first living room page, and the user concentration analysis request input by the user is obtained.

The user concentration analysis request is a request for concentration analysis of a person in the living room.

For step S822, the conversation assistant acquires data from the conference server according to concentration configuration data carried by the user concentration analysis request, and uses the acquired data as data to be analyzed.

Concentration configuration data includes, but is not limited to: user identification, analysis time range.

Data to be analyzed includes, but is not limited to: and identifying and analyzing the communication text data corresponding to the time range by the user. The communication text data is a text obtained by the conference server through audio conversion according to the audio frequency in the audio and video data.

And S823, calling a preset concentration degree prediction model through the conversation assistant, performing concentration degree analysis according to the data to be analyzed, and taking the data obtained by the concentration degree analysis as a concentration degree prediction result.

The concentration prediction model is preset, and a model for performing concentration prediction according to the communication text can be selected from the prior art, which is not described herein.

The concentration prediction result is a multi-dimensional vector and is used for describing the user's attention index to the product and the communication. And the attention index reflects the quality measurement index.

For S824, the conversation assistant sends the concentration prediction result to the first living room page corresponding to the conversation assistant, so as to provide a basis for displaying the concentration prediction result to the user quickly.

And S825, displaying the concentration degree prediction result by adopting a preset concentration degree display rule through the first living room page, so that the concentration degree prediction result is displayed for the user, and the enterprise staff is assisted in carrying out client service.

Referring to fig. 2, the present application also proposes an artificial intelligence based audio-video dialog device, the device comprising:

a request obtaining module 100, configured to obtain, by a first chat client, a conversation start request, where the conversation start request carries a target conversation invitation link;

a target browser determining module 200, configured to respond to the session start request through the first chat client to start a browser, so as to obtain a target browser;

a first living room page determining module 300, configured to invoke the target browser, and load a living room page according to the target session invitation link to obtain a first living room page;

the audio and video data uploading module 400 is configured to acquire first audio and video data input by a user through the first living room page, and send the first audio and video data to a conference server, where the conference server is configured to send the first audio and video data to a second living room page of a target chat client, and update the first audio and video data to an audio and video conversation database;

and the audio and video data display module 500 is configured to obtain, from the conference server, second audio and video data corresponding to the second living room page through the first living room page, and display the second audio and video data.

Referring to fig. 3, a computer device, which may be a server and whose internal structure may be as shown in fig. 3, is also provided in the embodiment of the present application. The computer device includes a processor, a memory, a network interface, and a database connected by a system bus. Wherein the computer designed processor is used to provide computational and control capabilities. The memory of the computer device comprises a nonvolatile storage medium and an internal memory. The non-volatile storage medium stores an operating system, a computer program, and a database. The memory provides an environment for the operation of the operating system and the computer program in the non-volatile storage medium. The database of the computer equipment is used for storing data such as audio and video conversation methods based on artificial intelligence. The network interface of the computer device is used for communicating with an external terminal through a network connection. The computer program is executed by a processor to implement an artificial intelligence based audiovisual dialog method. The audio and video conversation method based on artificial intelligence comprises the following steps: obtaining a conversation starting request through a first chat client, wherein the conversation starting request carries a target conversation invitation link; responding to the conversation starting request through the first chat client to start a browser to obtain a target browser; calling the target browser, and loading a living room page according to the target session invitation link to obtain a first living room page; acquiring first audio and video data input by a user through the first meeting room page, and sending the first audio and video data to a conference server, wherein the conference server is used for sending the first audio and video data to a second meeting room page of a target chat client, and updating the first audio and video data into an audio and video conversation database; and acquiring second audio and video data corresponding to the second living room page from the conference server through the first living room page, and displaying the second audio and video data.

An embodiment of the present application further provides a computer-readable storage medium, on which a computer program is stored, and the computer program, when executed by a processor, implements an artificial intelligence based audio/video conversation method, including the steps of: obtaining a conversation starting request through a first chat client, wherein the conversation starting request carries a target conversation invitation link; responding to the conversation starting request through the first chat client to start a browser to obtain a target browser; calling the target browser, and loading a living room page according to the target session invitation link to obtain a first living room page; acquiring first audio and video data input by a user through the first meeting room page, and sending the first audio and video data to a conference server, wherein the conference server is used for sending the first audio and video data to a second meeting room page of a target chat client, and updating the first audio and video data into an audio and video conversation database; and acquiring second audio and video data corresponding to the second living room page from the conference server through the first living room page, and displaying the second audio and video data.

The audio and video conversation method based on artificial intelligence comprises the steps of firstly obtaining a conversation starting request through a first chat client, wherein the conversation starting request carries a target conversation invitation link, responding to the conversation starting request through the first chat client to start a browser to obtain a target browser, calling the target browser, loading a meeting room page according to the target conversation invitation link to obtain a first meeting room page, then obtaining first audio and video data input by a user through the first meeting room page, sending the first audio and video data to a conference server, wherein the conference server is used for sending the first audio and video data to a second meeting room page of the target chat client, updating the first audio and video data into an audio and video conversation database, and enabling the first audio and video data to pass through the first meeting room page, and acquiring second audio and video data corresponding to the second meeting room page from the meeting server, and displaying the second audio and video data, so that the meeting room page loaded by calling a browser by adopting the chat client is communicated with the meeting server to carry out audio and video conversation, and the audio and video data is sent to the meeting server, and an enterprise can archive the audio and video data through the meeting server, so that the method for client service by adopting a general chat tool is suitable for enterprises paying attention to information archiving and safety compliance.

It will be understood by those skilled in the art that all or part of the processes of the methods of the embodiments described above can be implemented by hardware instructions of a computer program, which can be stored in a non-volatile computer-readable storage medium, and when executed, can include the processes of the embodiments of the methods described above. Any reference to memory, storage, database, or other medium provided herein and used in the examples may include non-volatile and/or volatile memory. Non-volatile memory can include read-only memory (ROM), Programmable ROM (PROM), Electrically Programmable ROM (EPROM), Electrically Erasable Programmable ROM (EEPROM), or flash memory. Volatile memory can include Random Access Memory (RAM) or external cache memory. By way of illustration and not limitation, RAM is available in a variety of forms such as Static RAM (SRAM), Dynamic RAM (DRAM), Synchronous DRAM (SDRAM), double-rate SDRAM (SSRSDRAM), Enhanced SDRAM (ESDRAM), synchronous link (Synchlink) DRAM (SLDRAM), Rambus Direct RAM (RDRAM), direct bus dynamic RAM (DRDRAM), and bus dynamic RAM (RDRAM).

It should be noted that, in this document, the terms "comprises," "comprising," or any other variation thereof, are intended to cover a non-exclusive inclusion, such that a process, apparatus, article, or method that comprises a list of elements does not include only those elements but may include other elements not expressly listed or inherent to such process, apparatus, article, or method. Without further limitation, an element defined by the phrase "comprising an … …" does not exclude the presence of other like elements in a process, apparatus, article, or method that includes the element.

The above description is only a preferred embodiment of the present application, and not intended to limit the scope of the present application, and all modifications of equivalent structures and equivalent processes, which are made by the contents of the specification and the drawings of the present application, or which are directly or indirectly applied to other related technical fields, are also included in the scope of the present application.

21页详细技术资料下载

Audio and video conversation method, device, equipment and storage medium based on artificial intelligence

相关技术

网友询问留言