Method and system for prompting important contents of video conference and transferring and storing important contents

文档序号:575330 发布日期:2021-05-21 浏览:18次 中文

阅读说明:本技术 一种视频会议重要内容提示及转写存储的方法及系统 (Method and system for prompting important contents of video conference and transferring and storing important contents ) 是由 王卫 杨艳芳 于波 于 2019-11-21 设计创作,主要内容包括:本发明公开一种视频会议重要内容提示及转写存储的方法。发言者在讲解重要内容时可在本地客户端发出重要内容提示指令,该指令可触发提示事件,在所有会议成员客户端的视频会话框中提示为重要内容。重要内容提示指令和音频流通过数据通道传输到服务器。服务器对音频流转写,将输出文本返回给会议成员客户端,客户端将文本以字幕形式实时显示在视频会话框中。服务器根据重要内容提示指令对重要内容转写出的文本进行区别处理,然后以会议记录格式生成文本文件保存在服务器。本发明给出重要内容提示可以让会议成员更容易捕捉重要内容;对重要内容作区别处理,方便会议成员会后查看会议重要内容;添加字幕,使得听力方面存在障碍的人员也可以参加会议。(The invention discloses a method for prompting, transcribing and storing important contents of a video conference. The speaker can send an important content prompt instruction at the local client when explaining the important content, and the instruction can trigger a prompt event to prompt the important content in the video session boxes of all the conference member clients. The important content prompt instruction and the audio stream are transmitted to the server through a data channel. The server transcribes the audio stream, the output text is returned to the conference member client, and the client displays the text in a video session box in a subtitle form in real time. And the server performs distinguishing processing on the text transcribed by the important content according to the important content prompting instruction, generates a text file in a conference record format and stores the text file in the server. The invention gives important content prompt to enable conference members to capture important content more easily; the important content is processed in a distinguishing way, so that the important content of the conference can be conveniently checked after meeting of conference members; subtitles are added so that people with hearing impairment can also participate in the conference.)

1. A method for prompting, transcribing and storing important contents of a video conference is characterized by comprising the following steps:

a client of a conference initiator enables a local conference management module, inputs related description information of a conference and sends the related description information to a server, and simultaneously initiates a request for establishing a conference room to the server;

the client of the conference initiator initiates a conference invitation request to invite other conference members to join a conference room, and the clients of all the conference members establish connection;

the client calls a WebRTC audio and video stream acquisition module to acquire audio and video streams and output the audio and video streams to the server; the server calls the relay mixed flow module to process the received audio and video stream in real time, performs voice transcription word processing on the received audio stream to form conference content information, and sends the conference content information back to all the clients and stores the conference content information into a conference record;

all the clients and the server check whether each piece of conference content information carries an important content prompt instruction in real time, if so, the server marks the piece of conference content information and indicates the participant client to prompt in a local video session frame, and the server stores the marked conference content information into a conference record.

2. The method for prompting and transcribing storage of important contents of a video conference as claimed in claim 1, further comprising: the client sends a conference recording request to the server and receives a conference record returned by a conference recording file generation module of the server; the conference recording includes: the related description information of the conference, the name of the speaker and the speaking content thereof.

3. The method for prompting and transcribing the storage of the important contents of the video conference as claimed in claim 1, wherein the conference-related description information comprises conference subject, conference time, conference participants, names of the conference participants and ID binding conditions of the used equipment.

4. The method for prompting, transcribing and storing the important contents of the video conference according to claim 1, wherein the relay mixed flow module mixes the multiple paths of audio and video streams received by the server to process the multiple paths of audio and video streams into one path of audio and video streams, and then transmits the audio and video streams.

5. The method for prompting and storing the important contents of the video conference as claimed in claim 1 or 2, wherein the voice transcription word processing comprises: and the server calls the voice transcription word processing module to perform voice recognition, semantic understanding and correction processing on the received audio stream to form an output text, wherein the output text is used for meeting content information.

6. The method for prompting, transcribing and storing important contents of a video conference according to claim 1, wherein all the clients and the server check whether each piece of conference content information carries an important content prompting instruction in real time, and if so, the server marks the piece of conference content information and instructs the clients to prompt in a local video session frame, including:

when a speaker inputs an important content prompt instruction to a client side of the speaker, the speaker client side enables an important content prompt instruction sending module to trigger an important content prompt event and output the important content prompt event to a server;

when the server detects that the received meeting content information carries an important content prompt instruction, an important content prompt event is triggered, an important content text processing module is called to mark and display the meeting content information for distinguishing from other non-important meeting content information, the meeting content information is stored in a meeting record file generation module, and the meeting content information is sent to all clients; when the server detects that the received conference content information does not carry an important content prompt instruction, the server does not additionally mark the conference content information, directly stores the conference content information into a conference record file generation module and sends the conference content information to all clients;

the client receives the conference content information and detects whether the conference content information carries an important content prompt instruction, if so, the important content information prompt module triggers an important content prompt event, and the client marks and displays the conference content information in a subtitle form for prompting participants; otherwise, no prompt is given.

7. The method of claim 6, wherein the mark is displayed to highlight, bold or enlarge the output text of the conference content information for distinguishing from the non-important content text.

8. A system for prompting, transcribing and storing important contents of a video conference is characterized by comprising: a conference member client and a server;

the conference member client comprises a conference management module, a subtitle display processing module, a module for sending an important content prompt instruction, an important content information prompt module, a conference recording request module and a WebRTC audio and video stream acquisition module, and is used for acquiring audio and video streams and conference related description information, outputting, receiving and displaying conference content information of other conference members; the conference member client is divided into an initiator client, a speaker client and a participant client according to roles;

the server side comprises a relay mixed flow module, a voice transcription word processing module, an important content text processing module and a conference recording file generating module, and is used for receiving an audio and video file of the client side, processing the audio and video file into conference content information, displaying a mark, sending the conference content information back to the client side and storing the conference content information to the server side.

9. A system for video conference vital content prompting and transcription storage as claimed in claim 8,

the conference management module is used for inputting the related description information of the conference, including the conference subject, the conference time, the conference participants, the names of the conference participants and the ID binding condition of the used equipment, and is used for subsequently generating a conference recording text file, and only the initiator client has the authority to use the conference management module;

the caption display processing module is used for processing text information which is returned from the server and is obtained after voice transcription word processing, so that the text information is displayed in a video session frame of a conference member client in a caption mode in real time;

the important content prompt instruction sending module is used for sending an important content prompt instruction by the speaker client, and only the speaker client has the permission to use the important content prompt instruction sending module;

the important content information prompting module is used for processing a received important content prompting instruction sent by the speaker client, triggering an important content prompting event and prompting that the current speaking content is the important content in a video session frame of the conference member client;

and the conference recording request module is used for initiating a request for recording the conference by the conference member client.

10. A system for video conference vital content prompting and transcription storage as claimed in claim 8,

the voice transcription word processing module comprises a voice recognition module and a semantic understanding and correcting module; the voice recognition module is used for processing the received audio stream and converting the audio content into a text; the semantic understanding and correcting module is used for processing the transcribed text, analyzing the conference content and understanding the semantic information of the conference content;

the important content text processing module is used for processing the important content to be distinguished from the non-important content: highlighting, thickening or amplifying;

the conference recording file generation module is used for generating a text file in a conference recording format from the output text subjected to the character transcription processing by the server and storing the text file in the server;

the relay mixed flow module is used for receiving the multi-path audio and video stream transmitted from the conference member client, mixing the multi-path audio and video stream into a path of audio and video stream, and then sending the audio and video stream back to the conference member client.

Technical Field

The invention relates to the technical field of WebRTC real-time video calls, in particular to a method and a system for prompting, transcribing and storing important content based on a WebRTC video conference.

Background

WebRTC, Web-Real-Time Communication (Web-Real-Time Communication) is an API that supports Web browsers for Real-Time voice or video conversations. The method mainly aims to realize the video conference based on the webpage, so that a Web developer can easily and quickly develop rich real-time multimedia application based on a browser without downloading and installing any plug-in, and the Web developer does not need to pay attention to the digital signal processing process of the multimedia and can realize the real-time multimedia application only by writing a simple Javascript program. WebRTC provides a core technology of a video conference, including functions of audio and video acquisition, encoding and decoding, network transmission, display and the like, and also supports cross-platform: windows, Linux, Mac, Android, and iOS.

The invention content is as follows:

the invention provides a method and a system for prompting, transcribing and storing important contents, aiming at emphasizing and storing the spoken contents in a video conference, and aiming at enabling participants to quickly and accurately capture the important contents of the conference so as to avoid missing the important contents.

The invention adopts the following technical scheme for realizing the purpose: a method for prompting, transcribing and storing important contents of a video conference comprises the following steps:

a client of a conference initiator enables a local conference management module, inputs related description information of a conference and sends the related description information to a server, and simultaneously initiates a request for establishing a conference room to the server;

the client of the conference initiator initiates a conference invitation request to invite other conference members to join a conference room, and the clients of all the conference members establish connection;

the client calls a WebRTC audio and video stream acquisition module to acquire audio and video streams and output the audio and video streams to the server; the server calls the relay mixed flow module to process the received audio and video stream in real time, performs voice transcription word processing on the received audio stream to form conference content information, and sends the conference content information back to all the clients and stores the conference content information into a conference record;

all the clients and the server check whether each piece of conference content information carries an important content prompt instruction in real time, if so, the server marks the piece of conference content information and indicates the participant client to prompt in a local video session frame, and the server stores the marked conference content information into a conference record.

Further comprising: the client sends a conference recording request to the server and receives a conference record returned by a conference recording file generation module of the server; the conference recording includes: the related description information of the conference, the name of the speaker and the speaking content thereof.

The conference related description information comprises conference subjects, conference time, conference participants, names of the conference participants and the ID binding condition of the used equipment.

The relay mixed flow module is used for carrying out mixed flow processing on the multi-path audio and video stream received by the server into one path of audio and video stream and then sending the audio and video stream.

The voice transcription word processing comprises the following steps: and the server calls the voice transcription word processing module to perform voice recognition, semantic understanding and correction processing on the received audio stream to form an output text, wherein the output text is used for meeting content information.

All clients and servers check whether each piece of conference content information carries an important content prompt instruction in real time, if so, the server marks the piece of conference content information and instructs the clients to prompt in a local video session frame, and the method comprises the following steps:

when a speaker inputs an important content prompt instruction to a client side of the speaker, the speaker client side enables an important content prompt instruction sending module to trigger an important content prompt event and output the important content prompt event to a server;

when the server detects that the received meeting content information carries an important content prompt instruction, an important content prompt event is triggered, an important content text processing module is called to mark and display the meeting content information for distinguishing from other non-important meeting content information, the meeting content information is stored in a meeting record file generation module, and the meeting content information is sent to all clients; when the server detects that the received conference content information does not carry an important content prompt instruction, the server does not additionally mark the conference content information, directly stores the conference content information into a conference record file generation module and sends the conference content information to all clients;

the client receives the conference content information and detects whether the conference content information carries an important content prompt instruction, if so, the important content information prompt module triggers an important content prompt event, and the client marks and displays the conference content information in a subtitle form for prompting participants; otherwise, no prompt is given.

The mark is displayed by highlighting, bolding or amplifying the output text of the conference content information for distinguishing from the non-important content text.

A system for prompting and transcribing storage of important contents of a video conference comprises: a conference member client and a server;

the conference member client comprises a conference management module, a subtitle display processing module, a module for sending an important content prompt instruction, an important content information prompt module, a conference recording request module and a WebRTC audio and video stream acquisition module, and is used for acquiring audio and video streams and conference related description information, outputting, receiving and displaying conference content information of other conference members; the conference member client is divided into an initiator client, a speaker client and a participant client according to roles;

the server side comprises a relay mixed flow module, a voice transcription word processing module, an important content text processing module and a conference recording file generating module, and is used for receiving an audio and video file of the client side, processing the audio and video file into conference content information, displaying a mark, sending the conference content information back to the client side and storing the conference content information to the server side.

The conference management module is used for inputting the related description information of the conference, including the conference subject, the conference time, the conference participants, the names of the conference participants and the ID binding condition of the used equipment, and is used for subsequently generating a conference recording text file, and only the initiator client has the authority to use the conference management module;

the caption display processing module is used for processing text information which is returned from the server and is obtained after voice transcription word processing, so that the text information is displayed in a video session frame of a conference member client in a caption mode in real time;

the important content prompt instruction sending module is used for sending an important content prompt instruction by the speaker client, and only the speaker client has the permission to use the important content prompt instruction sending module;

the important content information prompting module is used for processing a received important content prompting instruction sent by the speaker client, triggering an important content prompting event and prompting that the current speaking content is the important content in a video session frame of the conference member client;

the conference recording request module is used for initiating a request for recording a conference by a conference member client;

the voice transcription word processing module comprises a voice recognition module and a semantic understanding and correcting module; the voice recognition module is used for processing the received audio stream and converting the audio content into a text; the semantic understanding and correcting module is used for processing the transcribed text, analyzing the conference content and understanding the semantic information of the conference content;

the important content text processing module is used for processing the important content to be distinguished from the non-important content: highlighting, thickening or amplifying;

the conference recording file generation module is used for generating a text file in a conference recording format from the output text subjected to the character transcription processing by the server and storing the text file in the server;

the relay mixed flow module is used for receiving the multi-path audio and video stream transmitted from the conference member client, mixing the multi-path audio and video stream into a path of audio and video stream, and then sending the audio and video stream back to the conference member client.

The invention has the following beneficial effects and advantages:

1. the important content prompt is given, so that the participants can capture the important content to be expressed by the speaker more quickly and accurately, particularly, in a long-time conference, the attention is difficult to keep always focused, and the prompt is given when the speaking content of the speaker is the important content, so that the participants can be effectively prevented from missing the important content of the conference.

2. The text information after the voice transcription word processing is generated into the text file in the conference record format and stored in the server, so that the conference content can be conveniently checked after the conference, and meanwhile, the cost of manpower, time and the like for arranging the conference record is saved.

3. The text with the speech content as the important content is processed differently from the text with the non-important content, such as highlighting, thickening or amplifying, so that the participants can easily find out the important content in the conference after the conference, and the conference important content can be conveniently summarized.

4. And sending the text obtained by the voice transcription word processing to the client of the conference member for processing, and displaying the text in a video session frame in a subtitle form in real time, so that the person with hearing impairment can participate in the conference.

Description of the drawings:

fig. 1 is a schematic diagram of a video conference subtitle display process;

FIG. 2 is a schematic diagram of a video conference important content prompting process;

FIG. 3 is a schematic diagram of a conference record generation and request process;

FIG. 4 is a schematic diagram of system module composition;

the specific implementation mode is as follows:

the invention is described in further detail below with reference to the figures and specific embodiments.

The specific embodiment is carried out on the basis of establishing peer-to-peer connection among the conference member clients, and the peer-to-peer connection is established through the following steps:

a conference initiator inputs the related description information of the conference through a conference management module of a local client, sends the related description information of the conference to a server and initiates a request for establishing a conference room to the server;

the client of the conference initiator initiates a conference invitation request to invite other conference members to join the conference room, and peer-to-peer connection is established among all the clients of the conference members.

An embodiment of the present invention provides a method for displaying subtitles in a video conference, which takes the steps shown in fig. 1 as an example, and includes the following steps:

step 1.0: transmitting the audio stream of the speaker client to a server;

step 1.1: a voice transcription word processing module of the server processes the audio stream and outputs a text;

step 1.2: the server sends the output text obtained by processing to all the conference member clients;

step 1.3: and a subtitle display processing module of the conference member client processes the text information received from the server and displays the text information in a video session box in a subtitle form in real time.

An embodiment of the present invention provides a method for prompting important content of a video conference, which takes the example shown in fig. 2, and includes the following steps:

step 2.0: the speaker client sends out an important content prompt instruction;

step 2.1: setting the value of a Boolean type variable flag from false to true by a speaker client;

step 2.2: prompting that the current speaking content is important content in a video session frame of a speaker client;

step 2.3: the speaker client sends the important content prompt instruction to the server through the data channel;

step 2.4: the server receives an important content prompt instruction, and sets the value of a Boolean variable flag to true from false;

step 2.5: the server sends the important content prompt instruction to the participant client;

step 2.6: the participant client receives an important content prompt instruction, and sets the value of a Boolean type variable flag to true from false;

step 2.7: the current speech content is prompted as important content in the video session box of the participant client.

An embodiment of the present invention provides a method for generating and requesting a conference record, which takes the example shown in fig. 3 as an example, and includes the following steps:

step 3.0: transmitting the audio stream of the speaker client to a server;

step 3.1: a voice transcription word processing module of the server processes the audio stream and outputs a text;

step 3.2: judging whether the server receives an important content prompt instruction sent from a speaker client side or not;

step 3.3: the server does not receive the important content prompt instruction, and the value of the Boolean type variable flag is still false;

step 3.4: the server receives an important content prompt instruction, and sets the value of a Boolean variable flag to true from false;

step 3.5: under the condition that the value of a Boolean variable flag is true, an important content text processing module of the server performs processing, such as highlighting, bolding or amplifying processing, on an important content text, wherein the processing is different from non-important content text;

step 3.6: a conference recording file generation module of the server processes the obtained text information;

step 3.7: storing the processed conference recording text file in a server;

step 3.8: the client of the conference member initiates a request for viewing the text file of the conference record to the server;

step 3.9: and the server sends the meeting record text file to the client initiating the request for viewing the meeting record text file.

The invention provides a system for prompting, transcribing and storing important contents based on a WebRTC video conference, as shown in FIG. 4, the module composition and interaction among the modules are as follows:

the client comprises a WebRTC audio and video stream acquisition module (an audio stream module and a video stream module), a subtitle display processing module, an important content prompt instruction sending module, an important content information prompt module, a conference management module and a conference recording request module. Specifically, the WebRTC audio/video stream collection module is a WebRTC internal module, and is listed in the figure to describe the interaction between the system modules more clearly.

The server comprises a relay mixed flow module, a voice transcription word processing module (a voice recognition module and a semantic understanding and correcting module), an important content text processing module and a conference record file generating module.

The client WebRTC audio and video stream acquisition module transmits the audio and video stream obtained by the client WebRTC audio and video stream acquisition module to the server, the relay mixed flow module of the server processes the audio and video stream, the multi-path audio and video stream is mixed and processed into one path of audio and video stream, and then the audio and video stream is sent back to the client.

And transmitting the audio stream obtained by the audio stream module in the client WebRTC audio/video stream acquisition module to the server. The voice transcription word processing module of the server processes the text, firstly, the voice transcription word processing module processes the text, the voice recognition module processes the text, the voice stream content is transcribed into a text, then, the text is processed by the semantic understanding and correcting module, and the text is corrected to obtain an output text. On one hand, the output text is sent to the client, and a subtitle display processing module of the client processes the output text and displays the processed output text in a video session box in a subtitle form in real time. On the other hand, whether an important content prompt instruction sent by an important content prompt instruction module sent by the client is received or not is judged, if the instruction is received, the output text is handed to an important content text processing module for processing, and then handed to a conference record file generating module for processing, and if the instruction is not received, the output text is handed directly to the conference record file generating module for processing.

And the conference management module processes the obtained related description information of the conference and transmits the related description information to the server. The server receives the relative description information of the meeting from the client and the text obtained by the voice transcription word processing module and the important content text processing module, and the text is processed by the meeting record file generating module to obtain a meeting record text file which is stored in the server for viewing after meeting.

The client initiates a request for viewing the conference record to the server through the request conference record module, the conference record file generation module of the server processes the request, and the conference record text file is sent to the client initiating the request.

While the foregoing is directed to the preferred embodiment of the present invention, it will be understood by those skilled in the art that various changes and modifications may be made without departing from the spirit and scope of the invention as defined in the appended claims.

12页详细技术资料下载
上一篇:一种医用注射器针头装配设备
下一篇:全景镜头及其光学成像方法和图像处理方法

网友询问留言

已有0条留言

还没有人留言评论。精彩留言会获得点赞!

精彩留言,会给你点赞!

技术分类