Information processing method, device and computer readable storage medium

文档序号：1617234 发布日期：2020-01-10 浏览：5次中文

阅读说明：本技术 信息处理方法、装置及计算机可读存储介质 (Information processing method, device and computer readable storage medium ) 是由陈昊亮许敏强于 2019-10-15 设计创作，主要内容包括：本发明公开了一种信息处理方法,包括如下步骤：获取视频会议系统所接收到的音频信息,以及所述音频信息对应的用户信息；基于所述音频信息和所述用户信息,确定视频会议当前发言人所发言的文字信息；基于所述文字信息,在所述视频会议系统的显示屏幕中的视频文本框显示所述文字信息。本发明还公开了一种信息处理装置及计算机可读存储介质。本发明实时将发言人当前发言的音频信息和发言人的用户信息转换成文字信息,并将文字信息实时显示于视频会议系统中的显示屏幕上,解决了参会人员容易错漏会议的重要内容的问题,并且能够快速在显示屏幕上输出会议记录的内容,提高了时效性和实用性,方便参会人员更好地了解和掌握会议内容。(The invention discloses an information processing method, which comprises the following steps: acquiring audio information received by a video conference system and user information corresponding to the audio information; determining the text information spoken by the current speaker of the video conference based on the audio information and the user information; displaying the text information in a video text box in a display screen of the video conference system based on the text information. The invention also discloses an information processing device and a computer readable storage medium. The invention converts the current speaking audio information of the speaker and the user information of the speaker into the text information in real time, and displays the text information on the display screen in the video conference system in real time, thereby solving the problem that participants easily miss important contents of the conference, rapidly outputting the recorded contents of the conference on the display screen, improving the timeliness and the practicability, and facilitating the participants to better understand and master the conference contents.)

1. An information processing method characterized by comprising the steps of:

acquiring audio information received by a video conference system and user information corresponding to the audio information;

determining the text information spoken by the current speaker of the video conference based on the audio information and the user information;

displaying the text information in a video text box in a display screen of the video conference system based on the text information.

2. The information processing method according to claim 1, wherein the step of acquiring the audio information received by the video conference system and the user information corresponding to the audio information comprises:

acquiring the audio information received by the video conference system;

determining voiceprint characteristic information in the audio information based on the audio information;

and determining user information matched with the voiceprint characteristic information in a preset voiceprint information base based on the voiceprint characteristic information.

3. The information processing method according to claim 2, wherein the step of determining, based on the voiceprint feature information, user information in a preset voiceprint information base that matches the voiceprint feature information comprises:

detecting whether user information matched with the voiceprint characteristic information exists in the preset voiceprint information base or not;

if the user information matched with the voiceprint characteristic information exists in the preset voiceprint information base, the user information is obtained;

and if the user information matched with the voiceprint characteristic information does not exist in the preset voiceprint information base, creating user information corresponding to the voiceprint characteristic information in the preset voiceprint information base, and correspondingly storing the voiceprint characteristic information.

4. The information processing method of claim 1, wherein the step of determining text information spoken by a current speaker of the videoconference based on the audio information and the user information comprises:

determining audio track information corresponding to the audio information based on the audio information;

determining a plurality of sentence blocks corresponding to the audio information based on the audio track information;

and determining the text information spoken by the current speaker of the video conference based on the plurality of statement blocks and the user information.

5. The information processing method according to claim 4, wherein the plurality of sentence blocks includes a first sentence block, a second sentence block, or a third sentence block, and the step of determining the plurality of sentence blocks to which the audio information corresponds based on the track information includes:

detecting pause information in the audio track information;

if the pause information is greater than or equal to a first preset threshold value, determining the first statement block corresponding to the audio information;

if the pause information is smaller than the first preset threshold and larger than a second preset threshold, determining the second statement block corresponding to the audio information, wherein the second preset threshold is smaller than the first preset threshold;

and if the pause information is smaller than or equal to the second preset threshold, determining the third statement block corresponding to the audio information.

6. The information processing method of claim 1, wherein after the step of determining text information spoken by a current speaker of the videoconference based on the audio information and the user information, further comprising:

acquiring conference template information in the video conference system;

determining the conference recording content of the video conference based on the text information and the conference template information;

and determining the meeting record text of the video meeting process based on the meeting record content.

7. The information processing method according to claim 1, wherein before the step of obtaining the audio information received by the video conference system and the user information corresponding to the audio information, the method further comprises:

if the first opening instruction of the video text box is detected, displaying a first preset area and a second preset area in a display screen of the video conference system, displaying a first video image of the video conference in the first preset area, and displaying the video text box in the second preset area.

8. The information processing method according to any one of claims 1 to 7, wherein, after the step of displaying the text information in a video text box in a display screen of the video conference system based on the text information, further comprising:

and if the second opening instruction of the video text box is detected, displaying a second video image of the video conference in a display screen of the video conference system, and displaying the video text box on the video image.

9. An information processing apparatus characterized by comprising: memory, processor and information processing program stored on the memory and executable on the processor, which when executed by the processor implements the steps of the information processing method according to any one of claims 1 to 8.

10. A computer-readable storage medium, characterized in that an information processing program is stored thereon, which when executed by a processor implements the steps of the information processing method according to any one of claims 1 to 8.

Technical Field

The present invention relates to the field of communications technologies, and in particular, to an information processing method and apparatus, and a computer-readable storage medium.

Background

The video conference has a design idea facing users and a user interface with multi-party interaction, and users can conveniently and independently hold a conference and carry out conference control in own offices or conference rooms of companies, thereby bringing great convenience to enterprises or users.

However, in the current video conference, after a user registers and logs in an account of a video conference system, in the process of performing a remote video conference, the user needs to manually type through a keyboard in the video conference system to output content points of the conference process to a public screen for viewing by conference participants. However, in practice, the input by manual typing is slow, and the content of the speaker in the conference is too much, so that the important content of the conference is easily missed.

The above is only for the purpose of assisting understanding of the technical aspects of the present invention, and does not represent an admission that the above is prior art.

Disclosure of Invention

The invention mainly aims to provide an information processing method, an information processing device and a computer readable storage medium, and aims to solve the technical problem that important contents of a conference are easily missed.

In order to achieve the above object, the present invention provides an information processing method including the steps of:

acquiring audio information received by a video conference system and user information corresponding to the audio information;

determining the text information spoken by the current speaker of the video conference based on the audio information and the user information;

displaying the text information in a video text box in a display screen of the video conference system based on the text information.

In an embodiment, the step of acquiring the audio information received by the video conference system and the user information corresponding to the audio information includes:

acquiring the audio information received by the video conference system;

determining voiceprint characteristic information in the audio information based on the audio information;

and determining user information matched with the voiceprint characteristic information in a preset voiceprint information base based on the voiceprint characteristic information.

In an embodiment, the step of determining, based on the voiceprint feature information, user information in a preset voiceprint information base that matches the voiceprint feature information includes:

detecting whether user information matched with the voiceprint characteristic information exists in the preset voiceprint information base or not;

if the user information matched with the voiceprint characteristic information exists in the preset voiceprint information base, the user information is obtained;

In an embodiment, the step of determining text information spoken by a current speaker of the video conference based on the audio information and the user information includes:

determining audio track information corresponding to the audio information based on the audio information;

determining a plurality of sentence blocks corresponding to the audio information based on the audio track information;

and determining the text information spoken by the current speaker of the video conference based on the plurality of statement blocks and the user information.

In an embodiment, the plurality of sentence blocks includes a first sentence block, a second sentence block or a third sentence block, and the determining the plurality of sentence blocks corresponding to the audio information based on the audio track information includes:

detecting pause information in the audio track information;

if the pause information is greater than or equal to a first preset threshold value, determining the first statement block corresponding to the audio information;

and if the pause information is smaller than or equal to the second preset threshold, determining the third statement block corresponding to the audio information.

In an embodiment, after the step of determining text information spoken by a current speaker of the video conference based on the audio information and the user information, the method further includes:

acquiring conference template information in the video conference system;

determining the conference recording content of the video conference based on the text information and the conference template information;

and determining the meeting record text of the video meeting process based on the meeting record content.

In an embodiment, before the step of obtaining the audio information received by the video conference system and the user information corresponding to the audio information, the method further includes:

In one embodiment, after the step of displaying the text information in a video text box in a display screen of the video conference system based on the text information, the method further includes:

Further, to achieve the above object, the present invention also provides an information processing apparatus comprising: the information processing method comprises a memory, a processor and an information processing program which is stored on the memory and can run on the processor, wherein the information processing program realizes the steps of the information processing method when being executed by the processor.

Further, to achieve the above object, the present invention also provides a computer-readable storage medium having stored thereon an information processing program which, when executed by a processor, realizes the steps of the information processing method as described above.

The invention determines the text information spoken by the current speaker of the video conference based on the audio information and the user information corresponding to the audio information by acquiring the audio information received by the video conference system and the user information corresponding to the audio information, displays the text information in a video text box in a display screen of the video conference system based on the text information, converts the audio information spoken by the speaker currently and the user information spoken by the speaker into the text information in real time, and displays the text information on the display screen in the video conference system in real time, so that the participants can see the content spoken by the current speaker and the identity of the speaker, thereby solving the problem that the participants easily miss important content of the conference due to slow manual typing input and excessive speaking content of the speaker, and rapidly outputting the content of the conference record on the display screen, the timeliness and the practicability are improved, and the participants can know and master the conference content better.

Drawings

FIG. 1 is a schematic diagram of an information processing apparatus in a hardware operating environment according to an embodiment of the present invention;

fig. 2 is a flowchart illustrating an information processing method according to a first embodiment of the present invention.

The implementation, functional features and advantages of the objects of the present invention will be further explained with reference to the accompanying drawings.

Detailed Description

It should be understood that the specific embodiments described herein are merely illustrative of the invention and are not intended to limit the invention.

As shown in fig. 1, fig. 1 is a schematic structural diagram of an information processing apparatus in a hardware operating environment according to an embodiment of the present invention.

As shown in fig. 1, the information processing apparatus may include: a processor 1001, such as a CPU, a network interface 1004, a user interface 1003, a memory 1005, a communication bus 1002. Wherein a communication bus 1002 is used to enable connective communication between these components. The user interface 1003 may include a Display screen (Display), an input unit such as a Keyboard (Keyboard), and the optional user interface 1003 may also include a standard wired interface, a wireless interface. The network interface 1004 may optionally include a standard wired interface, a wireless interface (e.g., WI-FI interface). The memory 1005 may be a high-speed RAM memory or a non-volatile memory (e.g., a magnetic disk memory). The memory 1005 may alternatively be a storage device separate from the processor 1001.

Alternatively, the information processing apparatus may further include a camera, an RF (Radio Frequency) circuit, a sensor, an audio circuit, a WiFi module, and the like.

Those skilled in the art will appreciate that the information processing apparatus configuration shown in fig. 1 does not constitute a limitation of the information processing apparatus, and may include more or less components than those shown, or some of the components may be combined, or a different arrangement of components.

As shown in fig. 1, a memory 1005, which is a kind of computer storage medium, may include therein an operating system, a network communication module, a user interface module, and an information processing program.

In the information processing apparatus shown in fig. 1, the network interface 1004 is mainly used for connecting to a backend server and performing data communication with the backend server; the user interface 1003 is mainly used for connecting a client (user side) and performing data communication with the client; and the processor 1001 may be used to call up an information processing program stored in the memory 1005.

In the present embodiment, an information processing apparatus includes: a memory 1005, a processor 1001 and an information processing program stored in the memory 1005 and operable on the processor 1001, wherein when the processor 1001 calls the information processing program stored in the memory 1005, the following operations are performed:

acquiring audio information received by a video conference system and user information corresponding to the audio information;

determining the text information spoken by the current speaker of the video conference based on the audio information and the user information;

displaying the text information in a video text box in a display screen of the video conference system based on the text information.