Information processing method, device and computer readable storage medium

文档序号:1617234 发布日期:2020-01-10 浏览:5次 中文

阅读说明:本技术 信息处理方法、装置及计算机可读存储介质 (Information processing method, device and computer readable storage medium ) 是由 陈昊亮 许敏强 于 2019-10-15 设计创作,主要内容包括:本发明公开了一种信息处理方法,包括如下步骤:获取视频会议系统所接收到的音频信息,以及所述音频信息对应的用户信息;基于所述音频信息和所述用户信息,确定视频会议当前发言人所发言的文字信息;基于所述文字信息,在所述视频会议系统的显示屏幕中的视频文本框显示所述文字信息。本发明还公开了一种信息处理装置及计算机可读存储介质。本发明实时将发言人当前发言的音频信息和发言人的用户信息转换成文字信息,并将文字信息实时显示于视频会议系统中的显示屏幕上,解决了参会人员容易错漏会议的重要内容的问题,并且能够快速在显示屏幕上输出会议记录的内容,提高了时效性和实用性,方便参会人员更好地了解和掌握会议内容。(The invention discloses an information processing method, which comprises the following steps: acquiring audio information received by a video conference system and user information corresponding to the audio information; determining the text information spoken by the current speaker of the video conference based on the audio information and the user information; displaying the text information in a video text box in a display screen of the video conference system based on the text information. The invention also discloses an information processing device and a computer readable storage medium. The invention converts the current speaking audio information of the speaker and the user information of the speaker into the text information in real time, and displays the text information on the display screen in the video conference system in real time, thereby solving the problem that participants easily miss important contents of the conference, rapidly outputting the recorded contents of the conference on the display screen, improving the timeliness and the practicability, and facilitating the participants to better understand and master the conference contents.)

1. An information processing method characterized by comprising the steps of:

acquiring audio information received by a video conference system and user information corresponding to the audio information;

determining the text information spoken by the current speaker of the video conference based on the audio information and the user information;

displaying the text information in a video text box in a display screen of the video conference system based on the text information.

2. The information processing method according to claim 1, wherein the step of acquiring the audio information received by the video conference system and the user information corresponding to the audio information comprises:

acquiring the audio information received by the video conference system;

determining voiceprint characteristic information in the audio information based on the audio information;

and determining user information matched with the voiceprint characteristic information in a preset voiceprint information base based on the voiceprint characteristic information.

3. The information processing method according to claim 2, wherein the step of determining, based on the voiceprint feature information, user information in a preset voiceprint information base that matches the voiceprint feature information comprises:

detecting whether user information matched with the voiceprint characteristic information exists in the preset voiceprint information base or not;

if the user information matched with the voiceprint characteristic information exists in the preset voiceprint information base, the user information is obtained;

and if the user information matched with the voiceprint characteristic information does not exist in the preset voiceprint information base, creating user information corresponding to the voiceprint characteristic information in the preset voiceprint information base, and correspondingly storing the voiceprint characteristic information.

4. The information processing method of claim 1, wherein the step of determining text information spoken by a current speaker of the videoconference based on the audio information and the user information comprises:

determining audio track information corresponding to the audio information based on the audio information;

determining a plurality of sentence blocks corresponding to the audio information based on the audio track information;

and determining the text information spoken by the current speaker of the video conference based on the plurality of statement blocks and the user information.

5. The information processing method according to claim 4, wherein the plurality of sentence blocks includes a first sentence block, a second sentence block, or a third sentence block, and the step of determining the plurality of sentence blocks to which the audio information corresponds based on the track information includes:

detecting pause information in the audio track information;

if the pause information is greater than or equal to a first preset threshold value, determining the first statement block corresponding to the audio information;

if the pause information is smaller than the first preset threshold and larger than a second preset threshold, determining the second statement block corresponding to the audio information, wherein the second preset threshold is smaller than the first preset threshold;

and if the pause information is smaller than or equal to the second preset threshold, determining the third statement block corresponding to the audio information.

6. The information processing method of claim 1, wherein after the step of determining text information spoken by a current speaker of the videoconference based on the audio information and the user information, further comprising:

acquiring conference template information in the video conference system;

determining the conference recording content of the video conference based on the text information and the conference template information;

and determining the meeting record text of the video meeting process based on the meeting record content.

7. The information processing method according to claim 1, wherein before the step of obtaining the audio information received by the video conference system and the user information corresponding to the audio information, the method further comprises:

if the first opening instruction of the video text box is detected, displaying a first preset area and a second preset area in a display screen of the video conference system, displaying a first video image of the video conference in the first preset area, and displaying the video text box in the second preset area.

8. The information processing method according to any one of claims 1 to 7, wherein, after the step of displaying the text information in a video text box in a display screen of the video conference system based on the text information, further comprising:

and if the second opening instruction of the video text box is detected, displaying a second video image of the video conference in a display screen of the video conference system, and displaying the video text box on the video image.

9. An information processing apparatus characterized by comprising: memory, processor and information processing program stored on the memory and executable on the processor, which when executed by the processor implements the steps of the information processing method according to any one of claims 1 to 8.

10. A computer-readable storage medium, characterized in that an information processing program is stored thereon, which when executed by a processor implements the steps of the information processing method according to any one of claims 1 to 8.

Technical Field

The present invention relates to the field of communications technologies, and in particular, to an information processing method and apparatus, and a computer-readable storage medium.

Background

The video conference has a design idea facing users and a user interface with multi-party interaction, and users can conveniently and independently hold a conference and carry out conference control in own offices or conference rooms of companies, thereby bringing great convenience to enterprises or users.

However, in the current video conference, after a user registers and logs in an account of a video conference system, in the process of performing a remote video conference, the user needs to manually type through a keyboard in the video conference system to output content points of the conference process to a public screen for viewing by conference participants. However, in practice, the input by manual typing is slow, and the content of the speaker in the conference is too much, so that the important content of the conference is easily missed.

The above is only for the purpose of assisting understanding of the technical aspects of the present invention, and does not represent an admission that the above is prior art.

Disclosure of Invention

The invention mainly aims to provide an information processing method, an information processing device and a computer readable storage medium, and aims to solve the technical problem that important contents of a conference are easily missed.

In order to achieve the above object, the present invention provides an information processing method including the steps of:

acquiring audio information received by a video conference system and user information corresponding to the audio information;

determining the text information spoken by the current speaker of the video conference based on the audio information and the user information;

displaying the text information in a video text box in a display screen of the video conference system based on the text information.

In an embodiment, the step of acquiring the audio information received by the video conference system and the user information corresponding to the audio information includes:

acquiring the audio information received by the video conference system;

determining voiceprint characteristic information in the audio information based on the audio information;

and determining user information matched with the voiceprint characteristic information in a preset voiceprint information base based on the voiceprint characteristic information.

In an embodiment, the step of determining, based on the voiceprint feature information, user information in a preset voiceprint information base that matches the voiceprint feature information includes:

detecting whether user information matched with the voiceprint characteristic information exists in the preset voiceprint information base or not;

if the user information matched with the voiceprint characteristic information exists in the preset voiceprint information base, the user information is obtained;

and if the user information matched with the voiceprint characteristic information does not exist in the preset voiceprint information base, creating user information corresponding to the voiceprint characteristic information in the preset voiceprint information base, and correspondingly storing the voiceprint characteristic information.

In an embodiment, the step of determining text information spoken by a current speaker of the video conference based on the audio information and the user information includes:

determining audio track information corresponding to the audio information based on the audio information;

determining a plurality of sentence blocks corresponding to the audio information based on the audio track information;

and determining the text information spoken by the current speaker of the video conference based on the plurality of statement blocks and the user information.

In an embodiment, the plurality of sentence blocks includes a first sentence block, a second sentence block or a third sentence block, and the determining the plurality of sentence blocks corresponding to the audio information based on the audio track information includes:

detecting pause information in the audio track information;

if the pause information is greater than or equal to a first preset threshold value, determining the first statement block corresponding to the audio information;

if the pause information is smaller than the first preset threshold and larger than a second preset threshold, determining the second statement block corresponding to the audio information, wherein the second preset threshold is smaller than the first preset threshold;

and if the pause information is smaller than or equal to the second preset threshold, determining the third statement block corresponding to the audio information.

In an embodiment, after the step of determining text information spoken by a current speaker of the video conference based on the audio information and the user information, the method further includes:

acquiring conference template information in the video conference system;

determining the conference recording content of the video conference based on the text information and the conference template information;

and determining the meeting record text of the video meeting process based on the meeting record content.

In an embodiment, before the step of obtaining the audio information received by the video conference system and the user information corresponding to the audio information, the method further includes:

if the first opening instruction of the video text box is detected, displaying a first preset area and a second preset area in a display screen of the video conference system, displaying a first video image of the video conference in the first preset area, and displaying the video text box in the second preset area.

In one embodiment, after the step of displaying the text information in a video text box in a display screen of the video conference system based on the text information, the method further includes:

and if the second opening instruction of the video text box is detected, displaying a second video image of the video conference in a display screen of the video conference system, and displaying the video text box on the video image.

Further, to achieve the above object, the present invention also provides an information processing apparatus comprising: the information processing method comprises a memory, a processor and an information processing program which is stored on the memory and can run on the processor, wherein the information processing program realizes the steps of the information processing method when being executed by the processor.

Further, to achieve the above object, the present invention also provides a computer-readable storage medium having stored thereon an information processing program which, when executed by a processor, realizes the steps of the information processing method as described above.

The invention determines the text information spoken by the current speaker of the video conference based on the audio information and the user information corresponding to the audio information by acquiring the audio information received by the video conference system and the user information corresponding to the audio information, displays the text information in a video text box in a display screen of the video conference system based on the text information, converts the audio information spoken by the speaker currently and the user information spoken by the speaker into the text information in real time, and displays the text information on the display screen in the video conference system in real time, so that the participants can see the content spoken by the current speaker and the identity of the speaker, thereby solving the problem that the participants easily miss important content of the conference due to slow manual typing input and excessive speaking content of the speaker, and rapidly outputting the content of the conference record on the display screen, the timeliness and the practicability are improved, and the participants can know and master the conference content better.

Drawings

FIG. 1 is a schematic diagram of an information processing apparatus in a hardware operating environment according to an embodiment of the present invention;

fig. 2 is a flowchart illustrating an information processing method according to a first embodiment of the present invention.

The implementation, functional features and advantages of the objects of the present invention will be further explained with reference to the accompanying drawings.

Detailed Description

It should be understood that the specific embodiments described herein are merely illustrative of the invention and are not intended to limit the invention.

As shown in fig. 1, fig. 1 is a schematic structural diagram of an information processing apparatus in a hardware operating environment according to an embodiment of the present invention.

As shown in fig. 1, the information processing apparatus may include: a processor 1001, such as a CPU, a network interface 1004, a user interface 1003, a memory 1005, a communication bus 1002. Wherein a communication bus 1002 is used to enable connective communication between these components. The user interface 1003 may include a Display screen (Display), an input unit such as a Keyboard (Keyboard), and the optional user interface 1003 may also include a standard wired interface, a wireless interface. The network interface 1004 may optionally include a standard wired interface, a wireless interface (e.g., WI-FI interface). The memory 1005 may be a high-speed RAM memory or a non-volatile memory (e.g., a magnetic disk memory). The memory 1005 may alternatively be a storage device separate from the processor 1001.

Alternatively, the information processing apparatus may further include a camera, an RF (Radio Frequency) circuit, a sensor, an audio circuit, a WiFi module, and the like.

Those skilled in the art will appreciate that the information processing apparatus configuration shown in fig. 1 does not constitute a limitation of the information processing apparatus, and may include more or less components than those shown, or some of the components may be combined, or a different arrangement of components.

As shown in fig. 1, a memory 1005, which is a kind of computer storage medium, may include therein an operating system, a network communication module, a user interface module, and an information processing program.

In the information processing apparatus shown in fig. 1, the network interface 1004 is mainly used for connecting to a backend server and performing data communication with the backend server; the user interface 1003 is mainly used for connecting a client (user side) and performing data communication with the client; and the processor 1001 may be used to call up an information processing program stored in the memory 1005.

In the present embodiment, an information processing apparatus includes: a memory 1005, a processor 1001 and an information processing program stored in the memory 1005 and operable on the processor 1001, wherein when the processor 1001 calls the information processing program stored in the memory 1005, the following operations are performed:

acquiring audio information received by a video conference system and user information corresponding to the audio information;

determining the text information spoken by the current speaker of the video conference based on the audio information and the user information;

displaying the text information in a video text box in a display screen of the video conference system based on the text information.

Further, the processor 1001 may call the information processing program stored in the memory 1005, and also perform the following operations:

acquiring the audio information received by the video conference system;

determining voiceprint characteristic information in the audio information based on the audio information;

and determining user information matched with the voiceprint characteristic information in a preset voiceprint information base based on the voiceprint characteristic information.

Further, the processor 1001 may call the information processing program stored in the memory 1005, and also perform the following operations:

detecting whether user information matched with the voiceprint characteristic information exists in the preset voiceprint information base or not;

if the user information matched with the voiceprint characteristic information exists in the preset voiceprint information base, the user information is obtained;

and if the user information matched with the voiceprint characteristic information does not exist in the preset voiceprint information base, creating user information corresponding to the voiceprint characteristic information in the preset voiceprint information base, and correspondingly storing the voiceprint characteristic information.

Further, the processor 1001 may call the information processing program stored in the memory 1005, and also perform the following operations:

determining audio track information corresponding to the audio information based on the audio information;

determining a plurality of sentence blocks corresponding to the audio information based on the audio track information;

and determining the text information spoken by the current speaker of the video conference based on the plurality of statement blocks and the user information.

Further, the processor 1001 may call the information processing program stored in the memory 1005, and also perform the following operations:

detecting pause information in the audio track information;

if the pause information is greater than or equal to a first preset threshold value, determining the first statement block corresponding to the audio information;

if the pause information is smaller than the first preset threshold and larger than a second preset threshold, determining the second statement block corresponding to the audio information, wherein the second preset threshold is smaller than the first preset threshold;

and if the pause information is smaller than or equal to the second preset threshold, determining the third statement block corresponding to the audio information.

Further, the processor 1001 may call the information processing program stored in the memory 1005, and also perform the following operations:

acquiring conference template information in the video conference system;

determining the conference recording content of the video conference based on the text information and the conference template information;

and determining the meeting record text of the video meeting process based on the meeting record content.

Further, the processor 1001 may call the information processing program stored in the memory 1005, and also perform the following operations:

if the first opening instruction of the video text box is detected, displaying a first preset area and a second preset area in a display screen of the video conference system, displaying a first video image of the video conference in the first preset area, and displaying the video text box in the second preset area.

Further, the processor 1001 may call the information processing program stored in the memory 1005, and also perform the following operations:

and if the second opening instruction of the video text box is detected, displaying a second video image of the video conference in a display screen of the video conference system, and displaying the video text box on the video image.

The present invention further provides an information processing method, referring to fig. 2, fig. 2 is a schematic flow chart of a first embodiment of the information processing method of the present invention, and the information processing method includes:

step S10, acquiring audio information received by a video conference system and user information corresponding to the audio information;

in this embodiment, when a user or a plurality of users in different geographic areas need to perform a remote video conference, a video conference system is used to establish connection between conference sites in different geographic areas, so that the remote video conference can be started. In a remote video conference, the participants can synchronously see and hear images and sounds of the participants in other meeting places in real time through the display screens in the video conference system. After the video conference system is started, the remote desktops of the multi-party conference places participating in the video conference are connected with the video conference system, the video conference system starts, the video conference system controls the non-speaking branch conference places to stop speaking, namely, the acquisition function of the audio information of the non-speaking branch conference places is closed, and the acquisition function of the audio information of the speaking main conference place is started. In the speaking main meeting place, a speaker is speaking into the microphone of the video conference system, and the audio information generated by the speaker speaking currently can be acquired through the audio information acquisition system. Or, after the main meeting place obtains the audio information, the audio information is converted into original audio information which can be transmitted by a data channel, a video conference system in the branch meeting place firstly obtains the original audio information transmitted from the data channel, the original audio information is information spoken by a speaker in the main meeting place and then is transmitted to the video conference system in the branch meeting place, the video conference system receives the original audio information, converts the original audio information into audio information required by the video conference system, converts the audio information into the audio information and then obtains the audio information.

Further, the speaker in each speaking venue corresponds to a user message, so that each set of audio messages has a matching user message. After the audio information is acquired, the acquired audio information is identified through an audio identification system, and the user information matched with the audio information is detected through a user information matching system. Therefore, after the audio information received by the video conference system is acquired, the audio information-user information matching system can acquire the user information corresponding to the audio information.

It can be understood that the user information of the speaker who needs to speak in the main conference place is stored in the video conference system of the main conference place, the user information of the sub conference place is also stored in each sub conference place, each conference place can mutually share the user information of the speaker stored in each conference place, and the video conference system of each conference place can also set to refuse to share the user information.

Step S20, determining the text information spoken by the current speaker of the video conference based on the audio information and the user information;

in this embodiment, after acquiring the audio information and the user information, the audio information and the user information are converted into text content through an audio/user-text system, the name of the speaker is identified in the user information, and the name of the speaker is placed in front of the text content, and the name of the inventor and the text content are separated by a symbol, so that the text information spoken by the current speaker in the video conference can be determined. The text information comprises the name and the symbol of a speaker and the content spoken by the inventor; the symbol can be any punctuation mark or other symbol, such as a colon, a bar, etc.; the name of the inventor includes a chinese name, an english name, or a name named temporarily, etc.

Specifically, if the received user information includes gender information or position information of the speaker, the audio/user-text system further identifies the gender or position of the speaker during the process of identifying the name of the speaker, and adds the identity of the speaker, such as "mr. or woman" or the position, such as "manager or CEO", after the name of the speaker. If the received user information does not include gender information or position information of the speaker, the audio/user-text system may directly identify the name of the speaker.

For example, suppose that a first manager speaks in a main conference place, an audio information acquisition system in a video conference system of the main conference place acquires audio information currently spoken by the first manager, a user information matching system in the video conference system acquires user information matched by the first manager in the video conference system, and then the audio information and the user information are respectively sent to an audio/user-text system. If the content in the audio information currently spoken by the first manager is 'leadership and colleague, good afternoon', the audio/user-text system converts the received audio information into text content, namely 'leadership and colleague, good afternoon', extracts the name of the first manager from the user information as 'first manager', puts 'first manager' to 'leadership and colleague, good afternoon', uses between 'first manager' and 'leadership and colleague, good afternoon': "connect, thus get the literal information as" first manager: leaders and colleagues, good afternoon ".

And step S30, based on the character information, displaying the character information in a video text box in a display screen of the video conference system.

In this embodiment, after determining the text information spoken by the current speaker in the video conference, a text display system in the video conference system acquires the text information, and displays the acquired text information in a video text box in a display screen of the video conference system, so that participants in each conference room can see the content spoken by the speaker while speaking by the speaker. The video text box is preset in a display screen of the video conference system by a user, can display the content of the speech of the speaker in the main conference place of the speech, can be arranged on the right side or the left side or above or below the video image, and can also be arranged to suspend above the video image.

In the information processing method provided by this embodiment, by acquiring the audio information received by the video conference system and the user information corresponding to the audio information, determining the text information spoken by the current speaker of the video conference based on the audio information and the user information, displaying the text information in a video text box in a display screen of the video conference system based on the text information, converting the audio information spoken by the current speaker and the user information spoken by the speaker into text information in real time, and displaying the text information on the display screen in the video conference system in real time, a participant can see the content spoken by the current speaker and the identity of the speaker, thereby solving the problem that the participant easily misses important content of the conference due to slow manual typing input and excessive speech content of the speaker, and rapidly outputting the content of a conference record on the display screen, the timeliness and the practicability are improved, and the participants can know and master the conference content better.

A second embodiment of the information processing method of the present invention is proposed based on the first embodiment, and in this embodiment, step S10 includes:

step a, acquiring the audio information received by the video conference system;

in this embodiment, during the video conference, the participants can synchronously see and hear the images and sounds of the participants in other conference places in real time through the display screen in the video conference system. After the video conference system is started, the remote desktops of the multi-party conference places participating in the video conference are connected with the video conference system, the video conference system starts, the video conference system controls the non-speaking branch conference places to stop speaking, namely, the acquisition function of the audio information of the non-speaking branch conference places is closed, and the acquisition function of the audio information of the speaking main conference place is started. In the speaking main meeting place, a speaker is speaking into the microphone of the video conference system, and the audio information generated by the speaker speaking currently can be acquired through the audio information acquisition system. Or, after the main meeting place obtains the audio information, the audio information is converted into original audio information which can be transmitted by a data channel, a video conference system in the branch meeting place firstly obtains the original audio information transmitted from the data channel, the original audio information is information spoken by a speaker in the main meeting place and then is transmitted to the video conference system in the branch meeting place, the video conference system receives the original audio information, converts the original audio information into audio information required by the video conference system, converts the audio information into the audio information and then obtains the audio information.

Step b, determining voiceprint characteristic information in the audio information based on the audio information;

in this embodiment, after the video conference system acquires the audio information, since the audio information of the speaker includes the voiceprint feature information, the acquired audio information is analyzed, and the voiceprint feature information of the speaker can be extracted from the audio information to determine the voiceprint feature information in the audio information, so as to further identify the user information or the identity information of the speaker.

And c, determining user information corresponding to the voiceprint characteristic information in a preset voiceprint information base based on the voiceprint characteristic information.

In this embodiment, after determining the voiceprint feature information, the user information matching system obtains the voiceprint feature information corresponding to the audio information of the speaker, and the user information matching system detects the user information matched with the voiceprint feature information in a preset voiceprint information base of the video conference system, that is, detects the user information matched with the audio information.

The conference system of each conference hall has a preset voiceprint information base, the preset voiceprint information base stores voiceprint characteristic information of a speaker in the conference hall and user information corresponding to the voiceprint characteristic information, namely, the voiceprint characteristic information and the user information of the speaker in the main conference hall are stored in the preset voiceprint information base of the main conference hall, the voiceprint characteristic information and the user information of each conference hall are also stored in the preset voiceprint information base of each conference hall, and the preset voiceprint information bases between the conference halls and the conference halls respectively have the voiceprint characteristic information of the conference halls and do not share with each other.

Further, in an embodiment, the step of determining, based on the voiceprint feature information, user information corresponding to the voiceprint feature information in a preset voiceprint information base includes:

d, detecting whether user information matched with the voiceprint characteristic information exists in the preset voiceprint information base or not;

in this embodiment, after obtaining voiceprint feature information in audio information of a speaker, a means for determining user information corresponding to the voiceprint feature information in a preset voiceprint information base is to detect whether user information matched with the voiceprint feature information exists in the preset voiceprint information base, and since the user information and the voiceprint feature of the speaker are pre-stored in the preset voiceprint information base, it can be determined whether the user information of the speaker is pre-stored in a video conference system by detecting whether the voiceprint feature information is pre-stored in the preset voiceprint information base, so as to further determine the user information of the speaker.

Step e, if the user information matched with the voiceprint characteristic information exists in the preset voiceprint information base, acquiring the user information;

in this embodiment, after detecting whether user information matched with the voiceprint feature information exists in the preset voiceprint information base, if the user feature information matched with the voiceprint feature information exists in the preset voiceprint information base, it indicates that the user information of the speaker is stored in the preset voiceprint information base in advance, that is, the voiceprint feature and the user information of the speaker are previously entered in the current video conference system, and at this time, the user information of the speaker is obtained.

And f, if the user information matched with the voiceprint characteristic information does not exist in the preset voiceprint information base, creating the user information corresponding to the voiceprint characteristic information in the preset voiceprint information base, and correspondingly storing the voiceprint characteristic information.

In this embodiment, after detecting whether user information matching with voiceprint feature information exists in a preset voiceprint information base, if user information matching with voiceprint feature information does not exist in the preset voiceprint information base, it is indicated that voiceprint feature information and user information of a user are not pre-stored in the preset voiceprint information base, that is, voiceprint feature information and user information of the speaker are not pre-entered in a current video conference system, at this time, a piece of user information is created in the preset voiceprint information base, and the user information is acquired.

According to the information processing method provided by the embodiment, the voiceprint characteristic information in the audio information is determined by acquiring the audio information received by the video conference system and based on the audio information, the user information matched with the voiceprint characteristic information in the preset voiceprint information base is determined based on the voiceprint characteristic information, namely, the voiceprint characteristic information in the audio information of the speaker is acquired, then the voiceprint characteristic information and the user information of the speaker stored in the preset voiceprint base are detected, and whether the identity of the speaker and the identity of the speaker are legal or not is accurately determined, so that the usability, the practicability and the safety of the video conference system are improved, the management of a large-scale conference video conference system is facilitated, and the video conference system is more intelligent and convenient.

A third embodiment of the information processing method of the present invention is proposed based on the first embodiment, and in this embodiment, step S20 includes:

step g, determining the audio track information corresponding to the audio information based on the audio information;

in this embodiment, after the audio information of the speaker is obtained, the audio information is analyzed, and the track information is extracted from the audio information, that is, the track of the audio signal is analyzed, so that the track of the audio signal is analyzed subsequently, that is, the track information of the audio information is analyzed.

Step h, determining a plurality of statement blocks corresponding to the audio information based on the audio track information;

in this embodiment, in the process of converting the audio information into the text information, after the track information corresponding to the audio information is determined, the track information is analyzed first, that is, the track of the audio signal is analyzed, different sentence blocks are divided by analyzing the track information, and a plurality of sentence blocks are determined, so as to perform sentence breaking on the sentence spoken by the speaker.

And i, determining the text information spoken by the current speaker of the video conference based on the plurality of statement blocks and the user information.

In this embodiment, in the process of converting audio information into text content, after dividing a plurality of sentence blocks, obtaining the plurality of sentence blocks and user information, combining the plurality of sentence blocks and the user information into text information, that is, identifying the name of a speaker in the user information, and placing the name of the speaker in front of the plurality of sentence blocks, the name of an inventor and the plurality of sentence blocks are separated by a symbol, so that the text information spoken by the current speaker in the video conference can be determined.

Further, in an embodiment, the plurality of sentence blocks includes a first sentence block, a second sentence block, or a third sentence block, and the determining, based on the audio track information, the plurality of sentence blocks corresponding to the audio information includes:

step j, detecting pause information in the audio track information;

in this embodiment, in the process of determining a plurality of sentence blocks corresponding to audio information, first, audio track information is obtained, and the audio track information is analyzed to detect pause information therein. By detecting pause information in the audio track information, the speech content of the speaker can be punctuated.

Step k, if the pause information is greater than or equal to a first preset threshold, determining the first statement block corresponding to the audio information;

in this embodiment, after detecting the pause information in the audio track information, if the pause information in the audio track information is greater than or equal to a first preset threshold, which indicates that the pause information is the largest at this time, a first sentence block corresponding to the audio information is determined, and a line is switched after the first sentence block to determine the first sentence block. The first preset threshold may be a time or an energy value, where the time is a pause time of the track information, and the energy value is audio energy.

Step l, if the pause information is smaller than the first preset threshold and larger than a second preset threshold, determining the second statement block corresponding to the audio information, wherein the second preset threshold is smaller than the first preset threshold;

in this embodiment, after detecting the pause information in the audio track information, if the pause information in the audio track information is smaller than the first preset threshold and larger than the second preset threshold, which indicates that the pause information is larger at this time, the second sentence block corresponding to the audio information is determined, and a period is added after the second sentence block to determine the second sentence block. Wherein the second preset threshold is smaller than the first preset threshold; the second preset threshold may be a time, i.e. a pause time of the audio track information, or an energy value, i.e. an audio energy.

And m, if the pause information is less than or equal to the second preset threshold, determining the third statement block corresponding to the audio information.

In this embodiment, after detecting the pause information in the track information, if the pause information in the track information is less than or equal to a second preset threshold, which indicates that the pause information is small at this time, a third sentence block corresponding to the audio information is determined, and a comma is added after the third sentence block to determine the third sentence block; the first statement block, the second statement block and the third statement block in any number form the text information spoken by the current speaker of the video conference.

According to the information processing method provided by the embodiment, the audio track information corresponding to the audio information is determined based on the audio information, the plurality of sentence blocks corresponding to the audio information are determined based on the audio track information, the text information spoken by the current speaker of the video conference is determined based on the plurality of sentence blocks and the user information, the method of analyzing the audio track information in the audio information is adopted, the speaking content of the speaker is divided into the plurality of sentence blocks, and the speaking content of the speaker is subjected to sentence break, so that the text information spoken by the current speaker of the video conference can be completely determined, the readability of the text content is increased, the practicability of the video conference system is improved, and the video conference system is more intelligent and convenient.

A fourth embodiment of the information processing method of the present invention is proposed based on the first embodiment, and in this embodiment, after step S20, the method further includes:

step o, acquiring conference template information in the video conference system;

in this embodiment, conference template information is pre-stored in the video conference system, the conference template information may be downloaded from the internet or locally uploaded to the video conference system, and the layout of the output conference summary is determined by the conference template information, so that the conference template information pre-stored in the video conference system is first acquired before the conference summary is output.

Step p, determining the conference recording content of the video conference based on the character information;

in this embodiment, in the process of a video conference, a video conference system records text information of each speaker, each speaker corresponds to different text information, and finally, the text information of all speakers in the video conference is determined according to a conference summary template preset in the video conference system, that is, the conference summary content of the video conference is determined.

And q, determining a conference recording text of the video conference process based on the conference recording content.

In this embodiment, in the process of a video conference, after determining the conference recording content of the video conference, the video conference system sends the conference recording content to the printing terminal, and the printing terminal can print the conference recording content into a paper text, that is, the conference recording text in the process of the video conference is determined, so that the participants can view the text, and can better understand and master the conference content.

Further, in an embodiment, before the step of acquiring the audio information received by the video conference system and the user information corresponding to the audio information, the method further includes:

and r, if a first opening instruction of the video text box is detected, displaying a first preset area and a second preset area in a display screen of the video conference system, displaying a first video image of the video conference in the first preset area, and displaying the video text box in the second preset area.

In this embodiment, if a first opening instruction of the video text box is detected, where the first opening instruction is in a form of opening a video text and is also an instruction of opening the video text box, when a user opens the video text box at this time, a first preset area and a second preset area are displayed in a display screen of the video conference system, a first video image of the video conference is displayed in the first preset area, and the video text box is displayed in the second preset area, for example, a video image is displayed in a left area of the display screen, and a video text box is displayed in a right area of the display screen.

Further, in an embodiment, after the step of displaying the text information in a video text box in a display screen of the video conference system based on the text information, the method further includes:

and s, if a second opening instruction of the video text box is detected, the second opening instruction is also in a form of opening the video text box and is also an instruction for opening the video text box, so that when a user sets a second operation for opening the video text box at the moment, a second video image of the video conference is displayed in a display screen of the video conference system, and the video text box is displayed on the video image, and if the video text box is displayed on the video image in a suspended manner, an image of the video conference can be seen under the text content displayed in the video text box.

According to the information processing method provided by the embodiment, conference record content of the video conference is determined based on the text information and the conference template information by acquiring the conference template information in the video conference system, conference record texts in the video conference process are determined based on the conference record content, and the conference summary is output according to the text information of all speakers according to the set conference template information and a certain template, namely the conference record texts in the whole video conference process is determined, and the conference summary is output for participants to check, so that the participants can better know and master the conference content, the practicability of the video conference system is improved, and the video conference system is more intelligent and convenient.

Furthermore, an embodiment of the present invention further provides a computer-readable storage medium, where an information processing program is stored on the computer-readable storage medium, and when executed by a processor, the information processing program implements the following operations:

acquiring audio information received by a video conference system and user information corresponding to the audio information;

determining the text information spoken by the current speaker of the video conference based on the audio information and the user information;

displaying the text information in a video text box in a display screen of the video conference system based on the text information.

Further, the information processing program, when executed by the processor, further implements operations of:

acquiring the audio information received by the video conference system;

determining voiceprint characteristic information in the audio information based on the audio information;

and determining user information matched with the voiceprint characteristic information in a preset voiceprint information base based on the voiceprint characteristic information.

Further, the information processing program, when executed by the processor, further implements operations of:

detecting whether user information matched with the voiceprint characteristic information exists in the preset voiceprint information base or not;

if the user information matched with the voiceprint characteristic information exists in the preset voiceprint information base, the user information is obtained;

and if the user information matched with the voiceprint characteristic information does not exist in the preset voiceprint information base, creating user information corresponding to the voiceprint characteristic information in the preset voiceprint information base, and correspondingly storing the voiceprint characteristic information.

Further, the information processing program, when executed by the processor, further implements operations of:

determining audio track information corresponding to the audio information based on the audio information;

determining a plurality of sentence blocks corresponding to the audio information based on the audio track information;

and determining the text information spoken by the current speaker of the video conference based on the plurality of statement blocks and the user information.

Further, the information processing program, when executed by the processor, further implements operations of:

detecting pause information in the audio track information;

if the pause information is greater than or equal to a first preset threshold value, determining the first statement block corresponding to the audio information;

if the pause information is smaller than the first preset threshold and larger than a second preset threshold, determining the second statement block corresponding to the audio information, wherein the second preset threshold is smaller than the first preset threshold;

and if the pause information is smaller than or equal to the second preset threshold, determining the third statement block corresponding to the audio information.

Further, the information processing program, when executed by the processor, further implements operations of:

acquiring conference template information in the video conference system;

determining the conference recording content of the video conference based on the text information and the conference template information;

and determining the meeting record text of the video meeting process based on the meeting record content.

Further, the information processing program, when executed by the processor, further implements operations of:

if the first opening instruction of the video text box is detected, displaying a first preset area and a second preset area in a display screen of the video conference system, displaying a first video image of the video conference in the first preset area, and displaying the video text box in the second preset area.

Further, the information processing program, when executed by the processor, further implements operations of:

and if the second opening instruction of the video text box is detected, displaying a second video image of the video conference in a display screen of the video conference system, and displaying the video text box on the video image.

It should be noted that, in this document, the terms "comprises," "comprising," or any other variation thereof, are intended to cover a non-exclusive inclusion, such that a process, method, article, or system that comprises a list of elements does not include only those elements but may include other elements not expressly listed or inherent to such process, method, article, or system. Without further limitation, an element defined by the phrase "comprising an … …" does not exclude the presence of other like elements in a process, method, article, or system that comprises the element.

The above-mentioned serial numbers of the embodiments of the present invention are merely for description and do not represent the merits of the embodiments.

Through the above description of the embodiments, those skilled in the art will clearly understand that the method of the above embodiments can be implemented by software plus a necessary general hardware platform, and certainly can also be implemented by hardware, but in many cases, the former is a better implementation manner. Based on such understanding, the technical solution of the present invention may be embodied in the form of a software product, which is stored in a storage medium (e.g., ROM/RAM, magnetic disk, optical disk) as described above and includes instructions for enabling a terminal device (e.g., a mobile phone, a computer, a server, an air conditioner, or a network device) to execute the method according to the embodiments of the present invention.

The above description is only a preferred embodiment of the present invention, and not intended to limit the scope of the present invention, and all modifications of equivalent structures and equivalent processes, which are made by using the contents of the present specification and the accompanying drawings, or directly or indirectly applied to other related technical fields, are included in the scope of the present invention.

15页详细技术资料下载
上一篇:一种医用注射器针头装配设备
下一篇:多媒体传输系统中管理多媒体资源状态变化的系统和方法

网友询问留言

已有0条留言

还没有人留言评论。精彩留言会获得点赞!

精彩留言,会给你点赞!

技术分类