Conference voice data processing method and system

文档序号:1891592 发布日期:2021-11-26 浏览:28次 中文

阅读说明:本技术 一种会议语音数据处理方法及系统 (Conference voice data processing method and system ) 是由 王钰勋 于 2021-09-06 设计创作,主要内容包括:本发明提出了一种会议语音数据处理方法及系统,涉及语音识别领域。其包括以下步骤:多个采集模块布置在不同参会人员附近,根据不同采集模块采集对应参会人员的身份信息和初始声纹特征,以采集对应参会人员的讲话语音;识别并判断多个讲话语音的语音内容是否相同,当相同时,分析多个语音内容的声音强度,选择声音强度最大的语音内容;根据身份信息和初始声纹特征建立多个参会人员的语音特征模型,将选择的讲话语音输入语音特征模型,以得到身份匹配结果;根据采集模块判断身份信息是否与身份匹配结果匹配,当不匹配时根据身份匹配结果选择对应采集模块的相同语音内容。其能够提升对参会人员语音采集的准确度,提升会议记录效果。(The invention provides a conference voice data processing method and system, and relates to the field of voice recognition. Which comprises the following steps: the plurality of acquisition modules are arranged near different participants, and the identity information and the initial voiceprint characteristics of the corresponding participants are acquired according to the different acquisition modules so as to acquire the speaking voice of the corresponding participants; identifying and judging whether the voice contents of the plurality of speaking voices are the same, analyzing the sound intensity of the plurality of voice contents when the voice contents of the plurality of speaking voices are the same, and selecting the voice content with the maximum sound intensity; establishing voice feature models of a plurality of participants according to the identity information and the initial voiceprint features, and inputting the selected speaking voice into the voice feature models to obtain identity matching results; and judging whether the identity information is matched with the identity matching result according to the acquisition module, and selecting the same voice content corresponding to the acquisition module according to the identity matching result when the identity information is not matched with the identity matching result. It can promote the degree of accuracy to meeting personnel's pronunciation collection, promotes the meeting record effect.)

1. A conference voice data processing method is characterized by comprising the following steps:

the plurality of acquisition modules are arranged near different participants, and the identity information and the initial voiceprint characteristics of the corresponding participants are acquired according to the different acquisition modules so as to acquire the speaking voice of the corresponding participants;

identifying and judging whether the voice contents of the plurality of speaking voices are the same or not, analyzing the sound intensity of the plurality of voice contents when the voice contents of the plurality of speaking voices are the same, and selecting the voice content with the maximum sound intensity;

establishing voice feature models of a plurality of participants according to the identity information and the initial voiceprint features, and inputting the selected speaking voice into the voice feature models to obtain identity matching results;

and judging whether the identity information is matched with the identity matching result according to the acquisition module, and selecting the same voice content corresponding to the acquisition module according to the identity matching result when the identity information is not matched with the identity matching result.

2. The method as claimed in claim 1, wherein after the step of determining whether the identity information matches the identity matching result according to the collection module, the method further comprises:

and denoising the speaking voice, and converting the speaking voice subjected to denoising into text information.

3. The conference voice data processing method as claimed in claim 2, wherein after the step of converting the speaking voice subjected to the noise cancellation processing into text information, the method further comprises:

and recording the text information of different participants by using the identity matching result.

4. The method of claim 3, wherein after the step of recording the text messages of the different participants using the identity matching result, the method further comprises:

and sequencing a plurality of text messages according to the speaking time to generate a conference record.

5. The conference voice data processing system is characterized by comprising an error correction module, a confirmation module, an identity comparison module and a plurality of acquisition modules:

the plurality of acquisition modules are arranged near different participants, and are used for acquiring the identity information and the initial voiceprint characteristics of the corresponding participants according to the different acquisition modules so as to acquire the speaking voice of the corresponding participants;

the error correction module is used for identifying and judging whether the voice contents of the plurality of speaking voices are the same or not, analyzing the sound intensity of the plurality of voice contents when the voice contents of the plurality of speaking voices are the same, and selecting the voice content with the maximum sound intensity;

the identity comparison module is used for establishing voice feature models of a plurality of participants according to the identity information and the initial voiceprint features, and inputting the selected speaking voice into the voice feature models to obtain an identity matching result;

the confirming module is used for judging whether the identity information is matched with the identity matching result according to the acquisition module, and selecting the same voice content corresponding to the acquisition module according to the identity matching result when the identity information is not matched with the identity matching result.

6. The system as claimed in claim 5, further comprising a text conversion module, wherein said text conversion module is configured to perform noise cancellation on said spoken speech and convert said spoken speech after said noise cancellation into text information.

7. The system of claim 6, further comprising a storage module configured to record the text messages of different participants using the identity matching result.

8. The system of claim 6, further comprising a meeting logging module configured to sort a plurality of the text messages by speaking time to generate a meeting log.

9. An electronic device, comprising:

a memory for storing one or more programs;

a processor;

the one or more programs, when executed by the processor, implement the method of any of claims 1-4.

10. A computer-readable storage medium, on which a computer program is stored which, when being executed by a processor, carries out the method according to any one of claims 1-4.

Technical Field

The invention relates to the technical field of voice recognition, in particular to a conference voice data processing method and system.

Background

Meeting refers to an organized, leading, purposeful agenda activity that is conducted at a defined time and place, according to a certain program. When carrying out the meeting at present, all can carry out the record to the meeting process, have at present to carry out the record on one side through the record person, or use camera equipment to carry out the record, but it is handsome and not convenient enough, have at present to carry out the form of record to meeting pronunciation through recording equipment, but because the personnel of speaking are numerous, during the record pronunciation, confuse the personnel of speaking easily, when carrying out the arrangement of data in the later stage, easily waste a large amount of manpower resources and time cost, inconvenient searching retrieval in the later stage.

Disclosure of Invention

The invention aims to provide a conference voice data processing method and system, which can improve the accuracy of voice acquisition of each participant during a conference and improve the conference recording effect.

The embodiment of the invention is realized by the following steps:

in a first aspect, an embodiment of the present application provides a conference voice data processing method, including the following steps: the plurality of acquisition modules are arranged near different participants, and the acquisition modules acquire the identity information and the initial voiceprint characteristics of the corresponding participants according to different types of acquisition modules so as to acquire the speaking voice of the corresponding participants; identifying and judging whether the voice contents of a plurality of speaking voices are the same, if so, analyzing the sound intensity of the plurality of speaking voices and selecting the voice content with the maximum sound intensity; establishing voice feature models of a plurality of participants according to the identity information and the initial voiceprint features, and inputting the selected speaking voice into the voice feature models to obtain identity matching results; and judging whether the identity information is matched with the identity matching result according to the acquisition module, and selecting the same voice content corresponding to the acquisition module according to the identity matching result when the identity information is not matched with the identity matching result.

In some embodiments of the present invention, after the step of determining, by the acquisition module, whether the identity information matches the identity matching result, the method further includes: and denoising the speaking voice, and converting the speaking voice subjected to denoising into text information.

In some embodiments of the present invention, after the step of converting the speech after the denoising process into text information, the method further includes: and recording the text information of different participants by using the identity matching result.

In some embodiments of the present invention, after the step of recording the text messages of different participants using the identity matching result, the method further comprises: and sequencing a plurality of text messages according to the speaking time to generate a conference record.

In a second aspect, an embodiment of the present application provides a conference voice data processing system, which includes an error correction module, a confirmation module, an identity comparison module, and a plurality of acquisition modules: the plurality of acquisition modules are arranged near different participants, and are used for acquiring the identity information and the initial voiceprint characteristics of the corresponding participants according to the different acquisition modules so as to acquire the speaking voice of the corresponding participants; the error correction module is used for identifying and judging whether the voice contents of the plurality of speaking voices are the same or not, analyzing the sound intensity of the plurality of voice contents when the voice contents of the speaking voices are the same, and selecting the voice content with the maximum sound intensity; the identity comparison module is used for establishing voice feature models of a plurality of participants according to the identity information and the initial voiceprint features, and inputting the selected speaking voice into the voice feature models to obtain an identity matching result; the confirming module is used for judging whether the identity information is matched with the identity matching result according to the collecting module, and selecting the same voice content corresponding to the collecting module according to the identity matching result when the identity information is not matched with the identity matching result.

In some embodiments of the present invention, the conference voice data processing system further includes a text conversion module, and the text conversion module is configured to perform denoising processing on the speaking voice and convert the speaking voice after the denoising processing into text information.

In some embodiments of the present invention, the conference voice data processing system further includes a storage module, and the storage module is configured to record the text information of different participants by using the identity matching result.

In some embodiments of the present invention, the conference voice data processing system further includes a conference recording module, and the conference recording module is configured to sort a plurality of the text messages according to speaking times to generate a conference record.

In a third aspect, an embodiment of the present application provides an electronic device, which includes a memory for storing one or more programs; a processor. The program or programs, when executed by a processor, implement the method of any of the first aspects as described above.

In a fourth aspect, embodiments of the present application provide a computer-readable storage medium, on which a computer program is stored, which, when executed by a processor, implements the method according to any one of the first aspect described above.

Compared with the prior art, the embodiment of the invention has at least the following advantages or beneficial effects:

in a first aspect, an embodiment of the present application provides a conference voice data processing method, including the following steps: the plurality of acquisition modules are arranged near different participants, and the acquisition modules acquire the identity information and the initial voiceprint characteristics of the corresponding participants according to different types of acquisition modules so as to acquire the speaking voice of the corresponding participants; identifying and judging whether the voice contents of a plurality of speaking voices are the same, if so, analyzing the sound intensity of the plurality of speaking voices and selecting the voice content with the maximum sound intensity; establishing voice feature models of a plurality of participants according to the identity information and the initial voiceprint features, and inputting the selected speaking voice into the voice feature models to obtain identity matching results; and judging whether the identity information is matched with the identity matching result according to the acquisition module, and selecting the same voice content corresponding to the acquisition module according to the identity matching result when the identity information is not matched with the identity matching result.

Aiming at the first aspect, the plurality of acquisition modules are respectively used for acquiring the identity information and the initial voiceprint characteristics of the participants so as to acquire the speaking voice of the participants, so that the information of the participants can be mastered conveniently and different participants can be recorded in a voice mode; and analyzing the sound intensity of the plurality of speech contents and selecting the speech content with the maximum sound intensity when the speech contents of the plurality of speech voices are identical through recognition and judgment. Therefore, the acquisition module which is closest to and corresponds to the speaking person and acquires the current speaking content can be ensured to be in one-to-one correspondence, the other acquisition modules are prevented from mistakenly acquiring the speaking voice, and the acquired voice data are prevented from being confused; establishing voice feature models of a plurality of participants according to the identity information and the initial voiceprint features, and inputting the selected speaking voice into the voice feature models to obtain identity matching results; therefore, the collected speaking voice can be corresponding to the participant who sends the speaking voice, so that information comparison and recording are facilitated, and information confusion is prevented; judge whether above-mentioned identity information matches with above-mentioned identity matching result according to above-mentioned collection module, match the same above-mentioned pronunciation content that the result selected to correspond above-mentioned collection module according to above-mentioned identity when not matching, thereby confirmed collection module and the personnel's of participating in meeting relevance that correspond, guaranteed that a plurality of collection module can keep the one-to-one with a plurality of personnel of participating in meeting respectively always, when the personnel's position of participating in meeting takes place to change, still can keep the collection module that corresponds to gather its speech content, the accuracy of the pronunciation content of collection has been promoted.

In a second aspect, an embodiment of the present application provides a conference voice data processing system, which includes an error correction module, a confirmation module, an identity comparison module, and a plurality of acquisition modules: the plurality of acquisition modules are arranged near different participants, and are used for acquiring the identity information and the initial voiceprint characteristics of the corresponding participants according to the different acquisition modules so as to acquire the speaking voice of the corresponding participants; the error correction module is used for identifying and judging whether the voice contents of the plurality of speaking voices are the same or not, analyzing the sound intensity of the plurality of voice contents when the voice contents of the speaking voices are the same, and selecting the voice content with the maximum sound intensity; the identity comparison module is used for establishing voice feature models of a plurality of participants according to the identity information and the initial voiceprint features, and inputting the selected speaking voice into the voice feature models to obtain an identity matching result; the confirming module is used for judging whether the identity information is matched with the identity matching result according to the collecting module, and selecting the same voice content corresponding to the collecting module according to the identity matching result when the identity information is not matched with the identity matching result.

In a third aspect, an embodiment of the present application provides an electronic device, which includes a memory for storing one or more programs; a processor. The program or programs, when executed by a processor, implement the method of any of the first aspects as described above.

In a fourth aspect, embodiments of the present application provide a computer-readable storage medium, on which a computer program is stored, which, when executed by a processor, implements the method according to any one of the first aspect described above.

With respect to the second to fourth aspects, the principle and advantageous effects of the embodiments of the present application are the same as those of the first aspect, and a repeated description thereof is not necessary.

Drawings

In order to more clearly illustrate the technical solutions of the embodiments of the present invention, the drawings needed to be used in the embodiments will be briefly described below, it should be understood that the following drawings only illustrate some embodiments of the present invention and therefore should not be considered as limiting the scope, and for those skilled in the art, other related drawings can be obtained according to the drawings without inventive efforts.

Fig. 1 is a flowchart of a conference voice data processing method according to an embodiment of the present invention;

fig. 2 is a schematic diagram of a conference voice data processing system according to an embodiment of the present invention;

fig. 3 is a schematic structural block diagram of an electronic device according to an embodiment of the present invention.

Icon: 101-memory, 102-processor, 103-communication interface, 200-conference voice data processing system, 201-acquisition module, 202-error correction module, 203-identity comparison module, 204-confirmation module, 205-text conversion module, 206-storage module, 207-conference recording module.

Detailed Description

In order to make the objects, technical solutions and advantages of the embodiments of the present application clearer, the technical solutions in the embodiments of the present application will be clearly and completely described below with reference to the drawings in the embodiments of the present application, and it is obvious that the described embodiments are some embodiments of the present application, but not all embodiments. The components of the embodiments of the present application, generally described and illustrated in the figures herein, can be arranged and designed in a wide variety of different configurations.

Thus, the following detailed description of the embodiments of the present application, presented in the accompanying drawings, is not intended to limit the scope of the claimed application, but is merely representative of selected embodiments of the application. All other embodiments, which can be derived by a person skilled in the art from the embodiments given herein without making any creative effort, shall fall within the protection scope of the present application.

It should be noted that: like reference numbers and letters refer to like items in the following figures, and thus, once an item is defined in one figure, it need not be further defined and explained in subsequent figures. Meanwhile, in the description of the present application, the terms "first", "second", and the like are used only for distinguishing the description, and are not to be construed as indicating or implying relative importance.

It is noted that, herein, relational terms such as first and second, and the like may be used solely to distinguish one entity or action from another entity or action without necessarily requiring or implying any actual such relationship or order between such entities or actions. Also, the terms "comprises," "comprising," or any other variation thereof, are intended to cover a non-exclusive inclusion, such that a process, method, article, or apparatus that comprises a list of elements does not include only those elements but may include other elements not expressly listed or inherent to such process, method, article, or apparatus. Without further limitation, an element defined by the phrase "comprising an … …" does not exclude the presence of other identical elements in a process, method, article, or apparatus that comprises the element.

In the description of the present application, it should be noted that the terms "upper", "lower", "inner", "outer", and the like indicate orientations or positional relationships based on orientations or positional relationships shown in the drawings or orientations or positional relationships conventionally found in use of products of the application, and are used only for convenience in describing the present application and for simplification of description, but do not indicate or imply that the referred devices or elements must have a specific orientation, be constructed in a specific orientation, and be operated, and thus should not be construed as limiting the present application.

In the description of the present application, it is also to be noted that, unless otherwise explicitly specified or limited, the terms "disposed" and "connected" are to be interpreted broadly, e.g., as being either fixedly connected, detachably connected, or integrally connected; can be mechanically or electrically connected; they may be connected directly or indirectly through intervening media, or they may be interconnected between two elements. The specific meaning of the above terms in the present application can be understood in a specific case by those of ordinary skill in the art.

Some embodiments of the present application will be described in detail below with reference to the accompanying drawings. The embodiments described below and the individual features of the embodiments can be combined with one another without conflict.

Example 1

A conference voice data processing method comprises the following steps:

s110: the plurality of acquisition modules 201 are arranged near different participants, and the acquisition modules 201 acquire the identity information and the initial voiceprint characteristics of the corresponding participants according to different conditions so as to acquire the speaking voices of the corresponding participants;

s120: identifying and judging whether the voice contents of a plurality of speaking voices are the same, if so, analyzing the sound intensity of the plurality of speaking voices and selecting the voice content with the maximum sound intensity;

s130: establishing voice feature models of a plurality of participants according to the identity information and the initial voiceprint features, and inputting the selected speaking voice into the voice feature models to obtain identity matching results;

s140: and judging whether the identity information is matched with the identity matching result according to the acquisition module 201, and selecting the same voice content corresponding to the acquisition module 201 according to the identity matching result when the identity information is not matched with the identity matching result.

In the above embodiment, any one of the acquisition modules 201 may include a camera, a microphone and an identity registration unit, where the camera is configured to acquire images of the participants, and the microphone may be configured to acquire initial voiceprint features of the participants and the identity registration unit is configured to register identity information such as names of the participants.

In detail, the plurality of collecting modules 201 are arranged beside different participants, and when the participants speak, the voice content of the corresponding participants can be collected. Because of being provided with a plurality of collection modules 201, consequently can have the condition that a plurality of collection modules 201 all gathered the pronunciation content of current speaker, after gathering the pronunciation content, then judge whether the pronunciation content that a plurality of collection modules 201 gathered is the same, wherein judge whether the pronunciation content of gathering is the same can realize through the wave form of the loudness, pitch, spectrum and the sound of the pronunciation content of gathering. If judge the same, then carry out the analysis to the sound intensity of a plurality of speech content of gathering, thereby obtain the range of a different sound intensity size, and judge a speech content of selecting the sound intensity maximum according to the sound intensity size, when the distance is more near, the speech sound's that gathers sound intensity is big more, thereby can guarantee to gather present speech content be apart from the nearest collection module 201 rather than corresponding of speaking personnel, guarantee can the one-to-one, prevent that other collection module 201 from gathering speech by mistake, the speech sound of avoiding gathering appears obscuring.

The method comprises the steps of establishing voice feature models of a plurality of participants respectively according to identity information and initial voiceprint features of the participants, inputting selected voice content into the voice feature models to obtain an identity matching result, wherein the identity matching result is that collected speaking voice corresponds to identity information of corresponding speaking persons one by one, so that the collected speaking voice corresponds to the speaking persons, the accuracy of information is guaranteed, the collected speaking voice in a conference can correspond to the speaking persons, and recording is facilitated.

And judging whether the identity information is matched with the identity matching result according to the acquisition module 201, and selecting the same voice content corresponding to the acquisition module 201 according to the identity matching result when the identity information is not matched with the identity matching result. Because a person may move during a conference, the collected voice collected by the collection module 201 may change in intensity along with the distance between participants, which may lead to a problem that the collection module 201 and the participants are in correspondence with errors, after an identity matching result is obtained, matching and judging the identity information collected by the collection module 201 and the identity matching result, if the identity matching result is not matched, selecting the collection module 201 with a secondary sound intensity, performing the step of matching and judging the collected identity information and the identity matching result again until the matching is completed, and determining that the collection module 201 is the collection module 201 corresponding to the participants. The relevance of the acquisition module 201 and the corresponding participants is confirmed, so that the one-to-one correspondence between the acquisition modules 201 and the multiple participants is always kept, and the accuracy of the acquired voice content is improved.

In some embodiments of this embodiment, after the step of determining, according to the acquisition module 201, whether the identity information matches the identity matching result, the method further includes: and denoising the speaking voice, and converting the speaking voice subjected to denoising into text information.

Optionally, a mode of denoising the speech may adopt an adaptive filter/spectral subtraction/wiener filtering method, denoising the speech mainly reduces or eliminates background sounds of a meeting place, such as other human sounds, music sounds, and the like, and enhances a feature part of the speech of a speaker, thereby realizing accurate recognition of the speech, improving recognition accuracy, converting the speech into text information through a speech recognition method after denoising processing, thereby facilitating recording and later reviewing of meeting contents, wherein the speech recognition method may be one conventionally known in the prior art.

In some embodiments of this embodiment, after the step of converting the speech after the denoising process into text information, the method further includes: and recording the text information of different participants by using the identity matching result.

In detail, in the above embodiment, the obtained identity matching result is used to match the speaking person with the speaking voice sent by the speaking person, and after the speaking content is converted into the text information, the converted text information is matched with the person according to the identity matching result. For example, after the speech uttered by a is completely converted into text information, the speech is totally classified as a. After the speaking voices of a plurality of participants are respectively converted into text information, the text information is respectively matched with the participants, so that the filing and arrangement of the conference records can be conveniently carried out in the later period, and the workload of the conference record personnel is reduced. In this embodiment, when the text information is recorded, the speaking voice can be recorded together, which is convenient for later calling.

In some embodiments of this embodiment, after the step of recording the text messages of different participants by using the identity matching result, the method further includes: and sequencing a plurality of text messages according to the speaking time to generate a conference record.

In the above embodiment, order a plurality of text messages according to the speech time in order to generate the meeting record, when gathering the speech of collection module 201, can note the time of current speech, after converting the speech of speaking into text message, can be according to the recording time of the speech of converting, order the text message after will converting according to the time, after converting the speech of a plurality of meeting personnel into text message, order through respective recording time, the set obtains the meeting record of whole meeting, thereby can guarantee that the meeting record that obtains at last is for carrying out the record along the meeting time, when looking over in the later stage, can be clear look over the conversation circumstances in the different meeting personnel in the whole meeting.

Example 2

Referring to fig. 2, fig. 2 is a schematic diagram of a conference voice data processing system 200 according to an embodiment of the present invention.

A conference voice data processing system 200 comprising an error correction module 202, a validation module 204, an identity comparison module 203, and a plurality of acquisition modules 201: the plurality of acquisition modules 201 are arranged near different participants, and are used for acquiring the identity information and the initial voiceprint characteristics of the corresponding participants according to the different acquisition modules 201 so as to acquire the speaking voice of the corresponding participants; the error correction module 202 is configured to recognize and determine whether the speech contents of the plurality of speech voices are the same, and if so, analyze the sound intensities of the plurality of speech contents and select the speech content with the largest sound intensity; the identity comparison module 203 is configured to establish a voice feature model of a plurality of participants according to the identity information and the initial voiceprint feature, and input the selected speaking voice into the voice feature model to obtain an identity matching result; the confirmation module 204 is configured to determine whether the identity information matches the identity matching result according to the acquisition module 201, and select the same voice content corresponding to the acquisition module 201 according to the identity matching result when the identity information does not match the identity matching result.

In the above embodiment, any one of the acquisition modules 201 may include a camera, a microphone and an identity registration unit, where the camera is configured to acquire images of the participants, and the microphone may be configured to acquire initial voiceprint features of the participants and the identity registration unit is configured to register identity information such as names of the participants.

In detail, the microphones are all arranged beside the participants, when the participants speak, the corresponding speech content of the participants can be collected, because a plurality of collection modules 201 are arranged, the situation that the speech content of the current speaking personnel is collected by the collection modules 201 exists, after the speech content is collected, the error correction module 202 judges whether the speech content collected by the collection modules 201 is the same, if the judgment is the same, the sound intensity of the collected speech content is analyzed, so that an arrangement with different sound intensity is obtained, the speech content with the maximum sound intensity is selected according to the sound intensity judgment, when the distance is closer, the sound intensity of the collected speaking speech is greater, so that the collection module 201 which is closest to the speaking personnel and corresponds to the current speaking content can be ensured to collect the current speaking content, guarantee that it can the one-to-one, avoid the speech data of gathering to appear obscuring, prevent that other collection module 201 from gathering the pronunciation by mistake.

Identity contrast module 203 establishes the speech feature model of a plurality of participants respectively according to a plurality of participants 'identity information and initial voiceprint feature, the speech content of selecting is input into the speech feature model, in order to obtain the identity matching result, wherein the identity matching result refers to the speech that will gather and the corresponding person's of speaking identity information one-to-one correspondence, thereby guarantee that the speech of gathering corresponds with the speaker, guarantee the accuracy of information, guarantee that the speech of gathering in the meeting can have orderliness.

The determining module 204 determines whether the identity information matches the identity matching result according to the acquiring module 201, and selects the same voice content corresponding to the acquiring module 201 according to the identity matching result when the identity information does not match the identity matching result. Since there may be a case where the person moves during the conference, the speech collected by the collection module 201 varies with the distance between the participants, and the collected sound intensity is changed, which may cause the problem that the collecting module 201 corresponds to the participant incorrectly, after obtaining the identity matching result, the identity comparing module 203 feeds back the identity matching result to the collecting module 201 corresponding to the participant, the confirming module 204 performs matching judgment on the identity information collected by the collecting module 201 and the identity matching result, if the identity matching result is judged to be not matched, the acquisition module 201 with the secondary sound intensity is selected, the confirmation module 204 performs the step of matching and judging the identity information and the identity matching result acquired by the replaced acquisition module 201 again until the matching is completed, and determines that the acquisition module 201 is the acquisition module 201 corresponding to the participant. The relevance of the acquisition module 201 and the corresponding participants is confirmed, so that the acquisition modules 201 can always keep one-to-one correspondence with the participants, when the positions of the participants change, the corresponding acquisition modules 201 can still be kept to acquire the speech content of the participants, and the accuracy of the acquired speech content is improved.

In some embodiments of the present embodiment, the conference voice data processing system 200 further includes a text conversion module 205, where the text conversion module 205 is configured to perform noise cancellation processing on the speaking voice and convert the speaking voice subjected to the noise cancellation processing into text information.

Optionally, the text conversion module 205 may perform denoising on the speech by using an adaptive filter/spectral subtraction/wiener filtering method, where denoising on the speech mainly reduces or eliminates background sounds of a meeting place, such as other human voices and music sounds, and enhances a feature part of the speech of a speaking person, so as to realize accurate recognition of the speech, improve recognition accuracy, and convert the speech into text information by using a speech recognition method after denoising, so as to facilitate recording and later review of meeting contents, where the speech recognition method may be one conventionally known in the art.

In some embodiments of the present embodiment, the conference voice data processing system 200 further includes a storage module 206, and the storage module 206 is configured to record the text information of different participants by using the identity matching result.

In the above embodiment, the identity matching result obtained by the identity comparing module 203 is used to match a speaking person with a speaking voice sent by the speaking person, and after the content of the speaking voice is converted into text information, the storage module 206 matches the converted text information with the person through the identity matching result. For example, after the speech uttered by a is completely converted into text information, the speech is totally classified as a. After the speaking voices of a plurality of participants are respectively converted into text information, the text information is respectively matched with the participants, so that the filing and arrangement of the conference records can be conveniently carried out in the later period, and the workload of the conference record personnel is reduced. In this embodiment, when the text information is recorded, the speaking voice can be recorded together, which is convenient for later calling.

In some embodiments of the present embodiment, the conference voice data processing system 200 further includes a conference recording module 207, and the conference recording module 207 is configured to sort a plurality of text messages according to speaking times to generate a conference record.

In detail, the conference recording module 207 is configured to sort a plurality of text messages according to speaking time to generate a conference record, when the acquisition module 201 acquires speaking voice, the current speaking time is recorded, the text conversion module 205 converts the speaking voice into text information and then sends the text information to the conference recording module 207, the conference recording module 207 sorts the converted text information according to time according to the recording time of the converted speaking voice, after the speaking voice of a plurality of participants is converted into text information, the text information is sorted according to respective recording time, conference records of the whole conference are obtained in an aggregate, so that the finally obtained conference records can be recorded along the conference time, when reviewing and looking up in a later period, conversation conditions in different participants in the whole conference can be clearly looked up along with the conference time, and corresponding text information can be conveniently searched through the time point.

Example 3

Referring to fig. 3, fig. 3 is a schematic structural block diagram of an electronic device according to an embodiment of the present disclosure. The electronic device comprises a memory 101, a processor 102 and a communication interface 103, wherein the memory 101, the processor 102 and the communication interface 103 are electrically connected to each other directly or indirectly to realize data transmission or interaction. For example, the components may be electrically connected to each other via one or more communication buses or signal lines. The memory 101 may be used to store software programs and modules, such as program instructions/modules corresponding to the conference voice processing system provided in the embodiments of the present application, and the processor 102 executes the software programs and modules stored in the memory 101, so as to execute various functional applications and data processing. The communication interface 103 may be used for communicating signaling or data with other node devices.

The Memory 101 may be, but is not limited to, a Random Access Memory 101 (RAM), a Read Only Memory 101 (ROM), a Programmable Read Only Memory 101 (PROM), an Erasable Read Only Memory 101 (EPROM), an electrically Erasable Read Only Memory 101 (EEPROM), and the like.

The processor 102 may be an integrated circuit chip having signal processing capabilities. The Processor 102 may be a general-purpose Processor 102, including a Central Processing Unit (CPU) 102, a Network Processor 102 (NP), and the like; but may also be a Digital Signal processor 102 (DSP), an Application Specific Integrated Circuit (ASIC), a Field Programmable Gate Array (FPGA) or other Programmable logic device, discrete Gate or transistor logic device, discrete hardware components.

It will be appreciated that the configuration shown in FIG. 2 is merely illustrative, and that conference voice data processing system 200 may include more or fewer components than shown in FIG. 2, or may have a different configuration than shown in FIG. 1. The components shown in fig. 1 may be implemented in hardware, software, or a combination thereof.

In the embodiments provided in the present application, it should be understood that the disclosed apparatus and method may be implemented in other ways. The apparatus embodiments described above are merely illustrative, and for example, the flowchart and block diagrams in the figures illustrate the architecture, functionality, and operation of possible implementations of apparatus, methods and computer program products according to various embodiments of the present application. In this regard, each block in the flowchart or block diagrams may represent a module, segment, or portion of code, which comprises one or more executable instructions for implementing the specified logical function(s). It should also be noted that, in some alternative implementations, the functions noted in the block may occur out of the order noted in the figures. For example, two blocks shown in succession may, in fact, be executed substantially concurrently, or the blocks may sometimes be executed in the reverse order, depending upon the functionality involved. It will also be noted that each block of the block diagrams and/or flowchart illustration, and combinations of blocks in the block diagrams and/or flowchart illustration, can be implemented by special purpose hardware-based systems which perform the specified functions or acts, or combinations of special purpose hardware and computer instructions.

In addition, functional modules in the embodiments of the present application may be integrated together to form an independent part, or each module may exist separately, or two or more modules may be integrated to form an independent part.

The functions, if implemented in the form of software functional modules and sold or used as a stand-alone product, may be stored in a computer readable storage medium. Based on such understanding, the technical solution of the present application or portions thereof that substantially contribute to the prior art may be embodied in the form of a software product stored in a storage medium and including instructions for causing a computer device (which may be a personal computer, a server, or a network device) to execute all or part of the steps of the method according to the embodiments of the present application. And the aforementioned storage medium includes: a U-disk, a removable hard disk, a Read-Only Memory (ROM), a Random Access Memory (RAM), a magnetic disk or an optical disk, and other various media capable of storing program codes.

In summary, according to the conference voice data processing method and system provided by the embodiment of the present application, the plurality of acquisition modules 201 are respectively used for acquiring the identity information and the initial voiceprint characteristics of the participants, and are also respectively used for acquiring the speaking voice of the participants, so as to facilitate mastering of the information of the participants and recording of the conference; and analyzing the sound intensity of the plurality of speech contents and selecting the speech content with the maximum sound intensity when the speech contents of the plurality of speech voices are identical through recognition and judgment. Therefore, the acquisition module 201 which is closest to and corresponds to the speaking person and acquires the current speaking content can be ensured to be in one-to-one correspondence, the other acquisition modules 201 are prevented from acquiring the speaking voice by mistake, and the acquired voice data are prevented from being mixed up; establishing voice feature models of a plurality of participants according to the identity information and the initial voiceprint features, and inputting the selected speaking voice into the voice feature models to obtain identity matching results; therefore, the collected speaking voice can be corresponding to the participant who sends the speaking voice, so that information comparison and recording are facilitated, and information confusion is prevented; judge according to above-mentioned collection module 201 whether above-mentioned identity information matches with above-mentioned identity matching result, in the same above-mentioned pronunciation of the above-mentioned collection module 201 of selection correspondence according to above-mentioned identity matching result when not matching, thereby the relevance of collection module 201 rather than the meeting personnel who corresponds has confirmed, it can keep the one-to-one with a plurality of meeting personnel respectively to have guaranteed that a plurality of collection modules 201 can keep the one-to-one always with a plurality of meeting personnel, when the change takes place in meeting personnel position, still can keep the collection module 201 that corresponds to gather its speech content, the accuracy of the speech content of collection has been promoted

The above description is only a preferred embodiment of the present application and is not intended to limit the present application, and various modifications and changes may be made by those skilled in the art. Any modification, equivalent replacement, improvement and the like made within the spirit and principle of the present application shall be included in the protection scope of the present application.

It will be evident to those skilled in the art that the present application is not limited to the details of the foregoing illustrative embodiments, and that the present application may be embodied in other specific forms without departing from the spirit or essential attributes thereof. The present embodiments are therefore to be considered in all respects as illustrative and not restrictive, the scope of the application being indicated by the appended claims rather than by the foregoing description, and all changes which come within the meaning and range of equivalency of the claims are therefore intended to be embraced therein. Any reference sign in a claim should not be construed as limiting the claim concerned.

14页详细技术资料下载
上一篇:一种医用注射器针头装配设备
下一篇:模型训练方法、装置、电子设备和可读存储介质

网友询问留言

已有0条留言

还没有人留言评论。精彩留言会获得点赞!

精彩留言,会给你点赞!