Conference voice data acquisition method and system

文档序号：193361 发布日期：2021-11-02 浏览：34次中文

阅读说明：本技术 一种会议语音数据采集方法及系统 (Conference voice data acquisition method and system ) 是由王钰勋于 2021-09-03 设计创作，主要内容包括：本发明提出了一种会议语音数据采集方法及系统,涉及智能语音识别技术领域,包括：采集初始语音信息并转化为音频信号；通过滤波器消除所述音频信号中的无效语音信号从而得到有效语音信号；将所述有效语音信号进行模数转换后发送到语音信息处理模块得到处理后的语音信息；将所述处理后的语音信息通过时间区间分割为多段语音信息；提取所述多段语音信息的语音特征,并将所述语音特征相同的语音信息进行分类得到分类后的语音信息；将分类后的所述语音信息根据时间先后顺序排列后进行数模转换得到会议中每个人员的语音数据；本发明能提高会议采集语音信息的准确率,并能分辨参会人员的语音信息。(The invention provides a conference voice data acquisition method and a system, which relate to the technical field of intelligent voice recognition and comprise the following steps: collecting initial voice information and converting the initial voice information into an audio signal; eliminating invalid voice signals in the audio signals through a filter to obtain valid voice signals; the effective voice signal is subjected to analog-to-digital conversion and then is sent to a voice information processing module to obtain processed voice information; dividing the processed voice information into a plurality of sections of voice information through a time interval; extracting voice features of the multiple sections of voice information, and classifying the voice information with the same voice features to obtain classified voice information; arranging the classified voice information according to time sequence, and then performing digital-to-analog conversion to obtain voice data of each person in the conference; the invention can improve the accuracy of the conference for collecting the voice information and can distinguish the voice information of the participants.)

1. A conference voice data acquisition method is characterized by comprising the following steps:

step S01, collecting initial voice information and converting the initial voice information into an audio signal;

step S02, eliminating invalid voice signals in the audio signals through a filter to obtain valid voice signals;

step S03, the effective voice signal is sent to a voice information processing module after being subjected to analog-to-digital conversion to obtain processed voice information;

step S04, dividing the processed voice information into a plurality of sections of voice information through time intervals;

step S05, extracting the voice characteristics of the multiple sections of voice information, and classifying the voice information with the same voice characteristics to obtain classified voice information;

and step S06, arranging the classified voice information according to time sequence, and then performing digital-to-analog conversion to obtain voice data of each person in the conference.

2. The method of claim 1, wherein the invalid speech signal comprises an ambient noise and a blank speech signal having a time duration greater than a predetermined time threshold.

3. The method as claimed in claim 1, wherein the valid voice signal includes human voice information in the initial voice information.

4. The conference voice data collection method according to claim 3, wherein the human voice information includes voice information of all persons in a conference.

5. The method for acquiring conference voice data according to claim 1, wherein the step S03 further comprises sending the valid voice signal after analog-to-digital conversion to an external storage unit.

6. A conference voice data acquisition system, comprising:

the voice acquisition module is used for acquiring initial voice information of conference personnel and converting the initial voice information into an audio signal;

the filter module is used for eliminating invalid voice signals in the audio signals so as to obtain valid voice signals;

the analog-to-digital conversion module is used for converting the effective voice signal into an effective voice digital signal;

the voice division module is used for dividing the effective voice digital signal into a plurality of sections of voice information through a time interval;

the voice extraction module is used for extracting voice characteristics of the multiple sections of voice information and classifying the voice information with the same voice characteristics;

the digital-to-analog conversion module is used for converting the classified voice information into an analog signal to finish the acquisition of voice data;

the voice acquisition module is connected with the filter module, the analog-to-digital conversion module, the voice segmentation module, the voice extraction module and the digital-to-analog conversion module in sequence.

7. The system as claimed in claim 6, wherein a voice information processing module is connected between the analog-to-digital conversion module and the voice segmentation module, and the voice information processing module is configured to perform data processing on the valid voice digital signal so as to increase a signal processing speed.

8. The system as claimed in claim 7, wherein the voice information processing module is connected to an external storage unit, and the external storage unit is used for storing voice data in real time for later management.

9. The system according to claim 6, wherein the voice extracting module and the digital-to-analog converting module are connected to a voice arranging module, and the voice arranging module is configured to arrange the classified voice information according to a time sequence.

10. The system as claimed in claim 6, wherein the digital-to-analog conversion module is connected to a power amplification module, and the power amplification module is configured to amplify the converted analog signal.

Technical Field

The invention relates to the technical field of intelligent voice recognition, in particular to a conference voice data acquisition method and system.

Background

With the development and popularization of artificial intelligence and communication technology, more and more enterprises and users adopt the audio and video conference system to carry out local and multiparty conference communication. The application of the audio and video conference not only greatly reduces the user communication cost and time, but also improves the production and working efficiency of enterprises and users; meanwhile, in the audio and video conference system, more and more artificial intelligent algorithms of images and voices are adopted, such as face recognition, OCR, voice recognition, voiceprint recognition, role separation, sound source separation and the like, and the efficiency of the digital conference summary is further improved.

The existing conference system needs to collect the voice of the conference conversation, and relates to a plurality of voice related technologies. Due to the influence of factors such as voice acquisition quality of speakers in a conference, frequency spectrum attenuation under a far-field condition, reverberation of the size of a conference room, noise of returned voice caused by the fact that the voice of a loudspeaker is acquired by a microphone, voice mixing when a plurality of participants speak simultaneously exists, and influence of other environmental noises, the accuracy of voice acquisition is difficult to achieve a practical effect.

Disclosure of Invention

The invention aims to provide a conference voice data acquisition method and system, which can improve the accuracy of acquiring voice information in a conference system and can specifically distinguish the voice information of each participant.

The embodiment of the invention is realized by the following steps:

in a first aspect, an embodiment of the present application provides a conference voice data acquisition method, which includes the following steps:

step S01, collecting initial voice information and converting the initial voice information into an audio signal;

step S02, eliminating the invalid voice signal in the audio signal through a filter to obtain a valid voice signal;

step S03, the effective voice signal is sent to a voice information processing module after being subjected to analog-to-digital conversion to obtain processed voice information;

step S04, dividing the processed voice information into a plurality of pieces of voice information by time intervals;

and step S06, arranging the classified voice information according to time sequence, and then performing digital-to-analog conversion to obtain voice data of each person in the conference.

The conference voice data acquisition method can eliminate invalid voice signals such as environmental noise in a conference system through the filter, then distinguishes the voice information of each remaining participant as valid voice information, can classify the voice information of the same person through the voice information with the same characteristics, namely classifies the voice information of the same person into one class, can specifically distinguish the voice information sent by which participant, and can clearly distinguish the voice information sent by each participant in the conference in which time period through the arrangement and conversion of time intervals and time sequence.

In some embodiments of the present invention, the invalid speech signal includes an ambient noise and a blank speech signal having a time length greater than a predetermined time threshold.

In some embodiments of the present invention, the valid voice signal includes human voice information in the initial voice information.

In some embodiments of the present invention, the human voice information includes voice information of all persons in the conference.

In some embodiments of the present invention, the step S03 further includes sending the analog-to-digital converted valid voice signal to an external storage unit.

In a second aspect, an embodiment of the present application provides a conference voice data acquisition system, which includes:

the voice acquisition module is used for acquiring initial voice information of conference personnel and converting the initial voice information into an audio signal;

the filter module is used for eliminating invalid voice signals in the audio signals so as to obtain valid voice signals;

the analog-to-digital conversion module is used for converting the effective voice signal into an effective voice digital signal;

the voice division module is used for dividing the effective voice digital signal into a plurality of sections of voice information through a time interval;

the voice extraction module is used for extracting voice characteristics of the multiple sections of voice information and classifying the voice information with the same voice characteristics;

the digital-to-analog conversion module is used for converting the classified voice information into an analog signal to finish the acquisition of voice data;

In some embodiments of the present invention, a voice information processing module is connected between the analog-to-digital conversion module and the voice segmentation module, and the voice information processing module is configured to perform data processing on the valid voice digital signal so as to improve a signal processing speed.

In some embodiments of the present invention, the voice information processing module is connected to an external storage unit, and the external storage unit is used for storing voice data in real time, so as to facilitate management in the future.

In some embodiments of the present invention, the voice extracting module and the digital-to-analog converting module are connected to a voice arranging module, and the voice arranging module is configured to arrange the classified voice information according to a time sequence.

In some embodiments of the present invention, the digital-to-analog conversion module is connected to a power amplification module, and the power amplification module is configured to perform signal amplification on the converted analog signal.

Compared with the prior art, the embodiment of the invention has at least the following advantages or beneficial effects:

the invention provides a conference voice data acquisition method and a conference voice data acquisition system, which comprise the following steps: step S01, collecting initial voice information and converting the initial voice information into an audio signal; step S02, eliminating the invalid voice signal in the audio signal through a filter to obtain a valid voice signal; step S03, the effective voice signal is sent to a voice information processing module after being subjected to analog-to-digital conversion to obtain processed voice information; step S04, dividing the processed voice information into a plurality of pieces of voice information by time intervals; step S05, extracting the voice characteristics of the multiple sections of voice information, and classifying the voice information with the same voice characteristics to obtain classified voice information; and step S06, arranging the classified voice information according to time sequence, and then performing digital-to-analog conversion to obtain voice data of each person in the conference.

According to the conference voice data acquisition method and system, invalid voice signals such as environmental noise in a conference system can be eliminated through a filter, then voice information of each remaining participant is distinguished as valid voice information, the voice information of which participant specifically sends out is distinguished in detail after being classified through the voice information with the same characteristics, and the voice information of which participant sends out in which time period in a conference can be clearly distinguished through arrangement and conversion of time intervals and time sequence, so that the accuracy of voice information acquisition of the participants in the conference system is greatly improved.

Drawings

In order to more clearly illustrate the technical solutions of the embodiments of the present invention, the drawings needed to be used in the embodiments will be briefly described below, it should be understood that the following drawings only illustrate some embodiments of the present invention and therefore should not be considered as limiting the scope, and for those skilled in the art, other related drawings can be obtained according to the drawings without inventive efforts.

Fig. 1 is a flowchart of a conference voice data acquisition method according to embodiment 1 of the present invention;

fig. 2 is a schematic structural diagram of a conference voice data acquisition system according to embodiment 2 of the present invention.

Detailed Description

In order to make the objects, technical solutions and advantages of the embodiments of the present application clearer, the technical solutions in the embodiments of the present application will be clearly and completely described below with reference to the drawings in the embodiments of the present application, and it is obvious that the described embodiments are some embodiments of the present application, but not all embodiments. The components of the embodiments of the present application, generally described and illustrated in the figures herein, can be arranged and designed in a wide variety of different configurations.

Thus, the following detailed description of the embodiments of the present application, presented in the accompanying drawings, is not intended to limit the scope of the claimed application, but is merely representative of selected embodiments of the application. All other embodiments, which can be derived by a person skilled in the art from the embodiments given herein without making any creative effort, shall fall within the protection scope of the present application.

It should be noted that: like reference numbers and letters refer to like items in the following figures, and thus, once an item is defined in one figure, it need not be further defined and explained in subsequent figures. Meanwhile, in the description of the present application, the terms "first", "second", and the like are used only for distinguishing the description, and are not to be construed as indicating or implying relative importance.

It should be noted that, in this document, the term "comprises/comprising" or any other variation thereof is intended to cover a non-exclusive inclusion, so that a process, method, article, or apparatus that comprises a list of elements does not include only those elements but also other elements not expressly listed or inherent to such process, method, article, or apparatus. Without further limitation, an element defined by the phrase "comprising an … …" does not exclude the presence of other identical elements in a process, method, article, or apparatus that comprises the element.

In the description of the present application, it is also to be noted that, unless otherwise explicitly specified or limited, the term "connected" is to be interpreted broadly, e.g. as a fixed connection, a detachable connection, or an integral connection; can be mechanically or electrically connected; they may be connected directly or indirectly through intervening media, or they may be interconnected between two elements. The specific meaning of the above terms in the present application can be understood in a specific case by those of ordinary skill in the art.

Some embodiments of the present application will be described in detail below with reference to the accompanying drawings. The embodiments described below and the individual features of the embodiments can be combined with one another without conflict.

Example 1

Referring to fig. 1, fig. 1 is a block diagram illustrating a flow chart of a conference voice data acquisition method according to an embodiment of the present application.

A conference voice acquisition method comprises the following steps:

step S01, collecting initial voice information and converting the initial voice information into an audio signal;

step S02, eliminating the invalid voice signal in the audio signal through a filter to obtain a valid voice signal;

step S03, the effective voice signal is sent to the voice information processing module after analog-to-digital conversion to obtain the processed voice information;

step S04, dividing the processed voice information into a plurality of sections of voice information through time intervals;

and step S06, arranging the classified voice information according to time sequence, and then performing digital-to-analog conversion to obtain the voice data of each person in the conference.

According to the conference voice data acquisition method provided by the embodiment 1 of the application, invalid voice signals such as environmental noise in a conference system can be eliminated by adopting the filter, then voice information of each remaining participant is distinguished as valid voice information, the voice information with the same characteristics can be classified, the voice information which is sent by the participant is distinguished in detail, and then the voice information is arranged and converted according to time intervals and time sequence, so that the voice information which is sent by each participant in a time period in the conference can be clearly distinguished, and the accuracy of voice information acquisition of the participants in the conference system is greatly improved.

In a preferred embodiment, the null speech signal includes ambient noise and a blank speech signal having a time duration greater than a predetermined time threshold.

As a preferred embodiment, the valid speech signal includes human speech information in the initial speech information.

As a preferred embodiment, the human voice information includes voice information of all persons in the conference.

As a preferred embodiment, step S03 further includes sending the analog-to-digital converted valid voice signal to an external storage unit.

It will be appreciated that the configuration shown in fig. 1 is merely illustrative and that a conference voice data collection method may also include more or fewer components than shown in fig. 1, or have a different configuration than shown in fig. 1. The components shown in fig. 1 may be implemented in hardware, software, or a combination thereof.

Example 2

Referring to fig. 2, fig. 2 is a schematic structural diagram of a conference voice data acquisition system according to embodiment 2.

A conference voice data acquisition system comprising:

the voice acquisition module is used for acquiring initial voice information of conference personnel and converting the initial voice information into an audio signal;

the filter module is used for eliminating invalid voice signals in the audio signals so as to obtain valid voice signals;

the analog-to-digital conversion module is used for converting the effective voice signal into an effective voice digital signal;

the voice division module is used for dividing the effective voice digital signal into a plurality of sections of voice information through a time interval;

the voice extraction module is used for extracting voice characteristics of a plurality of sections of voice information and classifying the voice information with the same voice characteristics;

the digital-to-analog conversion module is used for converting the classified voice information into an analog signal to finish the acquisition of voice data;

the voice acquisition module is connected with the filter module, the filter module is connected with the analog-to-digital conversion module, the analog-to-digital conversion module is connected with the voice segmentation module, the voice segmentation module is connected with the voice extraction module, and the voice extraction module is connected with the digital-to-analog conversion module.

In the conference voice data collecting system provided in embodiment 2 of the present application, first, the initial voice information of conference participants is collected through the voice collecting module, and the initial voice information is converted into an audio signal, then the audio signal is eliminated through the filter module, and an invalid voice signal (a blank voice signal whose environmental noise and time length are greater than a predetermined time threshold) is eliminated to obtain an effective voice signal (human voice information in the initial voice information), then the effective voice signal is converted into an effective voice digital signal through the analog-to-digital conversion module, and the obtained effective voice digital signal is divided through the voice dividing module, specifically, a time interval generated by the effective voice digital signal is finely divided, that is, each effective voice digital signal is time-numbered, the effective voice digital signal of the same time section is divided into a plurality of sections of voice information with time numbers, the voice characteristics of the plurality of sections of divided voice information with time numbers are extracted by a voice extraction module, the voice information with the same voice characteristics is classified, so that the classified voice information can be arranged according to the front and back sequence of the time number, and finally the classified voice information is converted by the digital-to-analog conversion module, the classified digital voice signals are converted into analog voice signals, so that the acquisition of voice data in the conference is completed, the voice of each participant is accurately distinguished through time periods, and the voice information of each participant in the conference in which time period is sent is distinguished, so that the accuracy of the acquisition of the voice information of the participants in the conference system is greatly improved.

As a preferred embodiment, a voice information processing module is connected between the analog-to-digital conversion module and the voice segmentation module, and the voice information processing module is configured to perform data processing on the valid voice digital signal so as to increase the signal processing speed.

As a preferred embodiment, the voice information processing module is connected to an external storage unit, and the external storage unit is used for storing voice data in real time, so that management in the future is facilitated.

As a preferred embodiment, the voice extracting module and the digital-to-analog converting module are connected to a voice arranging module, and the voice arranging module is configured to arrange the classified voice information according to a time sequence.

As a preferred embodiment, the digital-to-analog conversion module is connected to a power amplification module, and the power amplification module is configured to amplify the converted analog signal.

In the embodiments provided in the present application, it should be understood that the disclosed apparatus and method may be implemented in other ways. The apparatus embodiments described above are merely illustrative, and for example, the flowchart and block diagrams in the figures illustrate the architecture, functionality, and operation of possible implementations of apparatus, methods and computer program products according to various embodiments of the present application. In this regard, each block in the flowchart or block diagrams may represent a module, segment, or portion of code, which comprises one or more executable instructions for implementing the specified logical function(s). It should also be noted that, in some alternative implementations, the functions noted in the block may occur out of the order noted in the figures. For example, two blocks shown in succession may, in fact, be executed substantially concurrently, or the blocks may sometimes be executed in the reverse order, depending upon the functionality involved. It will also be noted that each block of the block diagrams and/or flowchart illustration, and combinations of blocks in the block diagrams and/or flowchart illustration, can be implemented by special purpose hardware-based systems which perform the specified functions or acts, or combinations of special purpose hardware and computer instructions.

In addition, functional modules in the embodiments of the present application may be integrated together to form an independent part, or each module may exist separately, or two or more modules may be integrated to form an independent part.

The functions, if implemented in the form of software functional modules and sold or used as a stand-alone product, may be stored in a computer readable storage medium. Based on such understanding, the technical solution of the present application or portions thereof that substantially contribute to the prior art may be embodied in the form of a software product stored in a storage medium and including instructions for causing a computer device (which may be a personal computer, a server, or a network device) to execute all or part of the steps of the method according to the embodiments of the present application. And the aforementioned storage medium includes: a U-disk, a removable hard disk, a Read-only Memory (ROM), a Random Access Memory (RAM), a magnetic disk or an optical disk, and other various media capable of storing program codes.

To sum up, according to the conference voice collecting method and system provided by the embodiment of the application, invalid voice signals such as environmental noise in a conference system can be eliminated by using the filter, then voice information of each remaining participant is distinguished as valid voice information, the voice information of the same person can be classified, namely, the voice information of the same person is classified into one type, so that the voice information of which participant specifically sends out can be distinguished in detail, and the voice information of which participant sends out in which time period can be clearly distinguished by arranging and converting time intervals and time sequence, so that the accuracy of voice information collection of the participants in the conference system is greatly improved.

The above description is only a preferred embodiment of the present application and is not intended to limit the present application, and various modifications and changes may be made by those skilled in the art. Any modification, equivalent replacement, improvement and the like made within the spirit and principle of the present application shall be included in the protection scope of the present application.

It will be evident to those skilled in the art that the present application is not limited to the details of the foregoing illustrative embodiments, and that the present application may be embodied in other specific forms without departing from the spirit or essential attributes thereof. The present embodiments are therefore to be considered in all respects as illustrative and not restrictive, the scope of the application being indicated by the appended claims rather than by the foregoing description, and all changes which come within the meaning and range of equivalency of the claims are therefore intended to be embraced therein. Any reference sign in a claim should not be construed as limiting the claim concerned.

9页详细技术资料下载

Conference voice data acquisition method and system

相关技术

网友询问留言