Voice equipment control method, system, medium and voice equipment

文档序号:154829 发布日期:2021-10-26 浏览:52次 中文

阅读说明:本技术 一种语音设备控制方法、系统、介质及语音设备 (Voice equipment control method, system, medium and voice equipment ) 是由 刘修伦 于 2021-07-21 设计创作,主要内容包括:本申请提供一种语音设备控制方法,包括:获取显示设备输出的多媒体音频信号;利用麦克风阵列采集环境声音数据,生成对应的环境声音信号;所述环境声音数据包括语音操作指令和所述显示设备的声音数据;若在同一时间段内所述多媒体音频信号和所述环境声音信号具备信号一致性,基于所述多媒体音频信号对所述环境声音信号进行回声消除,得到所述语音操作指令,以便语音设备执行所述语音操作指令对应的操作本申请能够有效增强回声消除效果,提高声音信号的信噪比,从而提升用户体验。本申请还提供一种语音设备控制系统、计算机可读存储介质和语音设备,具有上述有益效果。(The application provides a voice device control method, which comprises the following steps: acquiring a multimedia audio signal output by display equipment; collecting environmental sound data by using a microphone array to generate a corresponding environmental sound signal; the environment sound data comprises voice operation instructions and sound data of the display device; if the multimedia audio signal and the environmental sound signal have signal consistency in the same time period, echo cancellation is carried out on the environmental sound signal based on the multimedia audio signal to obtain the voice operation instruction, so that the voice device executes the operation corresponding to the voice operation instruction. The application also provides a voice device control system, a computer readable storage medium and a voice device, which have the beneficial effects.)

1. A voice device control method, comprising:

acquiring a multimedia audio signal output by display equipment;

collecting environmental sound data by using a microphone array to generate a corresponding environmental sound signal; the environment sound data comprises voice operation instructions and sound data of the display device;

and if the multimedia audio signal and the environmental sound signal have signal consistency in the same time period, performing echo cancellation on the environmental sound signal based on the multimedia audio signal to obtain the voice operation instruction, so that the voice equipment executes the operation corresponding to the voice operation instruction.

2. The voice device control method according to claim 1, wherein acquiring the multimedia audio signal output by the display device comprises:

acquiring multimedia data output by display equipment;

performing signal separation on the multimedia data to obtain multimedia audio data;

and performing frequency domain conversion on the multimedia audio data to obtain a multimedia audio signal.

3. The speech device control method according to claim 1, wherein generating the corresponding ambient sound signal comprises:

performing impedance matching on the environmental sound data to obtain a sound signal to be processed;

and amplifying the voltage amplitude of the sound signal to be processed by using an amplifying circuit to obtain an environment sound signal corresponding to the environment sound data.

4. The voice device control method according to claim 1, characterized by further comprising:

and if the multimedia audio signal and the environmental sound signal do not have signal consistency in the same time period, generating a noise prompt.

5. The speech device control method according to any one of claims 1 to 4, wherein before performing echo cancellation on the environmental sound signal based on the multimedia audio signal to obtain the speech operation instruction, the method further comprises:

and converting the environmental sound signal from a time domain to a frequency domain, and calculating the consistency difference between the multimedia audio signal and the environmental sound signal in the same time period.

6. The speech device control method according to claim 5, wherein calculating the difference in the coincidence of the multimedia audio signal and the ambient sound signal over the same period of time comprises:

intercepting the same number of first and second sample signals within the multimedia audio signal and the ambient sound signal, respectively; wherein the sample signal comprises a frequency, an amplitude and a phase of a signal;

calculating an amplitude difference and a phase difference of the first sample signal and the second sample signal at the same frequency;

if the amplitude difference and the phase difference both meet corresponding confidence intervals under the target frequency, judging that the multimedia audio signal and the environmental sound signal are credible under the target frequency;

and determining confidence level according to the credible frequency number, and determining that the multimedia audio signal and the environmental sound signal have signal consistency when the confidence level is greater than a preset confidence level.

7. The speech device control method according to claim 5, wherein performing echo cancellation on the environmental sound signal based on the multimedia audio signal to obtain the speech operation command comprises:

and taking the multimedia audio signal as a reference signal, and performing echo cancellation on the environmental sound signal by using a voice self-adaptive echo cancellation algorithm to obtain a voice operation instruction.

8. A voice device control system, comprising:

the multimedia signal acquisition module is used for acquiring a multimedia audio signal output by the display equipment;

the environment sound acquisition module is used for acquiring environment sound data by using the microphone array and generating a corresponding environment sound signal; the environment sound data comprises voice operation instructions and sound data of the display device;

and the signal comparison control module is used for carrying out echo cancellation on the environment sound signal based on the multimedia audio signal if the multimedia audio signal and the environment sound signal have signal consistency in the same time period to obtain the voice operation instruction so that the voice equipment can execute the operation corresponding to the voice operation instruction.

9. A computer-readable storage medium, on which a computer program is stored, which, when being executed by a processor, carries out the steps of the speech device control method according to any one of claims 1 to 7.

10. A speech device comprising a memory in which a computer program is stored and a processor which, when called upon by the computer program in the memory, implements the steps of the speech device control method according to any one of claims 1 to 7.

Technical Field

The present application relates to the field of acoustics, and in particular, to a method and a system for controlling a voice device, a computer-readable storage medium, and a voice device.

Background

At present, some sound boxes with voice functions in the market can be connected with a family television or other display equipment to achieve better user experience. However, when the display device television is used as a sound source, the sound box and the television are two independent devices, and the sound box end cannot recover sound signals, so that the sound box cannot distinguish which sound is noise and which sound is a voice control sound source, and the awakening rate of the sound box is affected. Especially, when there is other noise interference in the environment, the sound box will be worse in awakening effect, which affects the user experience.

Disclosure of Invention

The application aims to provide a voice device control method, a voice device control system, a computer readable storage medium and a voice device, which can reduce the voice control influence of multimedia audio on the voice device by comparing the consistency of a multimedia audio signal and environmental sound.

In order to solve the above technical problem, the present application provides a voice device control method, which has the following specific technical scheme:

acquiring a multimedia audio signal output by display equipment;

collecting environmental sound data by using a microphone array to generate a corresponding environmental sound signal; the environment sound data comprises voice operation instructions and sound data of the display device;

and if the multimedia audio signal and the environmental sound signal have signal consistency in the same time period, performing echo cancellation on the environmental sound signal based on the multimedia audio signal to obtain the voice operation instruction, so that the voice equipment executes the operation corresponding to the voice operation instruction.

Optionally, the acquiring the multimedia audio signal output by the display device includes:

acquiring multimedia data output by display equipment;

performing signal separation on the multimedia data to obtain multimedia audio data;

and performing frequency domain conversion on the multimedia audio data to obtain a multimedia audio signal.

Optionally, generating the corresponding ambient sound signal includes:

performing impedance matching on the environmental sound data to obtain a sound signal to be processed;

and amplifying the voltage amplitude of the sound signal to be processed by using an amplifying circuit to obtain an environment sound signal corresponding to the environment sound data.

Optionally, the method further includes:

and if the multimedia audio signal and the environmental sound signal do not have signal consistency in the same time period, generating a noise prompt.

Optionally, before performing echo cancellation on the environmental sound signal based on the multimedia audio signal and obtaining a voice operation instruction, the method further includes:

and converting the environmental sound signal from a time domain to a frequency domain, and calculating the consistency difference between the multimedia audio signal and the environmental sound signal in the same time period.

Optionally, calculating a difference between the consistency of the multimedia audio signal and the environmental sound signal in the same time period includes:

intercepting the same number of first and second sample signals within the multimedia audio signal and the ambient sound signal, respectively; wherein the sample signal comprises a frequency, an amplitude and a phase of a signal;

calculating an amplitude difference and a phase difference of the first sample signal and the second sample signal at the same frequency;

if the amplitude difference and the phase difference both meet corresponding confidence intervals under the target frequency, judging that the multimedia audio signal and the environmental sound signal are credible under the target frequency;

and determining confidence level according to the credible frequency number, and determining that the multimedia audio signal and the environmental sound signal have signal consistency when the confidence level is greater than a preset confidence level.

Optionally, performing echo cancellation on the environmental sound signal based on the multimedia audio signal, and obtaining a voice operation instruction includes:

and taking the multimedia audio signal as a reference signal, and performing echo cancellation on the environmental sound signal by using a voice self-adaptive echo cancellation algorithm to obtain a voice operation instruction.

The present application further provides a voice device control system, including:

the multimedia signal acquisition module is used for acquiring a multimedia audio signal output by the display equipment;

the environment sound acquisition module is used for acquiring environment sound data by using the microphone array and generating a corresponding environment sound signal; the environment sound data comprises voice operation instructions and sound data of the display device;

and the signal comparison control module is used for carrying out echo cancellation on the environment sound signal based on the multimedia audio signal if the multimedia audio signal and the environment sound signal have signal consistency in the same time period to obtain the voice operation instruction so that the voice equipment can execute the operation corresponding to the voice operation instruction.

The present application also provides a computer-readable storage medium having stored thereon a computer program which, when being executed by a processor, carries out the steps of the method as set forth above.

The present application further provides a speech device comprising a memory and a processor, wherein the memory stores a computer program, and the processor implements the steps of the method as described above when calling the computer program in the memory.

The application provides a voice device control method, which comprises the following steps: acquiring a multimedia audio signal output by display equipment; collecting environmental sound data by using a microphone array to generate a corresponding environmental sound signal; the environment sound data comprises voice operation instructions and sound data of the display device; and if the multimedia audio signal and the environmental sound signal have signal consistency in the same time period, performing echo cancellation on the environmental sound signal based on the multimedia audio signal to obtain the voice operation instruction, so that the voice equipment executes the operation corresponding to the voice operation instruction.

This application is through the multimedia audio signal who obtains display device output to obtain the environmental sound data that the microphone array gathered, include sound signal and other sound that display device sent, and then calculate the signal uniformity of two signals in the same time, if the two possesses signal uniformity, then can directly carry out echo cancellation to the environmental sound signal that the microphone array acquireed based on multimedia audio signal, can effectively strengthen echo cancellation effect, improve sound signal's SNR, thereby promote user experience.

The application also provides a voice device control system, a computer readable storage medium and a voice device, which have the above beneficial effects and are not described herein again.

Drawings

In order to more clearly illustrate the embodiments of the present application or the technical solutions in the prior art, the drawings needed to be used in the description of the embodiments or the prior art will be briefly introduced below, it is obvious that the drawings in the following description are only embodiments of the present application, and for those skilled in the art, other drawings can be obtained according to the provided drawings without creative efforts.

Fig. 1 is a flowchart of a voice device control method according to an embodiment of the present application;

fig. 2 is a flowchart of a signal consistency comparison process provided in the present embodiment;

fig. 3 is a schematic structural diagram of the voice device and the display device provided in this embodiment when performing voice control interaction;

fig. 4 is a schematic structural diagram of a speech device control system according to an embodiment of the present application;

fig. 5 is a schematic structural diagram of a speech device according to an embodiment of the present application.

Detailed Description

In order to make the objects, technical solutions and advantages of the embodiments of the present application clearer, the technical solutions in the embodiments of the present application will be clearly and completely described below with reference to the drawings in the embodiments of the present application, and it is obvious that the described embodiments are some embodiments of the present application, but not all embodiments. All other embodiments, which can be derived by a person skilled in the art from the embodiments given herein without making any creative effort, shall fall within the protection scope of the present application.

Referring to fig. 1, fig. 1 is a flowchart of a voice device control method according to an embodiment of the present application, where the method includes:

s101: acquiring a multimedia audio signal output by display equipment;

the step is intended to acquire a multimedia audio signal output by the display device, and how to acquire the multimedia audio signal of the display device is not particularly limited herein, and the multimedia audio signal may be acquired by an established connection manner between the voice device and the display device, including but not limited to wired transmission or wireless transmission. When the wired transmission is adopted, a corresponding Interface is adopted according to the type of data output by the display device, for example, an HDMI (High Definition Multimedia Interface) Interface or a DVI (Digital Visual Interface) Interface may be adopted. Accordingly, when wireless transmission is employed, wireless transmission may be employed including, but not limited to, Bluetooth, zigbee, and cellular networks.

It is noted that the present step is aimed at obtaining a multimedia audio signal. Whereas usually a display device comprises only multimedia data and does not distinguish between video signals and audio signals. In this case, the step includes a processing process of the multimedia data, and a multimedia audio signal is obtained. In this case, a preferred implementation of this step is as follows:

s1011: acquiring multimedia data output by display equipment;

s1012: performing signal separation on the multimedia data to obtain multimedia audio data;

s1013: and performing frequency domain conversion on the multimedia audio data to obtain a multimedia audio signal.

Firstly, multimedia data output by display equipment is directly obtained, signal separation is carried out on the multimedia data, and therefore multimedia audio data are obtained, and then frequency domain conversion can be carried out to convert the data into multimedia audio signals.

S102: collecting environmental sound data by using a microphone array to generate a corresponding environmental sound signal;

this step aims at collecting ambient sound data and generating a corresponding ambient sound signal. The environmental sound data comprises voice operation instructions and sound data of the display device, namely all environmental sounds detected by the microphone array in the current environment are converted into corresponding environmental sound signals.

The specific parameters of the microphone array used in this step are not limited, and the specific parameters include the microphone type, the sampling frequency, the maximum sound pickup distance, the microphone cascade mode, and the like, and can be set by those skilled in the art according to the specific application environment of the speech device.

Secondly, in the process of generating the corresponding ambient sound signal, the corresponding optimization can be performed for the ambient sound. For example, impedance matching and voltage amplification may be performed to facilitate processing of the ambient sound signal by subsequent processes. The microphone array is mainly used for collecting voice instructions of a user, but when the display device is started, sound emitted by the display device, user voice and environmental sound are inevitably received in the collection process. If the microphone array comprises an electret microphone, the output impedance of the electret microphone is high, and the electret microphone output cannot be directly connected with the rear-stage module, so that impedance matching is required. In addition, the sensitivity of the electret microphone is generally-60 to-30 db, the voltage amplitude of the environment sound signal passing through the microphone is low and is only a few millivolts, and the system cannot directly process the small signal, so that a large error is easily caused. In order to achieve better awakening and identification effects, an amplifying circuit is added, and the processing accuracy of the environmental sound signal is improved. Then executing this step to generate the corresponding ambient sound signal can be divided into the following two steps:

s1021: carrying out impedance matching on the environmental sound data to obtain a sound signal to be processed;

s1022: and amplifying the voltage amplitude of the sound signal to be processed by using an amplifying circuit to obtain an environment sound signal corresponding to the environment sound data.

Of course, the microphone array may also adopt other microphones such as a condenser microphone, and when generating the corresponding ambient sound signal, different ambient sound optimization manners may be adopted, which is not illustrated herein.

S103: and if the multimedia audio signal and the environmental sound signal have signal consistency in the same time period, performing echo cancellation on the environmental sound signal based on the multimedia audio signal to obtain the voice operation instruction, so that the voice equipment executes the operation corresponding to the voice operation instruction.

In this step, when the multimedia audio signal and the environmental sound signal have signal consistency in the same time period, echo cancellation is performed on the environmental sound signal, so that interference of the multimedia audio signal sent by the display device in the environmental sound signal on a user voice instruction is reduced. It is easily understood that the step is performed before the step or during the step, and the process of determining the signal consistency of the multimedia audio signal and the environmental sound signal in the same time period is performed. The present embodiment does not specifically limit how to judge the signal consistency. Specifically, the multimedia audio signal and the environmental sound signal may be compared with each other according to signal parameters, including but not limited to frequency, amplitude and phase, and the comparison process is to compare at least one of the above signal parameters, so as to determine whether the multimedia audio signal and the environmental sound signal have signal consistency.

On the basis of the present embodiment, before performing echo cancellation, optimization processing may be performed on the ambient sound signal, for example, to achieve microphone gain. Namely, the time domain to frequency domain conversion is firstly carried out on the environment sound signal, then the microphone gain is carried out on the signal obtained by the conversion, and then the consistency difference between the multimedia audio signal and the environment sound signal in the same time period can be calculated. The microphone gain can greatly improve the input range of the environmental sound signal, the gain can be realized under the input of a small signal, and the amplitude cannot be intercepted under the input of a large signal. When a small signal is input, the gain value is increased, and the signal can be amplified sufficiently. When a large signal is input, the gain value is reduced, and the signal is ensured not to be distorted.

When the multimedia audio signal and the environmental sound signal have signal consistency, echo cancellation is performed, and how to perform echo cancellation is not particularly limited, for example, the echo signal may be simulated by using a voice adaptive algorithm, and the simulated echo is subtracted from the environmental sound signal collected by the microphone array to achieve echo cancellation. Specifically, in this step, the multimedia audio signal may be used as a reference signal, and the voice adaptive echo cancellation algorithm is used to perform echo cancellation on the environmental sound signal, so as to obtain the voice operation instruction. The voice operation instruction removes the multimedia audio signal, which is equivalent to removing the sound emitted by the display device from the environment sound data received by the microphone array, thereby reducing the interference of the sound played outside the display device to the voice device control process.

Certainly, if the multimedia audio signal and the environmental sound signal do not have signal consistency in the same time period, a noise prompt can be generated to prompt the user that the sound of the current display device is large, and the recognition of the voice command of the user by the voice device is influenced. The noise prompt can be directly sent out by a voice device through a sound box, or the noise prompt is returned to a display device so as to display a corresponding text prompt on the display device.

The embodiment of the application acquires the multimedia audio signal output by the display device, acquires the environmental sound data acquired by the microphone array, and comprises the sound signal sent by the display device and other sounds, so as to calculate the signal consistency of the two signals in the same time, and if the two signals have the signal consistency, the echo cancellation can be directly carried out on the environmental sound signal acquired by the microphone array based on the multimedia audio signal, so that the echo cancellation effect can be effectively enhanced, the signal-to-noise ratio of the sound signal is improved, and the user experience is improved

Hereinafter, how to calculate the difference between the consistency of the multimedia audio signal and the environmental sound signal in the same time period is described in detail, referring to fig. 2, where fig. 2 is a flowchart of a signal consistency comparison process provided by this embodiment, the process may include the following steps:

s201: intercepting a same number of first sample signals and second sample signals within the multimedia audio signal and the ambient sound signal, respectively; wherein the sample signal comprises a frequency, an amplitude and a phase of the signal;

s202: calculating the amplitude difference and the phase difference of the first sample signal and the second sample signal under the same frequency;

s203: if the amplitude difference and the phase difference both meet the corresponding confidence interval under the target frequency, the credibility of the multimedia audio signal and the environmental sound signal at the target frequency is judged;

s204: and determining a confidence level according to the credible frequency number, and determining that the multimedia audio signal and the environmental sound signal have signal consistency when the confidence level is greater than a preset confidence level.

The first sample signal is derived from the multimedia audio signal and the second sample signal is derived from the ambient sound signal. The frequency range referred to when taking a sample can be freely set by a person skilled in the art, and typically human utterance frequencies can be referred to. The sample signals are intercepted at equal intervals or at unequal intervals. The confidence interval and the preset confidence level are not limited herein, and can be set by those skilled in the art. It is easily understood that the higher the preset confidence level, the higher the similarity of the multimedia audio signal and the ambient sound signal.

The above process is exemplified below:

since the frequency f range of the human voice is 100Hz to 10Kz, 1000 samples are respectively cut out from the multimedia audio signal and the environmental sound signal at the same interval in the range:

the frequency, amplitude and phase of the multimedia audio signal are as follows:

frequency: f1, F2, F3, … …, F1000;

amplitude value: a1, a2, A3, … …, a 1000;

phase position: b1, B2, B3, … …, B1000;

the frequency, amplitude and phase of the ambient sound signal are as follows:

frequency: f1, f2, f3, … …, f 1000; and F1 ═ F1, F2 ═ F2, … … F1000 ═ F1000;

amplitude value: a1, a2, a3, … …, a 1000;

phase position: b1, b2, b3, … …, b 1000;

calculating the amplitude difference DeltaM and the phase difference DeltaN of the two signals of the multimedia audio signal and the environmental sound signal in the frequency domain of the 1000 samples, namely:

ΔM1=A1-a1;

ΔM2=A2-a2;

……

ΔM1000=A1000-a1000;

ΔN1=B1-b1;

ΔN2=B2-b2;

……

ΔN1000=B1000-b1000;

here, the confidence intervals for the amplitude difference and phase difference are assumed, as shown in Table 1, with an amplitude difference confidence interval of 0db < Δ Mx< mdb; the confidence interval of the phase difference is less than 0 DEG Delta Nx<n°。

TABLE 1 amplitude difference and confidence interval of phase difference

Lower limit of confidence interval Upper limit of confidence interval
Difference in amplitude △Mx 0db mdb
Phase difference △Nx n

And if the amplitude difference delta Mx and the phase difference delta Nx of the target frequency fx both meet the preset confidence interval, the multimedia audio signal and the environmental sound signal are considered to be credible at the frequency point fx, and if any one amplitude difference or phase difference does not meet the confidence interval, the target frequency fx is considered to be incredible. Counting the credible frequency number P, and when the confidence coefficient delta is satisfied: and when delta is P/1000 and is more than 95 percent, the multimedia audio signal and the environment sound signal have consistency, the multimedia audio signal can be used as a reference signal effective for the environment sound signal to execute echo cancellation, the multimedia audio signal and the environment sound signal have consistency, otherwise, the multimedia audio signal and the environment sound signal do not have consistency.

Referring to fig. 3, fig. 3 is a schematic structural diagram of the voice device and the display device provided in this embodiment when performing voice control interaction, and fig. 3 mainly includes a voice device and a display device, where the voice device includes a front-end processing module of the voice device and a back-end processing module of the voice device. Specifically, in the process of implementing the control of the voice device according to the embodiment, the display device outputs the multimedia data to the HDMI input module of the rear-end processing module of the voice device through the HDMI interface, and then sends the multimedia data to the system terminal processing module for processing, the multimedia audio data can be separated from the multimedia data in the system terminal processing module, and the audio signal is subjected to time domain to frequency domain conversion to obtain a multimedia audio signal, and the multimedia audio signal is sent to the signal comparison module.

On the other hand, the front-end processing module of the voice equipment comprises a microphone array and an impedance matching and amplifying circuit. As can be seen from fig. 3, the microphone array can receive the external sound, the external noise and the user voice of the display device, and the external sound, the external noise and the user voice are processed by the impedance matching and amplifying circuit and then transmitted to the AD sampling module of the back-end processing module of the voice device.

In the back-end processing module of the voice equipment, the AD sampling module is used for performing data sampling, so that the input range of a voice signal can be greatly improved, small signals can be awakened when being input, and amplitude interception does not exist in large signals. And the Fast Fourier Transform (FFT) module is used for converting the sampled and amplified signal from a time domain to a frequency domain.

The signal comparison module is used for calculating the trend consistency difference between the multimedia audio signal and the environmental sound signal and giving a judgment result. If the multimedia audio signal and the environmental sound signal have signal consistency in the same time period, the echo cancellation module performs echo cancellation on the environmental sound signal based on the multimedia audio signal, and then the voice operation execution module can perform user voice recognition processing. In addition, the data processing executed by the voice operation execution module can also be executed by the system terminal processing module.

In the following, the voice device control system provided by the embodiment of the present application is introduced, and the voice device control system described below and the voice device control method described above may be referred to correspondingly.

Referring to fig. 4, fig. 4 is a schematic structural diagram of a voice device control system according to an embodiment of the present application, and the present application further provides a voice device control system, including:

a multimedia signal obtaining module 100, configured to obtain a multimedia audio signal output by a display device;

an ambient sound collection module 200, configured to collect ambient sound data by using a microphone array, and generate a corresponding ambient sound signal; the environment sound data comprises voice operation instructions and sound data of the display device;

and the signal comparison control module 300 is configured to perform echo cancellation on the environment sound signal based on the multimedia audio signal if the multimedia audio signal and the environment sound signal have signal consistency in the same time period, so as to obtain the voice operation instruction, so that a voice device executes an operation corresponding to the voice operation instruction.

Based on the above embodiment, as a preferred embodiment, the multimedia signal acquiring module 100 includes:

the acquisition unit is used for acquiring multimedia data output by the display equipment;

the signal separation unit is used for carrying out signal separation on the multimedia data to obtain multimedia audio data;

and the signal processing unit is used for carrying out frequency domain conversion on the multimedia audio data to obtain a multimedia audio signal.

Based on the above-described embodiment, as a preferred embodiment, the ambient sound collection module 200 includes:

the impedance matching unit is used for performing impedance matching on the environmental sound data to obtain a sound signal to be processed;

and the signal amplification unit is used for amplifying the voltage amplitude of the sound signal to be processed by using an amplification circuit to obtain an environment sound signal corresponding to the environment sound data.

Based on the above embodiment, as a preferred embodiment, the method further includes:

and the excessive noise prompt module is used for generating a noise prompt if the multimedia audio signal and the environmental sound signal do not have signal consistency in the same time period.

Based on the above embodiment, as a preferred embodiment, the method further includes:

and the signal processing module is used for converting the time domain to the frequency domain of the environment sound signal and calculating the consistency difference between the multimedia audio signal and the environment sound signal in the same time period.

Based on the above embodiment, as a preferred embodiment, the signal processing module includes:

a signal intercepting unit for respectively intercepting a same number of first and second sample signals within the multimedia audio signal and the ambient sound signal; wherein the sample signal comprises a frequency, an amplitude and a phase of a signal;

a signal interception unit for calculating an amplitude difference and a phase difference of the first sample signal and the second sample signal at the same frequency;

the credibility judging unit is used for judging that the multimedia audio signal and the environmental sound signal are credible at the target frequency if the amplitude difference and the phase difference both meet corresponding confidence intervals at the target frequency;

and the consistency judging unit is used for determining confidence level according to the credible frequency number, and when the confidence level is greater than the preset confidence level, determining that the multimedia audio signal and the environmental sound signal have signal consistency.

Based on the above embodiment, as a preferred embodiment, the signal comparison control module 300 includes:

and the echo cancellation unit is used for carrying out echo cancellation on the environmental sound signal by using the multimedia audio signal as a reference signal and utilizing a voice self-adaptive echo cancellation algorithm to obtain a voice operation instruction.

The present application also provides a computer readable storage medium having stored thereon a computer program which, when executed, may implement the steps provided by the above-described embodiments. The storage medium may include: various media capable of storing program codes, such as a usb disk, a removable hard disk, a Read-Only Memory (ROM), a Random Access Memory (RAM), a magnetic disk, or an optical disk.

The present application further provides a speech device, which may include a memory and a processor, where the memory stores a computer program, and when the processor calls the computer program in the memory, the speech device may implement the steps of the speech device control method provided in the above embodiments, where the processor calls the computer program in the memory, and the speech device may include a memory and a processor, where the memory stores a computer program, and the processor calls the computer program in the memory, the steps of the speech device control method provided in the above embodiments may be implemented. Of course, the voice device may further include various network interfaces, power supplies, and other components, for example, the voice device may specifically be an intelligent sound box including a bluetooth function and a voice recognition function, or an intelligent sound box including a GPRS function and a voice recognition function. Referring to fig. 5, fig. 5 is a schematic structural diagram of a speech device according to an embodiment of the present application, where the speech device according to the embodiment may include: a processor 2101 and a memory 2102.

Optionally, the speech device may also include a communication interface 2103, an input unit 2104, and a display 2105 and a communication component 2106.

The processor 2101, the memory 2102, the communication interface 2103 and the input unit 2104 all communicate with each other via the communication component 2105.

In the embodiment of the present application, the processor 2101 may be a Central Processing Unit (CPU), an application specific integrated circuit (asic), a digital signal processor, an off-the-shelf programmable gate array (fpga) or other programmable logic device.

The processor may call a program stored in the memory 2102. In particular, the processor may perform the operations performed by the speech device in the above embodiments.

The memory 2102 stores one or more programs, which may include program code including computer operating instructions, and in this embodiment, at least one program for implementing the following functions is stored in the memory:

acquiring a multimedia audio signal output by display equipment;

collecting environmental sound data by using a microphone array to generate a corresponding environmental sound signal; the environment sound data comprises voice operation instructions and sound data of the display device;

and if the multimedia audio signal and the environmental sound signal have signal consistency in the same time period, performing echo cancellation on the environmental sound signal based on the multimedia audio signal to obtain the voice operation instruction, so that the voice equipment executes the operation corresponding to the voice operation instruction.

In one possible implementation, the memory 2102 may include a program storage area and a data storage area, wherein the program storage area may store an operating system, an application program required by at least one function (such as a topic detection function, etc.), and the like; the storage data area may store data created according to the use of the computer.

Further, the memory 2102 may include high speed random access memory, and may also include non-volatile memory, such as at least one disk storage device or other volatile solid state storage device.

The communication interface 2103 may be an interface of a communication module, such as an interface of a GSM module, and may further comprise a corresponding interface for multimedia signals, such as a DVI interface or an HDMI interface, etc.

The structure of the speech device shown in fig. 5 does not constitute a limitation of the speech device in the embodiment of the present application, and in practical applications, the speech device may include more or less components than those shown in fig. 5, or some components may be combined.

The embodiments are described in a progressive manner in the specification, each embodiment focuses on differences from other embodiments, and the same and similar parts among the embodiments are referred to each other. For the system provided by the embodiment, the description is relatively simple because the system corresponds to the method provided by the embodiment, and the relevant points can be referred to the method part for description.

The principles and embodiments of the present application are explained herein using specific examples, which are provided only to help understand the method and the core idea of the present application. It should be noted that, for those skilled in the art, it is possible to make several improvements and modifications to the present application without departing from the principle of the present application, and such improvements and modifications also fall within the scope of the claims of the present application.

It is further noted that, in the present specification, relational terms such as first and second, and the like are used solely to distinguish one entity or action from another entity or action without necessarily requiring or implying any actual such relationship or order between such entities or actions. Also, the terms "comprises," "comprising," or any other variation thereof, are intended to cover a non-exclusive inclusion, such that a process, method, article, or apparatus that comprises a list of elements does not include only those elements but may include other elements not expressly listed or inherent to such process, method, article, or apparatus. Without further limitation, an element defined by the phrase "comprising an … …" does not exclude the presence of other identical elements in a process, method, article, or apparatus that comprises the element.

16页详细技术资料下载
上一篇:一种医用注射器针头装配设备
下一篇:音频信号的处理方法、装置及设备

网友询问留言

已有0条留言

还没有人留言评论。精彩留言会获得点赞!

精彩留言,会给你点赞!