Voice broadcasting method and device

文档序号:1467471 发布日期:2020-02-21 浏览:13次 中文

阅读说明:本技术 语音播报方法及装置 (Voice broadcasting method and device ) 是由 宋夏 于 2019-11-15 设计创作,主要内容包括:本申请提供一种语音播报方法及装置。该方法包括:获取待语音播报的文本,根据敏感词数据库确定待语音播报的文本中是否包含敏感词,敏感词为与预设的语音识别命令发音相同或相似的词,若确定待语音播报的文本中包含敏感词,在播放与待语音播报的文本对应的待播报语音时根据敏感词调整语音识别算法的参数。从而,可减少触发语音误识别的概率,提升用户体验。(The application provides a voice broadcasting method and device. The method comprises the following steps: the method comprises the steps of obtaining a text to be broadcasted by voice, determining whether the text to be broadcasted by voice contains sensitive words according to a sensitive word database, wherein the sensitive words are words with the same or similar pronunciation as a preset voice recognition command, and if the text to be broadcasted by voice contains the sensitive words, adjusting parameters of a voice recognition algorithm according to the sensitive words when the voice to be broadcasted corresponding to the text to be broadcasted by voice is broadcasted. Therefore, the probability of triggering voice misrecognition can be reduced, and the user experience is improved.)

1. A voice broadcast method, comprising:

acquiring a text to be broadcasted in a voice mode;

determining whether a text to be subjected to voice broadcast contains a sensitive word according to a sensitive word database, wherein the sensitive word is a word with the same or similar pronunciation as a preset voice recognition command;

and if the text to be broadcasted contains the sensitive words, adjusting parameters of a voice recognition algorithm according to the sensitive words when the voice to be broadcasted corresponding to the text to be broadcasted is broadcasted.

2. The method according to claim 1, wherein the adjusting parameters of a speech recognition algorithm according to the sensitive words when playing the speech to be broadcasted corresponding to the text to be broadcasted comprises:

synthesizing the text to be broadcasted into the voice to be broadcasted, and extracting the time point of the occurrence of the sensitive words according to the voice to be broadcasted; playing the voice to be broadcasted, and adjusting the recognition threshold value of the voice recognition command corresponding to the sensitive word when the time point of the sensitive word is reached;

alternatively, the first and second electrodes may be,

acquiring voice synthesis parameters of the sensitive words from the sensitive word database, and adjusting the voice synthesis parameters of the sensitive words when synthesizing the text to be voice broadcast into the voice to be broadcast; playing the voice to be broadcasted;

alternatively, the first and second electrodes may be,

acquiring voice synthesis parameters of the sensitive words from the sensitive word database, adjusting the voice synthesis parameters of the sensitive words when synthesizing the text to be broadcasted into voice to be broadcasted, and extracting time points of the sensitive words according to the voice to be broadcasted; and playing the voice to be broadcasted, and adjusting the recognition threshold value of the voice recognition command corresponding to the sensitive word when the time point of the sensitive word is reached.

3. The method according to claim 1, wherein before determining whether the text to be voice-broadcasted contains the sensitive words according to the sensitive word database, the method further comprises:

and determining that the audio length of the text to be broadcasted after being synthesized into voice is larger than a preset threshold value.

4. The method of claim 3, further comprising:

if the audio length is smaller than the preset threshold value, closing a voice recognition algorithm, synthesizing the text to be broadcasted into voice to be broadcasted, and then broadcasting, and starting the voice recognition algorithm after the voice to be broadcasted is broadcasted.

5. The method of claim 1, further comprising:

monitoring whether voice recognition is triggered or not in the process of playing the voice to be broadcasted;

if the fact that the voice recognition is triggered is monitored, recording a playing word which triggers the voice recognition and a command word which is triggered by the playing word;

determining whether the playing word triggers voice recognition according to the playing word and the command word;

and if the fact that the played words trigger voice recognition is determined, storing the played words and the voice synthesis parameters of the played words into the sensitive word database.

6. The method of claim 5, wherein the determining whether the played word triggers speech recognition based on the played word and the command word comprises:

synthesizing the played words into voice;

inputting the voice synthesized by the played word into the voice recognition algorithm, and calculating the matching score of the played word and the command word through the voice recognition algorithm;

if the matching score is larger than a preset value, determining that the played word triggers voice recognition;

and if the matching score is smaller than a preset value, determining that the speech recognition is not triggered by the played word.

7. The method of claim 5 or 6, further comprising:

and if the playing words exist in the data sensitive word database, adjusting the voice synthesis parameters corresponding to the playing words stored in the sensitive word database.

8. A voice broadcast device, comprising:

the acquisition module is used for acquiring a text to be broadcasted in a voice mode;

the first determining module is used for determining whether the text to be broadcasted by voice contains a sensitive word according to a sensitive word database, wherein the sensitive word is a word with pronunciation the same as or similar to that of a preset voice recognition command;

and the processing module is used for determining that the text to be broadcasted contains the sensitive words in the determining module, and adjusting parameters of a voice recognition algorithm according to the sensitive words when the voice to be broadcasted corresponding to the text to be broadcasted is broadcasted.

9. The apparatus of claim 8, wherein the processing module is configured to:

synthesizing the text to be broadcasted into the voice to be broadcasted, and extracting the time point of the occurrence of the sensitive words according to the voice to be broadcasted; playing the voice to be broadcasted, and adjusting the recognition threshold value of the voice recognition command corresponding to the sensitive word when the time point of the sensitive word is reached;

alternatively, the first and second electrodes may be,

acquiring voice synthesis parameters of the sensitive words from the sensitive word database, and adjusting the voice synthesis parameters of the sensitive words when synthesizing the text to be voice broadcast into the voice to be broadcast; playing the voice to be broadcasted;

alternatively, the first and second electrodes may be,

acquiring voice synthesis parameters of the sensitive words from the sensitive word database, adjusting the voice synthesis parameters of the sensitive words when synthesizing the text to be broadcasted into voice to be broadcasted, and extracting time points of the sensitive words according to the voice to be broadcasted; and playing the voice to be broadcasted, and adjusting the recognition threshold value of the voice recognition command corresponding to the sensitive word when the time point of the sensitive word is reached.

10. The apparatus of claim 8, further comprising:

and the second determining module is used for determining that the audio length of the text to be broadcasted after being synthesized into voice is larger than a preset threshold value before the first determining module determines whether the text to be broadcasted contains the sensitive words according to the sensitive word database.

11. The apparatus of claim 10, wherein the processing module is further configured to:

and when the second determining module determines that the audio length is smaller than the preset threshold value, closing a voice recognition algorithm, synthesizing the text to be broadcasted into voice to be broadcasted and then broadcasting the voice, and starting the voice recognition algorithm after the voice to be broadcasted is broadcasted.

12. The apparatus of claim 8, further comprising:

the monitoring module is used for monitoring whether voice recognition is triggered or not in the process of playing the voice to be broadcasted;

the processing module is further configured to: if the monitoring module monitors that voice recognition is triggered, recording a playing word which triggers the voice recognition and a command word which triggers the playing word;

a third determining module, configured to determine whether voice recognition is triggered by the playing word according to the playing word and the command word;

the processing module is further configured to: and if the third determining module determines that the played word triggers voice recognition, storing the played word and the voice synthesis parameter of the played word into the sensitive word database.

13. The apparatus of claim 12, wherein the third determining module is configured to:

synthesizing the played words into voice;

inputting the voice synthesized by the played word into the voice recognition algorithm, and calculating the matching score of the played word and the command word through the voice recognition algorithm;

if the matching score is larger than a preset value, determining that the played word triggers voice recognition;

and if the matching score is smaller than a preset value, determining that the speech recognition is not triggered by the played word.

14. The apparatus of claim 12 or 13, wherein the processing module is further configured to:

and if the playing words exist in the data sensitive word database, adjusting the voice synthesis parameters corresponding to the playing words stored in the sensitive word database.

15. A computer-readable storage medium on which a computer program is stored, the computer program being characterized by implementing the voice broadcasting method according to any one of claims 1 to 7 when executed by a processor.

16. An electronic device, comprising:

a processor; and

a memory for storing executable instructions of the processor;

wherein the processor is configured to perform the voice broadcasting method of any one of claims 1 to 7 via execution of the executable instructions.

Technical Field

The present application relates to the field of communications technologies, and in particular, to a voice broadcast method and apparatus.

Background

At present, many intelligent household electrical appliances all possess speech recognition and voice broadcast's function, in order to prevent that the voice broadcast content of self from triggering the speech recognition of self by mistake, among the prior art, usually through gathering the audio signal of loudspeaker broadcast as reference signal, use echo cancellation algorithm, "subtract" the audio signal of loudspeaker broadcast in the audio signal of microphone collection. However, since the indoor environments of the household appliances are different, and the structural positions of the speaker and the microphone of different household appliances are different, the echo cancellation algorithm cannot ensure that the audio signal played by the speaker is completely cancelled from the audio signal acquired by the microphone, so that the voice misrecognition is occasionally triggered, and the user experience is influenced.

Disclosure of Invention

The application provides a voice broadcasting method and device, which can reduce the probability of triggering voice misrecognition and improve user experience.

In a first aspect, the present application provides a voice broadcast method, including:

acquiring a text to be broadcasted in a voice mode;

determining whether the text to be subjected to voice broadcast contains sensitive words according to a sensitive word database, wherein the sensitive words are words with pronunciation the same as or similar to that of a preset voice recognition command;

and if the text to be broadcasted contains the sensitive words, adjusting parameters of a voice recognition algorithm according to the sensitive words when the voice to be broadcasted corresponding to the text to be broadcasted is broadcasted.

Optionally, adjusting parameters of a speech recognition algorithm according to the sensitive words when the speech to be broadcasted corresponding to the text to be broadcasted is broadcasted includes:

synthesizing the text to be broadcasted into the voice to be broadcasted, and extracting the time point of the occurrence of the sensitive words according to the voice to be broadcasted; playing the voice to be broadcasted, and adjusting the recognition threshold value of the voice recognition command corresponding to the sensitive word when the time point of the sensitive word is reached;

alternatively, the first and second electrodes may be,

acquiring voice synthesis parameters of the sensitive words from the sensitive word database, and adjusting the voice synthesis parameters of the sensitive words when synthesizing the text to be voice broadcast into the voice to be broadcast; playing the voice to be broadcasted;

alternatively, the first and second electrodes may be,

acquiring voice synthesis parameters of the sensitive words from the sensitive word database, adjusting the voice synthesis parameters of the sensitive words when synthesizing the text to be broadcasted into voice to be broadcasted, and extracting time points of the sensitive words according to the voice to be broadcasted; and playing the voice to be broadcasted, and adjusting the recognition threshold value of the voice recognition command corresponding to the sensitive word when the time point of the sensitive word is reached.

Optionally, before determining whether the text to be voice-broadcasted includes the sensitive word according to the sensitive word database, the method further includes:

and determining that the audio length of the text to be broadcasted after being synthesized into voice is larger than a preset threshold value.

Optionally, the method further includes:

if the audio length is smaller than the preset threshold value, closing a voice recognition algorithm, synthesizing the text to be broadcasted into voice to be broadcasted, and then broadcasting, and starting the voice recognition algorithm after the voice to be broadcasted is broadcasted.

Optionally, the method further includes:

monitoring whether voice recognition is triggered or not in the process of playing the voice to be broadcasted;

if the fact that the voice recognition is triggered is monitored, recording a playing word which triggers the voice recognition and a command word which is triggered by the playing word;

determining whether the playing word triggers voice recognition according to the playing word and the command word;

and if the fact that the played words trigger voice recognition is determined, storing the played words and the voice synthesis parameters of the played words into the sensitive word database.

Optionally, the determining whether the playing word triggers voice recognition according to the playing word and the command word includes:

synthesizing the played words into voice;

inputting the voice synthesized by the played word into the voice recognition algorithm, and calculating the matching score of the played word and the command word through the voice recognition algorithm;

if the matching score is larger than a preset value, determining that the played word triggers voice recognition;

and if the matching score is smaller than a preset value, determining that the speech recognition is not triggered by the played word.

Optionally, the method further includes:

and if the playing words exist in the data sensitive word database, adjusting the voice synthesis parameters corresponding to the playing words stored in the sensitive word database.

In a second aspect, the present application provides a device for preventing voice broadcast from causing voice misrecognition, comprising:

the acquisition module is used for acquiring a text to be broadcasted in a voice mode;

the first determining module is used for determining whether the text to be broadcasted by voice contains a sensitive word according to a sensitive word database, wherein the sensitive word is a word with pronunciation the same as or similar to that of a preset voice recognition command;

and the processing module is used for determining that the text to be broadcasted contains the sensitive words in the determining module, and adjusting parameters of a voice recognition algorithm according to the sensitive words when the voice to be broadcasted corresponding to the text to be broadcasted is broadcasted.

Optionally, the processing module is configured to:

synthesizing the text to be broadcasted into the voice to be broadcasted, and extracting the time point of the occurrence of the sensitive words according to the voice to be broadcasted; playing the voice to be broadcasted, and adjusting the recognition threshold value of the voice recognition command corresponding to the sensitive word when the time point of the sensitive word is reached;

alternatively, the first and second electrodes may be,

acquiring voice synthesis parameters of the sensitive words from the sensitive word database, and adjusting the voice synthesis parameters of the sensitive words when synthesizing the text to be voice broadcast into the voice to be broadcast; playing the voice to be broadcasted;

alternatively, the first and second electrodes may be,

acquiring voice synthesis parameters of the sensitive words from the sensitive word database, adjusting the voice synthesis parameters of the sensitive words when synthesizing the text to be broadcasted into voice to be broadcasted, and extracting time points of the sensitive words according to the voice to be broadcasted; and playing the voice to be broadcasted, and adjusting the recognition threshold value of the voice recognition command corresponding to the sensitive word when the time point of the sensitive word is reached.

Optionally, the apparatus further comprises:

and the second determining module is used for determining that the audio length of the text to be broadcasted after being synthesized into voice is larger than a preset threshold value before the first determining module determines whether the text to be broadcasted contains the sensitive words according to the sensitive word database.

Optionally, the processing module is further configured to:

and when the second determining module determines that the audio length is smaller than the preset threshold value, closing a voice recognition algorithm, synthesizing the text to be broadcasted into voice to be broadcasted and then broadcasting the voice, and starting the voice recognition algorithm after the voice to be broadcasted is broadcasted.

Optionally, the apparatus further comprises:

the monitoring module is used for monitoring whether voice recognition is triggered or not in the process of playing the voice to be broadcasted;

the processing module is further configured to: if the monitoring module monitors that voice recognition is triggered, recording a playing word which triggers the voice recognition and a command word which triggers the playing word;

a third determining module, configured to determine whether voice recognition is triggered by the playing word according to the playing word and the command word;

the processing module is further configured to: and if the third determining module determines that the played word triggers voice recognition, storing the played word and the voice synthesis parameter of the played word into the sensitive word database.

Optionally, the third determining module is configured to:

synthesizing the played words into voice;

inputting the voice synthesized by the played word into the voice recognition algorithm, and calculating the matching score of the played word and the command word through the voice recognition algorithm;

if the matching score is larger than a preset value, determining that the played word triggers voice recognition;

and if the matching score is smaller than a preset value, determining that the speech recognition is not triggered by the played word.

Optionally, the processing module is further configured to:

and if the playing words exist in the data sensitive word database, adjusting the voice synthesis parameters corresponding to the playing words stored in the sensitive word database.

According to the voice broadcasting method and device, the text to be subjected to voice broadcasting is obtained, whether the text to be subjected to voice broadcasting contains the sensitive words or not is determined according to the sensitive word database, if the text to be subjected to voice broadcasting contains the sensitive words, parameters of a voice recognition algorithm are adjusted according to the sensitive words when the voice to be broadcasted corresponding to the text to be subjected to voice broadcasting is broadcasted, so that the probability of triggering voice misrecognition can be reduced, and user experience is improved.

Drawings

In order to more clearly illustrate the technical solutions in the present application or the prior art, the drawings needed to be used in the description of the embodiments or the prior art will be briefly introduced below, and it is obvious that the drawings in the following description are some embodiments of the present application, and it is obvious for those skilled in the art to obtain other drawings based on these drawings without inventive exercise.

Fig. 1 is a flowchart of an embodiment of a voice broadcast method provided in the present application;

fig. 2 is a flowchart of an embodiment of a voice broadcast method provided in the present application;

fig. 3 is a schematic structural diagram of an embodiment of a voice broadcast device provided in the present application;

fig. 4 is a schematic structural diagram of an embodiment of a voice broadcast device provided in the present application;

fig. 5 is a schematic structural diagram of an embodiment of a voice broadcast device provided in the present application;

fig. 6 is a schematic diagram of a hardware structure of an electronic device provided in the present application.

Detailed Description

To make the purpose, technical solutions and advantages of the present application clearer, the technical solutions in the present application will be clearly and completely described below with reference to the drawings in the present application, and it is obvious that the described embodiments are some, but not all embodiments of the present application. All other embodiments, which can be derived by a person skilled in the art from the embodiments given herein without making any creative effort, shall fall within the protection scope of the present application.

In the existing intelligent household appliance, in order to prevent the voice broadcast content of the intelligent household appliance from triggering the voice recognition of the intelligent household appliance by mistake, the audio signal played by a loudspeaker is generally collected to be used as a reference signal, an echo cancellation algorithm is used, the audio signal played by the loudspeaker is subtracted from the audio signal collected by a microphone, but the echo cancellation algorithm cannot ensure that the audio signal played by the loudspeaker is completely cancelled from the audio signal collected by the microphone, so that the voice recognition by mistake can be triggered occasionally, and the user experience can be influenced. In order to solve the problem, the application provides a voice broadcasting method and device, whether a text to be subjected to voice broadcasting contains a sensitive word is determined according to a sensitive word database, if the text to be subjected to voice broadcasting contains the sensitive word is determined, parameters of a voice recognition algorithm are adjusted according to the sensitive word when voice to be broadcasted corresponding to the text to be subjected to voice broadcasting is broadcasted, so that the probability of triggering voice misrecognition can be reduced, and user experience is improved. The following describes a specific implementation process of the voice broadcast method according to the embodiment of the present application in detail by using a specific embodiment with reference to the accompanying drawings.

Fig. 1 is a flowchart of an embodiment of a voice broadcast method provided in the present application, where an execution subject in the present embodiment may be an intelligent appliance, and as shown in fig. 1, the method of the present embodiment may include:

s101, obtaining a text to be subjected to voice broadcast.

In particular text, i.e. text content.

S102, determining whether the text to be subjected to voice broadcast contains a sensitive word according to the sensitive word database, wherein the sensitive word is a word with the same or similar pronunciation as the preset voice recognition command.

Specifically, the sensitive word refers to a word having the same or similar pronunciation as a preset voice recognition command, for example, the voice recognition command word "turn on light" is preset in the smart home appliance, and if the word having the same or similar pronunciation as the "turn on light" is "turn on light" for example, the word "turn on light" is a sensitive word. The sensitive word database can be determined according to a preset voice recognition command word list of the product in the development stage of the household appliance product, for example, words with similar pronunciations to the words such as 'turn on light' and 'turn on light' can be stored in the sensitive word database. If the voice misrecognition is triggered, the words which trigger the voice misrecognition can be stored in a sensitive word database.

Wherein, S102 may specifically be: and searching words which are the same as the words in the text to be broadcasted in voice from the sensitive word database.

S103, if the text to be broadcasted contains the sensitive words, adjusting parameters of a voice recognition algorithm according to the sensitive words when the voice to be broadcasted corresponding to the text to be broadcasted is broadcasted.

Specifically, if it is determined that the text to be subjected to voice broadcast contains the sensitive word, the parameters of the voice recognition algorithm are adjusted according to the sensitive word when the voice to be broadcast corresponding to the text to be subjected to voice broadcast is played, and the embodiment has three implementable modes:

the method comprises the steps of synthesizing a text to be broadcasted into a voice to be broadcasted, extracting a time point of occurrence of a sensitive word according to the voice to be broadcasted, then broadcasting the voice to be broadcasted, and adjusting a recognition threshold value of a voice recognition command corresponding to the sensitive word when the time point of occurrence of the sensitive word is reached.

Specifically, a Text-to-Speech (TTS) Speech synthesis technology may be used to synthesize a Text to be broadcasted into a Speech to be broadcasted, and when the Speech to be broadcasted is broadcasted and a time point at which a sensitive word appears is reached, a recognition threshold of a Speech recognition command corresponding to the sensitive word is adjusted, and if the sensitive word "turn on light" is to be broadcasted, the recognition threshold of the Speech recognition command word "turn on light" in a Speech recognition algorithm is synchronously increased, so as to prevent the sensitive word from being falsely triggered for Speech recognition.

And secondly, acquiring the voice synthesis parameters of the sensitive words from the sensitive word database, adjusting the voice synthesis parameters of the sensitive words when synthesizing the text to be broadcasted into the voice to be broadcasted, and then broadcasting the voice to be broadcasted.

The speech synthesis parameters include volume, speech rate and pitch. The sensitive word database stores the sensitive words and the speech synthesis parameters of the sensitive words, and when synthesizing the text to be broadcasted into the speech to be broadcasted, the speech synthesis parameters of the sensitive words are adjusted, such as reducing the volume, increasing the speed of speech, increasing or decreasing the pitch, etc., and the adjustment method can be various, such as multiplying the fixed proportionality coefficient each time, for example, adjusting the volume to 95% of the original volume, increasing the speed of speech to 105% of the original volume, etc. Thus, the probability of triggering speech misrecognition can be reduced.

Acquiring voice synthesis parameters of the sensitive words from the sensitive word database, adjusting the voice synthesis parameters of the sensitive words when synthesizing the text to be broadcasted into the voice to be broadcasted, and extracting the time points of the appearance of the sensitive words according to the voice to be broadcasted; and then playing the voice to be broadcasted, and adjusting the recognition threshold value of the voice recognition command corresponding to the sensitive word when the time point of the occurrence of the sensitive word is reached.

In the third mode, the first mode and the second mode are executed simultaneously, the voice synthesis parameters of the sensitive words are adjusted when the text to be broadcasted is synthesized into the voice to be broadcasted, and the recognition threshold value of the voice recognition command corresponding to the sensitive words is adjusted when the time point of the appearance of the sensitive words is reached when the voice to be broadcasted is broadcasted. Thereby further reducing the probability of triggering speech misrecognition.

Further, before determining whether the text to be voice-broadcasted includes the sensitive word according to the sensitive word database in S102, the method of this embodiment may further include:

and S104, determining that the audio length of the text to be subjected to voice broadcast after being synthesized into voice is larger than a preset threshold value. The preset threshold is, for example, 2 seconds or another value, that is, if it is determined that the audio length after the text to be broadcasted is synthesized into the voice is greater than the preset threshold, S102 is executed next.

And S105, if the audio length is determined to be smaller than the preset threshold value, closing the voice recognition algorithm, synthesizing the text to be broadcasted into the voice to be broadcasted, then broadcasting, and starting the voice recognition algorithm after the voice to be broadcasted is broadcasted. In the embodiment, when the audio length is determined to be smaller than the preset threshold, the voice recognition algorithm is closed, so that the voice recognition error caused by playing the audio can be prevented. And the voice recognition algorithm is directly closed for the short voice, so that the voice recognition error caused by playing the audio can be prevented.

Through the method, the probability of triggering the voice misrecognition in the voice broadcasting process can be reduced, and if the voice misrecognition is triggered in the broadcasting process, further, the method of the embodiment can also comprise the following steps:

and S106, monitoring whether voice recognition is triggered or not in the process of playing the voice to be broadcasted.

And S107, if the voice recognition is triggered, recording the playing words which trigger the voice recognition and the command words triggered by the playing words, such as recording 'large light on' and 'light on'.

And S108, determining whether the playing word triggers voice recognition according to the playing word and the command word.

Wherein, S108 may specifically be: synthesizing the played words into voice;

inputting the voice synthesized by the played words into a voice recognition algorithm, and calculating matching scores of the played words and the command words through the voice recognition algorithm;

if the matching score is larger than a preset value, determining that the speech recognition is triggered by the played word;

and if the matching score is smaller than the preset value, determining that the speech recognition is not triggered by the played word.

And S109, if the fact that the played word triggers the voice recognition is determined, storing the played word and the voice synthesis parameter of the played word into a sensitive word database.

And if the playing words exist in the data sensitive word database, adjusting the speech synthesis parameters corresponding to the playing words stored in the sensitive word database. For example, the volume is decreased, the speech rate is increased, the pitch is increased or decreased, etc., and the adjustment method may be many, for example, multiplying by a fixed scale factor each time, such as adjusting the volume to 95% of the original volume, increasing the speech rate to 105% of the original volume, etc., thereby decreasing the probability that the sensitive word is misrecognized by the speech.

According to the voice broadcasting method provided by the embodiment, the text to be subjected to voice broadcasting is obtained, whether the text to be subjected to voice broadcasting contains the sensitive words or not is determined according to the sensitive word database, and if the text to be subjected to voice broadcasting contains the sensitive words, the parameters of the voice recognition algorithm are adjusted according to the sensitive words when the voice to be broadcasted corresponding to the text to be subjected to voice broadcasting is broadcasted, so that the probability of triggering voice misrecognition can be reduced, and the user experience is improved.

The following describes the technical solution of the embodiment of the method shown in fig. 1 in detail by using a specific embodiment.

Fig. 2 is a flowchart of an embodiment of a voice broadcast method provided in the present application, and as shown in fig. 2, the method of the present embodiment may include:

s201, obtaining a text to be subjected to voice broadcast.

S202, determining whether the audio length of the text to be subjected to voice broadcast after being synthesized into voice is larger than a preset threshold value. If not, S203 is executed, and if yes, S204 is executed.

S203, closing the voice recognition algorithm, synthesizing the text to be broadcasted into voice to be broadcasted, and then broadcasting, and starting the voice recognition algorithm after the voice to be broadcasted is broadcasted.

And S204, determining whether the text to be subjected to voice broadcast contains a sensitive word according to the sensitive word database, wherein the sensitive word is a word with the same or similar pronunciation as the preset voice recognition command.

And S205, if the text to be broadcasted contains the sensitive words, adjusting parameters of a voice recognition algorithm according to the sensitive words when the voice to be broadcasted corresponding to the text to be broadcasted is broadcasted.

Specifically, the parameters of the voice recognition algorithm are adjusted according to the sensitive words when the voice to be broadcasted corresponding to the text to be broadcasted is broadcasted, and three implementable modes are provided:

the method comprises the steps of synthesizing a text to be broadcasted into a voice to be broadcasted, extracting a time point of occurrence of a sensitive word according to the voice to be broadcasted, then broadcasting the voice to be broadcasted, and adjusting a recognition threshold value of a voice recognition command corresponding to the sensitive word when the time point of occurrence of the sensitive word is reached. The sensitive word can be prevented from being triggered by mistake in speech recognition.

And secondly, acquiring the voice synthesis parameters of the sensitive words from the sensitive word database, adjusting the voice synthesis parameters of the sensitive words when synthesizing the text to be broadcasted into the voice to be broadcasted, and then broadcasting the voice to be broadcasted. Thus, the probability of triggering speech misrecognition can be reduced.

Acquiring voice synthesis parameters of the sensitive words from the sensitive word database, adjusting the voice synthesis parameters of the sensitive words when synthesizing the text to be broadcasted into the voice to be broadcasted, and extracting the time points of the appearance of the sensitive words according to the voice to be broadcasted;

and playing the voice to be broadcasted, and adjusting the recognition threshold value of the voice recognition command corresponding to the sensitive word when the time point of the occurrence of the sensitive word is reached. In the third mode, the first mode and the second mode are executed simultaneously, so that the probability of triggering the voice misrecognition can be further reduced.

For the detailed description of the above three embodiments, reference may be made to the description in the embodiment shown in fig. 1, and details are not repeated here.

And S206, monitoring that voice recognition is triggered in the process of playing the voice to be broadcasted.

And S207, recording the playing words which trigger the voice recognition and the command words triggered by the playing words, such as recording 'large light on' and 'light on'.

And S208, determining that the playing words trigger voice recognition according to the playing words and the command words.

Wherein, S208 may specifically be: synthesizing the played words into voice;

inputting the voice synthesized by the played words into a voice recognition algorithm, and calculating matching scores of the played words and the command words through the voice recognition algorithm;

if the matching score is larger than a preset value, determining that the speech recognition is triggered by the played word;

and if the matching score is smaller than the preset value, determining that the speech recognition is not triggered by the played word.

S209, if the fact that the played words trigger voice recognition is determined, storing the played words and voice synthesis parameters of the played words into a sensitive word database.

And if the playing words exist in the data sensitive word database, adjusting the speech synthesis parameters corresponding to the playing words stored in the sensitive word database. For example, the volume is decreased, the speech rate is increased, the pitch is increased or decreased, etc., and the adjustment method may be many, for example, multiplying by a fixed scale factor each time, such as adjusting the volume to 95% of the original volume, increasing the speech rate to 105% of the original volume, etc., thereby decreasing the probability that the sensitive word is misrecognized by the speech.

Fig. 3 is a schematic structural diagram of an embodiment of a voice broadcast device provided in the present application, and as shown in fig. 3, the device of the present embodiment may include: an acquisition module 11, a first determination module 12 and a processing module 13, wherein,

the acquisition module 11 is used for acquiring a text to be broadcasted by voice;

the first determining module 12 is configured to determine whether a text to be voice broadcast contains a sensitive word according to the sensitive word database, where the sensitive word is a word having a pronunciation the same as or similar to that of a preset voice recognition command;

the processing module 13 is configured to determine, at the determining module, that the text to be subjected to voice broadcast includes a sensitive word, and adjust a parameter of a voice recognition algorithm according to the sensitive word when the voice to be broadcast corresponding to the text to be subjected to voice broadcast is played.

Further, the processing module 13 is configured to:

synthesizing a text to be broadcasted into a voice to be broadcasted, and extracting a time point of occurrence of a sensitive word according to the voice to be broadcasted; playing the voice to be broadcasted, and adjusting the recognition threshold value of the voice recognition command corresponding to the sensitive words when the time point of the occurrence of the sensitive words is reached;

alternatively, the first and second electrodes may be,

acquiring voice synthesis parameters of the sensitive words from the sensitive word database, and adjusting the voice synthesis parameters of the sensitive words when synthesizing the text to be broadcasted into the voice to be broadcasted; playing the voice to be broadcasted;

alternatively, the first and second electrodes may be,

acquiring voice synthesis parameters of the sensitive words from a sensitive word database, adjusting the voice synthesis parameters of the sensitive words when synthesizing a text to be broadcasted into voice to be broadcasted, and extracting time points of the occurrence of the sensitive words according to the voice to be broadcasted; and playing the voice to be broadcasted, and adjusting the recognition threshold value of the voice recognition command corresponding to the sensitive word when the time point of the occurrence of the sensitive word is reached.

The apparatus provided in the embodiment of the present application may implement the method embodiment, and specific implementation principles and technical effects thereof may be referred to the method embodiment, which is not described herein again.

Fig. 4 is a schematic structural diagram of an embodiment of a voice broadcast device provided in the present application, and as shown in fig. 4, the device of the present embodiment may further include, on the basis of the device shown in fig. 3: the second determining module 14 is configured to determine that an audio length of a text to be subjected to voice broadcast after being synthesized into voice is greater than a preset threshold before the first determining module 12 determines whether the text to be subjected to voice broadcast includes a sensitive word according to the sensitive word database.

Optionally, the processing module 13 is further configured to:

when the second determining module 14 determines that the audio length is smaller than the preset threshold, the voice recognition algorithm is turned off, the text to be broadcasted is synthesized into the voice to be broadcasted and then broadcasted, and the voice recognition algorithm is turned on after the voice to be broadcasted is broadcasted.

The apparatus provided in the embodiment of the present application may implement the method embodiment, and specific implementation principles and technical effects thereof may be referred to the method embodiment, which is not described herein again.

Fig. 5 is a schematic structural diagram of an embodiment of a voice broadcasting device provided in the present application, and as shown in fig. 5, the device of the present embodiment may further include, on the basis of the device shown in fig. 3: the device comprises a monitoring module 15 and a third determining module 16, wherein the monitoring module 15 is used for monitoring whether voice recognition is triggered or not in the process of playing the voice to be broadcasted;

the processing module 13 is further configured to: if the monitoring module 15 monitors that the voice recognition is triggered, recording a playing word which triggers the voice recognition and a command word which triggers the playing word;

the third determining module 16 is configured to determine whether the playing word triggers speech recognition according to the playing word and the command word;

the processing module 13 is further configured to: if the third determining module 16 determines that the played word triggers speech recognition, the played word and the speech synthesis parameter of the played word are stored in the sensitive word database.

Further, the third determining module 16 is configured to:

synthesizing the played words into voice;

inputting the voice synthesized by the played words into a voice recognition algorithm, and calculating matching scores of the played words and the command words through the voice recognition algorithm;

if the matching score is larger than a preset value, determining that the speech recognition is triggered by the played word;

and if the matching score is smaller than the preset value, determining that the speech recognition is not triggered by the played word.

Further, the processing module 13 is further configured to:

and if the playing words exist in the data sensitive word database, adjusting the speech synthesis parameters corresponding to the playing words stored in the sensitive word database.

The apparatus provided in the embodiment of the present application may implement the method embodiment, and specific implementation principles and technical effects thereof may be referred to the method embodiment, which is not described herein again.

Fig. 6 is a schematic diagram of a hardware structure of an electronic device provided in the present application. As shown in fig. 6, the electronic device 60 of the present embodiment, may include: a memory 61 and a processor 62;

a memory 61 for storing a computer program;

and a processor 62 for executing the computer program stored in the memory to implement the voice broadcasting method in the above embodiments. Reference may be made in particular to the description relating to the method embodiments described above.

Alternatively, the memory 61 may be separate or integrated with the processor 62.

When the memory 61 is a device separate from the processor 62, the electronic device 60 may further include:

a bus 63 for connecting the memory 61 and the processor 62.

Optionally, this embodiment further includes: a communication interface 64, the communication interface 64 being connectable to the processor 62 via a bus 63. Processor 62 may control communication interface 63 to implement the above-described receiving and transmitting functions of electronic device 60.

The electronic device provided by this embodiment can be used to execute the above method, and its implementation manner and technical effect are similar, and this embodiment is not described herein again.

The present application also provides a computer-readable storage medium on which a computer program is stored, the computer program, when executed by a processor, implementing the voice broadcasting method in the above embodiments.

In the several embodiments provided in the present application, it should be understood that the disclosed apparatus and method may be implemented in other ways. For example, the above-described embodiments of the apparatus are merely illustrative, and for example, the division of modules is only one logical division, and other divisions may be realized in practice, for example, a plurality of modules may be combined or integrated into another system, or some features may be omitted, or not executed. In addition, the shown or discussed mutual coupling or direct coupling or communication connection may be an indirect coupling or communication connection through some interfaces, devices or modules, and may be in an electrical, mechanical or other form.

Modules described as separate parts may or may not be physically separate, and parts displayed as modules may or may not be physical units, may be located in one place, or may be distributed on a plurality of network units. Some or all of the modules may be selected according to actual needs to achieve the purpose of the solution of the present embodiment.

In addition, functional modules in the embodiments of the present application may be integrated into one processing unit, or each module may exist alone physically, or two or more modules are integrated into one unit. The unit formed by the modules can be realized in a hardware form, and can also be realized in a form of hardware and a software functional unit.

The integrated module implemented in the form of a software functional module may be stored in a computer-readable storage medium. The software functional module is stored in a storage medium and includes several instructions to enable a computer device (which may be a personal computer, a server, or a network device) or a processor (processor) to execute some steps of the methods according to the embodiments of the present application.

It should be understood that the Processor may be a Central Processing Unit (CPU), other general purpose Processor, a Digital Signal Processor (DSP), an Application Specific Integrated Circuit (ASIC), etc. A general purpose processor may be a microprocessor or the processor may be any conventional processor or the like. The steps of a method disclosed in connection with the present invention may be embodied directly in a hardware processor, or in a combination of the hardware and software modules within the processor.

The memory may comprise a high-speed RAM memory, and may further comprise a non-volatile storage NVM, such as at least one disk memory, and may also be a usb disk, a removable hard disk, a read-only memory, a magnetic or optical disk, etc.

The bus may be an Industry Standard Architecture (ISA) bus, a Peripheral Component Interconnect (PCI) bus, an Extended ISA (EISA) bus, or the like. The bus may be divided into an address bus, a data bus, a control bus, etc. For ease of illustration, the buses in the figures of the present application are not limited to only one bus or one type of bus.

The computer-readable storage medium may be implemented by any type or combination of volatile or non-volatile memory devices, such as Static Random Access Memory (SRAM), electrically erasable programmable read-only memory (EEPROM), erasable programmable read-only memory (EPROM), programmable read-only memory (PROM), read-only memory (ROM), magnetic memory, flash memory, magnetic or optical disks. A storage media may be any available media that can be accessed by a general purpose or special purpose computer.

Those of ordinary skill in the art will understand that: all or a portion of the steps of implementing the above-described method embodiments may be performed by hardware associated with program instructions. The program may be stored in a computer-readable storage medium. When executed, the program performs steps comprising the method embodiments described above; and the aforementioned storage medium includes: various media that can store program codes, such as ROM, RAM, magnetic or optical disks.

Finally, it should be noted that: the above embodiments are only used for illustrating the technical solutions of the present application, and not for limiting the same; although the present application has been described in detail with reference to the foregoing embodiments, it should be understood by those of ordinary skill in the art that: the technical solutions described in the foregoing embodiments may still be modified, or some or all of the technical features may be equivalently replaced; and the modifications or the substitutions do not make the essence of the corresponding technical solutions depart from the scope of the technical solutions of the embodiments of the present application.

17页详细技术资料下载
上一篇:一种医用注射器针头装配设备
下一篇:一种语种识别方法

网友询问留言

已有0条留言

还没有人留言评论。精彩留言会获得点赞!

精彩留言,会给你点赞!