Voice answering method and device

文档序号：1546156 发布日期：2020-01-17 浏览：31次中文

阅读说明：本技术 一种语音答题方法及装置 (Voice answering method and device ) 是由李响夏龙马楠高强孙梦蕊王宏伟吴凡李鑫于 2019-10-25 设计创作，主要内容包括：本申请提供一种语音答题方法及装置。其中,上述方法包括S10、接收答题指令,并基于答题指令在目标语音题目库中抽取待回答题目,生成题目集；S20、依次播放题目集中的待回答题目；S30、持续采集用户语音数据；S40、依次识别并显示用户语音数据中的当前答案信息,在未获取到答案信息的情况下执行步骤S30,在获取到答案信息的情况下,执行步骤S50；S50、判断当前答案信息是否是待回答题目的正确答案,若是执行步骤S51,若否执行步骤S52；S51、生成回答正确提示,继续执行步骤S20；S52、生成回答错误提示,继续执行步骤S40。本申请的方法及装置操作简单,使用方便,适用范围广。(The application provides a voice answering method and a voice answering device. The method comprises the steps of S10, receiving an answer instruction, extracting questions to be answered from a target voice question library based on the answer instruction, and generating a question set; s20, sequentially playing questions to be answered in the question set; s30, continuously collecting user voice data; s40, sequentially identifying and displaying the current answer information in the user voice data, executing the step S30 under the condition that the answer information is not acquired, and executing the step S50 under the condition that the answer information is acquired; s50, judging whether the current answer information is the correct answer of the question to be answered, if so, executing a step S51, and if not, executing a step S52; s51, generating a prompt for correct answer, and continuing to execute the step S20; s52, generating an answer error prompt, and continuing to execute the step S40. The method and the device are simple to operate, convenient to use and wide in application range.)

1. A speech question answering method, comprising:

s10, receiving an answer instruction, extracting questions to be answered from a target voice question library based on the answer instruction, and generating a question set;

s20, sequentially playing the questions to be answered in the question set;

s30, continuously collecting user voice data;

s40, sequentially identifying and displaying the current answer information in the user voice data, executing the step S30 under the condition that the answer information is not acquired, and executing the step S50 under the condition that the answer information is acquired;

s50, judging whether the current answer information is the correct answer of the question to be answered, if so, executing a step S51, and if not, executing a step S52;

s51, generating a prompt for correct answer, and continuing to execute the step S20;

s52, generating an answer error prompt, and continuing to execute the step S40.

2. The speech answering method according to claim 1, wherein before the step S10, further comprising:

s01, acquiring original voice data and at least one text topic library carrying category information;

and S02, synthesizing a corresponding voice topic library carrying category information based on the original voice data and the text topic library.

3. The speech answering method according to claim 2, wherein the step S10 includes:

s11, receiving an answer instruction carrying category information and question number information;

s12, matching a voice question library which is the same as the type information of the answer instruction as a target voice question library based on the type information carried by the answer instruction;

s13, extracting the questions to be answered in a target number from the target voice question library based on the question number information carried by the question answering instruction, and generating a question set.

4. The speech answering method according to claim 1, wherein the step S40 includes:

s41, processing the user voice data to obtain at least one word unit;

s42, judging whether the current word unit is answer information;

in the case that the current word unit is answer information, displaying the current answer information, and continuing to execute the step S50;

in a case where the word unit is not answer information, continuing to perform step S43;

s43, judging whether the word unit is the last word unit;

if yes, the step S30 is continued;

if not, the step S42 is continued.

5. The speech answering method according to claim 4, further comprising, after the step S20:

s22, judging whether the answering time exceeds a second preset threshold value;

if yes, the step S20 is continued;

if not, the step S41 is continued.

6. The speech answering method according to claim 5, wherein before the step S22, further comprising:

s21, judging whether the answering time exceeds a first preset threshold value;

if yes, generating a countdown prompt, and continuing to execute the step S22;

if not, the step S22 is continued.

7. The speech answering method according to claim 4, wherein the step S52 includes:

an answer error prompt is generated and the execution of the step S43 is continued.

8. A speech answering device, comprising:

the receiving module is configured to receive an answer instruction, extract questions to be answered in a target voice question library based on the answer instruction and generate a question set;

the playing module is configured to sequentially play the questions to be answered in the question set;

a collection module configured to continuously collect user voice data;

the recognition module is configured to sequentially recognize and display current answer information in the user voice data, execute the acquisition module under the condition that the answer information is not acquired, and execute the judgment module under the condition that the answer information is acquired;

the judging module is configured to judge whether the current answer information is a correct answer of the question to be answered, if so, the correct module is executed, and if not, the wrong module is executed;

a correct module configured to generate a prompt for correct answer and continue to execute the play module;

an error module configured to generate an answer error prompt and continue execution of the identification module.

9. A computing device comprising a memory, a processor, and computer instructions stored on the memory and executable on the processor, wherein the processor implements the steps of the method of any one of claims 1-7 when executing the instructions.

10. A computer-readable storage medium storing computer instructions, which when executed by a processor, perform the steps of the method of any one of claims 1 to 7.

Technical Field

The present application relates to the field of computer technologies, and in particular, to a method and an apparatus for answering questions with voice, a computing device, and a computer-readable storage medium.

Background

With the rapid development of computer technology and the continuous deepening of the computer technology in the field of education informatization, various tool education products are more and more accepted and used by parents and students. The tool education product mainly provides technical support and help for students, parents and teachers in education and guidance.

The prior tool education products can provide a plurality of voice question reading functions, but do not realize the voice question answering function, and need manual input of users to answer questions, are not friendly to users before school ages, firstly, children before school ages need to know and can write numbers and symbols by hand to exercise, and secondly, watching the mobile phone for a long time is not beneficial to the development of the eyesight of the eyes of the children.

Disclosure of Invention

In view of the above, embodiments of the present application provide a method and an apparatus for answering questions with voice, a computing device, and a computer-readable storage medium, so as to solve technical defects in the prior art.

The embodiment of the application discloses a voice answering method, which comprises the following steps:

s10, receiving an answer instruction, extracting questions to be answered from a target voice question library based on the answer instruction, and generating a question set;

s20, sequentially playing the questions to be answered in the question set;

s30, continuously collecting user voice data;

s50, judging whether the current answer information is the correct answer of the question to be answered, if so, executing a step S51, and if not, executing a step S52;

s51, generating a prompt for correct answer, and continuing to execute the step S20;

s52, generating an answer error prompt, and continuing to execute the step S40.

Further, before the step S10, the method further includes:

s01, acquiring original voice data and at least one text topic library carrying category information;

and S02, synthesizing a corresponding voice topic library carrying category information based on the original voice data and the text topic library.

Further, the step S10 includes:

s11, receiving an answer instruction carrying category information and question number information;

Further, the step S40 includes:

s41, processing the user voice data to obtain at least one word unit;

s42, judging whether the current word unit is answer information;

in the case that the current word unit is answer information, displaying the current answer information, and continuing to execute the step S50;

in a case where the word unit is not answer information, continuing to perform step S43;

s43, judging whether the word unit is the last word unit;

if yes, the step S30 is continued;

if not, the step S42 is continued.

Further, after the step S20, the method further includes:

s22, judging whether the answering time exceeds a second preset threshold value;

if yes, the step S20 is continued;

if not, the step S41 is continued.

Further, before the step S22, the method further includes:

s21, judging whether the answering time exceeds a first preset threshold value;

if yes, generating a countdown prompt, and continuing to execute the step S22;

if not, the step S22 is continued.

Further, the step S52 includes:

an answer error prompt is generated and the execution of the step S43 is continued.

The application also discloses pronunciation answer device includes:

the receiving module is configured to receive an answer instruction, extract questions to be answered in a target voice question library based on the answer instruction and generate a question set;

the playing module is configured to sequentially play the questions to be answered in the question set;

a collection module configured to continuously collect user voice data;

a correct module configured to generate a prompt for correct answer and continue to execute the play module;

an error module configured to generate an answer error prompt and continue execution of the identification module.

The application also discloses a computing device, which comprises a memory, a processor and computer instructions stored on the memory and capable of running on the processor, wherein the processor executes the instructions to realize the steps of the voice answering method.

A computer readable storage medium storing computer instructions which, when executed by a processor, implement the steps of the voice question answering method.

The voice question answering method and the voice question answering device achieve the purposes of voice question reading and voice question answering by playing questions through voice, collecting voice data of users, identifying answer information in the voice data of the users and judging whether the answer information is correct or not, can effectively reduce visual impairment caused by the fact that the users watch mobile phones for a long time, solve the problem that some special people such as preschool children are inconvenient to manually input answers, and are simple to operate, convenient to use and wide in application range.

Drawings

FIG. 1 is a schematic block diagram of a computing device according to an embodiment of the present application;

fig. 2 is a schematic flowchart illustrating steps of a voice answering method according to an embodiment of the present application;

fig. 3 is a schematic flowchart illustrating steps of a voice answering method according to an embodiment of the present application;

fig. 4 is a schematic structural diagram of a voice answering device according to an embodiment of the present application.

Detailed Description

In the following description, numerous specific details are set forth in order to provide a thorough understanding of the present application. This application is capable of implementation in many different ways than those herein set forth and of similar import by those skilled in the art without departing from the spirit of this application and is therefore not limited to the specific implementations disclosed below.

The terminology used in the description of the one or more embodiments is for the purpose of describing the particular embodiments only and is not intended to be limiting of the description of the one or more embodiments. As used in one or more embodiments of the present specification and the appended claims, the singular forms "a," "an," and "the" are intended to include the plural forms as well, unless the context clearly indicates otherwise. It should also be understood that the term "and/or" as used in one or more embodiments of the present specification refers to and encompasses any and all possible combinations of one or more of the associated listed items.

It will be understood that, although the terms first, second, etc. may be used herein in one or more embodiments to describe various information, these information should not be limited by these terms. These terms are only used to distinguish one type of information from another. For example, a first can also be referred to as a second and, similarly, a second can also be referred to as a first without departing from the scope of one or more embodiments of the present description. The word "if" as used herein may be interpreted as "at … …" or "when … …" or "in response to a determination", depending on the context.

First, the noun terms to which one or more embodiments of the present invention relate are explained.

Speech synthesis (Text To Speech, TTS): TTS technology for short relates to a plurality of subject technologies such as acoustics, linguistics, digital signal processing technology, multimedia technology and the like, and is a leading-edge technology in the field of Chinese information processing. The speech synthesis is a process of converting a text into a speech and outputting the speech, and the process mainly comprises the steps of decomposing the input text into phonemes according to characters or words, analyzing symbols to be specially processed such as numbers, currency units, word deformation and punctuation in the text, generating digital audio by the phonemes, playing the digital audio by a loudspeaker or playing the digital audio by multimedia software after storing the digital audio as a sound file.

Voice denoising: that is, the speech enhancement technology is a technology for extracting a useful speech signal from a noise background when a speech signal is interfered or even submerged by noise, and suppressing and reducing the noise interference.

The voice recognition technology comprises the following steps: also known as Automatic Speech Recognition (ASR), the goal is to convert the vocabulary content in human Speech into computer-readable input, such as keystrokes, binary codes, or character sequences. Unlike speaker recognition and speaker verification, the latter attempts to recognize or verify the speaker who uttered the speech rather than the vocabulary content contained therein.

In the present application, a method and an apparatus for answering questions by voice, a computing device and a computer scale storage medium are provided, which are described in detail in the following embodiments one by one.

Fig. 1 is a block diagram illustrating a configuration of a computing device 100 according to an embodiment of the present specification. The components of the computing device 100 include, but are not limited to, memory 110 and processor 120. The processor 120 is coupled to the memory 110 via a bus 130 and a database 150 is used to store data.

Computing device 100 also includes access device 140, access device 140 enabling computing device 100 to communicate via one or more networks 160. Examples of such networks include the Public Switched Telephone Network (PSTN), a Local Area Network (LAN), a Wide Area Network (WAN), a Personal Area Network (PAN), or a combination of communication networks such as the internet. Access device 140 may include one or more of any type of network interface (e.g., a Network Interface Card (NIC)) whether wired or wireless, such as an IEEE802.11 Wireless Local Area Network (WLAN) wireless interface, a worldwide interoperability for microwave access (Wi-MAX) interface, an ethernet interface, a Universal Serial Bus (USB) interface, a cellular network interface, a bluetooth interface, a Near Field Communication (NFC) interface, and so forth.

In one embodiment of the present description, the above-described components of computing device 100 and other components not shown in FIG. 1 may also be connected to each other, such as by a bus. It should be understood that the block diagram of the computing device architecture shown in FIG. 1 is for purposes of example only and is not limiting as to the scope of the description. Those skilled in the art may add or replace other components as desired.

Computing device 100 may be any type of stationary or mobile computing device, including a mobile computer or mobile computing device (e.g., tablet, personal digital assistant, laptop, notebook, netbook, etc.), a mobile phone (e.g., smartphone), a wearable computing device (e.g., smartwatch, smartglasses, etc.), or other type of mobile device, or a stationary computing device such as a desktop computer or PC. Computing device 100 may also be a mobile or stationary server.

Wherein the processor 120 may perform the steps of the method shown in fig. 2. Fig. 2 is a schematic flowchart illustrating a voice question answering method according to an embodiment of the present application, including step S201 to step S207.

S201, receiving an answer instruction, extracting questions to be answered in a target voice question library based on the answer instruction, and generating a question set.

Specifically, the answer instruction is a computer instruction, and may be various types of instructions such as "answer start", "READY GO", and the like, which is not limited in the present application. The voice question bank is a database which stores a large number of voice questions, and the target voice question bank is a database which stores voice questions of types required by users.

The answer instruction carries subject category information and subject quantity information, the subject category information may be subject category information such as "mathematics" and "English", or difficulty category information such as "within ten plus minus" and "within hundred plus minus" or the like, or grade category information such as "grade one of primary school", "grade two of primary school", or any combination thereof, which may be determined according to specific situations, and the present application is not limited thereto. The question quantity information can be selected by the user, such as '5-way questions' and '10-way questions', the corresponding voice question library can be selected as a target voice question library according to the question type matching carried in the answer instruction, and corresponding quantity of questions can be randomly extracted from the target voice question library according to the question quantity information to form a question set.

The questions to be answered are extracted from the target voice question library based on the answering instructions, the purposes of flexibly selecting different types of question libraries and flexibly selecting the number of disposable answers according to the answering instructions of the users can be achieved, the answering freedom degree is high, and the requirements of different users are met.

S202, sequentially playing the questions to be answered in the question set.

Specifically, each question to be answered is sequentially played according to the sequence of the questions to be answered in the question set, a certain answering time is reserved between every two adjacent questions to be answered, the length of the answering time can be different according to the type of the question or the difficulty of the question, for example, the answering time of a primary school grade-one mathematical question can be 10 seconds, the answering time of a primary school grade-two mathematical question can be 20 seconds, and the like, and the method is not limited in this application.

The questions to be answered are played in a voice mode, so that the user can be effectively prevented from watching the mobile phone for a long time, eyes are liberated, and eyesight is protected.

And S203, continuously collecting the voice data of the user.

Specifically, after a voice of a question to be answered is played, the voice data of the user starts to be collected. It should be noted that, in this step, the collected user voice data is continuously collected in real time after the question to be answered is played.

During this process, the voice capture component of computing device 100 remains operational to enable continuous capture of the user's voice data. The voice capturing component may be disposed on the computing device 100, for example, a microphone disposed on the computing device 100, or may be disposed separately from the computing device 100 and connected to the computing device in a wired or wireless manner, for example, a microphone.

The purpose of voice answering is achieved by collecting the voice data of the user in real time, the hands of the user can be liberated, and particularly for the preschool user, the problem that the user of special crowds such as the preschool user cannot manually input answers is solved through voice answering.

And S204, sequentially identifying and displaying the current answer information in the user voice data.

And S205, judging whether answer information is acquired.

If not, the step S203 is executed.

If yes, go to step S206.

The answer information is language information of a type corresponding to the question to be answered, for example, the answer information is a number when the question to be answered is a calculation question, the answer information is an English when the question to be answered is an English question, and the rest can be repeated by analogy in other cases.

Specifically, in the recognition process, the collected user voice data can be processed by utilizing a voice noise reduction technology so as to eliminate noise and more accurately recognize answer information in the user voice.

After the user voice data is collected, sequentially identifying the contents in the user voice data, detecting whether answer information is obtained or not, continuously collecting the user voice data under the condition that the answer information is not obtained in answer time, and sequentially displaying and judging whether each answer information in the user voice data is correct or not under the condition that the answer information is obtained until the answer information is a correct answer or the answer time is used up.

For example, it is assumed that during the answering of computational questions, the content of the collected user speech data includes: the answer is 3, if not, 4, the content of the user voice data is identified word by word, the answer is not answer information, the answer is abandoned, the case is not answer information, the answer is abandoned, and the like, if the answer is 3, the answer information is displayed, 3 is judged whether the answer is correct or not, if the answer is 3, the identification is stopped, and if the answer is 3, the identification is continued.

By identifying the answer information in the user voice data, the content irrelevant to the question to be answered in the user voice data can be filtered and removed, accuracy of answer identification and judgment is improved, the answer information is displayed regardless of correctness of the answer information, better feedback can be given to the user, and user experience is improved.

S206, judging whether the current answer information is a correct answer to the question to be answered, if so, executing a step S207, and if not, executing a step S208.

And S207, generating a prompt for correct answer, and continuing to execute the step S202.

And S208, generating an answer error prompt, and continuing to execute the step S204.

Specifically, under the condition that answer information in user voice data is detected, whether the answer information is a correct answer is judged, if yes, a correct answer prompt is generated, the next question is skipped to, the question to be answered is continuously played, and if not, an incorrect answer prompt is generated, and the answer information in the user voice data is continuously identified.

The prompt for correct answer and the prompt for incorrect answer may be single prompt modes such as voice prompt, text prompt, vibration prompt, and the like, or may be any combination of the above prompt modes, which is not limited in this application.

The present embodiment will be further described with reference to specific examples.

For example, if the received answer instruction carries question category information "within ten plus-minus method" and the question number information is "2-way question", two questions to be answered are randomly extracted from the "within ten plus-minus method" speech question library to form a question set.

Play the 1 st question to be answered "plus one equals several? ", collect user voice data and time.

Assuming that the collected user voice data content comprises '3' and '2', sequentially identifying and displaying answer information in the user voice data, wherein the '3' is answer information, displaying the user answer as '3' in an answering page, judging the answer as a wrong answer, generating an answer wrong prompt, continuously identifying the answer information in the user voice data, obtaining the '2' as answer information, displaying the user answer as '2' in the answering page, judging the answer as a correct answer, generating an answer correct prompt, and jumping to the 2 nd question.

Play the 2 nd question to be answered "two minus one equals several? ", collect user voice data and time.

Assuming that the content of the collected user voice data comprises '1', sequentially identifying and displaying answer information in the user voice data, wherein '1' is answer information, displaying the current answer information of the user as '1' in an answering page, judging that the answer '1' is a correct answer, and generating an answer correct prompt. And finishing the whole answering process after all the questions in the question set are played, and counting and displaying the answering results.

According to the voice question answering method provided by the embodiment, the purposes of voice question reading and voice question answering are achieved by voice question playing, user voice data acquisition, recognition of answer information in the user voice data and judgment of whether the answer information is correct, visual impairment caused by long-time mobile phone watching of a user can be effectively reduced, the problem that some special people such as preschool children are inconvenient to manually input answers is solved, and the voice question answering method is simple to operate, convenient to use and wide in application range.

As shown in fig. 3, fig. 3 shows a schematic flowchart of a voice question answering method according to an embodiment of the present application, including step S301 to step S315.

S301, acquiring original voice data and at least one text topic library carrying category information.

The original voice data is a pre-recorded corpus, which may be a female voice corpus, a male voice corpus or a child voice corpus, or may be various types of voice corpora such as cartoon type voices, so as to improve the attractiveness of users at different ages and increase the audience range, which is not limited in the present application.

The text topic library is a database storing a large number of text topics, and the category information of the text topic library may be subject category information such as "mathematics" and "English", may also be difficulty category information such as "within ten addition and subtraction" and "within hundred addition and subtraction", may also be grade category information such as "grade one of primary school", and "grade two of primary school", or any combination thereof, which may be determined according to specific situations, and the present application is not limited thereto. The text topic database can be updated regularly to enrich the types of the topics and ensure the novelty of the topics.

S302, synthesizing a corresponding voice topic library carrying category information based on the original voice data and the text topic library.

Specifically, the original speech data and the text topic can be synthesized into a speech topic by a TTS speech synthesis technique. The voice questions are synthesized by using the voice synthesis technology, and voices in different styles can be flexibly selected for the audiences with different difficulties and different types of questions and at different age stages, so that the interestingness of voice answering and the attraction to target audiences are improved.

It should be noted that, both the step S301 and the step S302 are preparation works before starting the voice question answering, and the text question library and the voice question library can be updated periodically without repeating the above two steps before answering each question.

S303, receiving an answer instruction carrying the category information and the number of questions.

The answer instruction carries question type information and question quantity information, the question type information is the same as the type information of the text question library and the voice question library, and can be subject type information, difficulty type information, grade type information or any combination of the above, which can be determined according to specific situations, and the application is not limited to this. The question quantity information can be selected by a user, such as '5-channel questions' and '10-channel questions', the question type information carried in the answer instruction is used for matching and selecting a corresponding voice question library as a target voice question library, and the question quantity information is used for randomly extracting a corresponding quantity of questions from the target voice question library to form a question set.

And S304, matching a voice question library which is the same as the type information of the answer instruction as a target voice question library based on the type information carried by the answer instruction.

Specifically, for example, the category information is taken as the combination of the grade category information and the subject category information, and assuming that the category information carried in the answer instruction is "primary school grade mathematics", the matching category information in all the voice question databases is also taken as the "primary school grade mathematics" voice question database as the target voice question database.

S305, extracting a target number of questions to be answered from the target voice question library based on the question number information carried by the question answering instruction, and generating a question set.

Specifically, taking the question number information as "20 questions" as an example, assuming that one thousand questions are shared in the target phonetic question library, 20 questions are randomly extracted from the one thousand questions in the target phonetic question library to generate a question set.

S306, sequentially playing the questions to be answered in the question set.

Specifically, each question to be answered is sequentially played according to the sequence of the questions to be answered in the question set, a certain answering time is reserved between every two adjacent questions to be answered, and the length of the answering time can be different according to the type of the question or the difficulty of the question. The questions to be answered are played in a voice mode, so that the user can be effectively prevented from watching the mobile phone for a long time, eyes are liberated, and eyesight is protected.

And S307, continuously collecting the voice data of the user.

The voice answer system has the advantages that the voice answer aim is achieved by collecting the voice data of the user in real time, the hands of the user can be liberated, and particularly for preschool users, the problem that the preschool users cannot manually input answers is solved through voice answer.

And S308, judging whether the answering time exceeds a first preset threshold value.

If yes, go to step S3081, and then continue to step S309.

If not, the process continues to step S309.

S3081, generating a countdown prompt.

The first preset threshold of the answering time is slightly smaller than the preset maximum answering time, and the difference value between the first preset threshold of the answering time and the maximum answering time can be 5 seconds, 10 seconds, 15 seconds and the like, which can be determined according to specific situations. Taking the preset maximum answer time as 30 seconds as an example, and the first preset threshold of the answer time is 25 seconds, if the answer time exceeds 25 seconds, a countdown prompt is generated. The countdown prompt is set, so that a good time reminding effect can be achieved, and a user is reminded that answer time is about to end.

It should be noted that the execution processes of step S307 and step S308 may overlap. Specifically, timing is started when voice data collection is started, whether answer time exceeds a first preset threshold value or not can be judged in real time in the voice data collection process, and a prompt is sent out in time.

S309, judging whether the answering time exceeds a second preset threshold value.

If yes, the step S306 is continued.

If not, the step S310 is continuously executed.

The second preset threshold of the answer time is the preset maximum answer time of each question, and the specific numerical value of the second preset threshold can be flexibly set according to the type and difficulty of the questions to be answered in the question set, for example, if the questions to be answered are primary school grade mathematics questions, the second preset threshold can be 10 seconds, if the questions to be answered are primary school grade mathematics questions, the second preset threshold can be 20 seconds, the specific situation can be determined, and the application is not limited to this.

Specifically, timing is started after the voice of the question to be answered is played, the answering time of the user is counted, if the answering time exceeds a second preset threshold value, the user still does not answer or makes an error, the question is skipped, and the next voice is played continuously.

The second preset threshold value, namely the setting of the longest answering time, can effectively avoid that a user does not answer all the time, delay the situation of too long time on one question and assist to advance the progress of answering.

S310, processing the user voice data to obtain at least one word unit.

Specifically, the processing of the user voice data includes converting the user voice data into a text, and performing sentence segmentation and word segmentation on the text to obtain at least one word unit. Taking the example that the user voice data comprises the answer of 1, after the user voice data is processed, four word units of answer, case, yes and 1 are obtained.

S311, judging whether the current word unit is answer information.

If yes, go on to step S313.

If not, the process continues to step S312.

Taking the question to be answered as an example of the calculation question, assuming that the word units comprise 'answer', 'case', 'yes' and '1', sequentially identifying and judging whether each word unit is answer information, namely that the word units 'answer', 'case' and 'yes' are not answer information, and the word unit '1' is answer information.

And S312, judging whether the word unit is the last word unit.

If yes, the step S307 is continued.

If not, the step S311 is continuously executed.

Specifically, whether the current word unit is the last word unit or not is judged, that is, whether the user voice data collected in the longest answering time is completely recognized or not is judged, and under the condition that recognition is completed, the user voice data is collected continuously, and under the condition that recognition is not completed, recognition analysis is continued.

Judging whether the word unit is the last word unit can ensure the comprehensiveness and integrity of the recognition and analysis of the user voice data and avoid the omission of key information in the recognition process.

S313, judging whether the current answer information is the correct answer of the question to be answered, if so, executing a step S314, and if not, executing a step S315.

And S314, generating a prompt for correct answer, and continuing to execute the step S306.

And S315, generating an answer error prompt, and continuing to execute the step S312.

Specifically, if the current answer information is a correct answer, generating a correct answer prompt, skipping to the next question, continuing to play the question to be answered, if the current answer information is an incorrect answer, generating an incorrect answer prompt, continuing to judge whether the current answer information, namely a word unit, is the last word unit in the user voice data, if so, skipping to the next question, and if not, continuing to recognize and analyze whether the next answer information is a correct answer.

In practical application, after all questions in the question set are answered, the answer conditions of the questions can be counted, a question answering feedback table is generated, and the answer conditions of the questions are fed back to the user.

The present embodiment will be further described with reference to specific examples.

For example, it is assumed that the received answer instruction carries category information "ten plus or minus methods" and title quantity information "5 questions". Matching a 'more than ten plus-minus method' voice question library in a plurality of pre-generated voice question libraries based on the category information, randomly extracting three questions to form a question set, presetting the longest answer time of each question to be 20 seconds, and sending a countdown prompt after starting timing for 15 seconds, wherein the first preset threshold of the answer time is 15 seconds, and the second preset threshold is 20 seconds.

Begin playing the 1 st question, "five equal to a few? After playing is finished, collecting and timing user voice data, sending a countdown prompt after the user voice data is not collected 15 seconds after timing is started, and skipping a first question after countdown is finished and the longest answer time is 20 seconds and the user voice data is not collected.

Begin playing the 2 nd question, "nine minus three equals a few? After playing is finished, collecting user voice data and timing, wherein the collected user voice data in answering time comprises 5, 6 and 8, processing the user voice data to obtain three word units of 5, 6 and 8, and sequentially identifying the three word units. The word unit '5' is answer information but not a correct answer, the word unit '5' is displayed and an answer wrong prompt is generated, the word unit '5' is not the last word unit, the next word unit is continuously identified, the word unit '6' is answer information and is a correct answer, the answer is correct, the word unit '6' is displayed and an answer correct prompt is generated, identification is stopped, and the next question is jumped to.

Begin playing title 3, "two plus three equals a few? After playing is finished, collecting user voice data and timing, collecting voice data of a user in answering time, wherein the voice data of the user comprises 'should be 5 bars', processing the user voice data to obtain 'should', '5' and 'bar' five word units, sequentially identifying the five word units, identifying the word unit 'should' is not answer information and is not the last word unit, continuously identifying the next word unit, identifying the word unit 'should' is not answer information and is not the last word unit, identifying the next word unit if the word unit 'is not answer information and is not the last word unit, continuously identifying the next word unit if the word unit' is not answer information and is not the last word unit, identifying the word unit '5' is answer information and is correct, stopping identifying, displaying the word unit '5', generating a correct answer prompt, and jumping to the next question.

Begin playing the 4 th question, "seven plus one equals a few? After playing is finished, collecting user voice data and timing, collecting the voice data of a user in answer time, wherein the voice data of the user comprises unknown word units, processing the user voice data to obtain three word units of ' not ', ' knowing ' and ' saying ', identifying the three word units in sequence, identifying the word unit ' not being answer information and not being the last word unit, continuously identifying the next word unit, identifying the word unit ' knowing ' not being answer information and not being the last word unit, continuously identifying the next word unit, identifying the word unit ' saying ' not being answer information and being the last word unit, identifying no answer information in the user voice data in the answer time, judging an answer error, and jumping to the next question.

Begin playing the 5 th question, "eight minus three equals a few? After playing is finished, collecting voice data of a user and timing, collecting voice data of the user including '5' in answering time, processing the voice data of the user to obtain a word unit '5', identifying the word unit, displaying the word unit as answer information and correct answer, and generating a correct answer prompt. The question is the last question in the question set, the voice answering is completed, and an answering condition feedback table is generated to skip one question, correct three questions and wrong one question.

According to the voice question answering method provided by the embodiment, the purposes of voice question reading and voice question answering are achieved by playing questions through voice, collecting user voice data, identifying answer information in the user voice data and judging whether the answer information is correct or not, the accuracy of identification and judgment results can be effectively guaranteed by sequentially identifying and judging each word unit in the user voice data, and under the condition that correct answers are identified, the subsequent word units do not perform identification and judgment steps any more, so that the calculation amount can be reduced.

The voice answer method provided by the embodiment can effectively reduce the visual impairment caused by the fact that a user watches a mobile phone for a long time, solves the problem that some special people such as preschool children are inconvenient to manually input answers, and is simple to operate, convenient to use and wide in application range.

A speech answering device comprising:

the receiving module 401 is configured to receive an answer instruction, extract a question to be answered in a target speech question library based on the answer instruction, and generate a question set;

a playing module 402 configured to sequentially play the questions to be answered in the question set;

an acquisition module 403 configured to continuously acquire user voice data;

an identifying module 404 configured to sequentially identify and display current answer information in the user voice data, execute the acquiring module 403 when the answer information is not acquired, and execute the judging module 405 when the answer information is acquired;

a determining module 405 configured to determine whether the current answer information is a correct answer to the question to be answered, if so, execute a correct module 406, and if not, execute an error module 407;

a correct module 406 configured to generate a correct answer prompt and continue to execute the play module 402;

an error module 407 configured to generate an answer error prompt and to continue executing the recognition module 404.

Optionally, the voice answering device further includes:

the system comprises an acquisition module, a classification module and a classification module, wherein the acquisition module is configured to acquire original voice data and at least one text topic library carrying category information;

and the synthesis module is configured to synthesize a corresponding voice topic library carrying category information based on the original voice data and the text topic library.

Optionally, the receiving module 401 is further configured to:

receiving an answer instruction carrying category information and question quantity information;

matching a voice question library which is the same as the type information of the answer instruction as a target voice question library based on the type information carried by the answer instruction;

and extracting the questions to be answered in a target quantity from the target voice question library based on the question quantity information carried by the question answering instruction, and generating a question set.

Optionally, the identifying module 404 is further configured to:

the processing module is configured to process the user voice data to obtain at least one word unit;

the answer information judging module is configured to judge whether the current word unit is answer information or not;

and under the condition that the current word unit is answer information, displaying the current answer information, and continuously judging whether the current answer information is the correct answer of the question to be answered.

Under the condition that the word unit is not answer information, continuously judging whether the word unit is the last word unit or not;

if yes, the acquisition module 403 continues to be executed;

if not, the answer information judgment module is continuously executed.

Optionally, the voice answering device further includes:

the second time judging module is configured to judge whether the answering time exceeds a second preset threshold value;

if yes, continue to execute the playing module 402;

if not, the processing module is continuously executed.

Optionally, the voice answering device further includes:

the first time judgment module is configured to judge whether the answer time exceeds a first preset threshold value;

if yes, generating a countdown prompt, and continuously executing a second time judgment module;

if not, the second time judgment module is continuously executed.

The application provides a pronunciation answer device through pronunciation broadcast topic, gather user voice data, discern the answer information among the user voice data and judge whether answer information is correct, reach the purpose of pronunciation reading, pronunciation answer, can effectively reduce the user and watch the visual impairment that the cell-phone brought for a long time, solved like the inconvenient manual input answer's of part special crowds such as preschool children problem, easy operation, convenient to use, application scope is wide.

An embodiment of the present application further provides a computing device, including a memory, a processor, and computer instructions stored on the memory and executable on the processor, where the processor executes the instructions to implement the following steps:

s10, receiving an answer instruction, extracting questions to be answered in the target voice question library based on the answer instruction, and generating a question set.

And S20, sequentially playing the questions to be answered in the question set.

And S30, continuously collecting the voice data of the user.

And S40, sequentially identifying and displaying the current answer information in the user voice data, executing the step S30 when the answer information is not acquired, and executing the step S50 when the answer information is acquired.

S50, judging whether the current answer information is the correct answer of the question to be answered, if so, executing a step S51, and if not, executing a step S52.

And S51, generating a prompt for correct answer, and continuing to execute the step S20.

S52, generating an answer error prompt, and continuing to execute the step S40.

An embodiment of the present application further provides a computer-readable storage medium, which stores computer instructions, and the instructions, when executed by a processor, implement the steps of the voice question answering method.

The above is an illustrative scheme of a computer-readable storage medium of the present embodiment. It should be noted that the technical solution of the storage medium belongs to the same concept as the technical solution of the voice answer method, and details that are not described in detail in the technical solution of the storage medium can be referred to the description of the technical solution of the voice answer method.

The computer instructions comprise computer program code which may be in the form of source code, object code, an executable file or some intermediate form, or the like. The computer-readable medium may include: any entity or device capable of carrying the computer program code, recording medium, usb disk, removable hard disk, magnetic disk, optical disk, computer Memory, Read-Only Memory (ROM), Random Access Memory (RAM), electrical carrier wave signals, telecommunications signals, software distribution medium, and the like. It should be noted that the computer readable medium may contain content that is subject to appropriate increase or decrease as required by legislation and patent practice in jurisdictions, for example, in some jurisdictions, computer readable media does not include electrical carrier signals and telecommunications signals as is required by legislation and patent practice.

It should be noted that, for the sake of simplicity, the above-mentioned method embodiments are described as a series of acts or combinations, but those skilled in the art should understand that the present application is not limited by the described order of acts, as some steps may be performed in other orders or simultaneously according to the present application. Further, those skilled in the art should also appreciate that the embodiments described in the specification are preferred embodiments and that the acts and modules referred to are not necessarily required in this application.

In the above embodiments, the descriptions of the respective embodiments have respective emphasis, and for parts that are not described in detail in a certain embodiment, reference may be made to related descriptions of other embodiments.

The preferred embodiments of the present application disclosed above are intended only to aid in the explanation of the application. Alternative embodiments are not exhaustive and do not limit the invention to the precise embodiments described. Obviously, many modifications and variations are possible in light of the above teaching. The embodiments were chosen and described in order to best explain the principles of the application and the practical application, to thereby enable others skilled in the art to best understand and utilize the application. The application is limited only by the claims and their full scope and equivalents.

19页详细技术资料下载

上一篇：一种医用注射器针头装配设备

下一篇：基于虚拟芯片和无线管理的数字电路实训平台及应用方法

Voice answering method and device

相关技术

网友询问留言