Voice test method, computer equipment and readable storage medium

文档序号：116970 发布日期：2021-10-19 浏览：38次中文

阅读说明：本技术 一种语音测试方法、计算机设备及可读存储介质 (Voice test method, computer equipment and readable storage medium ) 是由孙振芳黄世富白俊杰于 2020-04-10 设计创作，主要内容包括：本申请提供了一种语音测试方法、计算机设备及可读存储介质,该方法包括接收音频测试文件,音频测试文件包括主叫终端发出的唤醒语音及被叫终端响应唤醒语音发出的回复语音；解析音频测试文件以生成测试音频波形曲线；根据测试音频波形曲线计算唤醒语音的结束时间及回复语音的开始时间的差值,以得到被叫终端的响应时间。通过上述方法,本申请能够减少测试误差,降低人工成本,提高测试结果的一致性及测试效率。(The application provides a voice test method, computer equipment and readable storage medium, the method includes receiving an audio test file, the audio test file includes a wake-up voice sent by a calling terminal and a reply voice sent by a called terminal responding to the wake-up voice; analyzing the audio test file to generate a test audio waveform curve; and calculating the difference value of the ending time of the awakening voice and the starting time of the replying voice according to the test audio waveform curve so as to obtain the response time of the called terminal. By the method, testing errors can be reduced, labor cost is reduced, and consistency of testing results and testing efficiency are improved.)

1. A method for voice testing, the method comprising:

receiving an audio test file, wherein the audio test file comprises a wake-up voice sent by a calling terminal and a reply voice sent by a called terminal responding to the wake-up voice;

analyzing the audio test file to generate a test audio waveform curve;

and calculating the difference value between the ending time of the awakening voice and the starting time of the reply voice according to the test audio waveform curve so as to obtain the response time of the called terminal.

2. The method according to claim 1, wherein the step of calculating the difference between the ending time of the wake-up voice and the starting time of the reply voice according to the test audio waveform curve to obtain the response time of the called terminal further comprises:

and acquiring a plurality of response times obtained by multiple times of calculation, and calculating the average value of the plurality of response times.

3. The method according to claim 1, wherein the step of calculating a difference between the ending time of the wake-up voice and the starting time of the reply voice according to the test audio waveform curve to obtain the response time of the called terminal comprises:

obtaining the amplitude of the test audio waveform curve;

sequentially selecting a first moment and a second moment which are adjacent and have the absolute value of the amplitude equal to a threshold amplitude in the time axis direction of the test audio waveform curve, wherein the absolute value of the amplitude at the moment before the first moment and the absolute value of the amplitude at the moment after the second moment are greater than the threshold amplitude;

judging whether the difference value between the second moment and the first moment is greater than or equal to a threshold time value or not;

and if so, taking the difference value as the response time of the called terminal.

4. The method of claim 3, wherein the threshold amplitude is set by a method comprising:

collecting environmental sounds to generate an environmental audio waveform curve;

obtaining the amplitude of the environment audio waveform curve, and calculating the average value of the absolute values of the amplitude as a background noise value;

and adding a preset judgment amplitude on the basis of the background noise value, and taking the preset judgment amplitude as the threshold amplitude.

5. The method according to claim 1, wherein the step of calculating a difference between the ending time of the wake-up voice and the starting time of the reply voice according to the test audio waveform curve to obtain the response time of the called terminal comprises:

selecting a first waveform area and a second waveform area which respectively correspond to the awakening voice and the reply voice from the waveforms of the test audio waveform curve;

determining the time corresponding to the right boundary of the first waveform zone as the end time and the time corresponding to the left boundary of the second waveform zone as the start time in the time axis direction of the test audio waveform curve;

and calculating the difference value between the starting time and the ending time to obtain the response time of the called terminal.

6. The method of claim 5, wherein the step of selecting a first waveform region and a second waveform region corresponding to the wake-up voice and the reply voice, respectively, in the waveform of the test audio waveform profile comprises:

acquiring a first preset audio waveform curve of the awakening voice and a second preset audio waveform curve of the replying voice;

and selecting two sections of waveform areas which are respectively matched with the first preset audio waveform curve and the second preset audio waveform curve from the waveforms of the test audio waveform curve as the first waveform area and the second waveform area.

7. The method of claim 1, wherein the step of receiving an audio test file is preceded by:

and sending a wake-up instruction to the calling terminal so that the calling terminal sends the wake-up voice.

8. The method of claim 1, wherein the step of receiving an audio test file is preceded by:

sending a collecting instruction to a sound pick-up so that the sound pick-up collects the awakening voice and the reply voice;

the step of receiving an audio test file comprises:

and receiving the audio test file sent by the sound pickup.

9. A computer device comprising a processor and a memory, the memory storing computer instructions, the processor coupled to the memory, the processor, when operating, executing the computer instructions to implement the method of any of claims 1-8.

10. A computer-readable storage medium, on which a computer program is stored, the computer program being executable by a processor for implementing the method according to any one of claims 1 to 8.

Technical Field

The present application relates to the field of artificial intelligence technologies, and in particular, to a voice testing method, a computer device, and a readable storage medium.

Background

Along with the development of artificial intelligence technology, more and more things can be identified by a machine, which brings more realization possibilities for the interaction mode of voice besides facilitating our daily life, and at present, intelligent hardware layers with voice interaction on the market are not available, such as mobile phones, sound boxes, car machines, household appliances and the like, but most of users feel slow voice awakening when using the voice interaction function.

At present, in order to quantify specific awakening delay time, a manual table nipping mode is generally adopted for judgment, but the method has large error and different testing results of different people, so that the testing consistency is poor.

Disclosure of Invention

The application mainly provides a voice testing method, computer equipment and a readable storage medium, which can reduce testing errors, reduce labor cost and improve the consistency of testing results and testing efficiency.

In order to solve the technical problem, the application adopts a technical scheme that: there is provided a voice testing method, the method comprising: receiving an audio test file, wherein the audio test file comprises a wake-up voice sent by a calling terminal and a reply voice sent by a called terminal responding to the wake-up voice; analyzing the audio test file to generate a test audio waveform curve; and calculating the difference value between the ending time of the awakening voice and the starting time of the reply voice according to the test audio waveform curve so as to obtain the response time of the called terminal.

Wherein, after the step of analyzing the audio test file to calculate the response time of the called terminal according to the ending time of the awakening voice and the starting time of the reply voice, the method further comprises the following steps: and acquiring a plurality of response times obtained by multiple times of calculation, and calculating the average value of the plurality of response times.

Wherein, the step of calculating the difference between the ending time of the awakening voice and the starting time of the reply voice according to the test audio waveform curve to obtain the response time of the called terminal comprises: obtaining the amplitude of the test audio waveform curve; sequentially selecting a first moment and a second moment which are adjacent and have the absolute value of the amplitude equal to a threshold amplitude in the time axis direction of the test audio waveform curve, wherein the absolute value of the amplitude at the moment before the first moment and the absolute value of the amplitude at the moment after the second moment are greater than the threshold amplitude; judging whether the difference value between the second moment and the first moment is greater than or equal to a threshold time value or not; and if so, taking the difference value as the response time of the called terminal.

The setting method of the threshold amplitude comprises the following steps: collecting environmental sounds to generate an environmental audio waveform curve; and acquiring the amplitude of the environment audio waveform curve, calculating the average value of the absolute values of the amplitudes as a background noise value, adding a preset judgment amplitude on the basis of the background noise value, and taking the preset judgment amplitude as the threshold amplitude.

Wherein, the step of calculating the difference between the ending time of the awakening voice and the starting time of the reply voice according to the test audio waveform curve to obtain the response time of the called terminal comprises: selecting a first waveform area and a second waveform area which respectively correspond to the awakening voice and the reply voice from the waveforms of the test audio waveform curve; determining the time corresponding to the right boundary of the first waveform zone as the end time and the time corresponding to the left boundary of the second waveform zone as the start time in the time axis direction of the test audio waveform curve; and calculating the difference value between the starting time and the ending time to obtain the response time of the called terminal.

Wherein the step of selecting a first waveform region and a second waveform region corresponding to the wake-up voice and the reply voice respectively in the waveforms of the test audio waveform curve comprises: acquiring a first preset audio waveform curve of the awakening voice and a second preset audio waveform curve of the replying voice; and selecting two sections of waveform areas which are respectively matched with the first preset audio waveform curve and the second preset audio waveform curve from the waveforms of the test audio waveform curve as the first waveform area and the second waveform area.

Wherein, the step of receiving the audio test file further comprises, before the step of receiving the audio test file: and sending a wake-up instruction to the calling terminal so that the calling terminal sends the wake-up voice.

Wherein, the step of receiving the audio test file further comprises, before the step of receiving the audio test file: sending a collecting instruction to a sound pick-up so that the sound pick-up collects the awakening voice and the reply voice; the step of receiving an audio test file comprises: and receiving the audio test file sent by the sound pickup.

In order to solve the above technical problem, another technical solution adopted by the present application is: there is provided a computer device comprising a processor and a memory, the memory storing computer instructions, the processor being coupled to the memory, the processor executing the computer instructions when in operation to implement a method as described above.

In order to solve the above technical problem, the present application adopts another technical solution: there is provided a computer readable storage medium having stored thereon a computer program for execution by a processor to implement a method as described above.

The beneficial effect of this application is: different from the situation of the prior art, the method receives the audio test file, wherein the audio test file comprises the awakening voice sent by the calling terminal and the reply voice sent by the called terminal responding to the awakening voice; analyzing the audio test file to generate a test audio waveform curve; and calculating the difference between the ending time of the awakening voice and the starting time of the replying voice according to the test audio waveform curve to obtain the response time of the called terminal.

Drawings

In order to more clearly illustrate the technical solutions in the embodiments of the present application, the drawings needed to be used in the description of the embodiments are briefly introduced below, it is obvious that the drawings in the following description are only some embodiments of the present application, and other drawings can be obtained by those skilled in the art without inventive efforts, wherein:

FIG. 1 is a schematic flow chart diagram illustrating a first embodiment of a voice testing method provided in the present application;

FIG. 2 is a schematic diagram illustrating an embodiment of step S11 in FIG. 1;

FIG. 3 is a schematic diagram of another embodiment of step S11 in FIG. 1;

FIG. 4 is a graph of the test audio waveform of step S12 of FIG. 1;

FIG. 5 is a schematic diagram of a detailed flowchart of an embodiment of step S13 in FIG. 1;

FIG. 6 is a flowchart illustrating a detailed process of an embodiment of the method for setting the threshold amplitude value in step S132 of FIG. 5;

FIG. 7 is a schematic diagram illustrating another embodiment of step S13 in FIG. 1;

FIG. 8 is a schematic diagram illustrating a detailed flow chart of an embodiment of step S13a in FIG. 7;

FIG. 9 is a flowchart illustrating a second embodiment of a voice testing method provided herein;

FIG. 10 is a schematic block diagram of an embodiment of a computer device provided herein;

FIG. 11 is a schematic block diagram of an embodiment of a computer-readable storage medium provided herein.

Detailed Description

The technical solutions in the embodiments of the present application will be clearly and completely described below with reference to the drawings in the embodiments of the present application, and it is obvious that the described embodiments are only a part of the embodiments of the present application, and not all of the embodiments. All other embodiments, which can be derived by a person skilled in the art from the embodiments given herein without making any creative effort, shall fall within the protection scope of the present application.

Referring to fig. 1, fig. 1 is a schematic flow chart of a first embodiment of a voice testing method provided in the present application, where the voice testing method in the present embodiment includes:

s11: receiving an audio test file;

referring to fig. 2, fig. 2 is a schematic diagram illustrating an embodiment of step S11 in fig. 1, wherein the audio test file includes a wake-up voice uttered by the calling terminal 110 and a reply voice uttered by the called terminal 120 in response to the wake-up voice.

Specifically, the calling terminal 110 is an intelligent device capable of making a voice, such as an artificial mouth, a special artificial sound source of the artificial mouth, also called an artificial mouth or an artificial mouth, and is configured by mounting a small speaker on a specially-shaped baffle, wherein the baffle is designed to simulate the average directivity and radiation pattern of the human mouth, and the simulated mouth must have a constant sound pressure output, the called terminal 120 is a device to be tested for voice testing in this embodiment, such as a mobile phone, a sound box, a car machine, a home appliance, and the like, during the testing process, the calling terminal 110 makes a wake-up voice, such as "love classmate", which can be understood, the wake-up voice is preset, and when the called terminal 120 receives the wake-up voice, the computer device 130 can collect the wake-up voice made by the calling terminal 110 and the reply voice made by the called terminal 120 through its own sound pickup module, such as "me", and the computer device 130 can collect the wake-up voice made by the called terminal 110 and the reply made by the called terminal 120 And (4) sound.

Optionally, in a specific application scenario, before the computer device 130 in step S11 receives the audio test file, a wake-up instruction is issued to the calling terminal 110, so that the calling terminal 110 issues a wake-up voice, that is, the computer device 130 is in communication connection with the calling device 110, and in the test process, the computer device 130 controls the calling terminal 110 to issue the wake-up voice, and it is not necessary to perform manual operation on the calling terminal 110 to issue the wake-up voice.

Referring to fig. 3, fig. 3 is a schematic diagram of another embodiment of step S11 in fig. 1, in which when the calling terminal 110 sends a wake-up voice and the called terminal 120 sends a reply voice, the wake-up voice and the reply voice are collected by the sound pick-up 140 to generate an audio test file, and the sound pick-up 140 is communicatively connected to the computer device 130, so that the computer device 130 receives the audio test file sent by the sound pick-up 140.

In a specific application scenario, before the computer device 130 in step S11 receives the audio test file, a collecting instruction is sent to the microphone 140, so that the microphone 140 collects the wake-up voice and the reply voice, that is, the microphone 140 is controlled by the computer device 130 to work, without manually operating the microphone 140.

Referring further to fig. 1, the voice testing method in this embodiment further includes:

s12: analyzing the audio test file to generate a test audio waveform curve;

referring to fig. 4, fig. 4 is a graph illustrating the test audio waveform of step S12 in fig. 1, wherein the test audio waveform includes a waveform, a time axis and an amplitude axis, and the test audio waveform represents the relationship between the waveform and the amplitude of the waveform over time.

Referring further to fig. 1, the voice testing method in this embodiment further includes:

s13: and calculating the difference value of the ending time of the awakening voice and the starting time of the replying voice according to the test audio waveform curve so as to obtain the response time of the called terminal.

Referring to fig. 5, fig. 5 is a schematic flowchart illustrating an embodiment of step S13 in fig. 1, wherein step S13 may specifically include:

s131: obtaining the amplitude of a test audio waveform curve;

specifically, in the test audio waveform graph shown in fig. 4, the amplitude of the test audio waveform curve is obtained from the data represented by the amplitude axis.

S132: sequentially selecting a first time and a second time which have the absolute value of the amplitude equal to the threshold amplitude and are adjacent to each other in the time axis direction of the test audio waveform curve;

in this embodiment, the threshold amplitude is set according to the environmental noise of the test environment.

Specifically, first, a first time at which the absolute value of the amplitude is equal to the threshold amplitude is selected in the time axis direction of the test audio waveform curve, and the absolute value of the amplitude at the time immediately preceding the first time is greater than the threshold amplitude, as illustrated, at a first moment, in the case where the wake-up speech is ongoing or the wake-up speech comes into a condition of only ambient noise, the case that the wake-up voice enters only the environmental noise includes the case that the wake-up voice is stopped during the process or the wake-up voice is ended, then, a second moment is selected, at which the absolute value of the amplitude is equal to the threshold amplitude, and the absolute value of the amplitude at a moment subsequent to the second moment is greater than the threshold amplitude, which, in the same way, states that at the second moment, the method is under the condition that the awakening voice is in progress, the awakening voice enters the awakening voice again after pause occurs in the process of the awakening voice, or the voice is replied to start after the awakening voice is finished.

S133: and judging whether the difference value between the second moment and the first moment is greater than or equal to a threshold time value.

Specifically, it is determined whether the difference between the second time and the first time is greater than or equal to the threshold time value, if so, step S134 is executed, otherwise, step S132 is returned to.

Optionally, the threshold time value of the present application may be set according to specific situations, and is not limited herein.

S134: and taking the difference value as the response time of the called terminal.

For example, in a specific application scenario of the present application, the threshold time value may be set to 800ms, but may also be other values, which is not specifically limited herein.

Specifically, if it is detected that the difference between the second time and the first time is smaller than the threshold time value of 800ms, it may be determined that the above situation is when the wakeup voice is in progress or a pause occurs during the wakeup voice progress, which indicates that the first time is not the end time of the wakeup voice and the second time is not the start time of the reply voice, and then the process returns to step S132, and selects the next two time points with the same characteristics as those in step S132. Similarly, if it is detected that the difference between the second time and the first time is greater than or equal to the threshold time value of 800ms, it indicates that the second time and the first time are in a period from the end of the wake-up voice to the start of the reply voice, that is, the first time is the end time of the wake-up voice, the second time is the start time of the reply voice, and the difference between the two times is the response time of the called terminal, for example, as shown in fig. 4The first time is t_e1The second time is t_r1Then the response time of the called terminal is T₁＝t_r1-t_e1。

Referring to fig. 6, fig. 6 is a schematic flowchart illustrating an embodiment of the method for setting the threshold amplitude value in step S132 in fig. 5, in which step S132 may specifically include:

s1321: collecting environmental sounds to generate an environmental audio waveform curve;

specifically, the environmental sound under certain environment is gathered to accessible adapter, generates the environmental audio frequency wave form curve, and this environmental audio frequency wave form curve includes waveform, time axis and amplitude axle equally.

S1322: and acquiring the amplitude of the environmental audio waveform curve, and calculating the average value of the absolute values of the amplitudes as a background noise value.

It will be appreciated that in implementations where the ambient sound may be in an unstable state, the amplitude of the ambient audio waveform may not be a stable value, and therefore the average of the absolute values of the amplitudes is selected as the noise floor.

S1323: and adding a preset judgment amplitude on the basis of the background noise value, and taking the preset judgment amplitude as a threshold amplitude.

It is understood that the disappearance and the onset of the sound do not directly fall or rise to the threshold amplitude, and the preset judgment amplitude is added on the basis of the background noise value to serve as the beginning or the end of the sound. For example, when the obtained background noise value (amplitude) of the environment is 80db, the preset determination amplitude may be set to 20db, and thus when the obtained amplitude of the test audio waveform curve is greater than or equal to 100db, a point having an amplitude greater than or equal to 100db may be used as a determination point for starting or ending the speech. Of course, in other embodiments, the value of the preset judgment amplitude may be any value, and is not limited specifically here.

Similarly, in the embodiment of the present application, it may also be determined whether the amplitude point of the test audio waveform curve is a determination point of the beginning or the end of the voice by determining whether the difference between the obtained amplitude value of the test audio waveform curve and the background noise value of the environment is within a preset determination amplitude value.

Referring to fig. 7, fig. 7 is a schematic flowchart illustrating another embodiment of step S13 in fig. 1, wherein step S13 may specifically include:

s13 a: selecting a first waveform area and a second waveform area which respectively correspond to the awakening voice and the replying voice from the waveforms of the test audio waveform curve;

it can be understood that the test audio waveform curve includes a waveform curve corresponding to the wake-up voice and a waveform curve corresponding to the reply voice, and accordingly, the waveform of the test audio waveform curve also includes a waveform region corresponding to the wake-up voice and a waveform region corresponding to the reply voice, that is, the first waveform region a1 and the second waveform region a2 in fig. 4, and if the waveform of the test audio waveform curve does not include the first waveform region a1 and the second waveform region a2, it indicates that the voice collection fails, the wake-up voice sending fails or the reply voice sending fails, and it indicates that the test fails, and the test result is not counted.

Referring to fig. 8, fig. 8 is a schematic flowchart illustrating an embodiment of step S13a in fig. 7, in which step S13a may specifically include:

s131 and 131 a: acquiring a first preset audio waveform curve of the awakening voice and a second preset audio waveform curve of the replying voice;

specifically, the wake-up voice and the reply voice may be respectively collected in advance in the same environment, and the corresponding first preset audio waveform curve and the second preset audio waveform curve may be generated and stored.

S131 and 131 b: two sections of waveform areas matched with the first preset audio waveform curve and the second preset audio waveform curve respectively are selected from the waveforms of the test audio waveform curve to serve as a first waveform area and a second waveform area.

Specifically, the similarity comparison may be performed between the first preset audio waveform curve and the test audio waveform curve, for example, by comparing the amplitude similarity or the amplitude variation similarity, and a waveform region with the similarity smaller than or equal to the preset similarity to the first preset audio waveform curve is selected as the first waveform region a1, and a waveform region with the similarity smaller than or equal to the preset similarity to the second preset audio waveform curve is selected as the second waveform region a 2.

S13 b: determining the time corresponding to the right boundary of the first waveform area as the end time and the time corresponding to the left boundary of the second waveform area as the start time in the time axis direction of the test audio waveform curve;

specifically, after the first waveform region a1 corresponding to the wake-up speech and the second waveform region a2 corresponding to the reply speech are determined in step S13a, since the time of the wake-up speech is earlier and the time of the reply speech is later in time, as shown in fig. 4, in the time axis direction of the test audio waveform curve, the time corresponding to the right boundary of the first waveform region a1 is the end time of the wake-up speech, and the time corresponding to the left boundary of the second waveform region a2 is the start time of the reply speech.

S13 c: and calculating the difference value of the starting time and the ending time to obtain the corresponding time of the called terminal.

Specifically, as shown in fig. 4, the right boundary of the first waveform region a1 corresponds to a time t_e1Then the time t_e1I.e. the end time of the wake-up voice, the time corresponding to the left boundary of the second waveform region is t_r1Then the time t_r1I.e. the starting time of replying voice, the corresponding time of the called terminal is T₁＝t_r1-t_e1。

In the embodiment, the audio test file is received, and the audio test file comprises the awakening voice sent by the calling terminal and the reply voice sent by the called terminal responding to the awakening voice; analyzing the audio test file to generate a test audio waveform curve; and calculating the difference between the ending time of the awakening voice and the starting time of the replying voice according to the test audio waveform curve to obtain the response time of the called terminal.

Referring to fig. 9, fig. 9 is a flowchart illustrating a voice testing method according to a second embodiment of the present application, wherein steps S21 to S23 in this embodiment are the same as steps S11 to S13 in the first embodiment, and are not repeated herein, and the voice testing method further includes:

s24: and acquiring a plurality of response times obtained by multiple times of calculation, and calculating the average value of the plurality of response times.

Specifically, by cyclically using the methods of steps S11 to S13 in the above embodiments, a plurality of response times, for example, the first response time is T₁The second response time is T₂By analogy, the nth response time is T_nThen the average value T of the response times is

In this embodiment, the accuracy of the test result is improved compared with a single calculation result by further obtaining a plurality of response times obtained by multiple calculations and calculating an average value of the plurality of response times.

Referring to fig. 10, fig. 10 is a schematic block diagram of an embodiment of a computer device provided in the present application, the computer device in the present embodiment includes a processor 31 and a memory 32, the processor 31 is coupled to the memory 32, the memory 32 stores computer instructions, and the processor 31 executes the computer instructions to implement the voice testing method in any of the above embodiments when operating.

The processor 31 may also be referred to as a CPU (Central Processing Unit). The processor 31 may be an integrated circuit chip having signal processing capabilities. The processor 31 may also be a general purpose processor, a Digital Signal Processor (DSP), an Application Specific Integrated Circuit (ASIC), an off-the-shelf programmable gate array (FPGA) or other programmable logic device, discrete gate or transistor logic, discrete hardware components. A general purpose processor may be a microprocessor or the processor may be any conventional processor, but is not limited thereto.

Referring to fig. 11, fig. 11 is a schematic block diagram of an embodiment of a computer-readable storage medium provided in the present application, where the computer-readable storage medium stores a computer program 41, and the computer program 41 can be executed by a processor to implement the voice testing method in any of the above embodiments.

Optionally, the readable storage medium may be various media that can store program codes, such as a usb disk, a removable hard disk, a Read-Only Memory (ROM), a Random Access Memory (RAM), a magnetic disk or an optical disk, or may be a terminal device such as a computer, a server, a mobile phone, or a tablet.

The above description is only an example of the present application and is not intended to limit the scope of the present application, and all modifications of equivalent structures and equivalent processes, which are made by the contents of the specification and the drawings, or which are directly or indirectly applied to other related technical fields, are intended to be included within the scope of the present application.

13页详细技术资料下载

上一篇：一种医用注射器针头装配设备

下一篇：一种语音识别缺陷检测方法和装置

Voice test method, computer equipment and readable storage medium

相关技术

网友询问留言