Voice control method, voice control device and computer-executable nonvolatile storage medium

文档序号：991510 发布日期：2020-10-20 浏览：6次中文

阅读说明：本技术 语音控制方法、语音控制装置以及计算机可执行非易失性存储介质 (Voice control method, voice control device and computer-executable nonvolatile storage medium ) 是由李英杰于 2019-01-22 设计创作，主要内容包括：一种语音控制方法,包括：获取语音输入信息；对所述语音输入信息进行识别,以得到语音命令；基于所述语音命令,利用测试框架调用单元,确定与所述语音命令对应的控件,其中,所述测试框架调用单元不在所述控件所在的应用程序中；执行所述控件对应的功能。该方法在不修改系统源码和不需要针对特定APP进行适配的情况下,实现对三方APP的控制,更加灵活方便和普适性更强。还提供了语音控制装置和计算机可执行非易失性存储介质。(A method of voice control, comprising: acquiring voice input information; recognizing the voice input information to obtain a voice command; determining a control corresponding to the voice command by using a test frame calling unit based on the voice command, wherein the test frame calling unit is not in an application program where the control is located; and executing the function corresponding to the control. The method realizes the control of three-party APP under the conditions that the source code of the system is not modified and the adaptation aiming at the specific APP is not needed, and is more flexible and convenient and has stronger universality. A voice control apparatus and a computer-executable non-volatile storage medium are also provided.)

1. A method of voice control, comprising:

acquiring voice input information;

recognizing the voice input information to obtain a voice command;

determining a control corresponding to the voice command by using a test frame calling unit based on the voice command, wherein the test frame calling unit is not in an application program where the control is located;

and executing the function corresponding to the control.

2. The method of claim 1, wherein the determining, with a test framework invocation unit based on the voice command, a control corresponding to the voice command comprises:

acquiring a control in an application program in a foreground running state on a current user interface by using the test frame calling unit;

acquiring a character string on the control or a description character string of the control;

and matching the voice command with the character string on the control or the description character string of the control to determine the control corresponding to the voice command.

3. The method of claim 2, wherein the voice command further comprises a command parameter,

wherein, the determining, based on the voice command and by using a test frame calling unit, the control corresponding to the voice command further includes:

acquiring the position of the control in the application program in a foreground running state on the current user interface by using the test frame calling unit;

determining whether an edit box is located at least one position adjacent to the position of the control by using the test frame calling unit, and inputting the command parameter into any edit box when one or more edit boxes are determined;

wherein, executing the function corresponding to the control comprises:

and executing the function corresponding to the control based on the command parameter.

4. The method of claim 3, wherein determining whether there is an edit box in at least one location adjacent to the location of the control comprises:

searching all edit boxes on the current user interface;

identifying the boundary of each editing frame;

determining a position of an edit box at least one position adjacent to the position of the control based on the boundary.

5. The method of any of claims 1-4, wherein recognizing the speech input information to obtain a speech command comprises:

converting the voice input information into a character string;

matching the converted character string with a preset voice command;

and determining a voice command corresponding to the voice input information based on the matching result.

6. The method of claim 5, matching the converted string with a preset voice command comprises:

establishing a corresponding relation set of the character strings and a preset voice command;

determining a voice command in the set that the converted character string matches based on template matching or deep learning;

matching the character string with the determined voice command.

7. The method of claim 1, wherein the determining, with a test framework invocation unit based on the voice command, a control corresponding to the voice command comprises:

acquiring an image of an application program in a foreground running state in a current user interface based on the test frame called by the test frame calling unit;

identifying the image to determine a control icon in the image;

and matching the voice command with the control icon to determine a control corresponding to the voice command.

8. The method of any of claims 2-6, wherein the determining, with a test framework invocation element based on the voice command, the control corresponding to the voice command further comprises:

when the voice command is unsuccessfully matched with the character string on the control or the description character string of the control, acquiring the image of the application program in the foreground running state in the current user interface based on the test frame called by the test frame calling unit;

identifying the image to determine a control icon in the image;

and matching the voice command with the control icon to determine a control corresponding to the voice command.

9. The method of claim 7, wherein the determining, with a test framework invocation element based on the voice command, the control corresponding to the voice command further comprises:

when the voice command is unsuccessfully matched with the control icon, acquiring a control in an application program in a foreground running state on a current user interface by using the test frame calling unit;

acquiring a character string on the control or a description character string of the control;

and matching the voice command with the character string on the control or the description character string of the control to determine the control corresponding to the voice command.

10. The method of any of claims 7-9, wherein identifying the image to determine the control icon in the image comprises:

extracting the outline of the image to obtain at least one control area;

and performing image recognition on the at least one control area to determine a control icon in the control area.

11. The method of claim 10, wherein the matching the voice command with the control icon to determine the control corresponding to the voice command comprises:

converting the control icon into a character string corresponding to the control function, and matching the corresponding character string with the voice command;

alternatively, the first and second electrodes may be,

and converting the voice command into an icon corresponding to the voice command, and matching the corresponding icon with the control icon.

12. The method of any of claims 1-11, prior to the step of obtaining speech input information, the method further comprising:

acquiring an application program starting command;

and starting the application program where the control is located based on the application program starting command.

13. A voice control apparatus comprising:

the voice recognition and semantic understanding unit is configured to acquire voice input information and recognize the voice input information to obtain a voice command;

the test frame calling unit is configured to determine a control corresponding to the voice command based on the voice command, wherein the test frame calling unit is not in an application program where the control is located;

and the execution unit is configured to execute the function corresponding to the control.

14. The apparatus of claim 13, further comprising:

the image identification unit is configured to identify an image in an application program where the control is located in the current user interface so as to determine a control icon in the image;

the test frame calling unit is further configured to match the voice command with the control icon to determine a control corresponding to the voice command.

15. The device of claim 14, wherein the speech recognition and semantic understanding unit is in a first module, the test framework calling unit and the image recognition unit are in a second module, and the first module and the second module communicate with each other by way of inter-process communication.

16. A voice-controlled apparatus comprising a memory and a processor, the memory having stored therein instructions, the processor when processing the instructions performing the method of any of claims 1-12.

17. A computer-executable non-volatile storage medium storing computer program instructions which, when processed by a processor, perform the method of any of claims 1-12.

Technical Field

The disclosed embodiments relate to a voice control method, a voice control apparatus corresponding to the voice control method, and a computer-executable nonvolatile storage medium.

Background

Along with the rapid popularization of smart phones, mobile internet is also developing rapidly, and under the condition that intelligent operating system and mobile internet develop rapidly together, android operating system has become the operating system that all kinds of smart devices use the most because its open source, advantage that can degree of depth customization, and the equipment of operation android system is the most various, and these equipment use voice interaction's mode more and more generally.

However, although a large number of Applications (APPs) can be installed and used freely in the android ecology, most of the APPs adopt touch operation input of a user to a mobile phone device, and if the user wants to use voice natural interaction control, the APPs need to be newly developed. This, in addition to being labor intensive, may involve problems with cooperation with third party APP companies, both time and economic.

In addition, the method can also realize the control of the third-party APP under the condition of not modifying the source codes of the third-party APP by modifying the source codes of the operating system to carry out the adaptation between the instruction and the control on the appointed third-party APP. However, the method needs to do adaptation work, so that the use of three-party APP is still limited to a certain extent, and in addition, the development difficulty is increased by modifying the source code of the operating system.

Disclosure of Invention

An object of the embodiments of the present disclosure is to provide a voice control method, a voice control apparatus, and a non-volatile storage medium, so as to solve the above technical problems.

According to at least one embodiment of the present disclosure, there is provided a voice control method including: acquiring voice input information; recognizing the voice input information to obtain a voice command; and determining a control corresponding to the voice command by using a test frame calling unit based on the voice command, wherein the test frame calling unit is not in an application program where the control is located and executes a function corresponding to the control.

For example, the determining, by the test framework invoking unit based on the voice command, a control corresponding to the voice command includes: acquiring a control in an application program in a foreground running state on a current user interface by using the test frame calling unit; acquiring a character string on the control or a description character string of the control; and matching the voice command with the character string on the control or the description character string of the control to determine the control corresponding to the voice command.

For example, the voice command further includes a command parameter, wherein the determining, by using a test framework call unit, a control corresponding to the voice command based on the voice command further includes: acquiring the position of the control in the application program in a foreground running state on the current user interface by using the test frame calling unit; determining whether an edit box is located at least one position adjacent to the position of the control by using the test frame calling unit, and inputting the command parameter into any edit box when one or more edit boxes are determined; wherein, executing the function corresponding to the control comprises: and executing the function corresponding to the control based on the command parameter.

For example, determining whether there is an edit box in at least one location adjacent to the location of the control comprises: searching all edit boxes on the current user interface; identifying the boundary of each editing frame; determining a position of an edit box at least one position adjacent to the position of the control based on the boundary.

For example, the recognizing the voice input information to obtain the voice command includes: converting the voice input information into a character string; matching the converted character string with a preset voice command; and determining a voice command corresponding to the voice input information based on the matching result.

For example, matching the converted character string with a preset voice command includes: establishing a corresponding relation set of the character strings and a preset voice command; determining a voice command in the set that the converted character string matches based on template matching or deep learning; matching the character string with the determined voice command.

For example, the determining, by the test framework invoking unit based on the voice command, a control corresponding to the voice command includes: acquiring an image of an application program in a foreground running state in a current user interface based on the test frame called by the test frame calling unit; identifying the image to determine a control icon in the image; and matching the voice command with the control icon to determine a control corresponding to the voice command.

For example, the determining, by the test framework invoking unit based on the voice command, a control corresponding to the voice command further includes: when the voice command is unsuccessfully matched with the character string on the control or the description character string of the control, acquiring the image of the application program in the foreground running state in the current user interface based on the test frame called by the test frame calling unit; identifying the image to determine a control icon in the image; and matching the voice command with the control icon to determine a control corresponding to the voice command.

For example, the determining, by the test framework invoking unit based on the voice command, a control corresponding to the voice command further includes: when the voice command is unsuccessfully matched with the control icon, acquiring a control in an application program in a foreground running state on a current user interface by using the test frame calling unit; acquiring a character string on the control or a description character string of the control; and matching the voice command with the character string on the control or the description character string of the control to determine the control corresponding to the voice command.

For example, recognizing the image to determine the control icon in the image includes: carrying out contour extraction on the screen image to obtain at least one control area; and performing image recognition on the at least one control area to determine a control icon in the control area.

For example, the matching the voice command with the control icon to determine the control corresponding to the voice command includes: converting the control icon into a character string corresponding to the control function; matching the voice command with the corresponding character string; or converting the voice command into an icon corresponding to the voice command, and matching the corresponding icon with the control icon.

For example, before the step of obtaining the voice input information, the method further comprises: acquiring an application program starting command; and starting the application program where the control is located based on the application program starting command.

According to at least one embodiment of the present disclosure, there is provided a voice control apparatus including: the voice recognition and semantic understanding unit is configured to acquire voice input information and recognize the voice input information to obtain a voice command; the test frame calling unit is configured to determine a control corresponding to the voice command based on the voice command, wherein the test frame calling unit is not in an application program where the control is located; and the execution unit is configured to execute the function corresponding to the control.

For example, the apparatus further comprises: the image identification unit is configured to identify an image in an application program where the control is located in the current user interface so as to determine a control icon in the image; the test frame calling unit is further configured to match the voice command with the control icon to determine a control corresponding to the voice command.

For example, the speech recognition and semantic understanding unit is in a first module, the test framework calling unit and the image recognition unit are in a second module, and the first module and the second module communicate with each other by way of interprocess communication.

According to at least one embodiment of the present disclosure, there is provided a voice control apparatus including a memory storing instructions therein and a processor performing the foregoing method when processing the instructions.

According to at least one embodiment of the present disclosure, there is provided a computer-executable non-volatile storage medium storing computer program instructions which, when processed by a processor, perform the aforementioned method.

The method and the device realize control of three-party APP under the condition that system source codes are not modified and adaptation is not needed for specific APPs, and are more flexible, convenient and strong in universality.

Drawings

In order to more clearly illustrate the technical solutions of the embodiments of the present disclosure, the drawings used in the description of the embodiments will be briefly introduced below. The drawings in the following description are merely exemplary embodiments of the disclosure.

FIG. 1 shows a flow diagram of a voice control method according to an embodiment of the present disclosure;

FIG. 2 is a schematic diagram of a voice control apparatus according to an embodiment of the present disclosure;

FIG. 3 illustrates a block diagram of one example of a voice control apparatus according to an embodiment of the present disclosure;

FIG. 4 illustrates an architecture diagram of a voice control device according to an embodiment of the present disclosure;

fig. 5 shows a schematic structural diagram of another voice control device according to an embodiment of the present disclosure.

Detailed Description

Hereinafter, preferred embodiments of the present disclosure will be described in detail with reference to the accompanying drawings. Note that in the present specification and the drawings, steps and elements having substantially the same structure are denoted by the same reference numerals, and repeated explanation of the steps and elements will be omitted.

In the embodiment of the present disclosure, the voice control method and the voice control apparatus, the nonvolatile storage medium may be applied to an electronic device such as a mobile terminal, a personal computer, a laptop computer, and the like. The electronic equipment can execute the voice control method of the embodiment of the disclosure. For example, the voice control method of the embodiment of the present disclosure is made into an application program, the application program implementing the voice control method is installed on the electronic device, or a chip or a processor including the voice control apparatus or the storage medium of the embodiment of the present disclosure is installed. In this way, after the voice control method, apparatus or storage medium according to the embodiment of the present disclosure is executed, it is possible to perform voice control on any other application installed in the electronic device (or referred to as a third-party application), so as to implement voice interaction, even if the third-party application itself does not have a voice control function. When the third-party application program has the voice control function, the user can also choose not to apply the voice control function of the third-party application program, and choose to apply the voice control function of the embodiment of the disclosure, so that more selection experiences are provided for the user. Therefore, the voice control method or the voice control device of the embodiment of the disclosure can realize the voice control and management of the third-party application program under the condition of not modifying the third-party application program code and the operating system program code, thereby enhancing the functions of the electronic equipment and facilitating the use of users.

FIG. 1 depicts a flow diagram of a voice control method 100 according to an embodiment of the present disclosure. A voice control method of an embodiment of the present disclosure will be described below with reference to fig. 1. Referring to FIG. 1, the voice control method 100 may include steps S101-S104.

In step S101, voice input information is acquired. According to one example of the present disclosure, a voice input of a user may be received as voice input information through a microphone device of an electronic device, the voice input information being used to control a control in a third-party application so that the user may interact with the third-party application through the voice information. For example, a voice such as "search", "slide up", or the like input by the user is acquired as voice input information through a microphone.

In one example, a third-party application may first be launched by voice before obtaining voice input information that a user uses to control a control in the third-party application. For example, first, an application program starting voice command input by a user is acquired through a voice input interface of the electronic device, and a third-party application program is started based on the program starting voice command of the voice control method. For example, the third-party application program of 'WeChat' is started by inputting voice 'open WeChat' through the program of the voice control method. Of course, the third-party application can be opened by touching or clicking the application icon.

In step S102, the voice input information is recognized to obtain a voice command. According to an example of the present disclosure, voice input information may be converted into a character string, the converted character string may be matched with a preset voice command, and a voice command corresponding to the voice input information may be determined based on a matching result. For example, the voice command may be a control command capable of controlling the electronic device. For example, a set of correspondence relationships of character strings to voice commands may be defined in advance. For example, a voice command "slide up" corresponds to a set of strings such as "slide up", "pull up", "slide up" }, the operation corresponding to the voice command is a pull-up operation on the slidable control, and if the voice input information of the user includes at least one of the strings "slide up", "pull up", "slide up", the voice command of "slide up" can be matched. Also for example, the voice command "search" corresponds to a set of strings such as { "search", "find" }, and the corresponding operation is clicking a search control. If the user's voice input information includes at least one of "search" and "find", the voice command of "search" may be matched.

In one example, when matching the converted string to a voice command, the voice command that the converted string matches in the set may be determined based on template matching or deep learning, and the string may be converted to the determined voice command. In addition, the supported voice commands and the corresponding character strings can be infinitely expanded, and elements in the character string set corresponding to the voice commands can be added according to needs.

According to an embodiment of the present disclosure, the voice command may include only one command, and may further include a voice command and command parameters. When the converted character string is matched with a preset voice command, command parameters included in the voice command may be recognized and matched at the same time as the voice command is matched. For example, the matching result may include at least three categories, the first category being only matching voice commands, such as "slide up", the parsing result being the command to slide up. The second type comprises a voice command and command parameters corresponding to the voice command, and if the character string of 'I want to search for Liu De Hua', the matching result is the command of searching and the parameter 'Liu De Hua'. For example, if the character string of 'play forgetting water', the matching result is the voice command of play and the command parameter of forgetting water. In the third category, for operations that do not match a preset voice command, such as a character string of "forgetting water", the voice command is considered as "forgetting water".

In step S103, based on the voice command, a test frame calling unit is used to determine a control corresponding to the voice command, where the test frame calling unit is not in the application program where the control is located. That is to say, the application program where the control is located is different from the program where the test framework calling unit is located.

The test framework calling unit is a program for calling the function of the test framework. The testing framework is a software library used for automatic testing, and is a function of the operating systems such as android and the like. Such as a user interface automatic test framework (uiautomator). The uiautomator testing framework can acquire the control or the attribute information of the control on the current user interface. For example, the control hierarchical relationship and the attribute information of the current window are obtained, and the target control is found. If the event is a click event, the coordinates of the center point of the control can be calculated. In addition, the uiautomator can inject user events (such as clicking and inputting type operations) through a hidden interface, so that the purpose of cross-process automation is achieved. In addition, in addition to the uiautomator test framework, there are other test frameworks such as the apium test framework, which is not limited by the present disclosure.

In the embodiment of the disclosure, the control required to be subjected to voice control is in a third-party application program in the electronic device, and the test frame calling unit for calling the test frame is in a program other than the third-party application program, so that the application program in which the control is located and the program in which the test frame calling unit is located are not the same application program.

According to an example of the present disclosure, in the process of determining the control corresponding to the voice command by using the test frame calling unit, the control object on the current user interface may be obtained by using the test frame calling unit first. For example, all control objects in the third party application window are obtained through the uiautomator testing framework. And then acquiring a character string on the control object or a description character string of the control. For example, a character string on a control object or a control description character string is recognized by Optical Character Recognition (OCR). For example, the text strings "search", "copy", "exit" on the control object are identified. The voice command is then matched to a string on the control object or a description string of the control to determine the control corresponding to the voice command. For example, a voice command such as "search" is matched against the text strings "search", "copy", "exit" on the control object to determine a matching "search" control.

According to an example of the present disclosure, when the voice command further includes a command parameter, the command parameter needs to be further processed in the process of determining the control corresponding to the voice command by using the test framework calling unit. For example, after the control is determined by calling the test frame by using the test frame calling unit, the position of the control on the current user interface or the third-party application program window is obtained by further using the test frame calling unit. And then determining whether an edit box is arranged on at least one position adjacent to the position of the control by using the test frame calling unit. For example, it is determined whether there is an edit box in the top region of the control, and if not, it is determined whether there is an edit box in the left region of the control. When the edit box is determined to be found, inputting command parameters into the edit box; and then, operating the control based on the command parameters to execute the function corresponding to the control. For example, after "water of forgetfulness" is input in the edit box, an operation of clicking the control "search" is performed to search for "water of forgetfulness".

In one example, when determining whether there is an edit box at least one position adjacent to the position of the control, all edit boxes in the third-party application window can be found by using a find Objects (find Objects) function of the uiautomator, and then, for each edit box, the boundary of the edit box is obtained, so that the position coordinates of the edit box are obtained, and the position relationship between the edit box and the control is determined according to the position coordinates of the edit box.

Because some controls do not have characters but icons, for example, the icon of a magnifying glass is often arranged on the "search" control instead of the "search" two characters, according to an example of the present disclosure, the control corresponding to the voice command can be found by means of image recognition. For example, when the control cannot be found by means of character string matching, or even if the control is found, the control cannot be found in the vicinity of the control, the control can be found by means of image recognition. Of course, those skilled in the art will appreciate that the control corresponding to the voice command may be determined by arbitrarily selecting or simultaneously selecting the string-mode matching control and the image-recognition-mode matching control, and both of them have no priority.

According to an example of the present disclosure, when the control is matched in the image recognition mode, an image on the current user interface may be obtained based on the test frame called by the test frame calling unit. For example, an image in an application in a foreground running state in a current user interface is acquired. And then identifying the image, and positioning one or more control icons in the third-party application program in the foreground running state in the image. For example, the image is subjected to contour extraction, and one or more control areas are obtained. And then carrying out image recognition on the acquired one or more control areas to determine a control description character string or a control icon in the control area. By means of obtaining the control area at first, the identification range of the control icon can be narrowed, the calculated amount is reduced, and the identification efficiency is improved.

And if the control character string is obtained through recognition, matching the character string corresponding to the voice command with the control character string, and determining the control corresponding to the voice command. And if the control icon is obtained, matching the voice command with the control icon to determine a control corresponding to the voice command. For example, a control icon is converted into a character string corresponding to the control function; matching the character string corresponding to the voice command with the character string corresponding to the control icon to determine the control corresponding to the voice command. For example, the voice command may be converted into an icon corresponding to the voice command, the icon corresponding to the voice command may be matched with the control icon, and the control corresponding to the voice command may be determined. For example, the character string corresponding to the voice command "search" includes "search" and "find", and also includes a plurality of icons, such as "magnifying glass icons". The magnifying glass icon is matched with the control icon, and when a certain control icon is determined to be the magnifying glass icon, the control can be determined to be the searching control.

In one example, the matching technique of the voice command icon and the control icon may use image recognition methods such as image feature matching, deep learning and the like. When the image features are matched, the image features, such as contour features, of the voice command icon and the spatial icon can be respectively extracted, the image features of the voice command icon and the spatial icon are matched, and when the matching rate is greater than a matching threshold, such as 80%, the voice command icon and the spatial icon are considered to be the same icon.

In one example, after the control area is recognized, if no character is recognized, image recognition is performed on the control area, and if the character is recognized, image recognition is not performed on the control area, so that unnecessary calculation is avoided, and recognition efficiency is improved. And when the voice command can be matched with the icon or the character string of the control area, determining the control area as the control. For example, when matching the voice command with the character string on the control or the description character string of the control is unsuccessful, the image of the application program in the foreground running state in the current user interface may be acquired based on the test frame called by the test frame calling unit, and then the image is identified to determine the control icon in the image; and matching the voice command with the control icon to determine the control corresponding to the voice command.

In another example, the control corresponding to the voice command may be searched in an image recognition mode, and when the image is not recognized in the image recognition mode, the control area is recognized in a text recognition mode, so that unnecessary calculation may be avoided, and system resources may be saved. For example, when the matching of the voice command and the control icon is unsuccessful, the control in the application program in the foreground running state on the current user interface can be acquired by using the test frame calling unit; then acquiring a character string on the control or a description character string of the control; and matching the voice command with a character string on the control or a description character string of the control to determine the control corresponding to the voice command.

In step S104, the function corresponding to the control is executed. According to an example of the disclosure, operations such as single click, double click or dragging of the control can be executed according to the attribute of the control and according to the voice command. When the voice command further comprises command parameters, the control can be operated according to the command parameters. For example, when there is a command parameter "forget water" in the edit box, a click on the "search" control is performed and the "forget water" is searched.

According to the voice control method, the test framework of the operating system is called, so that the voice control of the third-party application program can be realized without modifying the third-party application program and the code of the operating system, the functions of the electronic equipment are expanded, and the use of a user is facilitated.

Having described the voice control method according to the embodiment of the present disclosure, a voice control apparatus according to the embodiment of the present disclosure, which corresponds to the voice control method of the foregoing embodiment, will be further described below, and for the sake of brevity of the description, only briefly described below. See in particular all the examples previously described.

Fig. 2 shows a schematic structural diagram of a voice control device according to an embodiment of the present disclosure, and referring to fig. 2, the voice control device 200 includes a voice recognition and semantic understanding unit 201, a test framework invoking unit 202, and an execution unit 203. Wherein the speech recognition and semantic understanding unit 201 is configured to acquire speech input information and to recognize the speech input information to obtain a speech command. The test frame calling unit 202 is configured to determine a control corresponding to the voice command based on the voice command, where the test frame calling unit is not in the application program where the control is located, that is, the program where the control is located is different from the application program where the test frame calling unit is located. And the execution unit 203 is configured to execute the function corresponding to the control. In the implementation of the present disclosure, the speech recognition and semantic understanding unit 201, the test framework calling unit 202, and the execution unit 203 may be implemented in software, hardware, or firmware, for example, a computer program, a programmable logic circuit, a chip, or a chip set.

FIG. 3 shows a block diagram of one example of a voice control apparatus according to an embodiment of the present disclosure. Referring to fig. 3, in order to recognize an image on the user interface, the voice control apparatus 200 may further include an image recognition unit 204. The image recognition unit 204 is configured to recognize an image on the user interface, for example, an image in an application program in a foreground running state in the current user interface, so as to determine a control icon in the application program in the image. In addition, the test frame invoking unit 202 is further configured to match the voice command with the control icon to determine a control corresponding to the voice command. In the implementation of the present disclosure, the voice recognition and semantic understanding unit 201, the test framework calling unit 202, the execution unit 203, and the image recognition unit 204 may be implemented in software, hardware, or firmware, for example, implemented by a computer program, a programmable logic circuit, a chip, or a chip set.

Fig. 4 shows an architecture diagram of a voice control device according to an embodiment of the present disclosure, and referring to fig. 4, a voice recognition and semantic understanding unit 201 is in a first module, a test framework calling unit 202 and an image recognition unit 204 are in a second module, respectively, and the first module and the second module communicate with each other by means of inter-process communication. Furthermore, the execution unit 203 may be in a third module. For example, the execution unit 203 may call a control of the operating system itself to execute the function.

Fig. 5 shows a schematic structural diagram of another voice control device according to an embodiment of the present disclosure. Referring to fig. 5, the voice control apparatus 500 includes a memory 501 and a processor 502. The memory 501 stores computer program instructions, and the processor 502 executes the program instructions to perform the voice control method in the foregoing embodiments.

According to an embodiment of the present disclosure, there is also provided a computer-executable nonvolatile storage medium that stores computer program instructions that, when executed by a processor in a computer, perform the voice control method in the foregoing embodiment.

Those of ordinary skill in the art will appreciate that the various illustrative elements and algorithm steps described in connection with the embodiments disclosed herein may be implemented as electronic hardware, computer software, or combinations of both. And the software modules may be disposed in any form of computer storage media. To clearly illustrate this interchangeability of hardware and software, various illustrative components and steps have been described above generally in terms of their functionality. Whether such functionality is implemented as hardware or software depends upon the particular application and design constraints imposed on the implementation. Skilled artisans may implement the described functionality in varying ways for each particular application, but such implementation decisions should not be interpreted as causing a departure from the scope of the present disclosure.

Those skilled in the art will appreciate that various modifications, combinations, sub-combinations, and substitutions are possible in the present disclosure depending on design requirements and other factors, provided they come within the scope of the appended claims and their equivalents.

14页详细技术资料下载

Voice control method, voice control device and computer-executable nonvolatile storage medium

相关技术

网友询问留言