Query method, query device, terminal equipment and storage medium

文档序号:193325 发布日期:2021-11-02 浏览:40次 中文

阅读说明:本技术 查询方法、装置、终端设备及存储介质 (Query method, query device, terminal equipment and storage medium ) 是由 袁项南 陈轶博 李鑫 祖华龙 刘志伟 于 2020-04-30 设计创作,主要内容包括:本申请公开了一种查询方法、装置、终端设备及存储介质,涉及智能搜索领域。具体实现方案为:获取语音查询指令;根据所述语音查询指令获取预设区域的图像;根据所述预设区域的图像确定待查询内容;根据所述待查询内容确定查询结果,并展示所述查询结果。本申请实施例提供的方案,无需对书本进行拍照、框选等操作,直接通过用户发出语音查询指令即可启动查询操作,并最终得到查询结果进行展示,操作简便。(The application discloses a query method, a query device, terminal equipment and a storage medium, and relates to the field of intelligent search. The specific implementation scheme is as follows: acquiring a voice query instruction; acquiring an image of a preset area according to the voice query instruction; determining the content to be inquired according to the image of the preset area; and determining a query result according to the content to be queried, and displaying the query result. According to the scheme provided by the embodiment of the application, the operation of photographing, selecting frames and the like on the book is not needed, the query operation can be started by directly sending the voice query instruction by the user, the query result is finally obtained and displayed, and the operation is simple and convenient.)

1. A method of querying, comprising:

acquiring a voice query instruction;

acquiring an image of a preset area according to the voice query instruction;

determining the content to be inquired according to the image of the preset area;

and determining a query result according to the content to be queried, and displaying the query result.

2. The method according to claim 1, wherein the image of the preset area comprises a finger of a user; determining the content to be inquired according to the image of the preset area, wherein the determining comprises the following steps:

performing gesture recognition processing on the image of the preset area to acquire the position of the finger of the user on the image of the preset area;

and determining the content to be inquired according to the position of the finger of the user on the image of the preset area.

3. The method of claim 2, wherein the voice query instruction includes a first query requirement of the user; determining the content to be queried according to the position of the finger of the user on the image of the preset area, wherein the determining comprises the following steps:

and determining the content to be queried according to the first query requirement and the position of the finger of the user on the image of the preset area.

4. The method of claim 3, wherein determining the content to be queried according to the first query requirement and the position of the finger of the user on the image of the preset area comprises:

determining coordinates of each word on the image of the preset area;

and determining the content to be queried according to the first query requirement, the position of the finger of the user on the image of the preset area and the coordinates of each word.

5. The method of claim 4, wherein the first query requirement comprises at least one of a word, a term, and a paragraph; the position of the finger of the user on the image of the preset area is the coordinate of the fingertip of the finger on the image of the preset area;

determining the content to be queried according to the first query requirement, the position of the finger of the user on the image of the preset area and the coordinates of each word, wherein the determining comprises the following steps:

when the first query requirement comprises a word, determining a word to be queried according to the coordinates of the fingertip of the finger on the image of the preset area and the coordinates of each word;

when the first query requirement comprises a word, determining the word to be queried according to the coordinates of the fingertip of the finger on the image of the preset area and the coordinates of each character;

and when the first query requirement comprises a paragraph, determining the paragraph to be queried according to the coordinates of the fingertip of the finger on the image of the preset area and the coordinates of each word.

6. The method according to claim 5, wherein the distance between the fingertip of the finger and the word to be queried is less than or equal to a first preset distance;

the distance between the fingertip of the finger and any character in the word to be inquired is smaller than or equal to a second preset distance;

and coordinates of the fingertip of the finger on the image of the preset area are positioned in the area corresponding to the paragraph to be inquired.

7. The method according to any one of claims 3-6, wherein the voice query instruction includes a second query requirement of the user; determining a query result according to the content to be queried and displaying the query result, wherein the query result comprises the following steps:

and determining the query result according to the second query requirement and the content to be queried, and displaying the query result, wherein the second query requirement comprises one or more of pronunciation, paraphrase, stroke structures, related words and example sentences.

8. The method according to any one of claims 1-6, wherein before obtaining the image of the preset area according to the voice query instruction, the method further comprises:

and determining that the voice query instruction comprises a preset starting instruction.

9. An inquiry apparatus, comprising:

the first acquisition module is used for acquiring a voice query instruction;

the second acquisition module is used for acquiring an image of a preset area according to the voice query instruction;

the processing module is used for determining the content to be inquired according to the image of the preset area;

and the query module is used for determining a query result according to the content to be queried and displaying the query result.

10. A terminal device, comprising a mirror, a camera, a display, a microphone array and a speaker, wherein:

the reflector is used for reflecting light rays in a preset area to the camera;

the terminal device is configured to:

controlling the microphone array to acquire a voice query instruction;

controlling the camera to acquire an image of a preset area according to the voice query instruction;

determining the content to be inquired according to the image of the preset area;

and determining a query result according to the content to be queried, and controlling the display and/or the loudspeaker to display the query result.

11. The apparatus according to claim 10, wherein the image of the preset area includes a finger of a user; the terminal device is specifically configured to:

performing gesture recognition processing on the image of the preset area to acquire the position of the finger of the user on the image of the preset area;

and determining the content to be inquired according to the position of the finger of the user on the image of the preset area.

12. The apparatus of claim 11, wherein the voice query instruction includes a first query requirement of the user; the terminal device is specifically configured to:

and determining the content to be queried according to the first query requirement and the position of the finger of the user on the image of the preset area.

13. The device according to claim 12, wherein the terminal device is specifically configured to:

determining coordinates of each word on the image of the preset area;

and determining the content to be queried according to the first query requirement, the position of the finger of the user on the image of the preset area and the coordinates of each word.

14. The apparatus of claim 13, wherein the first query requirement comprises at least one of a word, a term, and a paragraph; the position of the finger of the user on the image of the preset area is the coordinate of the fingertip of the finger on the image of the preset area;

the terminal device is specifically configured to:

when the first query requirement comprises a word, determining a word to be queried according to the coordinates of the fingertip of the finger on the image of the preset area and the coordinates of each word;

when the first query requirement comprises a word, determining the word to be queried according to the coordinates of the fingertip of the finger on the image of the preset area and the coordinates of each character;

and when the first query requirement comprises a paragraph, determining the paragraph to be queried according to the coordinates of the fingertip of the finger on the image of the preset area and the coordinates of each word.

15. The device according to claim 14, wherein the distance between the fingertip of the finger and the word to be queried is smaller than or equal to a first preset distance;

the distance between the fingertip of the finger and any character in the word to be inquired is smaller than or equal to a second preset distance;

and coordinates of the fingertip of the finger on the image of the preset area are positioned in the area corresponding to the paragraph to be inquired.

16. The apparatus according to any one of claims 12-15, wherein the voice query instruction includes a second query requirement of the user; the terminal device is specifically configured to:

and determining the query result according to the second query requirement and the content to be queried, and displaying the query result, wherein the second query requirement comprises one or more of pronunciation, paraphrase, stroke structures, related words and example sentences.

17. The device according to any one of claims 10 to 15, wherein before the obtaining of the image of the preset area according to the voice query instruction, the terminal device is further configured to:

and determining that the voice query instruction comprises a preset starting instruction.

18. An electronic device, comprising:

at least one processor; and

a memory communicatively coupled to the at least one processor; wherein the content of the first and second substances,

the memory stores instructions executable by the at least one processor to enable the at least one processor to perform the method of any one of claims 1-8.

19. A non-transitory computer readable storage medium having stored thereon computer instructions for causing the computer to perform the method of any one of claims 1-8.

Technical Field

The embodiment of the application relates to the field of intelligent search in the field of data processing, in particular to a query method, a query device, terminal equipment and a storage medium.

Background

With the successive development of various educational products, the demand for inquiring words is sharply increased.

At present, word query is mainly performed through a word query APP on a mobile phone, a tablet and other terminal devices. When a word on a book needs to be queried, a user needs to open a word searching APP on the terminal device, take a picture of the book, and select the word to be queried on the shot picture to realize the query.

The existing word searching mode is too complicated to operate.

Disclosure of Invention

Provided are an inquiry method, an inquiry device, terminal equipment and a storage medium.

According to a first aspect, there is provided a query method comprising:

acquiring a voice query instruction;

acquiring an image of a preset area according to the voice query instruction;

determining the content to be inquired according to the image of the preset area;

and determining a query result according to the content to be queried, and displaying the query result.

According to a second aspect, there is provided a query device comprising:

the first acquisition module is used for acquiring a voice query instruction;

the second acquisition module is used for acquiring an image of a preset area according to the voice query instruction;

the processing module is used for determining the content to be inquired according to the image of the preset area;

and the query module is used for determining a query result according to the content to be queried and displaying the query result.

According to a third aspect, there is provided a terminal device comprising a mirror, a camera, a display, a microphone array and a speaker, wherein:

the reflector is used for reflecting light rays in a preset area to the camera;

the terminal device is configured to:

controlling the microphone array to acquire a voice query instruction;

controlling the camera to acquire an image of a preset area according to the voice query instruction;

determining the content to be inquired according to the image of the preset area;

and determining a query result according to the content to be queried, and controlling the display and/or the loudspeaker to display the query result.

According to a fourth aspect, there is provided an electronic device comprising:

at least one processor; and

a memory communicatively coupled to the at least one processor; wherein the content of the first and second substances,

the memory stores instructions executable by the at least one processor to enable the at least one processor to perform the method of any one of the first aspects.

According to a fifth aspect, there is provided a non-transitory computer readable storage medium having stored thereon computer instructions for causing the computer to perform the method of any of the first aspects.

According to the query method, the query device, the terminal device and the storage medium, the voice query instruction is firstly obtained, the image of the preset area is then obtained according to the voice query instruction, the content to be queried is determined according to the image of the preset area, and finally the query result is determined according to the content to be queried and displayed. According to the scheme provided by the embodiment of the application, the operation of photographing, selecting frames and the like on the book is not needed, the query operation can be started by directly sending the voice query instruction by the user, the query result is finally obtained and displayed, and the operation is simple and convenient.

It should be understood that the statements in this section do not necessarily identify key or critical features of the embodiments of the present disclosure, nor do they limit the scope of the present disclosure. Other features of the present disclosure will become apparent from the following description.

Drawings

The drawings are included to provide a better understanding of the present solution and are not intended to limit the present application. Wherein:

fig. 1 is a schematic view of an application scenario a according to an embodiment of the present application;

fig. 2 is a schematic diagram of an application scenario provided in the embodiment of the present application;

fig. 3 is a schematic flowchart of a query method according to an embodiment of the present application;

FIG. 4 is a schematic diagram of user gestures provided in an embodiment of the present application;

fig. 5 is a schematic flowchart of determining content to be queried according to an embodiment of the present application;

fig. 6 is a first schematic diagram for determining content to be queried according to an embodiment of the present application;

fig. 7 is a schematic diagram two of determining content to be queried according to an embodiment of the present application;

fig. 8 is a third schematic diagram for determining content to be queried according to an embodiment of the present application;

fig. 9 is a schematic flowchart of obtaining a query result according to an embodiment of the present application;

FIG. 10 is a schematic diagram illustrating query content provided by an embodiment of the present application;

fig. 11 is a schematic structural diagram of an inquiry apparatus according to an embodiment of the present application;

fig. 12 is a schematic structural diagram of a terminal device according to an embodiment of the present application;

fig. 13 is a block diagram of an electronic device for implementing the query method of the embodiment of the present application.

Detailed Description

The following description of the exemplary embodiments of the present application, taken in conjunction with the accompanying drawings, includes various details of the embodiments of the application for the understanding of the same, which are to be considered exemplary only. Accordingly, those of ordinary skill in the art will recognize that various changes and modifications of the embodiments described herein can be made without departing from the scope and spirit of the present application. Also, descriptions of well-known functions and constructions are omitted in the following description for clarity and conciseness.

An application scenario to which the present application is applicable is first described with reference to fig. 1.

Fig. 1 is a schematic view of an application scenario provided in the embodiment of the present application, as shown in fig. 1, illustrating a current word query manner. In fig. 1, a desk 11 is included, a book 12 is placed on the desk 11, and the book 12 includes contents to be inquired. At present, the word query mode is that the user opens the word query APP in the mobile phone 13 to take a picture of the book 12.

In fig. 1, an area 14 is a shooting range of the mobile phone 13, and the book 12 is located in the area 14. After the book 12 is photographed to obtain an image, the user selects a word or a paragraph to be searched on the mobile phone 13 for searching.

In the word query mode illustrated in fig. 1, the steps to be executed include opening a mobile phone, opening a word query APP, shooting an image, and selecting a content by frame, and a query result can be obtained finally. The query procedure is very cumbersome. Moreover, the operation of selecting the content is also inconvenient, and if the user wants to query a word and a paragraph, the method illustrated in fig. 1 needs to be performed several times according to the above steps, so that the process is complicated.

In order to solve the problem, the embodiment of the present application provides an inquiry method that is simple and convenient to operate. This is described below in conjunction with fig. 2.

Fig. 2 is a schematic view of an application scenario ii provided in the embodiment of the present application, as shown in fig. 2, the application scenario includes a desk 21, a smart sound box 22 is placed on the desk 21, and the smart sound box 22 includes a reflector 23, a camera 24, a display 25, a microphone array 26, and a speaker 27.

The reflector 23 is angularly adjusted to reflect light in a predetermined area to the camera 24. The camera 24 is mounted on a display 25, and the results of the query can be displayed on the display 25. For a more comfortable visual experience for the user, the display 25 is not in an upright position, but has a certain elevation angle slightly upwards. In this case, the camera 24 mounted on the display 25 takes a picture facing upward, and does not take an area on the desk 21. Therefore, the reflecting mirror 23 is provided so that the camera 24 can photograph an area on the desk 21 at this time. If the camera 24 is disposed at a position where the desk 21 can be directly photographed, the reflection mirror 23 may not be disposed on the smart speaker.

The microphone array 26 is used to obtain the user's voice query instruction, and the user can start the query by directly making a sound. Then, the smart speaker 22 controls the camera 24 to shoot the image of the preset area, and processes the image according to the voice query instruction and the image of the preset area to obtain a query result.

The execution main part in the embodiment of the application is the intelligent sound box, and the intelligent sound box can process the voice query instruction through the cloud server after networking and can also process the voice query instruction by means of the local server.

Finding a specific word or word which the user wants to search by comparing the position of the finger of the user with the range and the position of each word; then, recognizing the text of the word or the phrase through Optical Character Recognition (OCR); then, the word dictionary of the server is inquired to find the corresponding results of pronunciation, writing method, paraphrasing and the like.

For example, the microphone array on the intelligent sound box is used for accurately recognizing the words spoken by the user; judging whether a user speaks to the intelligent sound box or not through a wakeup word recognition technology; through a voice recognition technology, accurate text recognition and reduction are carried out on the words spoken by the user; and performing accurate semantic understanding on the recognized text through a natural language understanding technology, and the like.

The query result may be displayed on the display 25 or may be broadcasted by voice through the speaker 27.

In the example of fig. 2, the user can start the query by simply placing the book in the preset area and then sending out the voice, and the operation flow is very simple.

It is to be understood that fig. 2 is only one applicable application scenario and does not constitute a limitation to the specific application scenario. In the following examples, the scheme of the present application will be explained in detail.

Fig. 3 is a schematic flowchart of a query method provided in an embodiment of the present application, and as shown in fig. 3, the method may include:

and S31, acquiring a voice query instruction.

The scheme provided by the embodiment of the application can be applied to terminal equipment such as a sound box shown in figure 2. The voice query instruction is an instruction actively sent by a user. When a user has a query requirement, the query can be started directly by sending out voice.

And S32, acquiring the image of the preset area according to the voice query instruction.

On the smart speaker of the example of fig. 2, a microphone array is provided. After a user sends a voice query instruction, the microphone array can acquire the voice query instruction sent by the user, and then the voice query instruction is correspondingly processed by the intelligent sound box.

In the embodiment of the application, the preset area can be an area for placing articles such as books and picture books. After the voice query instruction is received, the intelligent sound box device can acquire the image of the preset area, and the content of the book placed in the preset area can be acquired by the intelligent sound box device.

And S33, determining the content to be inquired according to the image of the preset area.

And after the image of the preset area is obtained, determining the content to be inquired according to the image of the preset area. For example, after a book is placed in the preset area, the content of the book placed in the preset area can be acquired through the camera. The user can appoint some positions on the books through the finger, and the intelligent sound box can confirm the position pointed by the finger of the user through carrying out gesture recognition on the image of the preset area, and regard the content near the position pointed by the finger of the user as the content to be inquired. If the user does not specify any position, all or part of the content on the image of the preset area can be used as the content to be inquired, and the like.

S34, determining a query result according to the content to be queried, and displaying the query result.

After the content to be queried is determined, a query result can be determined according to the content to be queried.

For example, for a word, its pronunciation, paraphrase, similar words, sentence making, corresponding english translation, etc. can be queried. The user may or may not be instructed to query certain aspects of the content to be queried by instructions. If the user specifies some aspects of the content to be queried, such as query pronunciation, the smart sound box may highlight the aspects that the user wants to query. If the user does not specify which aspects of the content to be queried are queried, the related content of all aspects of the content to be queried can be displayed and used as a query result for the user to look up. After determining the query results, the query results may then be presented to the user.

The presentation means may include a voice presentation, a picture presentation, and the like. If voice display is adopted, for example, pronunciation is inquired, a loudspeaker on the intelligent sound box device can be controlled to read out pronunciation. If the picture display is adopted, for example, similar words and sentences are inquired, the display on the intelligent sound box equipment can be controlled to display the inquired contents.

According to the query method provided by the embodiment of the application, the voice query instruction is firstly obtained, the image of the preset area is then obtained according to the voice query instruction, the content to be queried is determined according to the image of the preset area, and finally the query result is determined according to the content to be queried and displayed. According to the scheme provided by the embodiment of the application, the operation of photographing, selecting frames and the like on the book is not needed, the query operation can be started by directly sending the voice query instruction by the user, the query result is finally obtained and displayed, and the operation is simple and convenient.

The embodiments of the present application will be described in detail with reference to specific examples.

The scheme in the embodiment of the present application is applicable to terminal devices such as the smart sound box shown in fig. 2, and the following will describe the smart sound box device in fig. 2 as an example.

In fig. 2, before querying, a user first places a book or other object to be queried on a desktop in front of the smart speaker, and places a mirror on top of the smart speaker.

The book has contents which the user wants to inquire, the user can open one page to be inquired, the book is positioned in the preset area, and the preset area is the area where the reflector can reflect light rays to the camera. The light of the object located in the preset area is reflected to the camera by the reflector, and if the camera is started, the camera indirectly shoots the image of the preset area through the shooting reflector.

When a user places a book in a preset area, a voice query instruction can be initiated at any time for querying.

After a user initiates a voice query instruction, before the intelligent sound box starts to query, the intelligent sound box acquires the voice query instruction through the microphone array, and whether the user speaks to the intelligent sound box needs to be judged.

The judgment mode is that whether the voice inquiry command comprises a preset starting command or not is judged, if yes, it is determined that the user speaks to the intelligent sound box, and the subsequent inquiry process can be started normally. If the preset starting instruction is not included, it is determined that the user is not speaking to the smart sound box, and the smart sound box may not respond to the user's voice.

The preset starting instruction may be some preset words or words, for example. After the microphone array obtains the voice query instruction, the intelligent sound box judges whether the voice query instruction comprises the preset words or words. The preset starting instruction can be set according to an actual intelligent sound box product, for example, the preset starting instruction can be fixed words such as 'small degree', 'sound box', and the like.

Taking the 'small degree' as a preset starting instruction as an example, when a user needs to inquire, the voice inquiry instruction needs to send the 'small degree' first, and the intelligent sound box can recognize that the user speaks to the voice inquiry instruction, so that the user can inquire according to the voice inquiry instruction. For example, the voice query command may be "degree of smallness, how the word recites", "degree of smallness, what the word means", and the like, and in the voice query command, "degree of smallness" is a preset starting command. Through setting up and predetermineeing the start instruction, the user can start intelligent audio amplifier in needs, can choose not to start when not needing, avoids every sentence words intelligent audio amplifier of saying all to respond, has improved flexibility and convenience.

After the intelligent sound box confirms that the user sends an instruction to the intelligent sound box, the intelligent sound box can acquire an image of a preset area. When a user sends a voice instruction, the user can point to the content to be inquired by using a finger, the intelligent sound box obtains an image of a preset area through the camera, and the image of the preset area comprises the finger of the user.

And then, the intelligent sound box performs gesture recognition processing on the image of the preset area to acquire the position of the finger of the user on the image of the preset area, so that the content to be inquired is determined according to the position of the finger of the user on the image of the preset area. The content to be inquired can be determined according to the needs of the user, and the method is more targeted.

Fig. 4 is a schematic diagram of a user gesture provided by the embodiment of the present application, as shown in fig. 4, a paragraph in a certain page of a book.

When the user needs to inquire which part of the characters is, the user only needs to point at the part of the characters by using a finger. For example, in fig. 4, if the user wants to query "sky", the user may point to "sky" in the book paragraph with a finger. After the camera acquires the image of the preset area, the camera can perform gesture recognition processing on the image to acquire the position of the finger of the user on the image of the preset area. Alternatively, the position of the finger tip on the image of the preset area needs to be acquired. There are various ways to recognize the finger tip on the image for gesture recognition, such as recognition based on a model of the finger, recognition by training a neural network model, etc., and will not be described herein.

Fig. 5 is a schematic flowchart of a process of determining content to be queried according to an embodiment of the present application, as shown in fig. 5, including:

s51, acquiring the first query requirement of the user in the voice query instruction.

In an embodiment of the present application, the first query requirement includes at least one of a word, a phrase, and a paragraph. That is, it is first determined whether the user wants to query a word, or a paragraph according to the voice query instruction.

S52, determining the content to be queried according to the first query requirement and the position of the finger of the user on the image of the preset area.

According to the difference of the first query requirements, the determined contents to be queried are correspondingly different. In the embodiment of the application, the intelligent sound box can acquire the coordinates of each word on the image of the preset area, and then determines the content to be queried according to the first query requirement, the position of the finger of the user on the image of the preset area and the coordinates of each word, wherein the position of the finger of the user on the image of the preset area is the coordinates of the fingertip of the finger on the image of the preset area. Different query results can be displayed according to different first query requirements, and the method is more targeted.

Fig. 6 is a first schematic diagram for determining content to be queried according to an embodiment of the present application, as shown in fig. 6, a dashed box is used to illustrate a position of each word in fig. 6 on an image of a preset region.

The image also comprises fingers of the user, and after gesture recognition processing is carried out on the image of the preset area, the positions of the fingers of the user on the image of the preset area, namely the coordinates of the fingertips of the user on the image of the preset area, can be obtained.

If the first query requirement of the user is acquired as the character according to the voice query instruction, the content to be queried can be determined according to the position of the finger of the user on the image of the preset area.

For example, in fig. 6, after the coordinates of each word and the coordinates of the fingertip of the user on the image of the preset region are obtained, the word to be queried may be determined according to the coordinates of each word and the coordinates of the fingertip of the user on the image of the preset region.

The distance between the word to be inquired and the fingertip of the finger is smaller than or equal to a first preset distance.

Alternatively, the distance between the fingertip of the user and each word in the image may be acquired, and then the word with the smallest distance from the fingertip of the user is taken as the word to be queried.

For example, in fig. 6, the position where the fingertip of the user is located in the frame where the "blank" word is located, and the distance between the fingertip of the user and the "blank" word is the smallest, at this time, the "blank" word is determined as the word to be queried.

Fig. 7 is a second schematic diagram for determining content to be queried according to an embodiment of the present application, as shown in fig. 7, a position of each word in fig. 7 on an image of a preset area is illustrated by using a dashed box.

The image also comprises fingers of the user, after gesture recognition processing is carried out on the image of the preset area, the positions of the fingers of the user on the image of the preset area can be obtained, and further, coordinates of fingertips of the user on the image of the preset area can be obtained.

If the first query requirement of the user is acquired as a word according to the voice query instruction, determining the content to be queried by taking the word as a unit. For example, in fig. 7, words may include "on a tree," "sky," "then," "a group," "wild goose," and so forth. The content to be queried may be determined according to a position of a finger of the user on the image of the preset area.

For example, in fig. 7, after the coordinates of each word and the coordinates of the fingertip of the user on the image of the preset region are obtained, the word to be queried may be determined according to the coordinates of each word and the coordinates of the fingertip of the user on the image of the preset region.

And the distance between any character in the words to be inquired and the fingertip of the finger is less than or equal to a second preset distance. The second preset distance may be greater than the first preset distance, equal to the first preset distance, or smaller than the first preset distance, which is not limited in the embodiment of the present application.

Alternatively, the distance between the fingertip of the user and each word in the image may be acquired, and then the word with the minimum comprehensive distance from the fingertip of the user is used as the word to be queried.

For example, in fig. 7, the position where the fingertip of the user is located in the frame where the "null" character is located, the distance between the fingertip of the user and the "null" character is the smallest, and the distance between the fingertip of the user and the "day" character is the smallest, and at this time, the "sky" character is determined to be the word to be queried.

Fig. 8 is a third schematic diagram for determining a content to be queried according to an embodiment of the present application, and as shown in fig. 8, each paragraph in fig. 8 and a position of each paragraph on an image of a preset region are illustrated by using a dashed box.

The image also comprises fingers of the user, after gesture recognition processing is carried out on the image of the preset area, the positions of the fingers of the user on the image of the preset area can be obtained, and further, coordinates of fingertips of the user on the image of the preset area can be obtained.

If the first query requirement of the user is obtained as a paragraph according to the voice query instruction, determining the content to be queried by taking the paragraph as a unit. For example, in fig. 8, 3 paragraphs are included. The paragraph division can acquire the boundary and corresponding coordinates of each paragraph according to the paragraph region identification to identify different paragraphs. The content to be queried may be determined according to a position of a finger of the user on the image of the preset area.

For example, in fig. 8, after the areas corresponding to the different paragraphs and the coordinates of the fingertip of the user on the image of the preset area are obtained, the paragraph to be queried may be determined according to the areas corresponding to the different paragraphs and the coordinates of the fingertip of the user on the image of the preset area. And coordinates of the fingertip of the user finger on the image of the preset area are in the area corresponding to the paragraph to be inquired.

For example, in fig. 8, the position of the fingertip of the user is located near the "empty" word, and is located in the area where the second paragraph is located (the area corresponding to the solid line box in fig. 8), and the second paragraph is determined as the paragraph to be queried.

The following describes a scheme for obtaining the query result with reference to fig. 9.

Fig. 9 is a schematic flowchart of a process of obtaining a query result according to an embodiment of the present application, as shown in fig. 9, including:

s91, acquiring a second query requirement of the user in the voice query instruction, wherein the second query requirement comprises one or more of pronunciation, paraphrase, stroke structure, related words and example sentences.

The first query requirement refers to whether a word, a word or a paragraph is desired to be queried by the user, and the second query requirement refers to which aspect of the content to be queried the user desires to be queried, and includes pronunciation, paraphrase, stroke structures, related words, illustrative sentences and the like, wherein the stroke structures may include stroke orders, word structures, radical, strokes and the like, for example, and the related words may include similar words, antisense words and the like, for example.

Both the first query requirement and the second query requirement may be obtained by a voice query instruction.

The following describes a first query requirement and a second query requirement in the voice query with reference to a table.

In table one below, "degree of smallness" is a preset start instruction. Through presetting a starting instruction, the device can know that the user initiates the instruction to the device. The part of the voice query instruction in the first table does not include the first query requirement and the second query requirement, for example, "degree of smallness, what is this", at this time, it is unclear whether the user needs to query a word, or a paragraph, and it is unclear whether the user needs to display a pronunciation, a paraphrase, or an example sentence, and therefore, all possibly related contents can be displayed. The partial speech query instruction includes a first query requirement but not a second query requirement, such as "degree of smallness, what the word is", indicating that the user intended to query the word, but does not indicate what aspects of the word are intended to query. Various aspects of the word may be presented at this time, such as presenting a pronunciation of the word, a schematic, a similar word, an illustrative sentence, and so on. The partial voice query instruction comprises a first query requirement and a second query requirement, for example, "degree of smallness, what the word is" identifies what the user wants to query is the paraphrase of the word, at this time, the paraphrase of the word can be emphatically displayed, and other information including pronunciation of the word, example sentences and the like can also be displayed in an auxiliary mode.

Watch 1

The first table is only an illustration of the voice query command, and does not constitute a limitation of the specific voice query command.

S92, determining the query result according to the second query requirement and the content to be queried, and displaying the query result.

As the voice query instruction illustrated in table one, the result to be queried may be determined and displayed according to the second query requirement and the content to be queried. When the query result can be determined according to the first query requirement and/or the second query requirement, only the corresponding query result can be displayed, and other related results can also be displayed. If the query result cannot be determined according to the first query requirement and/or the second query requirement, all related contents can be roughly displayed to the user.

Fig. 10 is a schematic view illustrating query contents provided in an embodiment of the present application, and as shown in fig. 10, the content to be queried obtained through the voice query instruction is a word "love" to be queried, so as to exemplify a relevant query result of the word "love", including pronunciation, paraphrase, example sentence, and the like.

According to the query method provided by the embodiment of the application, the voice query instruction is firstly obtained, the image of the preset area is then obtained according to the voice query instruction, the content to be queried is determined according to the image of the preset area, and finally the query result is determined according to the content to be queried and displayed. According to the scheme provided by the embodiment of the application, the operation of photographing, selecting frames and the like on the book is not needed, the query operation can be started by directly sending the voice query instruction by the user, the query result is finally obtained and displayed, and the operation is simple and convenient. Meanwhile, the content to be queried and the query result are determined according to the first query requirement and the second query requirement in the user voice query instruction, so that the method is more targeted and has wider applicability.

Fig. 11 is a schematic structural diagram of an inquiry apparatus provided in an embodiment of the present application, and as shown in fig. 11, the inquiry apparatus includes a first obtaining module 111, a second obtaining module 112, a processing module 113, and an inquiry module 114, where:

the first obtaining module 111 is configured to obtain a voice query instruction;

the second obtaining module 112 is configured to obtain an image of a preset area according to the voice query instruction;

the processing module 113 is configured to determine content to be queried according to the image of the preset area;

the query module 114 is configured to determine a query result according to the content to be queried and display the query result.

In a possible implementation, the image of the preset area includes a finger of a user; the processing module 113 is specifically configured to:

performing gesture recognition processing on the image of the preset area to acquire the position of the finger of the user on the image of the preset area;

and determining the content to be inquired according to the position of the finger of the user on the image of the preset area.

In a possible implementation manner, the voice query instruction includes a first query requirement of the user; the processing module 113 is specifically configured to:

and determining the content to be queried according to the first query requirement and the position of the finger of the user on the image of the preset area.

In a possible implementation manner, the processing module 113 is specifically configured to:

determining coordinates of each word on the image of the preset area;

and determining the content to be queried according to the first query requirement, the position of the finger of the user on the image of the preset area and the coordinates of each word.

In one possible implementation, the first query requirement includes at least one of a word, a term, and a paragraph; the position of the finger of the user on the image of the preset area is the coordinate of the fingertip of the finger on the image of the preset area; the processing module 113 is specifically configured to:

when the first query requirement comprises a word, determining a word to be queried according to the coordinates of the fingertip of the finger on the image of the preset area and the coordinates of each word;

when the first query requirement comprises a word, determining the word to be queried according to the coordinates of the fingertip of the finger on the image of the preset area and the coordinates of each character;

and when the first query requirement comprises a paragraph, determining the paragraph to be queried according to the coordinates of the fingertip of the finger on the image of the preset area and the coordinates of each word.

In a possible implementation mode, the distance between the fingertip of the finger and the word to be inquired is smaller than or equal to a first preset distance;

the distance between the fingertip of the finger and any character in the word to be inquired is smaller than or equal to a second preset distance;

and coordinates of the fingertip of the finger on the image of the preset area are positioned in the area corresponding to the paragraph to be inquired.

In a possible implementation manner, the voice query instruction includes a second query requirement of the user; the query module 114 is specifically configured to:

and determining the query result according to the second query requirement and the content to be queried, and displaying the query result, wherein the second query requirement comprises one or more of pronunciation, paraphrase, stroke structures, related words and example sentences.

In a possible implementation manner, before the obtaining of the image of the preset area according to the voice query instruction, the second processing module 112 is further configured to:

and determining that the voice query instruction comprises a preset starting instruction.

The apparatus provided in the embodiment of the present application may be configured to implement the technical solution of the method embodiment, and the implementation principle and the technical effect are similar, which are not described herein again.

Fig. 12 is a schematic structural diagram of a terminal device according to an embodiment of the present application, and as shown in fig. 12, the terminal device includes a reflector 121, a camera 122, a display 123, a microphone array 124, and a speaker 125, where:

the reflector 121 is configured to reflect light rays in a preset area to the camera 122;

the terminal device is configured to:

controlling the microphone array 124 to obtain voice query instructions;

controlling the camera to acquire an image of a preset area according to the voice query instruction;

determining the content to be inquired according to the image of the preset area;

determining a query result according to the content to be queried, and controlling the display 123 and/or the speaker 125 to display the query result.

In a possible implementation, the image of the preset area includes a finger of a user; the terminal device is specifically configured to:

performing gesture recognition processing on the image of the preset area to acquire the position of the finger of the user on the image of the preset area;

and determining the content to be inquired according to the position of the finger of the user on the image of the preset area.

In a possible implementation manner, the voice query instruction includes a first query requirement of the user; the terminal device is specifically configured to:

and determining the content to be queried according to the first query requirement and the position of the finger of the user on the image of the preset area.

In a possible implementation manner, the terminal device is specifically configured to:

determining coordinates of each word on the image of the preset area;

and determining the content to be queried according to the first query requirement, the position of the finger of the user on the image of the preset area and the coordinates of each word.

In one possible implementation, the first query requirement includes at least one of a word, a term, and a paragraph; the position of the finger of the user on the image of the preset area is the coordinate of the fingertip of the finger on the image of the preset area;

the terminal device is specifically configured to:

when the first query requirement comprises a word, determining a word to be queried according to the coordinates of the fingertip of the finger on the image of the preset area and the coordinates of each word;

when the first query requirement comprises a word, determining the word to be queried according to the coordinates of the fingertip of the finger on the image of the preset area and the coordinates of each character;

and when the first query requirement comprises a paragraph, determining the paragraph to be queried according to the coordinates of the fingertip of the finger on the image of the preset area and the coordinates of each word.

In a possible implementation mode, the distance between the fingertip of the finger and the word to be inquired is smaller than or equal to a first preset distance;

the distance between the fingertip of the finger and any character in the word to be inquired is smaller than or equal to a second preset distance;

and coordinates of the fingertip of the finger on the image of the preset area are positioned in the area corresponding to the paragraph to be inquired.

In a possible implementation manner, the voice query instruction includes a second query requirement of the user; the terminal device is specifically configured to:

and determining the query result according to the second query requirement and the content to be queried, and displaying the query result, wherein the second query requirement comprises one or more of pronunciation, paraphrase, stroke structures, related words and example sentences.

In a possible implementation manner, before the acquiring the image of the preset area according to the voice query instruction, the terminal device is further configured to:

and determining that the voice query instruction comprises a preset starting instruction.

According to an embodiment of the present application, an electronic device and a readable storage medium are also provided.

Fig. 13 is a block diagram of an electronic device according to the query method in the embodiment of the present application. Electronic devices are intended to represent various forms of digital computers, such as laptops, desktops, workstations, personal digital assistants, servers, blade servers, mainframes, and other appropriate computers. The electronic device may also represent various forms of mobile devices, such as personal digital processing, cellular phones, smart phones, wearable devices, and other similar computing devices. The components shown herein, their connections and relationships, and their functions, are meant to be examples only, and are not meant to limit implementations of the present application that are described and/or claimed herein.

As shown in fig. 13, the electronic apparatus includes: one or more processors 131, memory 132, and interfaces for connecting the various components, including high-speed interfaces and low-speed interfaces. The various components are interconnected using different buses and may be mounted on a common motherboard or in other manners as desired. The processor may process instructions for execution within the electronic device, including instructions stored in or on the memory to display graphical information of a GUI on an external input/output apparatus (such as a display device coupled to the interface). In other embodiments, multiple processors and/or multiple buses may be used, along with multiple memories and multiple memories, as desired. Also, multiple electronic devices may be connected, with each device providing portions of the necessary operations (e.g., as a server array, a group of blade servers, or a multi-processor system). Fig. 13 illustrates an example of one processor 131.

The memory 132 is a non-transitory computer readable storage medium provided herein. The memory stores instructions executable by at least one processor to cause the at least one processor to perform the query method provided herein. The non-transitory computer-readable storage medium of the present application stores computer instructions for causing a computer to perform the query method provided herein.

The memory 132, as a non-transitory computer readable storage medium, may be used for storing non-transitory software programs, non-transitory computer executable programs, and modules, such as program instructions/modules corresponding to the query method in the embodiment of the present application (for example, the first obtaining module 111, the second obtaining module 112, the processing module 113, and the query module 114 shown in fig. 11). The processor 131 executes various functional applications of the server and data processing, i.e., implements the query method in the above-described method embodiments, by executing non-transitory software programs, instructions, and modules stored in the memory 132.

The memory 132 may include a storage program area and a storage data area, wherein the storage program area may store an operating system, an application program required for at least one function; the storage data area may store data created according to the use of the queried electronic device, and the like. Further, the memory 132 may include high speed random access memory, and may also include non-transitory memory, such as at least one magnetic disk storage device, flash memory device, or other non-transitory solid state storage device. In some embodiments, memory 132 optionally includes memory located remotely from processor 131, which may be connected to the querying electronic device via a network. Examples of such networks include, but are not limited to, the internet, intranets, local area networks, mobile communication networks, and combinations thereof.

The electronic device of the query method may further include: an input device 133 and an output device 134. The processor 131, memory 132, input device 133, and output device 134 may be connected by a bus 135 or otherwise, as exemplified by the connection by bus 135 in fig. 13.

The input device 133 may receive input numeric or character information and generate key signal inputs related to user settings and function controls of the queried electronic device, such as a touch screen, keypad, mouse, track pad, touch pad, pointer stick, one or more mouse buttons, track ball, joystick, or other input device. The output device 134 may include a display device, an auxiliary lighting device (e.g., an LED), a haptic feedback device (e.g., a vibration motor), and the like. The display device may include, but is not limited to, a Liquid Crystal Display (LCD), a Light Emitting Diode (LED) display, and a plasma display. In some implementations, the display device can be a touch screen.

Various implementations of the systems and techniques described here can be realized in digital electronic circuitry, integrated circuitry, application specific ASICs (application specific integrated circuits), computer hardware, firmware, software, and/or combinations thereof. These various embodiments may include: implemented in one or more computer programs that are executable and/or interpretable on a programmable system including at least one programmable processor, which may be special or general purpose, receiving data and instructions from, and transmitting data and instructions to, a storage system, at least one input device, and at least one output device.

These computer programs (also known as programs, software applications, or code) include machine instructions for a programmable processor, and may be implemented using high-level procedural and/or object-oriented programming languages, and/or assembly/machine languages. As used herein, the terms "machine-readable medium" and "computer-readable medium" refer to any computer program product, apparatus, and/or device (e.g., magnetic discs, optical disks, memory, Programmable Logic Devices (PLDs)) used to provide machine instructions and/or data to a programmable processor, including a machine-readable medium that receives machine instructions as a machine-readable signal. The term "machine-readable signal" refers to any signal used to provide machine instructions and/or data to a programmable processor.

To provide for interaction with a user, the systems and techniques described here can be implemented on a computer having: a display device (e.g., a CRT (cathode ray tube) or LCD (liquid crystal display) monitor) for displaying information to a user; and a keyboard and a pointing device (e.g., a mouse or a trackball) by which a user can provide input to the computer. Other kinds of devices may also be used to provide for interaction with a user; for example, feedback provided to the user can be any form of sensory feedback (e.g., visual feedback, auditory feedback, or tactile feedback); and input from the user may be received in any form, including acoustic, speech, or tactile input.

The systems and techniques described here can be implemented in a computing system that includes a back-end component (e.g., as a data server), or that includes a middleware component (e.g., an application server), or that includes a front-end component (e.g., a user computer having a graphical user interface or a web browser through which a user can interact with an implementation of the systems and techniques described here), or any combination of such back-end, middleware, or front-end components. The components of the system can be interconnected by any form or medium of digital data communication (e.g., a communication network). Examples of communication networks include: local Area Networks (LANs), Wide Area Networks (WANs), and the Internet.

The computer system may include clients and servers. A client and server are generally remote from each other and typically interact through a communication network. The relationship of client and server arises by virtue of computer programs running on the respective computers and having a client-server relationship to each other.

It should be understood that various forms of the flows shown above may be used, with steps reordered, added, or deleted. For example, the steps described in the present application may be executed in parallel, sequentially, or in different orders, and the present invention is not limited thereto as long as the desired results of the technical solutions disclosed in the present application can be achieved.

The above-described embodiments should not be construed as limiting the scope of the present application. It should be understood by those skilled in the art that various modifications, combinations, sub-combinations and substitutions may be made in accordance with design requirements and other factors. Any modification, equivalent replacement, and improvement made within the spirit and principle of the present application shall be included in the protection scope of the present application.

24页详细技术资料下载
上一篇:一种医用注射器针头装配设备
下一篇:智能音箱语音服务系统、方法、装置及设备

网友询问留言

已有0条留言

还没有人留言评论。精彩留言会获得点赞!

精彩留言,会给你点赞!