Voice interaction method, device, equipment and computer readable medium

文档序号：1952062 发布日期：2021-12-10 浏览：17次中文

阅读说明：本技术 语音交互的方法、装置、设备和计算机可读介质 (Voice interaction method, device, equipment and computer readable medium ) 是由孙宇鹏于 2020-10-14 设计创作，主要内容包括：本发明公开了语音交互的方法、装置、设备和计算机可读介质,涉及计算机技术领域。该方法的一具体实施方式包括：接收用户的语音信息；基于所述用户的语音信息获取提示语音、页面详情数据和页面跳转参数,所述提示语音是根据所述语音信息和/或用户的历史行为得到的,所述页面详情数据是根据所述语音信息中的关键词而得到的,所述页面跳转参数用于指示目标显示页面；播报所述提示语音,并依据所述页面跳转参数从当前显示页面跳转至所述目标显示页面,以显示所述页面详情数据。该实施方式能够按照用户语音播报提示语音,并显示相关商品的信息,输出较多信息,进而提高用户的购物体验。(The invention discloses a voice interaction method, a voice interaction device, voice interaction equipment and a computer readable medium, and relates to the technical field of computers. One embodiment of the method comprises: receiving voice information of a user; acquiring prompt voice, page detail data and page jump parameters based on the voice information of the user, wherein the prompt voice is acquired according to the voice information and/or historical behaviors of the user, the page detail data is acquired according to keywords in the voice information, and the page jump parameters are used for indicating a target display page; and broadcasting the prompt voice, and jumping from the current display page to the target display page according to the page jumping parameter so as to display the page detail data. According to the implementation mode, the prompt voice can be broadcast according to the voice of the user, the information of the related commodities is displayed, more information is output, and the shopping experience of the user is improved.)

1. A method of voice interaction, comprising:

receiving voice information of a user;

acquiring prompt voice, page detail data and page jump parameters based on the voice information of the user, wherein the prompt voice is acquired according to the voice information and/or historical behaviors of the user, the page detail data is acquired according to keywords in the voice information, and the page jump parameters are used for indicating a target display page;

and broadcasting the prompt voice, and jumping from the current display page to the target display page according to the page jumping parameter so as to display the page detail data.

2. The method of voice interaction according to claim 1, wherein the voice information comprises: at least one of searching for voice information, viewing detailed voice information, and settling voice information.

3. The method of voice interaction according to claim 1, wherein the broadcasting the prompt voice and jumping from a current display page to the target display page according to the page jump parameter to display the page detail data comprises:

and after broadcasting the prompt voice or before broadcasting the prompt voice, skipping from the current display page to the target display page according to the page skipping parameter so as to display the page detail data.

4. The method for voice interaction according to claim 1 or 2, wherein the skipping from the current display page to the target display page according to the page skipping parameter to display the page detail data comprises:

and determining the target display page according to the page jump parameter and a preset page routing table, and jumping from the current display page to the target display page to display the page detail data on the target display page, wherein the preset page routing table correspondingly stores the page jump parameter and the target display page.

5. The method of claim 1, wherein the obtaining of the prompt voice, the page detail data and the page jump parameter based on the voice information of the user comprises:

uploading voice information of a user to acquire the prompt voice, the page detail data and the page jump parameter;

or the like, or, alternatively,

and processing the voice information to obtain the prompt voice, extracting keywords of the voice information, generating an access request according to the keywords, and uploading the access request to obtain the page detail data and the page jump parameter.

6. The utility model provides a voice interaction's device, its characterized in that includes voice acquisition unit, voice broadcast unit, communication unit, treater, screen, wherein:

the voice acquisition unit is used for receiving voice information of a user;

the communication unit is used for acquiring prompt voice, page detail data and page jump parameters based on the voice information of the user, wherein the prompt voice is acquired according to the voice information and/or the historical behavior of the user, the page detail data is acquired according to keywords in the voice information, and the page jump parameters are used for indicating a target display page;

and the processor is used for indicating the voice broadcasting unit to broadcast the prompt voice and indicating the screen to jump from the current display page to the target display page according to the page jump parameter so as to display the page detail data.

7. The apparatus for voice interaction according to claim 6, wherein the apparatus for voice interaction is a mobile terminal or a smart speaker.

8. The voice interaction apparatus of claim 6, wherein the screen is remotely connected to the processor.

9. An electronic device for voice interaction, comprising:

one or more processors;

a storage device for storing one or more programs,

when executed by the one or more processors, cause the one or more processors to implement the method of any one of claims 1-5.

10. A computer-readable medium, on which a computer program is stored, which, when being executed by a processor, carries out the method according to any one of claims 1-5.

Technical Field

The present invention relates to the field of computer technologies, and in particular, to a method, an apparatus, a device, and a computer-readable medium for voice interaction.

Background

People's life is more and more intelligent now, can do many things through pronunciation, like pronunciation listening song, pronunciation car and the take out of business etc.. On the basis that the voice recognition technology is more and more perfect, the voice recognition technology can be applied to an electronic shopping website, so that the user can further release the hands of the user through voice shopping.

In the process of implementing the invention, the inventor finds that at least the following problems exist in the prior art: in the voice shopping process, only one voice can be broadcasted, and output information is less.

Disclosure of Invention

In view of this, embodiments of the present invention provide a voice interaction method, apparatus, device and computer readable medium, which can report a prompt voice according to a user voice, display information of a related commodity, and output more information, thereby improving a shopping experience of the user.

To achieve the above object, according to an aspect of an embodiment of the present invention, there is provided a method of voice interaction, including:

receiving voice information of a user;

and broadcasting the prompt voice, and jumping from the current display page to the target display page according to the page jumping parameter so as to display the page detail data.

The voice information includes: at least one of searching for voice information, viewing detailed voice information, and settling voice information.

The broadcasting the prompt voice and skipping from the current display page to the target display page according to the page skipping parameter so as to display the page detail data, comprising:

The skipping from the current display page to the target display page according to the page skipping parameter to display the page detail data comprises:

The obtaining of the prompt voice, the page detail data and the page jump parameter based on the voice information of the user comprises:

uploading voice information of a user to acquire the prompt voice, the page detail data and the page jump parameter;

or the like, or, alternatively,

According to a second aspect of the embodiments of the present invention, there is provided an apparatus for voice interaction, including:

including voice acquisition unit, voice broadcast unit, communication unit, treater, screen, wherein:

the voice acquisition unit is used for receiving voice information of a user;

The voice interaction device is a mobile terminal or an intelligent sound box.

The screen is remotely connected to the processor.

According to a third aspect of the embodiments of the present invention, there is provided an electronic device for voice interaction, including:

one or more processors;

a storage device for storing one or more programs,

when executed by the one or more processors, cause the one or more processors to implement the method as described above.

According to a fourth aspect of embodiments of the present invention, there is provided a computer readable medium, on which a computer program is stored, which when executed by a processor, implements the method as described above.

One embodiment of the above invention has the following advantages or benefits: receiving voice information of a user; acquiring prompt voice, page detail data and page jump parameters based on the voice information of the user, wherein the prompt voice is acquired according to the voice information and/or historical behaviors of the user, the page detail data is acquired according to keywords in the voice information, and the page jump parameters are used for indicating a target display page; and broadcasting the prompt voice, and jumping from the current display page to the target display page according to the page jumping parameter so as to display the page detail data. The prompt voice can be broadcasted according to the voice information of the user, the page detail information of the related commodities is displayed, the output information is increased, and the shopping experience of the user is improved.

Further effects of the above-mentioned non-conventional alternatives will be described below in connection with the embodiments.

Drawings

The drawings are included to provide a better understanding of the invention and are not to be construed as unduly limiting the invention. Wherein:

FIG. 1 is a schematic diagram of the main structure of a method of voice interaction according to an embodiment of the present invention;

FIG. 2 is a flow diagram illustrating waking up an APP according to an embodiment of the present invention;

FIG. 3 is a schematic flow chart of presenting prompt voice and displaying page detail data according to an embodiment of the present invention;

FIG. 4 is a flowchart illustrating waking up an APP according to a wake-up voice command according to an embodiment of the present invention;

FIG. 5 is a flow chart illustrating a search according to a search speech information according to an embodiment of the present invention;

FIG. 6 is a schematic flow chart of the voice message viewing details according to the viewing details in the embodiment of the present invention;

FIG. 7 is a flow diagram illustrating settlement according to a settlement voice message according to an embodiment of the present invention;

FIG. 8 is a schematic diagram of a main flow of a method of voice interaction according to an embodiment of the present invention;

FIG. 9 is an exemplary system architecture diagram in which embodiments of the present invention may be employed;

fig. 10 is a schematic block diagram of a computer system suitable for use in implementing a terminal device or server according to an embodiment of the present invention.

Detailed Description

Exemplary embodiments of the present invention are described below with reference to the accompanying drawings, in which various details of embodiments of the invention are included to assist understanding, and which are to be considered as merely exemplary. Accordingly, those of ordinary skill in the art will recognize that various changes and modifications of the embodiments described herein can be made without departing from the scope and spirit of the invention. Also, descriptions of well-known functions and constructions are omitted in the following description for clarity and conciseness.

Currently, shopping is available on some smart speakers through voice. As an example, the user says: "I want to buy the cell phone". The intelligent sound box broadcasts the commodity information of the first mobile phone in the mobile phone list to the user, and the commodity information is only some simple information, if: color, hardware parameters and functions. Therefore, in the voice shopping process, only one voice can be broadcasted, and less information is output.

In order to solve the technical problem that the broadcast voice output information is less, the following technical scheme in the embodiment of the invention can be adopted.

Referring to fig. 1, fig. 1 is a schematic diagram of a main flow of a voice interaction method according to an embodiment of the present invention, and based on voice information of a user, not only can prompt voice be broadcasted, but also page detail data can be displayed. As shown in fig. 1, the method specifically comprises the following steps:

in the embodiment of the invention, the voice information of the user can be received, and the prompt voice and the page detail data can be displayed.

As an example, the execution subject of each step in fig. 1 may be a mobile terminal. Mobile terminals include, but are not limited to: mobile phones, tablet computers, notebook computers, and the like.

As another example, the execution subject of each step in fig. 1 may be a smart speaker, and the smart speaker may not only receive and broadcast voice, but also display content. Wherein, the screen can also be arranged on the intelligent sound box. The smart speaker may also be connected to other devices having a display to display content on the display screen of the device. Such as: the intelligent sound box is connected with the mobile terminal, and content is displayed on a display screen of the mobile terminal.

As still another embodiment, the execution subject of the steps in fig. 1 may be an Application (APP) installed in the mobile terminal. The APP can broadcast prompt voice and display page detail data by calling a microphone, a loudspeaker and a display screen of the mobile terminal.

S101, receiving voice information of a user.

The user sends voice information by speaking. After receiving the voice information of the user, in order to be able to obtain the user intention of the voice information, the voice information of the user can be uploaded to a server, and the server is responsible for further processing the voice information. As an example, the server may be in a mobile terminal, and may also be in a cloud.

The following describes the technical solution in the embodiment of the present invention by taking APP as an example of the execution subject of each step.

The APP can be pre-installed in the mobile terminal. The APP may receive a voice wake-up. Namely, the APP is awakened through voice information so as to realize voice interaction.

Referring to fig. 2, fig. 2 is a schematic flow diagram of waking up an APP according to an embodiment of the present invention, which specifically includes:

s201, receiving voice information of a user, and determining that the voice information comprises a preset awakening word.

The user sends speech information to APP, and after APP receives user's speech information, rather than starting speech interaction immediately, need judge whether include in the speech information and predetermine awakening word, detect out awakening word in succession promptly in the word flow in real time.

The preset wake-up word is a word preset for starting voice interaction. As an example, the preset wake-up word may be: "hello" or "mood".

For example, whether the voice information includes a preset wake-up word may be determined by the voice recognition model. Wherein the speech recognition model may be a neural network model.

S202, uploading voice information after starting based on the preset awakening words.

After determining that the voice message includes the preset wake-up word, the APP may be started based on the preset wake-up word. Of course, in the case that it is determined that the voice message does not include the preset wake-up word, the APP does not need to be started.

The voice information comprises a preset awakening word, which indicates that the user needs voice interaction, and in order to clearly know the intention of the user, the APP can upload the voice information, namely the server further processes the voice information. It can be understood that the server may be located in the mobile terminal, and may also be located in the cloud.

In the embodiment of fig. 2, before the voice interaction with APP, the APP needs to be started with a wakeup word. And starting the APP under the condition that the voice information comprises a preset awakening word. Thus, the APP does not need to be in a voice interaction state in real time, but is in a dormant state to avoid consuming more resources.

In one embodiment of the invention, the voice information of the user comprises: at least one of searching for voice information, viewing detailed voice information, and settling voice information.

It can be understood that the technical scheme in the embodiment of the invention can be applied to voice search, detail voice information viewing and voice information settlement.

S102, obtaining prompt voice, page detail data and page jump parameters, wherein the prompt voice is obtained according to voice information and/or historical behaviors of a user, the page detail data is obtained according to keywords in the voice information, and the page jump parameters are used for indicating a target display page.

The server, after receiving the user's voice information, may process the voice information. It should be noted that, the specific scheme that the server obtains the prompt voice, the page detail data and the page jump parameter based on the voice information of the user belongs to the prior art.

Illustratively, the server may prompt for speech based on the speech information and/or historical behavior of the user. As an example, the voice message includes a mobile phone payment, and the historical behavior of the user includes a plurality of browsing records of the mobile phone, it can be known that the user needs to pay to purchase the mobile phone in the browsing records. Accordingly, the prompt speech may be: you confirm the purchase.

Then, the data processor can match according to the keywords in the voice information to obtain the page detail data. The page detail data includes data that the voice information relates to the contents of the article.

As one example, the page detail data includes text data and/or image data. It can be understood that, according to the keywords in the voice information, a database comprising a plurality of commodities is searched, and data of the commodities having a relatively large correlation with the keywords in the voice information is used as the page detail data. The page detail data not only includes the text description of the product with large relevance, but also includes the image information of the product with large relevance. Such as: the commodity use mode schematic diagram and the commodity use effect schematic diagram. Of course, the image information may be a video.

As an example, the voice information includes a search, the page detail data includes titles and images of a plurality of items, it may be determined that the user is browsing the plurality of items, and the prompting voice includes: you choose the goods.

As another example, the user information includes a purchase, the page detail data includes merchandise information for an article, and the prompt voice includes: you determine whether to purchase the display item.

In the embodiment of the invention, the page jump parameter is a parameter for jumping to a target display page. The page jump parameter may be obtained from the server. After the user intention is determined in the user information, the server determines a page jump parameter based on the user intention. Such as: the user intent is a search, and the page jump parameter comprises a search page parameter.

After the server completes the processing of the voice message, a feedback data packet of the voice message may be sent. The feedback data packet of the voice information comprises prompt voice, page detail data and page jump parameters.

The APP can receive and analyze the feedback data packet of the voice information to obtain prompt voice, page detail data and page jump parameters.

S103, broadcasting prompt voice, and jumping from the current display page to the target display page according to the page jumping parameter so as to display the page detail data.

After the prompt voice, the page detail data and the page jump parameter are obtained, on one hand, the prompt voice can be broadcasted, and on the other hand, the page detail data can be displayed in the target display page.

The current display page is a page displayed currently on the display screen, and the target display page is a page related to the voice information. As one example, the current display page is a search page, the user's voice information includes view details voice information, and the target display page is a merchandise details page. The page detail data is displayed in the item detail page.

In one embodiment of the invention, after the prompt voice is broadcasted or before the prompt voice is broadcasted, the current display page is jumped to the target display page according to the page jump parameter so as to display the page detail data. In this way, the user can be prompted with voice and display pages for feedback of the voice interaction.

Referring to fig. 3, fig. 3 is a schematic flowchart of a process of reporting a prompt voice and displaying page detail data according to an embodiment of the present invention, which specifically includes:

s301, acquiring page jump parameters.

In the embodiment of the invention, the server can directly send the page jump parameter after determining the page jump parameter.

S302, determining a target display page according to the page jump parameter and a preset page routing table so as to display page detail data on the target display page, wherein the preset page routing table correspondingly stores the page jump parameter and the target display page.

It can be understood that the preset page routing table stores the page jump parameter and the target display page correspondingly. That is, the preset page routing table stores the corresponding relationship between the page jump parameter and the target display page.

The preset page routing table comprises a corresponding relation between a page jump parameter and a link of a target display page. And in a preset page routing table, inquiring and acquiring the link of the target display page based on the page jump parameter. And then, jumping to the target display page according to the link of the target display page, and further displaying the page detail data on the target display page.

In the embodiment of fig. 3, after hearing the prompt voice, a jump may be made from the current display page to the target display page to display page detail data. And the voice interaction result is fed back to the user from two aspects, and the output information is increased, so that the shopping experience of the user is improved.

In the above embodiment, the voice information of the user is received; acquiring prompt voice, page detail data and page jump parameters based on voice information of a user, wherein the prompt voice is acquired according to the voice information and/or historical behaviors of the user, the page detail data is acquired according to keywords in the voice information, and the page jump parameters are used for indicating a target display page; and broadcasting prompt voice, and jumping from the current display page to a target display page according to the page jumping parameters so as to display the page detail data. The prompt voice is broadcast according to the voice of the user, the page detail information of the related commodities is displayed, and the increase of the output information is realized, so that the shopping experience of the user is improved.

The technical scheme in the embodiment of the invention is described by taking APP as an example with reference to the attached drawings,

referring to fig. 4, fig. 4 is a schematic flowchart of a process of waking up an APP according to a wake-up voice instruction in the embodiment of the present invention, which specifically includes:

s401, voice information is input.

The APP is installed on the mobile terminal, and a user inputs voice information to a microphone of the mobile terminal.

S402, judging that the voice information comprises a preset awakening word.

The APP receives the voice information in a dormant state, and whether the voice information comprises a preset awakening word needs to be judged. And under the condition that the voice information comprises the preset awakening words, converting from the dormant state to the working state. And under the condition that the voice information does not comprise the preset awakening word, the APP is still in a dormant state.

And S403, displaying the APP home page.

The APP is converted into a working state, and the APP home page is displayed in the display screen of the mobile terminal.

In the embodiment of fig. 4, the user wakes up the APP by voice information including a preset wake-up word.

Referring to fig. 5, fig. 5 is a schematic view illustrating a flow of searching according to the search speech information according to an embodiment of the present invention;

s501, voice information is input.

A user inputs voice information to a microphone of the mobile terminal.

And S502, uploading voice information.

The APP uploads the voice information of the user to the server. For example, the voice information includes: i want to search for the handset.

S503, processing the voice information to obtain prompt voice, page detail data and page jump parameters.

The server processes the voice information, and the voice information comprises search voice information. Obtaining a prompt voice: you choose the goods. The page detail data includes titles and pictures of a plurality of handsets.

S504, obtaining prompt voice, page detail data and page jump parameters.

The APP obtains prompt voice, page detail data and page jump parameters from the server.

And S505, displaying a search page.

And broadcasting prompt voice, and jumping from the APP home page to a search page according to the page jumping parameters to display the titles and pictures of a plurality of mobile phones.

In the embodiment of fig. 5, the user reports a prompt voice through voice information including search voice information, and displays a search page.

Referring to fig. 6, fig. 6 is a schematic flowchart of a process of viewing details according to viewing details voice information in an embodiment of the present invention, which specifically includes:

s601, inputting voice information.

A user inputs voice information to a microphone of the mobile terminal.

And S602, uploading voice information.

The APP uploads the voice information of the user to the server. For example, the voice information includes: a second handset.

And S603, processing the voice information to obtain prompt voice, page detail data and page jump parameters.

And the server processes the voice information, wherein the voice information comprises the voice information for checking details. Obtaining a prompt voice: you look at the product. The page detail data includes a detailed description and an appearance picture of the second handset.

S604, acquiring prompt voice, page detail data and page jump parameters.

The APP obtains prompt voice, page detail data and page jump parameters from the server.

And S605, displaying a commodity detail page.

And broadcasting prompt voice, and jumping to a commodity detail page from a display search page according to the page jumping parameters so as to display detailed introduction and appearance pictures of the second mobile phone.

In the embodiment of fig. 6, the user announces a prompt voice by voice information including viewing detailed voice information, and displays a goods detail page.

Referring to fig. 7, fig. 7 is a schematic view of a process of settling a voice message according to an embodiment of the present invention, which specifically includes:

and S701, inputting voice information.

A user inputs voice information to a microphone of the mobile terminal.

And S702, uploading voice information.

The APP uploads the voice information of the user to the server. For example, the voice information includes: a second handset is purchased.

And S703, processing the voice information to obtain prompt voice, page detail data and page jump parameters.

The server processes the voice information, and the voice information comprises settlement voice information. Obtaining a prompt voice: you confirm the purchase. The page detail data includes settlement information for the second handset.

S704, acquiring prompt voice, page detail data and page jump parameters.

The APP obtains prompt voice, page detail data and page jump parameters from the server.

S705, displaying a settlement page.

And broadcasting prompt voice, and skipping from the commodity detail display page to the settlement page according to the page skipping parameter so as to display the settlement information of the second mobile phone.

In the embodiment of fig. 7, the user broadcasts a prompt voice through voice information including settlement voice information, and displays the settlement information.

In one embodiment of the present invention, the method of voice interaction comprises another embodiment. The difference from the embodiment in fig. 1 is that the voice information is directly processed to obtain the prompt voice, the keyword of the voice information is extracted to generate the access request, and the access request is sent to the server.

Namely: processing the voice information to obtain prompt voice, extracting keywords of the voice information, generating an access request according to the keywords, and uploading the access request to obtain page detail data and page jump parameters.

Illustratively, the prompt voice may be derived from keywords in the voice message and/or historical behavior of the user. Further, after the user intention is known based on the keywords in the voice information and/or the historical behavior of the user, the server determines the page jump parameter based on the user intention.

It should be noted that various schemes in the embodiment of fig. 1 are applicable to this embodiment.

Referring to fig. 8, fig. 8 is a schematic diagram of a main structure of a voice interaction apparatus according to an embodiment of the present invention, which specifically includes: voice acquisition unit 801, communication unit 802, processor 803, voice broadcast unit 804 and screen 805.

The voice collecting unit 801 is configured to receive voice information of a user. As one example, the speech acquisition unit 801 may be a microphone or an array of microphones.

The communication unit 802 is configured to upload voice information and receive prompt voice, page detail data, and page jump parameters. As one example, the communication unit 802 may be a WiFi communication unit and/or a mobile communication unit. Illustratively, the communication unit 802 may receive a prompt voice, page detail data, and page jump parameters from a server.

And the processor 803 is configured to instruct the voice broadcasting unit 804 to broadcast a prompt voice and instruct the screen 805 to jump from the current display page to the target display page according to the page jump parameter, so as to display page detail data, where the prompt voice is obtained according to voice information and/or historical behaviors of the user, the page detail data is obtained according to a keyword in the voice information, and the page jump parameter is used for instructing the target display page. The processor 803 can obtain the prompt voice, the page detail data, and the page jump parameter from the communication unit 802.

And the voice broadcasting unit 804 is used for broadcasting prompt voice. As an example, the voice broadcasting unit 804 may be a speaker.

Screen 805 for jumping from the current display page to the target display page to display page detail data. As an example, the screen 805 may be a separate display, and the screen 805 may be integrated with the voice broadcasting unit 804 into one display.

In one embodiment of the invention, the voice interaction device may be a mobile terminal. Mobile terminals include, but are not limited to: mobile phones, tablet computers, notebook computers, and the like. It can be understood that a microphone in the mobile terminal is the voice collecting unit 801, a communication module in the mobile terminal is the communication unit 802, a processor in the mobile terminal is the processor 803, a speaker in the mobile terminal is the voice broadcasting unit 804, and a display screen in the mobile terminal is the screen 805.

In another embodiment of the present invention, the voice interaction device may be a smart speaker. It can be understood that a microphone in the smart-box is the voice acquisition unit 801, a communication module in the smart-box is the communication unit 802, a processor in the smart-box is the processor 803, a speaker in the smart-box is the voice broadcast unit 804, and a display screen in the smart-box is the screen 805.

In addition, the display screen in other devices may also be referred to as screen 805. As an example, the smart speaker is connected to the mobile terminal remotely, and a display screen of the mobile terminal may be used as the screen 805. Therefore, the scheme in the embodiment of the invention can also be realized by the display screens of the intelligent sound boxes and other devices which are not provided with the screens.

That is, for the smart speaker, the screen 805 may be the display screen of the smart speaker itself or the display screen of another device connected to the smart speaker. That is, the screen 805 is remotely connected to the processor 803.

In an embodiment of the present invention, the voice information of the user includes at least one of search voice information, view detail voice information, and settlement voice information.

Searching for voice information is an instruction for searching for information. As one example, upon recognizing that the user's intent of the voice information is to search for information, the voice information includes the search voice information.

The view details instruction is an instruction for viewing details of the article. As one example, upon recognizing that the user intent of the voice information is to view the details, the voice information includes a view details instruction.

The settlement voice information is an instruction for purchasing the commodity. As one example, upon recognizing that the user's intention of the voice information is to purchase a commodity, the voice information includes settlement voice information.

In one embodiment of the present invention, to avoid wasting resources, the voice-interactive apparatus may be in a dormant state when not in use. Then, the processor 803 may also determine whether a preset wake-up word is included in the voice message of the user.

As an example, the preset wake-up word is a word preset to initiate voice interaction. As an example, the preset wake-up word may be: "hello" or "mood".

For example, whether the voice information includes a preset wake-up word may be determined by the voice recognition model. Wherein the speech recognition model may be a neural network model.

The communication unit 802 uploads the voice message when the voice message of the user includes a preset wakeup word.

In one embodiment of the invention, the screen 805 may display page detail data in a variety of ways to meet user needs. As an example, the screen 805 jumps from the current display page to the target display page according to the page jump parameter after the voice broadcast unit 804 broadcasts the prompt voice or before the prompt voice is broadcast, to display the page detail data.

In one embodiment of the present invention, the processor 803 may determine the target display page according to the page jump parameter and a preset page routing table, which stores the page jump parameter and the target display page correspondingly, to instruct the screen 805 to display the page detail data on the target display page.

That is, the page routing table is preset in the processor 803 so that the target display page can be quickly known. As one example, the target display page may be a search page, a product detail page, or a checkout page.

In one embodiment of the present invention, the apparatus for voice interaction comprises another embodiment. The device structure of the voice interaction is the same as that of fig. 8.

The difference from the embodiment in fig. 8 is that the processor 803 may process the voice information of the user by itself to obtain the prompt voice, extract the keyword of the voice information, and generate the access request according to the keyword. The communication unit 802 transmits the access request generated by the processor 803 to the server to acquire the page detail data and the page jump parameter.

That is, the prompt voice is processed by the processor 803, and the page detail data and the page jump parameter are obtained from the server.

Namely: the voice collecting unit 801 is configured to receive voice information of a user.

A communication unit 802 for sending an access request and receiving page detail data and page jump parameters based on the access request.

A processor 803, configured to process the voice information to obtain a prompt voice, extract a keyword in the voice information, and generate an access request according to the keyword; and instructs the voice broadcasting unit 804 to broadcast a prompt voice, instructs the screen 805 to jump from the current display page to the target display page according to the page jump parameter to display the page detail data,

the prompt voice is obtained according to voice information and/or historical behaviors of a user, the page detail data is obtained according to keywords in the voice information, and the page jump parameter is used for indicating a target display page.

It should be noted that various schemes in the embodiment of fig. 8 are applicable to this embodiment.

Fig. 9 shows an exemplary system architecture 900 of a voice interaction method or voice interaction apparatus to which embodiments of the present invention may be applied.

As shown in fig. 9, the system architecture 900 may include end devices 901, 902, 903, a network 904, and a server 905. Network 904 is the medium used to provide communication links between terminal devices 901, 902, 903 and server 905. Network 904 may include various connection types, such as wired, wireless communication links, or fiber optic cables, to name a few.

A user may use the terminal devices 901, 902, 903 to interact with a server 905 over a network 904 to receive or send messages and the like. The terminal devices 901, 902, 903 may have installed thereon various messenger client applications such as, for example only, a shopping-like application, a web browser application, a search-like application, an instant messaging tool, a mailbox client, social platform software, etc.

The terminal devices 901, 902, 903 may be various electronic devices having a display screen and supporting web browsing, including but not limited to smart phones, tablet computers, laptop portable computers, desktop computers, and the like.

The server 905 may be a server providing various services, such as a background management server (for example only) providing support for shopping websites browsed by users using the terminal devices 901, 902, 903. The backend management server may analyze and perform other processing on the received data such as the product information query request, and feed back a processing result (for example, target push information, product information — just an example) to the terminal device.

It should be noted that the method for voice interaction provided by the embodiment of the present invention is generally executed by the server 905, and accordingly, the apparatus for voice interaction is generally disposed in the server 905.

It should be understood that the number of terminal devices, networks, and servers in fig. 9 is merely illustrative. There may be any number of terminal devices, networks, and servers, as desired for implementation.

Referring now to FIG. 10, a block diagram of a computer system 1000 suitable for use with a terminal device implementing an embodiment of the invention is shown. The terminal device shown in fig. 10 is only an example, and should not bring any limitation to the functions and the scope of use of the embodiments of the present invention.

As shown in fig. 10, the computer system 1000 includes a Central Processing Unit (CPU)1001 that can perform various appropriate actions and processes according to a program stored in a Read Only Memory (ROM)1002 or a program loaded from a storage section 1008 into a Random Access Memory (RAM) 1003. In the RAM 1003, various programs and data necessary for the operation of the system 1000 are also stored. The CPU 1001, ROM 1002, and RAM 1003 are connected to each other via a bus 1004. An input/output (I/O) interface 1005 is also connected to bus 1004.

The following components are connected to the I/O interface 1005: an input section 1006 including a keyboard, a mouse, and the like; an output section 1007 including a display such as a Cathode Ray Tube (CRT), a Liquid Crystal Display (LCD), and the like, and a speaker; a storage portion 1008 including a hard disk and the like; and a communication section 1009 including a network interface card such as a LAN card, a modem, or the like. The communication section 1009 performs communication processing via a network such as the internet. The driver 1010 is also connected to the I/O interface 1005 as necessary. A removable medium 1011 such as a magnetic disk, an optical disk, a magneto-optical disk, a semiconductor memory, or the like is mounted on the drive 1010 as necessary, so that a computer program read out therefrom is mounted into the storage section 1008 as necessary.

In particular, according to the embodiments of the present disclosure, the processes described above with reference to the flowcharts may be implemented as computer software programs. For example, embodiments of the present disclosure include a computer program product comprising a computer program embodied on a computer readable medium, the computer program comprising program code for performing the method illustrated in the flow chart. In such an embodiment, the computer program may be downloaded and installed from a network through the communication part 1009 and/or installed from the removable medium 1011. The computer program executes the above-described functions defined in the system of the present invention when executed by the Central Processing Unit (CPU) 1001.

It should be noted that the computer readable medium shown in the present invention can be a computer readable signal medium or a computer readable storage medium or any combination of the two. A computer readable storage medium may be, for example, but not limited to, an electronic, magnetic, optical, electromagnetic, infrared, or semiconductor system, apparatus, or device, or any combination of the foregoing. More specific examples of the computer readable storage medium may include, but are not limited to: an electrical connection having one or more wires, a portable computer diskette, a hard disk, a Random Access Memory (RAM), a read-only memory (ROM), an erasable programmable read-only memory (EPROM or flash memory), an optical fiber, a portable compact disc read-only memory (CD-ROM), an optical storage device, a magnetic storage device, or any suitable combination of the foregoing. In the present invention, a computer readable storage medium may be any tangible medium that can contain, or store a program for use by or in connection with an instruction execution system, apparatus, or device. In the present invention, however, a computer readable signal medium may include a propagated data signal with computer readable program code embodied therein, for example, in baseband or as part of a carrier wave. Such a propagated data signal may take many forms, including, but not limited to, electro-magnetic, optical, or any suitable combination thereof. A computer readable signal medium may also be any computer readable medium that is not a computer readable storage medium and that can communicate, propagate, or transport a program for use by or in connection with an instruction execution system, apparatus, or device. Program code embodied on a computer readable medium may be transmitted using any appropriate medium, including but not limited to: wireless, wire, fiber optic cable, RF, etc., or any suitable combination of the foregoing.

The flowchart and block diagrams in the figures illustrate the architecture, functionality, and operation of possible implementations of systems, methods and computer program products according to various embodiments of the present invention. In this regard, each block in the flowchart or block diagrams may represent a module, segment, or portion of code, which comprises one or more executable instructions for implementing the specified logical function(s). It should also be noted that, in some alternative implementations, the functions noted in the block may occur out of the order noted in the figures. For example, two blocks shown in succession may, in fact, be executed substantially concurrently, or the blocks may sometimes be executed in the reverse order, depending upon the functionality involved. It will also be noted that each block of the block diagrams or flowchart illustration, and combinations of blocks in the block diagrams or flowchart illustration, can be implemented by special purpose hardware-based systems which perform the specified functions or acts, or combinations of special purpose hardware and computer instructions.

The modules described in the embodiments of the present invention may be implemented by software or hardware. The described modules may also be provided in a processor, which may be described as: a processor includes a transmitting unit, an obtaining unit, a determining unit, and a first processing unit. The names of these units do not in some cases constitute a limitation to the unit itself, and for example, the sending unit may also be described as a "unit sending a picture acquisition request to a connected server".

As another aspect, the present invention also provides a computer-readable medium that may be contained in the apparatus described in the above embodiments; or may be separate and not incorporated into the device. The computer readable medium carries one or more programs which, when executed by a device, cause the device to comprise:

receiving voice information of a user;

and broadcasting the prompt voice, and jumping from the current display page to the target display page according to the page jumping parameter so as to display the page detail data.

According to the technical scheme of the embodiment of the invention, the voice information of a user is received; acquiring prompt voice, page detail data and page jump parameters based on the voice information of the user, wherein the prompt voice is acquired according to the voice information and/or historical behaviors of the user, the page detail data is acquired according to keywords in the voice information, and the page jump parameters are used for indicating a target display page; and broadcasting the prompt voice, and jumping from the current display page to the target display page according to the page jumping parameter so as to display the page detail data. The prompt voice can be broadcasted according to the voice information of the user, the page detail information of the related commodities is displayed, the output information is increased, and the shopping experience of the user is improved.

The above-described embodiments should not be construed as limiting the scope of the invention. Those skilled in the art will appreciate that various modifications, combinations, sub-combinations, and substitutions can occur, depending on design requirements and other factors. Any modification, equivalent replacement, and improvement made within the spirit and principle of the present invention should be included in the protection scope of the present invention.

19页详细技术资料下载

Voice interaction method, device, equipment and computer readable medium

相关技术

网友询问留言