Computing delayed responses of an assistant

文档序号：1590945 发布日期：2020-01-03 浏览：16次中文

阅读说明：本技术 计算助理的延迟响应 (Computing delayed responses of an assistant ) 是由亚里夫·阿丹弗拉迪米尔·武什科维奇贝沙德·贝扎迪于 2018-05-14 设计创作，主要内容包括：一种示例方法包括：由在一个或多个处理器处执行的计算助理接收在计算设备处说出的话语的表示；基于所述话语,识别由所述计算助理要执行的任务；响应于由计算助理确定任务的完全执行将花费超过于阈值时间量,输出向所述计算设备的用户通知所述任务的完全执行将不会立即执行的合成语音数据,以供由可操作地连接到所述计算设备的一个或多个扬声器进行回放；并由所述计算助理执行任务。(An example method includes: receiving, by a computing assistant executing at one or more processors, a representation of an utterance spoken at a computing device; identifying, based on the utterance, a task to be performed by the computing assistant; in response to determining, by a computing assistant, that full execution of a task will take more than a threshold amount of time, outputting, for playback by one or more speakers operatively connected to the computing device, synthesized speech data that informs a user of the computing device that full execution of the task will not be performed immediately; and performing a task by the computing assistant.)

1. A method, comprising:

receiving, by a computing assistant executing at one or more processors, a representation of an utterance spoken at a computing device;

identifying a task to be performed by the computing assistant based on the utterance;

in response to determining, by the computing assistant, that full execution of the task will take more than a threshold amount of time, outputting, for playback by one or more speakers operatively connected to the computing device, synthesized speech data that informs a user of the computing device that full execution of the task will not be performed immediately; and

performing, by the computing assistant, the task.

2. The method of claim 1, further comprising:

determining an estimated amount of time for full execution of the task, wherein determining that full execution of the task will take more than the threshold amount of time comprises determining that the estimated amount of time is greater than the threshold amount of time.

3. The method of claim 2, wherein determining the estimated amount of time comprises:

determining the estimated amount of time for full execution of the identified task based on a historical time for full execution of a task of the same type as the identified task.

4. The method of any of claims 1-3, wherein outputting the synthesized speech data that informs the user of the computing device that complete execution of the task will not be performed immediately comprises:

outputting, for playback by the one or more speakers operatively connected to the computing device, synthesized speech data that includes the estimated amount of time for full execution of the task.

5. The method of any of claims 1 to 4, further comprising:

determining that full execution of the task involves the computing assistant performing one or more subtasks; and

in response to determining that at least one of the one or more subtasks is marked in the task data store as unsuitable for immediate execution, determining that full execution of the task will take more than the threshold amount of time.

6. The method of claim 5, wherein determining that full performance of the task involves the computing assistant performing one or more subtasks comprises:

determining that full performance of the task involves the computing assistant performing a subtask that interacts with a person other than the user of the computing device; and

determining, based on the task data store, that the subtask interacting with the person other than the user of the computing device is not suitable for immediate execution.

7. The method of claim 6, wherein interacting with the person other than the user of the computing device comprises:

outputting, by the computing assistant and as part of a conversation with the person other than the user of the computing device, synthesized speech data for playback by one or more speakers operatively connected to a device associated with the person other than the user of the computing device.

8. The method of any of claims 5 to 7, wherein the subtasks marked in the task data store as unsuitable for immediate execution include one or more of:

a subtask that the computing assistant interacts with a person other than a user of the computing device;

the computing assistant performs a predetermined subtask;

a subtask of the computing assistant purchasing tickets;

a subtask requiring a large amount of computation;

a subtask of the computing assistant interacting with one or more computing systems that are predetermined to be slow; and

a subtask requiring a future event to occur.

9. The method of any of the preceding claims, wherein the utterance is a first utterance, wherein the synthesized speech data that informs the user of the computing device that complete execution of the task will not be performed immediately is output at a first time, and wherein the method further comprises:

receiving, by the computing assistant at a second time later than the first time, a representation of a second utterance spoken at the computing device, the second utterance including a request for an execution state of the task; and

outputting, for playback by the one or more speakers operatively connected to the computing device, synthesized speech data that informs the user of the computing device of the execution state of the task.

10. The method of any of the preceding claims, wherein the utterance is a first utterance, the method further comprising:

receiving, by the computing assistant and prior to completion of execution of the task, a representation of a third utterance spoken at the computing device, the third utterance including a request to modify one or more parameters of the task; and

performing, by the computing assistant, the task with the modified one or more parameters.

11. The method of claim 10, wherein the request to modify one or more parameters of the task comprises one or more of:

a request to change the time of booking or ticketing; and

a request to change the number of people included in a reservation or ticket purchase.

12. The method of any of the preceding claims, wherein outputting the synthesized speech data that informs the user of the computing device that full execution of the task will not be performed immediately comprises:

outputting, by one or more speakers operatively connected to the computing device, synthesized speech data that indicates a partial or lower confidence response to the utterance and that the computing assistant is to follow with a full or higher confidence response for playback by the one or more speakers.

13. The method of claim 12, wherein identifying the task comprises:

a search query is identified based on the utterance, and wherein the synthesized speech data that indicates a partial or lower confidence response to the utterance comprises synthesized speech data that indicates a partial or lower confidence response to the search query.

14. The method of any of the preceding claims, wherein the computing assistant is a general purpose computing assistant capable of performing tasks other than the identified task.

15. The method of any of the preceding claims, wherein the computing assistant is personalized for the user.

16. The method of any of the preceding claims, further comprising:

displaying, on the computing device and prior to full execution of the task, a visual indicator that the assistant is performing the task.

17. A computing system, comprising:

at least one processor; and

a memory comprising instructions that, when executed, cause the at least one processor to execute a computing assistant configured to:

receiving a representation of an utterance spoken at one or more microphones operatively connected to a computing device;

based on the utterance, identifying a task to be performed by the computing assistant;

in response to determining that full execution of the task will take more than a threshold amount of time, outputting, for playback by one or more speakers operatively connected to the computing device, synthesized speech data that informs a user of the computing device that full execution of the task will not be performed immediately; and

and executing the task.

18. The computing system of claim 17, wherein the computing assistant is further configured to:

determining an estimated amount of time for full execution of the task, wherein to determine that full execution of the task will take more than the threshold amount of time, the computing assistant is configured to determine that the estimated amount of time is greater than the threshold amount of time.

19. The computing system of claim 18, wherein to determine the estimated amount of time, the computing assistant is configured to:

determining the estimated amount of time for full execution of the identified task based on a historical time for full execution of a task of the same type as the identified task.

20. The computing system of any of claims 17 to 19, wherein to output the synthesized speech data that informs the user of the computing device that complete execution of the task will not be performed immediately, the computing assistant is further configured to:

outputting synthesized speech data including the estimated amount of time for full execution of the task for playback by the one or more speakers.

21. The computing system of any of claims 17 to 20, wherein the computing assistant is further configured to:

determining that full execution of the task involves the computing assistant performing one or more subtasks; and

22. The computing system of claim 21, wherein to determine that full execution of the task involves the computing assistant performing one or more subtasks, the computing assistant is configured to:

determining that full performance of the task involves the computing assistant performing a subtask that interacts with a person other than the user of the computing device; and

determining, based on the task data store, that the subtask interacting with the person other than the user of the computing device is not suitable for immediate execution.

23. The computing system of claim 22, wherein to interact with a person other than the user of the computing device, the computing assistant is configured to:

outputting synthesized speech data as part of a conversation with the person other than the user of the computing device for playback by one or more speakers operatively connected to a device associated with the person other than the user of the computing device.

24. The computing system of any of claims 21 to 23, wherein the subtasks marked in the task data store as unsuitable for immediate execution include one or more of:

a subtask that the computing assistant interacts with a person other than a user of the computing device;

the computing assistant performs a predetermined subtask;

a subtask of the computing assistant purchasing tickets;

a subtask requiring a large amount of computation;

a subtask of the computing assistant interacting with one or more computing systems that are predetermined to be slow; and

a subtask requiring a future event to occur.

25. The computing system of any of claims 17 to 24, wherein the utterance is a first utterance, wherein the synthesized speech data that informs the user of the computing device that complete execution of the task will not be performed immediately is output at a first time, and wherein the computing assistant is further configured to:

receiving, at a second time later than the first time, a representation of a second utterance spoken at the computing device, the second utterance including a request for an execution state of the task; and

26. A computer-readable storage medium storing instructions that, when executed, cause one or more processors of a computing system to execute a computing assistant configured to:

receiving a representation of an utterance spoken at one or more microphones operatively connected to a computing device;

based on the utterance, identifying a task to be performed by the computing assistant;

and executing the task.

27. The computer-readable storage medium of claim 26, wherein the computing assistant is further configured to:

28. The computer-readable storage medium of claim 26 or 27, wherein the computing assistant is further configured to:

determining that full execution of the task involves the computing assistant performing one or more subtasks; and

29. The computer-readable storage medium of claim 28, wherein to determine that full performance of the task involves the computing assistant performing one or more subtasks, the computing assistant is configured to:

determining that full performance of the task involves the computing assistant performing a subtask that interacts with a person other than the user of the computing device; and

determining, based on the task data store, that the subtask interacting with the person other than the user of the computing device is not suitable for immediate execution.

30. The computer-readable storage medium of claim 29, wherein to interact with a person other than the user of the computing device, the computing assistant is configured to:

31. A computing system, comprising:

at least one processor; and

a memory comprising instructions that, when executed, cause the at least one processor to perform a computing assistant configured to perform the method of any of claims 1-16.

32. A computer-readable storage medium storing instructions that, when executed, cause one or more processors of a computing system to perform a computing assistant configured to perform the method of any of claims 1-16.

Background

Some computing platforms may provide a user interface from which a user may chat, speak, or otherwise communicate with a virtual computing assistant (e.g., also referred to as a "smart assistant" or simply "assistant") to cause the assistant to output useful information, respond to the user's needs, or otherwise perform certain operations to assist the user in completing various real or virtual tasks. Unfortunately, certain operations performed by such assistants may not be performed immediately, leaving the requesting user in doubt as to whether the assistant is functioning properly or whether an error has occurred.

Disclosure of Invention

In general, techniques of this disclosure may enable a virtual computing assistant (e.g., also referred to as a "smart assistant" or simply "assistant") executing at one or more processors to notify a user that satisfaction with a spoken or spoken request may not be immediate. For example, the computing device may receive acoustic input (e.g., audio data) corresponding to a user utterance via a microphone. Based on the voice input, the computing assistant may identify a task to perform (e.g., using speech recognition). If the computing assistant determines that full execution of the task will take longer than a threshold amount of time, the computing device may use one or more speakers to output synthesized speech data that informs the requesting user that full execution of the task will not be immediate. In this way, the computing assistant may prevent the user from restating the utterance as opposed to merely performing the task without notifying the user that the response is to be delayed. By preventing the user from restating the utterance, the computing assistant may avoid performing repeated tasks and avoid determining whether the utterance is a repeat or a new task to perform, which may improve functionality of the computing assistant (e.g., by reducing processing requirements and/or power consumption).

Thus, by reducing the impact on user interaction with the device, the technical limitations related to response time may be alleviated. In particular, repeated utterances may be avoided, which may affect technical aspects such as processing load, power consumption, and network usage. Furthermore, the method may avoid other user utterances occurring after the task has been recognized; such other utterances may not have any beneficial use in the task selection by the assistant, and may actually be ambiguous by using, for example, inconsistent terms. In this way, improvements can be implemented in the interpretation of user requests by the assistant.

In one example, a method, comprising:

receiving, by a computing assistant executing at one or more processors, a representation of an utterance spoken at a computing device;

based on the utterance, identifying a task to be performed by the computing assistant;

in response to the computing assistant determining that full execution of the task will take more than a threshold amount of time, outputting, for playback by one or more speakers operatively connected to the computing device, synthesized speech data that informs a user of the computing device that full execution of the task will not be performed immediately; and

performing, by the computing assistant, the task.

In another example, a computing system, comprising: at least one processor; and a memory comprising instructions that, when executed, cause the at least one processor to execute a computing assistant configured to: receiving a representation of an utterance spoken at one or more microphones operatively connected to a computing device; based on the utterance, identifying a task to be performed by the computing assistant; in response to determining that full execution of the task will take more than a threshold amount of time, outputting, for playback by one or more speakers operatively connected to the computing device, synthesized speech data that informs a user of the computing device that full execution of the task will not be performed immediately; and executing the task.

In another example, a computer-readable storage medium stores instructions that, when executed, cause one or more processors of a computing system to execute a computing assistant configured to: receiving a representation of an utterance spoken at one or more microphones operatively connected to a computing device; based on the utterance, identifying a task to be performed by the computing assistant; in response to determining that full execution of the task will take more than a threshold amount of time, outputting, for playback by one or more speakers operatively connected to the computing device, synthesized speech data that informs a user of the computing device that full execution of the task will not be performed immediately; and executing the task.

The details of one or more examples are set forth in the accompanying drawings and the description below. Other features, objects, and advantages of the disclosure will be apparent from the description and drawings, and from the claims.

Drawings

Fig. 1 is a conceptual diagram illustrating an example system executing an example virtual assistant in accordance with one or more aspects of the present disclosure.

Fig. 2 is a block diagram illustrating an example computing device configured to execute an example virtual assistant in accordance with one or more aspects of the present disclosure.

Fig. 3 is a flow diagram illustrating example operations performed by one or more processors executing an example virtual assistant in accordance with one or more aspects of the present disclosure.

Fig. 4 is a block diagram illustrating an example computing device configured to execute an example virtual assistant in accordance with one or more aspects of the present disclosure.

Detailed Description

Fig. 1 is a conceptual diagram illustrating an example system executing an example virtual assistant in accordance with one or more aspects of the present disclosure. The system 100 of FIG. 1 includes: fig. 1 includes a digital assistant system 160 in communication with a search server system 180 and a computing device 110 via a network 130. Although system 100 is shown as being distributed among digital assistant system 160, search server system 180, and computing device 110, in other examples, features and techniques attributed to system 100 may be performed internally by components local to computing device 110. Similarly, the digital assistant system 160 can include certain components and perform various techniques that are otherwise attributed to the search server system 180 and/or the computing device 110 in the following description.

Network 130 represents any public or private communication network, such as a cellular, Wi-Fi, and/or other type of network, for transmitting data between computing systems, servers, and computing devices. Digital assistant system 160 can exchange data with computing device 110 via network 130 to provide virtual assistant services that are accessible to computing device 110 when computing device 110 is connected to network 130. The digital assistant system 160 may exchange data with the search server system 180 via the network 130 to access search services provided by the search server system 180. Computing device 110 may exchange data with search server system 180 via network 130 to access search services provided by search server system 180.

Network 130 may include one or more network hubs, network switches, network routers, or any other network devices operatively coupled to each other to provide for the exchange of information between systems 160 and 180 and computing device 110. Computing device 110, digital assistant system 160, and search server system 180 may send and receive data across network 130 using any suitable communication technology. Computing device 110, digital assistant system 160, and search server system 180 may each be operatively coupled to network 130 using respective network links. The links coupling computing device 110, digital assistant system 160, and search server system 180 to network 130 may be ethernet or other types of network connections, and such connections may be wireless and/or wired connections.

Digital assistant system 160 and search server system 180 represent any suitable remote computing system, such as one or more desktop computers, laptop computers, mainframes, servers, cloud computing systems, etc., capable of sending and receiving information to and from a network, such as network 130. Digital assistant system 160 hosts (or at least provides access to) a virtual assistant service. Search server system 180 hosts (or at least provides access to) a search service. In some examples, digital assistant system 160 and search server system 180 represent cloud computing systems that provide access to their respective services via the cloud.

Computing device 110 represents one or more separate mobile or non-mobile computing devices. Examples of computing device 110 include a mobile phone, a tablet, a laptop, a desktop, a server, a mainframe, a set-top box, a television, a wearable device (e.g., a computer watch, computer glasses, computer gloves, etc.), a home automation device or system (e.g., a smart thermostat or home assistant device), a Personal Digital Assistant (PDA), a gaming system, a media player, an e-book reader, a mobile television platform, a car navigation or infotainment system, or any other type of mobile, non-mobile, wearable, and non-wearable computing device configured to execute or access a virtual assistant and receive information via a network (e.g., network 130).

Digital assistant system 160 and/or search server system 180 may communicate with computing device 110 via network 130 to enable computing device 110 to access virtual assistant services provided by digital assistant system 160 and/or to provide computing device 110 with access to search services provided by search server system 180. In providing virtual assistant services, digital assistant system 160 may communicate with search server system 180 via network 130 to obtain search results for providing virtual assistant service information to users to complete tasks.

In the example of fig. 1, digital assistant system 160 includes a remote assistant module 122B and a user information data store 124B. Remote assistant module 122B may save user information data store 124B as part of a virtual assistant service that digital assistant system 160 provides (e.g., to computing device 110) via network 130. Computing device 110 includes User Interface Device (UID)112, User Interface (UI) module 120, local assistant module 122A, and user information data store 124A. Local assistant module 122A can save user information data store 124A as part of a virtual assistant service executing locally on computing device 110. Remote assistant module 122B and local assistant module 122A may be collectively referred to as assistant modules 122A and 122B. Local data store 124A and remote data store 124B may be collectively referred to as data stores 124A and 124B.

Modules 120, 122A, 122B, and 182 may perform the described operations using software, hardware, firmware, or a mixture of hardware, software, and firmware residing and/or executing at one of computing device 110, digital assistant system 160, or search server system 180. Computing device 110, digital assistant system 160, and search server system 180 may execute modules 120, 122A, 122B, and 182 using multiple processors or multiple devices. Computing device 110, digital assistant system 160, and search server system 180 may execute modules 120, 122A, 122B, and 182 as virtual machines executing on the underlying hardware. Modules 120, 122A, 122B, and 182 may execute as one or more services of an operating system or computing platform. Modules 120, 122A, 122B, and 182 may execute as one or more executable programs at the application layer of the computing platform.

UIDs 112 of computing devices 110 may serve as input and/or output devices for computing devices 110. UID112 may be implemented using various technologies. For example, UID112 may function as an input device using a presence-sensitive input screen, such as a resistive touchscreen, a surface acoustic wave touchscreen, a capacitive touchscreen, a projected capacitive touchscreen, a pressure-sensitive screen, an acoustic pulse recognition touchscreen, or another presence-sensitive display technology.

UID112 may function as an input device using microphone technology, infrared sensor technology, or other input device technology for receiving user input. For example, UID112 may use built-in microphone technology to detect speech input that UI module 120 and/or local assistant module 122A process to complete a task. As another example, UID112 may include a presence-sensitive display that may receive tactile input from a user of computing device 110. UID112 may receive indications of tactile input by detecting one or more gestures from a user (e.g., the user touching or pointing to one or more locations of UID112 with a finger or stylus pen).

UID112 may function as an output (e.g., display) device and present the output to a user. UID112 may function as an output device using any one or more display devices, such as a Liquid Crystal Display (LCD), a dot matrix display, a Light Emitting Diode (LED) display, an Organic Light Emitting Diode (OLED) display, electronic ink, or similar monochrome or color display capable of outputting visible information to a user of computing device 110. UIDs 112 may function as output devices using speaker technology, haptic feedback technology, or other output device technology for outputting information to a user. UID112 may present a user interface related to a virtual assistant provided by local assistant module 122A and/or remote assistant module 122B. UID112 may present a user interface related to a computing platform, operating system, application, and/or other functionality of a service (e.g., email, chat, online service, phone, game, etc.) executing on computing device 110 and/or accessible from computing device 110.

UI module 120 may manage user interactions with UIDs 112 and other components of computing devices 110, including interacting with digital assistant systems 160 to provide assistant services via UIDs 112. UI module 120 may cause UID112 to output a user interface when a user of computing device 110 views output and/or provide input at UID 112. At different times and when the user and computing device 110 are located at different locations, UI module 120 and UIDs 112 may receive one or more indications of input (e.g., voice input, gesture input, etc.) from the user as the user interacts with the user interface. UI module 120 and UIDs 112 may interpret inputs detected at UIDs 112 and may relay information related to the inputs detected at UIDs 112 to, for example, local assistant module 122A and/or one or more other related platforms, operating systems, applications, and/or services executing at computing device 110 to cause computing device 110 to perform functions.

UI module 120 may receive information and instructions from one or more related platforms, operating systems, applications, and/or services executing at computing device 110 and/or one or more remote computing systems (e.g., systems 160 and 180). Additionally, UI module 120 may act as an intermediary between one or more associated platforms, operating systems, applications, and/or services executing at computing device 110 and various output devices of computing device 110 (e.g., speakers, LED indicators, audio or haptic output devices, etc.) to produce output (e.g., graphics, lights, sounds, haptic responses, etc.) with computing device 110.

Search module 182 may perform a search for information determined to be relevant to a search query that search module 182 automatically generates (e.g., based on contextual information associated with computing device 110) or that search module 182 receives from digital assistant system 160 or computing device 110 (e.g., as part of a task that a virtual assistant is completing on behalf of a user of computing device 110). The search module 182 may conduct an internet search based on the search query to identify information (e.g., weather or traffic conditions, news, stock prices, sports scores, user schedules, transportation schedules, retail prices, etc.) relevant to the search query from various information sources (e.g., stored locally or remotely to the search server system 180). After performing the search, the search module 182 may output information returned from the search (e.g., search results) to the digital assistant system 160 or the computing device 110.

Local assistant module 122A of computing device 110 and remote assistant module 122B of digital assistant system 160 may each perform similar functions described herein to automatically execute an assistant configured to perform various tasks for a user. Remote assistant module 122B and user information data store 124B represent a server-side or cloud implementation of the example virtual assistant, while local assistant module 122A and user information data store 124A represent a client-side or local implementation of the example virtual assistant.

Modules 122A and 122B (collectively "assistant modules 122") may each include respective software agents configured to execute an intelligent personal assistant that may perform tasks or services for an individual, such as a user of computing device 110. Assistant module 122 may perform these tasks or services based on user input (e.g., detected at UID 112) from various information sources (e.g., stored locally on computing device 110, digital assistant system 160, or obtained through a search service provided by search server system 180), location awareness (e.g., based on context), and/or the ability to access other information (e.g., weather or traffic conditions, news, stock prices, sports scores, user schedules, transportation schedules, retail prices, etc.). The assistant provided by the assistant module 122 may be considered a general assistant because the assistant is capable of performing a variety of tasks. The assistant module 122 can perform artificial intelligence and/or machine learning techniques to automatically identify and complete one or more tasks on behalf of the user.

The various assistants provided by assistant module 122 may be configured to perform one or more tasks in performing operations to satisfy spoken or verbal requests by a user of computing device 110. For example, an assistant provided by the assistant module 122 may receive, with one or more microphones of the computing device 110, an audio input (e.g., audio data) that corresponds to an utterance of a user of the computing device 110 requesting to perform a particular task (e.g., "make a reservation for follow at La 'French Spot at 7:30tomorrow night (tomorrow night 7:30 makes a reservation for four people at La' French Spot)").

An assistant provided by the assistant module 122 can analyze the audio data to identify a task corresponding to the spoken utterance. For example, the assistant provided by the assistant module 122 may utilize speech recognition to determine that the spoken utterance of "make a reservation for four people at La 'French Spot at 7:30tomorrow right" (tomorrow night 7:30 booked four people at La' French sight) "corresponds to a reservation task with the following parameters: banquet size 4, date: tomorrow, time: 7:30pm, site: la' French Spot.

In some examples, completion of the identified task may require completion of one or more subtasks. Some example subtasks include, but are not limited to, interacting with a person other than the user of computing device 110 (e.g., by placing a call using synthesized speech), making a reservation, purchasing tickets, calculating, interacting with one or more computing systems, performing a search query, creating or modifying a calendar event, and so forth. For example, completing a reservation task may require: a first subtask that executes a search query to identify a restaurant in the request; and, a second subtask of actually making a reservation at the identified restaurant. In some examples, the assistant may perform a subtask that notifies the requesting user that the task has been completed, for example, in cases where completion of a particular task is not readily apparent to the requesting user (e.g., as opposed to certain home automation operations in which something is physically moved or changed in the vicinity of the requesting user). For example, making the reservation task may include notifying the requesting user of a third subtask that a reservation has been made.

The assistant provided by the assistant module 122 may perform the identified task. For example, continuing with the booking example, the assistant provided by assistant module 122 may perform the subtask of identifying restaurants in the request by outputting to search server system 180 a finding restaurant called "La 'French Spot" that is close to the user's current location (if the assistant predicts that the user will be traveling at the time of the booking). After identifying a restaurant, the assistant provided by the assistant module 122 may perform the subtasks that actually make reservations at the identified restaurant. As one example, if the identified restaurant uses an electronic reservation system accessible by the assistant (e.g., via network 130), the assistant may submit the reservation request electronically via the electronic reservation system. As another example, an assistant provided by the assistant module 122 can place a call to an identified restaurant (e.g., using contact information identified by the search server system 180). Once the assistant has completed the subtasks that actually make the reservation, the assistant can output an indication to the issuing requesting user that the reservation has been made. For example, the assistant may cause one or more speakers of computing device 110 to output synthesized speech data that describes "your reservation for four your favorite at La 'French Spot tomorrow at 7:30pm is confirmed (confirm you have subscribed to four people in La' French Spot 7:30 tomorrow) in tomorrow).

In some examples, the assistant provided by the assistant module 122 may not be able to immediately complete execution of the identified task. For example, the assistant provided by assistant module 122 may not be able to complete execution of the identified task (or complete execution of all subtasks of the identified task) within a threshold amount of time (e.g., 500 milliseconds, 1 second, 2 seconds, 5 seconds, 10 seconds, 30 seconds, etc.). In other words, there may be a delay between when the user provides the spoken utterance and when the assistant can complete performing the task based on spoken utterance recognition. During the delay, the user may be concerned that the assistant provided by assistant module 122 is not running or has not received the original request. In this way, the user can re-state the utterance, which can cause an assistant provided by the assistant module 122 to perform a repeated task and/or must determine whether the new utterance is a repeat of the original utterance that does not require other actions or a request to perform a new task.

In accordance with one or more techniques of this disclosure, if full execution of the task cannot be performed immediately (e.g., within a configurable threshold amount of time), the assistant provided by the assistant module 122 may output an indication that full execution of the task will not be performed immediately. For example, an assistant provided by assistant module 122 may output synthesized speech data that informs the user that complete execution of the task will not be completed immediately for playback by one or more speakers operatively connected to computing device 110. In this way, the assistant provided by the assistant module 122 may prevent the user from restating the utterance as opposed to merely performing the task without notifying the user that the response is to be delayed. By preventing the user from restating the utterance, the assistant provided by the assistant module 122 may avoid performing repeated tasks and avoid determining whether the utterance is repeated or a new task to be performed, which may improve the functionality of the assistant provided by the assistant module 122 (e.g., by reducing processing requirements and/or power consumption).

The assistant provided by assistant module 122 may determine that full execution of the task will not be immediate, where full execution of the task will take longer than a threshold amount of time (e.g., 500 milliseconds, 1 second, 2 seconds, 5 seconds, 10 seconds, 30 seconds, etc.). The assistant may determine whether full performance of the task requires more than a threshold amount of time based on a variety of factors including, but not limited to, an estimated amount of time required to fully perform the identified task, the identified type of task, and the like.

As one example, an assistant provided by assistant module 122 may determine an estimated amount of time required for complete execution of a task. If the estimated amount of time is longer than the threshold amount of time, the assistant provided by the assistant module 122 may determine that the complete execution of the task is not completed immediately. In some examples, the assistant may determine the estimated amount of time based on a historical time to fully perform the same type of task as the identified task. For example, where the identified task is booking tickets from a popular ticketing agency website, the assistant may determine an estimated amount of time required to fully perform the identified task based on the time it took the assistant to book tickets from the popular ticketing agency website in the past. In some examples, the assistant may determine the estimated amount of time based on additional contextual information. For example, where the identified task is a popular ticket agency website booking tickets, the assistant may determine the estimated amount of time based on one or more of the release date of the ticket, the popularity of a particular group/action/event, the wait time indicated by the ticket agency, the queue length, etc.

As another example, the assistant may determine that full execution of the task will take more than a threshold amount of time in response to determining that the task (or constituent sub-tasks) are not suitable for immediate execution. For example, it may be predetermined that one or more tasks are not suitable for immediate execution (e.g., because execution of one or more tasks may not necessarily be able to be performed immediately). A task data store accessible to the assistant provided by the assistant module 122 may indicate which tasks are not suitable for immediate execution. Some example tasks that may not be suitable for immediate execution include, but are not limited to, interacting with others other than the computing device user, making reservations, purchasing tickets, tasks that require a large amount of computation (e.g., using a large number of machine learning models), interacting with one or more computer systems that are predetermined to be slow, and tasks that require future events to occur (e.g., having to wait until a ticket is actually sold, providing a final score for a sporting event currently in progress, etc.). As described above, an assistant performing a task completely may involve the execution of multiple subtasks. As such, if one or more subtasks of a particular task are not suitable for immediate execution, the assistant may determine that full execution of the particular task will take more than a threshold amount of time.

In some examples, the threshold amount of time (e.g., a threshold for determining whether full execution of the task will not be performed immediately) may not be user adjustable. In some examples, the threshold amount of time may be user adjustable. For example, the user may provide input specifying a threshold amount of time. In this way, assistants associated with different users may use different threshold amounts of time in determining whether to alert their respective users that the complete execution of the task will not be immediately performed.

In some examples, the threshold amount of time may be the same for each task or each task type. For example, the assistant may use the same threshold when determining whether full execution of the ticket ordering task will be performed immediately and when determining whether full execution of the search query task will be performed immediately. In some examples, the threshold amount of time may depend on the task. For example, the assistant may use a first threshold when determining whether full execution of a complex task (e.g., a subscribe ticket task) will be performed immediately, and a second threshold (e.g., which is shorter than the first threshold) when determining that full execution of a simple task (e.g., a search query task) will be performed immediately. As an example, the assistant may output a different indication (i.e., a delay notification) when it is determined that the execution of the complex task is not performed immediately (e.g., synthesized speech of "work"), while the assistant may use a shorter threshold to trigger the delay notification for simpler tasks that are determined to take less time (e.g., in the case of poor internet connectivity or a breakdown of a website), as the user desires some delay in executing the more complex task.

The assistant provided by the assistant module 122 may be capable of implementing modifications to the performance of tasks and the like. For example, after outputting synthesized speech data that informs a user of computing device 110 that a complete execution of a task will not be performed immediately, but before actually completing the task at all, the assistant may utilize one or more microphones of computing device 110 to receive an acoustic input (e.g., audio data) corresponding to an utterance by the user of computing device 110 requesting a modification to the execution of the task currently being performed (e.g., "change the reservation to five people"). Some example modifications include, but are not limited to, changing the time and/or date of a reservation or ticketing, changing the number of people included in the reservation or ticketing.

The assistant can then modify the performance of the task based on the utterance. For example, if the assistant is currently talking to a restaurant making a reservation, the assistant may output synthesized speech data as part of a conversation with an employee of the restaurant for playback through one or more speakers operatively connected to a device associated with the restaurant that accepts the reservation requesting 5 persons (as opposed to the original 4 persons) of the reservation.

The assistant module 122 may provide an assistant with the ability to check the status of a task currently being performed, as one example, after outputting synthesized speech data that informs the user of the computing device 110 that complete execution of the task is not performing immediately, but before actual complete execution of the task, the assistant may receive an acoustic input (e.g., audio data) with one or more microphones of the computing device 110 that corresponds to an utterance by the user of the computing device 110 requesting the status of performance of the task currently being performed (e.g., "has the ticket booked"). in response to receiving the utterance request status, the assistant may output synthesized speech data that informs the user of the status of performing the task for playback by one or more speakers operatively connected to the computing device 110.

The assistant provided by the assistant module 122 may enable the user to cancel or exit a task currently being performed. As one example, if the user determines that a task currently being performed will not complete fast enough after checking the status of the task, the user may verbally or otherwise provide input to the computing device 110 to cause the assistant provided by the assistant module 122 to cancel or exit the task. As another example, if the assistant provided by the assistant module 122 determines that full execution of a task currently being performed will take too long (e.g., if full execution does not occur until the task becomes insignificant), the assistant may output synthesized speech data that asks the user whether they wish the assistant to continue performing the task or to cancel or exit the task. As another example, if the assistant provided by the assistant module 122 determines that the estimated amount of time for full execution of the task will take too long (e.g., longer than a threshold) before executing the task, the assistant may output synthesized speech data that indicates the estimated amount of time and asks the user whether they wish the assistant to perform the task or cancel the task.

The respective assistants provided by remote assistant module 122B and local assistant module 122A may, in the course of performing operations to support a conversation with a user of computing device 110, automatically create, generate, or otherwise maintain personal records of information obtained during the conversation, store the personal records as user-specific values, and store as user information data store 124B and user information data store 124A, respectively, in a structured and semantic manner. Data stores 124B and 124A may enable respective assistants executed by remote assistant module 122B and local assistant module 122A, respectively, to quickly access personal information (e.g., user-specific values) to complete actual tasks, virtual tasks, or otherwise respond to immediate or future needs of a user of computing device 110. For ease of description, the techniques of this disclosure are described primarily from the perspective performed by local assistant module 122A.

Assistant modules 122A and 122B can maintain user information datastores 124A and 124B as part of a virtual assistant service that assistant modules 122A and 122B provide to computing device 110, either together or separately. The assistant provided by assistant module 122 may, in the course of performing operations to support a conversation with a user of computing device 110, maintain personal records of information automatically culled from the conversation and store the personal records in a structured and semantic manner as user information data stores 124A and 124B. Data stores 124A and 124B may enable assistants executed by assistant modules 122A and 122B to quickly access personal information to complete real-world tasks, virtual tasks, or otherwise respond to immediate and/or future needs of a user of computing device 110.

Assistant modules 122A and 122B may retain personal records associated with the user of computing device 110 only after first receiving explicit permission from the user to do so. Thus, the user may have complete control over how the assistant collects and uses information about the user, including permission settings and automated operation usage histories. For example, before retaining information associated with a user of computing device 110, assistant modules 122A and 122B may cause UI module 120 to present a user interface via UID112, the UID112 requesting the user to select a box, click a button, state a voice input, or otherwise provide a particular input to the user interface that is interpreted by assistant modules 122A and 122B as an explicit affirmative consent by assistant modules 122A and 122B to collect and utilize the user's personal information.

Assistant modules 122A and 122B may encrypt or otherwise treat the maintained information as a personal record to delete the actual identity of the user before storing the personal information in data stores 124A and 124B. For example, information may be processed by assistant modules 122A and 122B such that any personally identifiable information will be deleted from the user's personal records when stored in data stores 124A and 124B.

Assistant modules 122A and 122B may cause UI module 120 to present a user interface via UID112, and a user of computing device 110 may modify information from UID112 or delete information from personal records stored in data stores 124A and 124B. For example, the user interface may provide a region where a user of computing device 110 may provide input to communicate commands to assistant modules 122A and 122B to modify or remove particular portions of personal information. In this manner, a user of computing device 110 may have full control over the information retained by assistant modules 122A and 122B at data stores 124A and 124B.

Each entry in the personal records stored by data stores 124A and 124B may be associated with a predefined schema that can be quickly traversed or parsed by assistant modules 122A and 122B to find the information that assistant modules 122A and 122B need at the current time to understand the user's needs and to assist the user in completing the task. Once the personal information has been recorded as one or more values specific to the user, the assistant modules 122A and 122B can quickly use the information stored in the data stores 124A and 124B to complete the task. If there are no ongoing tasks, the assistant modules 122A and 122B may provide the user with an example of how the assistant may use this information in the future to assist the user. The user may then provide an input at UID112 to instruct assistant modules 122A and 122B to forget or modify the information.

The values stored by data stores 124A and 124B may be textual values (e.g., person name, place name, other textual descriptors of the entity), numerical values (e.g., age, height, weight, other physiological data, other numerical information associated with the entity), or pointers to user-specific values (e.g., to locations in memory of the entity in the user's knowledge graph, to locations in memory of contacts in an address book, etc.). In other words, the user-specific values may take a variety of forms, and are specific to fields of the personal record defined by the recording schema. The value may indicate the actual information that is user-specific or may be a reference to a location from which user-specific information may be retrieved.

By accessing data stores 124A and 124B, the assistant provided by assistant module 122 can be considered personalized to the user. For example, an assistant provided by the assistant module 122 may be able to perform tasks using information specific to the requesting user, which is not typically available to other users.

Fig. 2 is a block diagram illustrating an example computing device configured to execute an example virtual assistant in accordance with one or more aspects of the present disclosure. Computing device 210 of fig. 2 is described below as an example of computing device 110 of fig. 1. Fig. 2 shows only one particular example of computing device 210, and many other examples of computing device 210 may be used in other instances and may include a subset of the components included in example computing device 210, or may include additional components not shown in fig. 2.

As shown in the example of fig. 2, computing device 210 includes a User Interface Device (UID)212, one or more processors 240, one or more communication units 242, one or more input components 244, one or more output components 246, and one or more storage components 248. UID 212 includes display component 202, presence-sensitive input component 204, microphone component 206, and speaker component 208. Storage components 248 of computing device 210 include UI module 220, assistant module 222, search module 282, one or more application modules 226, context module 230, user information data store 224, user identification module 232, action identification module 234, and authorization module 236.

Communication channel 250 may interconnect each of components 212, 240, 242, 244, 246, and 248 for inter-component communication (physically, communicatively, and/or operatively). In some examples, communication channel 250 may include a system bus, a network connection, an interprocess communication data structure, or any other method for communicating data.

One or more communication units 242 of computing device 210 may communicate with external devices (e.g., digital assistant system 160 and/or search server system 180 of system 100 of fig. 1) via one or more wired and/or wireless networks by transmitting and/or receiving network signals over one or more networks (e.g., network 130 of system 100 of fig. 1). Examples of communication unit 242 include a network interface card (e.g., such as an ethernet card), an optical transceiver, a radio frequency transceiver, a GPS receiver, or any other type of device that can send and/or receive information. Other examples of communication unit 242 may include short wave radios, cellular data radios, wireless network radios, and Universal Serial Bus (USB) controllers.

One or more input components 244 of computing device 210 may receive input. Examples of inputs are tactile, audio and video inputs. In one example, input component 242 of computing device 210 includes a presence-sensitive input device (e.g., touch screen, PSD), mouse, keyboard, voice response system, camera, microphone, or any other type of device for detecting input from a human or machine. In some examples, input components 242 may include one or more sensor components, one or more location sensors (GPS components, Wi-Fi components, cellular components), one or more temperature sensors, one or more motion sensors (e.g., accelerometers, gyroscopes), one or more pressure sensors (e.g., barometers), one or more ambient light sensors, and one or more other sensors (e.g., infrared proximity sensors, hygrometer sensors, etc.). Other sensors may include heart rate sensors, magnetometers, glucose sensors, olfactory sensors, compass sensors, step counter sensors, to name a few other non-limiting examples.

One or more output components 246 of the computing device 110 may generate output. Examples of outputs are tactile, audio and video outputs. Output component 246 of computing device 210, in one example, includes a presence-sensitive display, a sound card, a video graphics adapter card, a speaker, a Cathode Ray Tube (CRT) monitor, a Liquid Crystal Display (LCD), or any other type of device for producing an output to a person or machine.

UID 212 of computing device 210 may be similar to UID112 of computing device 110 and include display component 202, presence-sensitive input component 204, microphone component 206, and speaker component 208. Display component 202 may be a screen where information is displayed by UID 212, while presence-sensitive input component 204 may detect objects at and/or near display component 202. Speaker component 208 may be a speaker from which UID 212 plays back audible information, while microphone component 206 may detect audible input provided at and/or near display component 202 and/or speaker component 208.

While shown as an internal component of computing device 210, UID 212 may also represent an external component that shares a data path with computing device 210 to send and/or receive input and output. For example, in one example, UID 212 represents a built-in component of computing device 210 that is located within an external packaging of computing device 210 and is physically connected to the external packaging of computing device 210 (e.g., a screen on a mobile phone). In another example, UID 212 represents an external component of computing device 210 (e.g., a monitor, projector, etc., which shares a wired and/or wireless data path with computing device 210) that is located outside of a packaging or housing of computing device 210 and is physically separate from the packaging or housing of computing device 210.

As one example range, presence-sensitive input component 204 may detect an object, such as a finger or stylus, within two inches or less of display component 202. Presence-sensitive input component 204 may determine a location (e.g., [ x, y ] coordinates) of display component 204 at which an object is detected. In another example range, presence-sensitive input component 204 may detect objects six inches or less from display component 202, although other ranges are possible. Presence-sensitive input component 204 may use capacitive, inductive, and/or optical recognition techniques to determine the location of display component 202 selected by the user's finger. In some examples, presence-sensitive input component 204 also provides output to the user using tactile, audio, or video stimuli, as described with respect to display component 202. In the example of fig. 2, UID 212 may present a user interface (such as a graphical user interface).

Speaker component 208 may include speakers built into a housing of computing device 210, and in some examples, may be speakers built into a set of wired or wireless headphones operatively coupled to computing device 210. Microphone component 206 may detect audible input occurring at or near UID 212. The microphone component 206 may perform various noise cancellation techniques to remove background noise and isolate user speech from the detected audio signal.

UID 212 of computing device 210 may detect a two-dimensional and/or three-dimensional gesture as input from a user of computing device 210. For example, a sensor of UID 212 may detect a motion of the user (e.g., moving a hand, arm, pen, stylus, etc.) within a threshold distance of the sensor of UID 212. UID 212 may determine a two-dimensional or three-dimensional vector representation of the motion and associate the vector representation with a gesture input having multiple dimensions (e.g., a hand wave, a pinch, a clap, a brush stroke, etc.). In other words, UID 212 may detect multi-dimensional gestures without requiring the user to make gestures at or near the screen or surface on which UID 212 outputs information for display. Rather, UID 212 may detect multi-dimensional gestures performed at or near a sensor, which may or may not be located near a screen or surface on which UID 212 outputs information for display.

The one or more processors 240 may implement functions and/or execute instructions associated with the computing device 210. Examples of processor 240 include an application processor, a display controller, an auxiliary processor, one or more sensor hubs, and any other hardware configured to function as: a processor, a processing unit, or a processing device. Modules 220, 222, 226, 230, and 282 may be operable by processor 240 to perform various actions, operations, or functions of computing device 210. For example, processor 240 of computing device 210 may retrieve and execute instructions stored by storage component 248 that cause processor 240 to execute operational modules 220, 222, 226, 230, and 282. The instructions, when executed by processor 240, may cause computing device 210 to store information within storage component 248.

One or more storage components 248 within computing device 210 may store information for processing during operation of computing device 210 (e.g., computing device 210 may store data accessed by modules 220, 222, 226, 230, and 282 during execution at computing device 210). In some examples, storage component 248 is a temporary memory, meaning that the primary purpose of storage component 248 is not long-term storage. The storage component 248 on the computing device 210 may be configured to store information as volatile memory for short periods of time, and thus not retain stored content if power is removed. Examples of volatile memory include Random Access Memory (RAM), Dynamic Random Access Memory (DRAM), Static Random Access Memory (SRAM), and other forms of volatile memory known in the art.

In some examples, storage component 248 also includes one or more computer-readable storage media. In some examples, storage component 248 includes one or more non-transitory computer-readable storage media. Storage component 248 may be configured to store a greater amount of information than is typically stored by volatile memory. Storage component 248 may be further configured to store information as non-volatile storage space for long periods of time, and to retain information after power on/off cycles. Examples of non-volatile memory include magnetic hard disks, optical disks, floppy disks, flash memory, or forms of electrically programmable memory (EPROM) or electrically erasable and programmable memory (EEPROM). Storage component 248 may store program instructions and/or information (e.g., data) associated with modules 220, 222, 226, 230, and 282 and data store 224. Storage component 248 may include memory configured to store data or other information associated with modules 220, 222, 226, 230, and 282, and data store 224.

UI module 220 may include all of the functionality of UI module 120 of computing device 110 of fig. 1, and may perform operations similar to UI module 120, UI module 120 for managing a user interface provided by computing device 210, e.g., at UID 212, to facilitate interaction between a user of computing device 110 and assistant module 222. For example, UI module 220 of computing device 210 may receive information from assistant module 222 that includes instructions for outputting (e.g., displaying or playing back audio) an assistant user interface. UI module 220 may receive information from assistant module 222 over communication channel 250 and use the data to generate a user interface. UI module 220 may send display or audible output commands and associated data over communication channel 250 to cause UID 212 to present a user interface at UID 212.

In some examples, UI module 220 may receive an indication of one or more user inputs detected at UID 212 and may output information regarding the user inputs to assistant module 222. For example, UID 212 may detect voice input from a user and send data regarding the voice input to UI module 220. UI module 220 may send an indication of the voice input to assistant module 222 for further explanation. Assistant module 222 may determine, based on the speech input, that the detected speech input represents a user request by assistant module 222 to perform one or more tasks.

Application module 226 represents all of the various individual applications and services executing on and accessible from computing device 210, which may be accessed by an assistant, such as assistant module 222, to provide information to a user and/or to perform tasks. A user of computing device 210 may interact with a user interface associated with one or more application modules 226 to cause computing device 210 to perform functions. Many examples of application modules 226 may exist and include a fitness application, a calendar application, a search application, a mapping or navigation application, a transportation service application (e.g., a bus or train tracking application), a social media application, a gaming application, an email application, a chat or messaging application, an internet browser application, or any other application that may execute on computing device 210.

Search module 282 of computing device 210 may perform integrated search functions on behalf of computing device 210. Search module 282 may be invoked by UI module 220, one or more application modules 226, and/or assistant module 222 to perform search operations on their behalf. When invoked, the search module 282 may perform search functions, such as generating search queries and performing searches based on search queries generated across various local and remote information sources. The search module 282 may provide the results of the executed search to the calling component or module. That is, in response to invoking the command, search module 282 may output the search results to UI module 220, assistant module 222, and/or application module 226.

Context module 230 may collect context information associated with computing device 210 to define a context of computing device 210. In particular, context module 230 is primarily used by assistant module 222 to define a context for computing device 210 that specifies characteristics of computing device 210 and the physical and/or virtual environment of the user of computing device 210 at a particular time.

As used throughout this disclosure, the term "context information" is used to describe any information that context module 230 may use to define virtual and/or physical environmental features that a computing device and a user of the computing device may experience at a particular time. Examples of contextual information are numerous and may include: sensor information obtained by sensors of computing device 210 (e.g., location sensors, accelerometers, gyroscopes, barometers, ambient light sensors, proximity sensors, microphones, and any other sensors), communication information sent and received by a communication module of computing device 210 (e.g., text-based communications, audio communications, video communications, etc.), and application usage information associated with applications executing at computing device 210 (e.g., application data related to the applications, internet search history, text communications, voice and video communications, calendar information, social media posts and related information, etc.). Other examples of contextual information include signals and information obtained from a sending device external to computing device 210. For example, context module 230 may receive, via a radio or communication unit of computing device 210, beacon information transmitted from an external beacon located at or near the physical location of the merchant.

The assistant module 222 can include all of the functionality of the local assistant module 122A of the computing device 110 of fig. 1 and can perform similar operations as the local assistant module 122A for providing assistants. In some examples, the assistant module 222 may execute locally (e.g., at the processor 240) to provide assistant functionality. In some examples, assistant module 222 may act as an interface to a remote assistant service accessible to computing device 210. For example, assistant module 222 may be an interface or Application Program Interface (API) of remote assistant module 122B of digital assistant system 160 of fig. 1. Assistant module 222 may rely on information stored in data store 224 to perform assistant tasks in addition to any information provided by context module 230 and/or search module 282.

The assistant provided by assistant module 222 may be configured to perform one or more tasks in performing operations to satisfy spoken or verbal requests by a user of computing device 210. For example, an assistant provided by assistant module 222 may receive acoustic input (e.g., audio data) corresponding to an utterance of a user of computing device 210 requesting to perform a particular task using one or more microphones of computing device 210. The assistant provided by the assistant module 222 can analyze the audio data to identify tasks corresponding to the spoken utterance.

In accordance with one or more techniques of this disclosure, if full execution of the task cannot be performed immediately (e.g., within a configurable threshold amount of time), the assistant provided by the assistant module 222 may output an indication that full execution of the task will not be performed immediately. For example, the assistant provided by assistant module 222 may output synthesized speech data that informs the user that complete execution of the task will not be performed immediately for playback by one or more speakers operatively connected to computing device 210. In this way, the assistant provided by the assistant module 222 can prevent the user from restating the utterance as opposed to merely performing the task without notifying the user that the response is to be delayed. By preventing the user from restating the utterance, the assistant provided by the assistant module 222 can avoid performing repeated tasks and avoid determining whether the utterance is repeated or a new task to be performed, which can improve the functionality of the assistant provided by the assistant module 222 (e.g., by reducing processing requirements and/or power consumption).

Fig. 3 is a flow diagram illustrating example operations performed by one or more processors executing an example virtual assistant in accordance with one or more aspects of the present disclosure. Fig. 3 is described below in the context of the system 100 of fig. 1. For example, in accordance with one or more aspects of the present disclosure, the local assistant module 122A, when executed at one or more processors of the computing device 110, may perform one or more of operations 302 and 312. And in some examples, the remote assistant module 122B, when executed at one or more processors of the digital assistant system 160, may perform operations 302 and 312 in accordance with one or more aspects of the present disclosure. For purposes of illustration only, FIG. 3 is described below in the context of computing device 110 of FIG. 1.

In operation, the computing device 110 may receive audio data generated by one or more microphones of the computing device 110, the audio data representing a spoken utterance (302). For example, in response to recognizing the spoken trigger phrase, the computing device 110 may receive audio data representing a spoken utterance provided by the user computing device 110.

The computing device 110 may identify a task to perform based on the audio data (304). For example, if the utterance is a user saying "book me and my wife tickets to the later performance of Les Joyeux one year of anniversary for me and my wife," the computing device 110 may identify a reservation task with the following subtasks: identifying anniversaries, identifying times of future performances by Les Joyeux on the identified anniversaries, booking tickets, and confirming the booking to the user.

The computing device 110 may determine whether full execution of the task will take more than a threshold amount of time (306). As one example, the computing device 110 may determine an estimated amount of time for full execution of the task (e.g., full execution of all subtasks). If the estimated amount of time is greater than the threshold amount of time, the computing device may determine that execution of the task will take more than the threshold amount of time. As another example, the computing device 110 may determine whether a task or any subtask is not suitable for immediate execution. In the above ticket booking Les Joyeux, the computing device 110 may determine that the subtasks of the actual booking of the ticket include computing assistants interacting with employees of the theater, and that tasks or subtasks involving interaction with people other than the user of the computing device 110 are not suitable for immediate execution.

In response to determining that full execution of the task will take more than a threshold amount of time ("Yes" branch of 308), the computing device 110 may output synthesized speech data that informs the user that full execution of the task will not be performed immediately for playback by one or more speakers operatively connected to the computing device 110. For example, the synthesized speech data may indicate "I am working on booking the tickets and a will let you knock on the clicking autocomplete (I am booking tickets and notifying you when the booking is complete)".

In some examples, the synthesized speech data that informs the user that full performance of the task will not be performed immediately may be synthesized speech data that indicates a partial or lower confidence response to the utterance and that the computing assistant will follow in the future with a full or higher confidence response. For example, where the task is a search query, the synthesized speech data may indicate a partial or low confidence response to the search query.

Computing device 110 may perform a task (310). For example, computing device 110 may consult user information data store 124A to determine that the user's anniversary is 6-9 days, output a request to search server system 180 to find when to execute Les Joyeux on the determined anniversary, reserve a ticket for the latest performance on the determined anniversary, and confirm the reservation with the user.

Fig. 4 is a block diagram illustrating an example computing system configured to execute an example virtual assistant in accordance with one or more aspects of the present disclosure. The assistant server system 460 of fig. 4 is described below as an example of the digital assistant system 160 of fig. 1. Fig. 4 shows only one particular example of the assistant server system 460, and many other examples of the assistant server system 460 may be used in other instances and may include a subset of the components included in the example assistant server system 460, or may include additional components not shown in fig. 4.

As shown in the example of fig. 4, the assistant server system 460 includes the user one or more processors 440, one or more communication units 442, and one or more storage components 448. Storage component 448 includes assistant module 422, search module 482, context module 430, and user information data store 424.

The processor 440 is similar to the processor 240 of the computing system 210 of fig. 2. The communication unit 442 is similar to the communication unit 242 of the computing system 210 of fig. 2. Storage 448 is similar to storage 248 of computing system 210 of FIG. 2. The communication channel 450 is similar to the communication channel 250 of the computing system 210 of FIG. 2, and thus each of the components 440, 442, and 448 may be interconnected for inter-component communication. In some examples, communication channel 450 may include a system bus, a network connection, an interprocess communication data structure, or any other method for communicating data.

Search module 482 of assistant server system 460 is similar to search module 282 of computing device 210 and may perform integrated search functions on behalf of assistant server system 460. That is, search module 482 may perform a search operation on behalf of assistant module 422. In some examples, search module 482 may interface with an external search system, such as search server system 180, to perform search operations on behalf of assistant module 422. When invoked, the search module 482 may perform search functions, such as generating search queries and performing searches in accordance with search queries generated across various local and remote information sources. The search module 482 may provide results of the performed search to the calling component or module. That is, search module 482 may output the search results to assistant module 422.

The context module 430 of the assistant server system 460 is similar to the context module 230 of the computing device 210. The context module 430 may collect context information associated with computing devices, such as the computing device 110 of fig. 1 and the computing device 210 of fig. 2, to define a context of the computing device. Context module 430 may be used primarily by assistant module 422 and/or search module 482 to define the context of computing devices that interface with and access services provided by digital assistant system 160. The context may specify characteristics of the computing device and the physical and/or virtual environment of the user of the computing device at a particular time.

Assistant module 422 may include all of the functionality of local assistant module 122A and remote assistant module 122B of fig. 1 and assistant module 222 of computing device 210 of fig. 2. Assistant module 422 may perform similar operations as remote assistant module 122B to provide assistant services accessible via assistant server system 460. That is, the assistant module 422 can act as an interface to a remote assistant service accessible by computing devices that can communicate with the assistant server system 460 over a network. For example, assistant module 422 can be an interface or API of remote assistant module 122B of digital assistant system 160 of fig. 1. Assistant module 422 can rely on information stored in data store 424 to perform assistant tasks in addition to any information provided by context module 430 and/or search module 482.

The assistant provided by assistant module 422 may be configured to perform one or more tasks in performing operations to satisfy spoken or spoken requests of a user of a computing device (e.g., computing device 110 of fig. 1). For example, an assistant provided by the assistant module 422 can receive acoustic input (e.g., audio data) corresponding to an utterance of a user of the computing device requesting performance of a particular task using one or more microphones of the computing device. The assistant provided by the assistant module 422 can analyze the audio data to identify tasks corresponding to the spoken utterance.

In accordance with one or more techniques of this disclosure, if full execution of the task cannot be performed immediately (e.g., within a configurable threshold amount of time), the assistant provided by the assistant module 422 may output an indication that full execution of the task will not be performed immediately. For example, the assistant provided by assistant module 422 may output synthesized speech data that informs the user that complete execution of the task will not be completed immediately for playback by one or more speakers operatively connected to the computing device. In this way, the assistant provided by the assistant module 422 can prevent the user from restating the utterance as opposed to merely performing the task without notifying the user that the response is to be delayed. By preventing the user from restating the utterance, the assistant provided by the assistant module 422 may avoid performing repeated tasks and avoid determining whether the utterance is repeated or a new task to be performed, which may improve the functionality of the assistant provided by the assistant module 422 (e.g., by reducing processing requirements and/or power consumption).

The following examples may illustrate one or more aspects of the present disclosure:

example 1. a method, comprising: receiving, by a computing assistant executing at one or more processors, a representation of an utterance spoken at a computing device; based on the utterance, identifying a task to be performed by the computing assistant; in response to determining, by the computing assistant, that full execution of the task will take more than a threshold amount of time, outputting, for playback by one or more speakers operatively connected to the computing device, synthesized speech data that informs a user of the computing device that full execution of the task will not be performed immediately; and performing, by the computing assistant, the task.

Example 2. the method of example 1, further comprising: determining an estimated amount of time for full execution of the task, wherein determining that full execution of the task will take more than the threshold amount of time comprises determining that the estimated amount of time is greater than the threshold amount of time.

Example 3. the method of example 2, wherein determining the estimated amount of time comprises: determining an estimated amount of time for full execution of the identified task based on a historical time for full execution of a task of the same type as the identified task.

Example 4. the method of any of examples 1-3, wherein outputting the synthesized speech data that informs the user of the computing device that complete execution of the task will not be performed immediately comprises: outputting, for playback by the one or more speakers operatively connected to the computing device, synthesized speech data that includes the estimated amount of time for full execution of the task.

Example 5. the method of any of examples 1 to 4, further comprising: determining that full execution of the task involves the computing assistant performing one or more subtasks; and in response to determining that at least one of the one or more subtasks is marked in the task data store as unsuitable for immediate execution, determining that full execution of the task will take more than the threshold amount of time.

The method of example 6. according to example 5, wherein determining that full execution of the task involves the computing assistant performing one or more subtasks includes: determining that full performance of the task involves the computing assistant performing a subtask that interacts with a person other than the user of the computing device; and determining, based on the task data store, that the subtask interacting with the person other than the user of the computing device is not suitable for immediate execution.

The method of example 6, wherein interacting with the person other than the user of the computing device comprises: outputting, by the computing assistant and as part of a conversation with the person other than the user of the computing device, synthesized speech data for playback by one or more speakers operatively connected to a device associated with the person other than the user of the computing device.

Example 8, the method of example 5, wherein the subtasks marked in the task data store as unsuitable for immediate execution include one or more of: a subtask that the computing assistant interacts with a person other than a user of the computing device; the computing assistant performs a predetermined subtask; a subtask of the computing assistant purchasing tickets; a subtask requiring a large amount of computation; a subtask of the computing assistant interacting with one or more computing systems that are predetermined to be slow; and a subtask requiring a future event to occur.

Example 9 the method of any combination of examples 1-8, wherein the utterance is a first utterance, wherein the synthesized speech data that informs the user of the computing device that complete execution of the task will not be performed immediately is output at a first time, and wherein the method further comprises: the computing assistant receiving, at a second time later than the first time, a representation of a second utterance spoken at the computing device, the second utterance including a request for an execution state of the task; and outputting, for playback by the one or more speakers operatively connected to the computing device, synthesized speech data that informs the user of the computing device of the execution state of the task.

Example 10 the method of any combination of examples 1-9, wherein the utterance is a first utterance, the method further comprising: receiving, by the computing assistant and prior to completion of execution of the task, a representation of a third utterance spoken at the computing device, the third utterance including a request to modify one or more parameters of the task; and performing, by the computing assistant, the task with the modified one or more parameters.

Example 11, the method of example 10, wherein the request to modify one or more parameters of the task includes one or more of: a request to change the time of booking or ticketing; and a request to change the number of people included in the reservation or ticket purchase.

Example 12 the method of any combination of examples 1-11, wherein outputting the synthesized speech data that informs the user of the computing device that complete execution of the task will not be performed immediately comprises: outputting, by one or more speakers operatively connected to the computing device, synthesized speech data that indicates a partial or lower confidence response to the utterance and that the computing assistant is to follow with a full or higher confidence response for playback by the one or more speakers.

Example 13. the method of example 12, wherein identifying the task comprises: a search query is identified based on the utterance, and wherein the synthesized speech data that indicates a partial or lower confidence response to the utterance comprises synthesized speech data that indicates a partial or lower confidence response to the search query.

Example 14. the method of any combination of examples 1-13, wherein the computing assistant is a general purpose computing assistant capable of performing tasks other than the identified task.

Example 15 the method of any combination of examples 1-14, wherein the computing assistant is personalized for the user.

Example 16. the method of any combination of examples 1-15, further comprising: displaying, on the computing device and prior to full execution of the task, a visual indicator that the assistant is performing the task.

Example 17. a computing system, comprising: a communication module; at least one processor; and at least one memory including instructions that, when executed, cause the at least one processor to provide an assistant configured to perform the method of any combination of examples 1-16.

Example 18. a computing system, comprising: a communication module; and means for performing the method of any combination of examples 1-16.

Example 19 a computer-readable storage medium comprising instructions that, when executed, cause at least one processor of a computing system to perform the method of any combination of examples 1-16.

In one or more examples, the functions described may be implemented in hardware, software, firmware, or any combination thereof. If implemented in software, the functions may be stored on or transmitted over as one or more instructions or code on a computer-readable medium and may be executed by a hardware-based processing unit. The computer readable medium may include: one or more computer-readable storage media corresponding to a tangible medium such as a data storage medium; alternatively, communication media includes any medium that facilitates transfer of a computer program from one place to another, such as according to a communication protocol. In this manner, the computer-readable medium may generally correspond to: (1) a tangible computer-readable storage medium that is non-transitory; or (2) a communication medium such as a signal or carrier wave. A data storage medium may be any available medium that can be accessed by one or more computers or one or more processors to retrieve instructions, code and/or data structures to implement the techniques described in this disclosure. The computer program product may include a computer-readable medium.

By way of example, and not limitation, such computer-readable storage media can comprise RAM, ROM, EEPROM, CD-ROM or other optical disk storage, magnetic disk storage or other magnetic storage devices, flash memory, or any other storage medium that can be used to store desired program code in the form of instructions or data structures and that can be accessed by a computer. Also, any connection is properly termed a computer-readable medium. For example, if the instructions are transmitted from a website, server, or other remote source using a coaxial cable, fiber optic cable, twisted pair, Digital Subscriber Line (DSL), or wireless technologies such as infrared, radio, and microwave, then the definition of medium includes coaxial cable, fiber optic cable, twisted pair, DSL, or wireless technologies such as infrared, radio, and microwave. It should be understood, however, that one or more of a computer-readable storage medium and a data storage medium do not include a connection, carrier wave, signal, or other transitory medium, but are instead directed to a non-transitory tangible storage medium. Disk and disc, as used herein, includes Compact Disc (CD), laser disc, optical disc, Digital Versatile Disc (DVD), floppy disk and blu-ray disc where disks usually reproduce data magnetically, while discs reproduce data optically with lasers. Combinations of the above should also be included within the scope of computer-readable media.

The instructions may be executed by one or more processors, such as one or more Digital Signal Processors (DSPs), general purpose microprocessors, Application Specific Integrated Circuits (ASICs), field programmable logic arrays (FPGAs), or other equivalent integrated or discrete logic circuitry. Thus, the term "processor" as used herein may refer to any of the foregoing structure or any other structure suitable for implementing the techniques described herein. In addition, in some aspects, the functions described herein may be provided within dedicated hardware and/or software modules. Also, the techniques may be fully implemented in one or more circuits or logic elements.

The techniques of this disclosure may be implemented in a variety of devices or apparatuses including a wireless handset, an Integrated Circuit (IC), or a set of ICs (e.g., a chipset). Various components, modules, or units are described in this disclosure to emphasize functional aspects of devices configured to perform the disclosed techniques, but do not necessarily require implementation by different hardware units. Rather, as noted above, the various units may be combined in hardware units or provided by a collection of interoperative hardware units in combination with suitable software and/or firmware, the collection including one or more processors as described above.

Various embodiments have been described. These and other embodiments are within the scope of the following claims.

27页详细技术资料下载

上一篇：一种医用注射器针头装配设备

下一篇：间隔件和硬盘驱动器装置

Computing delayed responses of an assistant

相关技术

网友询问留言