System, method and apparatus for providing image shortcuts for assistant applications
阅读说明:本技术 为助理应用提供图像快捷方式的系统、方法和装置 (System, method and apparatus for providing image shortcuts for assistant applications ) 是由 马尔钦·诺瓦克-普日戈兹基 格克汗·巴克尔 于 2018-09-07 设计创作,主要内容包括:生成和/或利用响应于确定一个或多个特征存在于来自用户的计算设备的相机的图像中(例如,存在于来自相机的实时图像馈送中)而使得执行一个或多个相应的计算机动作的图像快捷方式。可以响应于用户接口输入诸如语音命令来生成图像快捷方式。例如,用户接口输入可以指示自动化助理响应于在相机的视野中存在具有某些特征的对象来执行一个或多个动作。随后,当用户将他们的相机指向具有这些特征的对象时,助理应用可以使得动作被自动地执行。例如,助理应用可以根据图像快捷方式,使得数据被呈现和/或可以控制远程设备。(An image shortcut is generated and/or utilized that causes one or more respective computer actions to be performed in response to determining that one or more features are present in an image from a camera of a computing device of a user (e.g., present in a real-time image feed from the camera). The image shortcut may be generated in response to a user interface input, such as a voice command. For example, the user interface input may instruct the automated assistant to perform one or more actions in response to the presence of an object having certain features in the field of view of the camera. Subsequently, when the user points their camera at an object having these features, the assistant application may cause the action to be automatically performed. For example, the assistant application may cause data to be presented and/or may control the remote device according to the image shortcut.)
1. A method implemented by one or more processors, the method comprising:
determining, by the assistant application, that a real-time image feed from a camera of the computing device includes a graphical representation of an object, the determining comprising: processing an image from the camera using one or more image processing techniques;
identifying an image shortcut setting associated with the object, the image shortcut setting corresponding to a pre-configuration procedure by which the assistant application responds to image content provided in the real-time image feed;
generating a query associated with the image shortcut setting, the query including a data identifier of a type of data provided by the assistant application according to the image shortcut setting;
receiving data based on the query, the data corresponding to a type of the data associated with the image shortcut setting; and
in response to determining that the real-time image feed includes a graphical representation of the object, and based on the image shortcut settings stored in association with the object:
causing the object data to be rendered at the computing device with the real-time image feed.
2. The method of claim 1, wherein the query further comprises a contextual identifier for context of the real-time image feed from the camera.
3. The method of claim 2, wherein the contextual identifier identifies a location at which the camera provides the real-time image feed.
4. The method of claim 1, wherein the image shortcut settings are preconfigured by a user through a spoken command, the spoken command processed at least in part via the housekeeping application.
5. The method of claim 1, further comprising:
transmitting the query to a separate application at the computing device, wherein the data is received from the separate application.
6. The method of claim 1, wherein the type of data corresponds to dynamic data that changes independently of the assistant application.
7. The method of claim 6, wherein the data is received from a remote device that is responsive to a query from the assistant application.
8. A system, comprising:
a camera;
a display device;
a speaker;
one or more processors in communication with the camera, the display device, and the speaker; and
a memory configured to store instructions that, when executed by the one or more processors, cause the one or more processors to perform steps comprising:
generating an object identifier based on an image from a live image feed provided by the camera, wherein generating the object identifier comprises: processing the image using one or more image processing techniques;
determining that the object identifier corresponds to an image shortcut setting, wherein the image shortcut setting causes data to be provided in response to an object appearing in the real-time image feed;
sending a query to a remote device configured to retrieve data in response to receiving the query;
receiving data associated with the image shortcut setting from the remote device; and
causing the data to be presented via at least one of: the display device and the speaker.
9. The system of claim 8, wherein the data is presented simultaneously with the live image feed displayed on the display device.
10. The system of claim 8, wherein the steps further comprise:
determining a contextual identifier for the image from the real-time image feed, wherein the query includes the contextual identifier.
11. The system of claim 10, wherein the contextual identifier specifies a location at which the image was generated by the camera.
12. The system of claim 8, further comprising a microphone, wherein the steps further comprise:
receiving audio data from the microphone, the audio data corresponding to a request from a user to cause generation of the image shortcut setting.
13. The system of claim 12, wherein the audio data is received when the camera provides different images.
14. The system of claim 13, wherein the steps further comprise:
identifying an object description from the audio data;
determining a correspondence between the object description and the different images; and
generating the image shortcut setting based at least on the object description.
15. At least one non-transitory computer-readable medium configured to store instructions that, when executed by one or more processors, cause the one or more processors to perform steps comprising:
receiving audio data corresponding to a request to create an image shortcut setting;
receiving image data from a real-time image feed generated by a camera of a computing device;
identifying one or more computer actions to perform from the audio data;
identifying, from the image data, an object identifier corresponding to an object at which a camera of the computing device is pointed, wherein identifying the object identifier comprises: processing the image data using one or more image processing techniques;
generating the image shortcut setting based on the request and the object identifier, wherein the image shortcut setting is configured to cause the one or more computer actions to be performed in response to identifying the object identifier from subsequent image data of a subsequent real-time image feed from the camera; and
in response to identifying the object identifier from the subsequent image data, causing the one or more computer actions to be performed in accordance with the image shortcut setting.
16. The non-transitory computer-readable medium of claim 15, wherein the one or more computer actions comprise sending a command to at least one peripheral device, wherein the command causes a state of the at least one peripheral device to be changed.
17. The non-transitory computer-readable medium of claim 15, wherein identifying the object identifier corresponding to the object comprises: a plurality of object identifiers corresponding to a plurality of different objects at which a camera of the computing device is pointed are identified.
18. The non-transitory computer-readable medium of claim 17, wherein the image shortcut setting is based on the plurality of object identifiers.
19. The non-transitory computer readable medium of claim 15, wherein the steps further comprise:
identifying a contextual identifier of the request from the audio data or the image data, wherein the image shortcut setting is generated further based on the contextual identifier.
20. The non-transitory computer-readable medium of claim 19, wherein the contextual identifier identifies at least one time or at least one location, and wherein causing performance of the one or more computer actions in accordance with the image shortcut setting is further in response to subsequent image data provided at a time matching the at least one time or at a location matching the at least one location.
Background
Humans may engage in human-computer conversations using interactive software applications referred to herein as "automated assistants" (also referred to as "digital assistants," "chat robots," "interactive personal assistants," "intelligent personal assistants," "session agents," and the like). For example, a human being (which may be referred to as a "user" when they interact with the automated assistant) may provide commands and/or requests using a spoken natural language input (i.e., an utterance) that may be converted to text and then processed in some cases and/or by providing textual (e.g., typed) natural language input. While using an automated assistant may allow for easier access to information and more convenient ways to control peripheral devices, in some cases, providing spoken input and/or text commands may be difficult. For example, a user may provide verbal commands to an automated assistant application in the morning when others in the home may be sleeping. These and other problems may be caused by the dependency of the assistant application on spoken commands. However, there may be ways to provide more complex commands, provide commands using less laborious input, provide commands that protect the privacy of the respective user, and/or provide commands with other or alternative benefits.
Disclosure of Invention
Embodiments disclosed herein relate to generating and utilizing an image shortcut that causes one or more corresponding computer actions to be performed in response to determining that one or more features are present in an image from a camera of a computing device of a user (e.g., present in a real-time image feed from the camera). In various embodiments, in response to a user-provided voice and/or typed user interface input, an image shortcut is generated and stored in association with the user. For example, the user may provide the spoken input "where I direct the camera at a train schedule," to the automated assistant application. In response to the spoken input, an image shortcut may be generated that causes graphical and/or audible presentation of train schedule information at a user's computing device in response to determining that a real-time image feed from a camera of the computing device captured an image having a feature indicating "train platform". For example, future images captured via the computing device may be processed (locally and/or remotely at the computing device) to identify features indicative of "train platforms," such as: classification of images as "train platform" images; as a classification of a portion of an image such as "train", "person", "crowd", "train track", and/or other classification indicating "train platform", and so forth. Future image capture "train platforms" may be determined based on the presence of these features, and train schedule information presented in response. The train schedule information may be presented audibly and/or graphically on the same computing device that captured the image and/or at another computing device linked to the computing device that captured the image (e.g., based on two computing devices connected to the same network through user accounts used at the two computing devices). In some implementations, the current location of the computing device and/or other contextual (contextual) data may also be utilized to determine that the image captured "train platform. For example, determining an image capture train platform may be based on two aspects: determining that a feature of the image indicates "train platform"; and determining that the current location of the computing device has a "train stop" classification. Various image processing techniques may be utilized to determine the classification and/or other characteristics of an image. For example, some techniques may utilize a deep neural network model that accepts an image as input and utilizes learned parameters to generate a metric that indicates which of a plurality of respective features is present in the image as an image-based output.
In some implementations, the computer action performed on the image shortcut includes generating a query, sending the generated query, receiving response data in response to sending the query, and/or presenting all or part of the response data. In some of those embodiments, the image shortcut is associated with a query or query template that indicates the type of data to be provided according to the image shortcut. For example, continuing with the "train platform" example above, the generated image shortcut may define a query template for a "train schedule" query, a "current location train schedule" or a "train schedule from [ current location ] to [ destination location ]. In the query template, the placeholder "[ current location ]" may be populated with the current location of the computing device. The current location may be specific coordinates, a geographic area, or text or other information indicating the train station where the computing device is currently located. The placeholder "[ destination location ]" may be populated with context-related destinations, such as a "work" destination if the user was on the morning of a weekday, a "home" destination if the user was on the evening of a weekend, and an "appointment" location for the user corresponding to a temporally proximate appointment stored in the user's electronic calendar. In some of those embodiments, a query or "populated" query template may be used to determine train schedule information to provide in response to determining an image capture train platform. For example, the query or populated query template can be sent to a search engine, application, and/or other resource, in response to receiving a responsive train schedule, and audibly or graphically presenting the responsive train schedule.
In some implementations, the computer action to be performed on the image shortcut additionally and/or alternatively includes transmitting one or more commands that cause a state change of one or more peripheral devices (e.g., internet of things (IoT) devices). For example, the user may provide the spoken input "while I direct the camera at my alarm in the generating, turn on my button lights and turn on my coffee maker plug" to the automated assistant application. In response to the spoken input, an image shortcut may be generated that captures an image having a feature indicating any "alarm clock" (or a particular alarm clock of the user) in response to determining that a real-time image feed from a camera of the user's computing device causes a "bedroom light" and a "coffee maker plug" of the user to be turned on. For example, the image shortcut may cause a command to be sent that causes the networking light labeled "bedroom light" to be turned "on" in response to making this determination, and also cause a command to be sent that causes the networking plug labeled "coffee maker plug" to be turned on. One or more Application Programming Interfaces (APIs) and/or other communication protocols may be utilized in generating and/or transmitting commands that result in a change in the state of a device. In some implementations, the image shortcut causes the "bedroom light" and "coffee machine plug" to be turned on based on determining that the image has a feature indicating an alarm and capturing the image at "mourning" (e.g., based on a spoken input including "in the mourning") and/or capturing the image at the user's "home" location (e.g., based on a spoken input including "myalarm clock"). Further, in some implementations, the automated assistant, when generating the image shortcut, can prompt the user to capture an image of the user's particular alarm, after which the image shortcut is triggered only in response to capturing a real-time image feed of images having characteristics that match the characteristics of the user's particular alarm (as derived from the image captured in response to the prompt).
In some implementations, the computer action performed on the image shortcut additionally and/or alternatively includes sending one or more electronic communications to other users. For example, the user may provide the spoken input "while I direct the camera at my car key while work, the digit me a traffic update and text the traffic update to my wife" to the automated assistant application. In response to the spoken input, an image shortcut may be generated that, in response to determining that the user is at work and that a real-time image feed from a camera of the user's computing device captures an image having a feature indicating "car key," causes a traffic update to be presented on the computing device (and/or another computing device of the user), and causes a text message including the traffic update to be automatically generated and automatically sent to the user's "wife" contact.
As another example of an embodiment disclosed herein, a user may wish to see their schedule stored on their portable computing device while performing his/her morning routine. While the user may use the spoken command to invoke the automated Assistant to view the schedule (e.g., "is" the Assistant, can display my schedule) "), the user may instead configure the automated Assistant to automatically provide the schedule when the camera of the portable computing device is pointed at an object having one or more particular features. The user may configure this setting using spoken commands such as "Assistant, while I direct the camera at a mirror in the morning" please show my schedule ". In response, the automated assistant can cause the spoken command to be parsed to identify text related to the new image shortcut configuration. A new image shortcut configuration may then be generated and stored for the user at a later time. For example, the new image shortcut configuration may be in response to determining: currently "morning"; and the image captured by the camera includes a mirror such that the user's current schedule is audibly and/or graphically provided to the user via the portable computing device. For example, the next morning, the user may open a camera application on their portable computing device and point the camera at their mirror. In response to being "morning" and pointing the camera at the mirror, the automated assistant can cause a user schedule for the day to be presented on the portable computing device.
In various implementations, the above-described and other techniques described herein enable a user to interact with and obtain relevant output from an automated assistant without requiring the user to provide laborious typing input and/or without requiring the user to provide spoken input that can cause privacy concerns (e.g., if someone else is nearby). Moreover, various embodiments may reduce the amount of input required to obtain a relevant output relative to other techniques, which may save computing resources of the client device and/or help the user address speech and/or flexibility issues. In addition, various embodiments disclosed herein perform processing of the image locally at the client device to determine features of the object contained in the image. In some of those various implementations, the client device further determines locally whether to instantiate an image shortcut setting based on the determined features and, optionally, also based on locally determined context data (e.g., current time, current date of week, current location of the client device). Further, the client device itself may locally perform a computer action of the image shortcut setting in response to determining that the image shortcut setting is instantiated, or may send a query and/or other data to one or more remote devices to cause one or more computer operations to be performed (without communicating the image and/or context data). In this manner, image and/or context data can be maintained on the client device without the need to send images from the client device to cause the computer action of the image shortcut setting to be performed, thereby enhancing the security of such image and/or context data.
In some embodiments, a method implemented by one or more processors is set forth as including steps such as: determining, by the assistant application, that the real-time image feed from the camera of the computing device includes a graphical representation of the object. The determining may include processing the image from the camera using one or more image processing techniques. Steps may also include identifying an image shortcut setting associated with the object. The image shortcut settings may correspond to a provisioning process by which the assistant application responds to image content provided in the real-time image feed. The steps may further include generating a query associated with the image shortcut setting. The query may include a data identifier of the type of data provided by the assistant application according to the image shortcut setting. Additionally, the steps may include receiving data based on the query, the data corresponding to a type of data associated with the image shortcut setting. In response to determining that the real-time image feed includes a graphical representation of an object, and based on image shortcut settings stored in association with the object, the one or more processors may perform the step of causing object data to be rendered at the computing device with the real-time image feed.
The query may further include a contextual identifier for a context of the real-time image feed from the camera. The contextual identifier may identify a location at which the camera provides the real-time image feed. The image shortcut settings may be pre-configured by the user through verbal commands that are processed, at least in part, via the assistant application. The steps may also include transmitting the query to a separate application at the computing device, wherein the data is received from the separate application. The type of data may correspond to dynamic data that changes independently of the assistant application. Data may be received from a remote device that is responsive to a query from an assistant application.
In other embodiments, a system may be described as comprising a camera; a display device; a speaker; one or more processors in communication with the camera, the display device, and the speaker; and a memory configured to store instructions that, when executed by the one or more processors, cause the one or more processors to perform steps comprising: an object identifier is generated based on an image from a real-time image feed provided by a camera. Generating the object identifier may include processing the image using one or more image processing techniques. The steps may also include determining that the object identifier corresponds to an image shortcut setting. The image shortcut settings may cause data to be provided in response to objects appearing in the real-time image feed. The steps may further include sending the query to a remote device configured to retrieve data in response to receiving the query; receiving data associated with an image shortcut setting from a remote device; and causing the data to be presented via at least one of a display device and a speaker.
The data may be presented simultaneously with a live image feed displayed on the display device. The steps may further include determining, by the real-time image feed, a contextual identifier of the image, wherein the query includes the contextual identifier. The context identifier may specify a location at which the image is generated by the camera. The system may further include a microphone, and the steps may further include receiving audio data from the microphone corresponding to a request from the user to cause generation of the image shortcut setting. The audio data may be received while the camera provides different images. The steps may further include identifying an object description from the audio data; determining a corresponding relation between the object description and different images; and generating an image shortcut setting based at least on the object description.
In other embodiments, a non-transitory computer-readable medium is set forth as storing instructions that, when executed by one or more processors, cause the one or more processors to perform steps comprising: receiving audio data in relation to the request for the assistant application to create the image shortcut setting; and receiving image data from a live image feed generated by a camera of the computing device, wherein the assistant application is accessible by the computing device. Steps may also include identifying a request for data from an assistant application from the audio data and identifying an object identifier from the image data that corresponds to an object at which a camera of the computing device is pointed. The step of identifying the object identifier may comprise processing the image data using one or more image processing techniques. The steps may further include generating an image shortcut setting based on the request and the object identifier. The image shortcut setting may be configured to cause the assistant application to respond to a real-time image feed generated by the camera. Additionally, the steps may include providing different image data associated with the object identifier in response to the camera such that the assistant application provides the data according to the image shortcut setting.
In some embodiments, the steps may include identifying an object previously pointed to by the camera from the different image data, and accessing a remote service that provides data corresponding to the request. Identifying an object identifier corresponding to an object may include identifying a plurality of object identifiers corresponding to a plurality of different objects pointed at by a camera of the computing device. The image shortcut setting may be further based on a plurality of object identifiers. In some implementations, the step can also include identifying the requested contextual identifier from the audio data or the image data. Image shortcut settings may be generated further based on the contextual identifier. The context identifier may identify a time of the request, and the assistant application may further provide data in response to the camera providing different image data at the identified time.
In other embodiments, a method implemented by one or more processors is presented that includes processing an image from a camera of a computing device using one or more image processing techniques, and determining that the image contains one or more features based on the processing. The method further includes identifying an image shortcut setting associated with the one or more features. The image shortcut setting defines one or more computer actions to be performed in response to determining that the image includes one or more features. The method further includes performing one or more computer actions in response to determining that the image includes one or more features and based on image shortcut settings stored in association with the one or more features.
The one or more computer actions may include causing a command to be sent to the at least one peripheral device, wherein the command causes a state of the at least one peripheral device to be changed. One or more computer actions may additionally or alternatively include sending a query, receiving data in response to the query, and causing the data to be presented at the computing device and/or at another computing device linked to the computing device. The query may optionally be generated based on one or more features and/or based on contextual data associated with the capture of the image. The one or more computer actions may additionally or alternatively include causing an electronic communication (e.g., email, text message) to be sent to a further computing device of a further user. The images may be from a real-time image feed of the camera.
In other embodiments, a method performed by one or more processors is presented, the method comprising: receiving audio data corresponding to a request to create an image shortcut setting; and receiving image data from a real-time image feed generated by a camera of the computing device. The method further comprises the following steps: identifying one or more computer actions to perform from the audio data; and identifying, from the image data, an object identifier corresponding to an object at which a camera of the computing device is pointed. Identifying the object identifier includes processing the image data using one or more image processing techniques. The method further includes generating an image shortcut setting based on the request and the object identifier. The image shortcut setting is configured to cause one or more computer actions to be performed in response to identifying an object identifier from subsequent image data of a subsequent real-time image feed from the camera. The method further includes causing one or more computer actions to be performed according to the image shortcut setting in response to identifying the object identifier from the subsequent image data.
The one or more computer actions may include sending a command to the at least one peripheral device, wherein the command causes a state of the at least one peripheral device to be changed. The method may further include identifying a contextual identifier of the request from the audio data or the image data, and generating the image shortcut setting further based on the contextual identifier. The contextual identifier may identify at least one time and/or at least one location, and such that performing the one or more computer actions according to the image shortcut setting may be further responsive to subsequent image data provided at a time matching the at least one time, and/or at a location matching the at least one location.
Furthermore, some embodiments include one or more processors of one or more computing devices, wherein the one or more processors are operable to execute instructions stored in an associated memory, and wherein the instructions are configured to cause performance of one or more methods described herein. The processor may include one or more Graphics Processing Units (GPUs), Central Processing Units (CPUs), and/or Tensor Processing Units (TPUs). Some embodiments include one or more non-transitory computer-readable storage media storing computer instructions executable by one or more processors to implement one or more methods described herein.
It should be understood that all combinations of the foregoing concepts and additional concepts described in greater detail herein are considered a part of the subject matter disclosed herein. For example, all combinations of claimed subject matter appearing at the end of this disclosure are considered part of the subject matter disclosed herein.
Drawings
FIG. 1 illustrates a system capable of responding to camera images to provide automated assistants according to image shortcut settings created by a user.
FIG. 2A shows a view of a computing device operating an assistant application capable of generating image shortcut settings according to instructions from a user.
FIG. 2B shows a view of a computing device providing a response to a user initiating an image shortcut setting by pointing a camera of the computing device at an object associated with the image shortcut setting.
FIG. 3A illustrates a view of a computing device for configuring image shortcut settings for an assistant application.
FIG. 3B illustrates a view of a computing device operating an assistant application according to an image shortcut setting.
Fig. 4A illustrates a view of a computing device for arranging image shortcut settings that may cause an assistant application to provide data or perform actions based at least on contextual data received by the assistant application.
FIG. 4B shows a view of a user pointing the camera of the computing device to a train station in order for the assistant application to perform an action according to a previously generated image shortcut setting.
FIG. 5A illustrates a view in which an assistant application running on a computing device creates an image shortcut setting for a user operating in response to the user pointing a camera of the computing device at a mirror.
FIG. 5B shows a view of a user invoking an assistance application to perform a function according to an image shortcut setting.
FIG. 6 illustrates a method for causing an assistant application to provide data to a user according to an image shortcut setting of the assistant application.
FIG. 7 illustrates a method for generating an image shortcut setting based at least on a command from a user.
FIG. 8 is a block diagram of an example computer system.
Detailed Description
Embodiments disclosed herein relate to image shortcut settings that may cause an assistant application to perform one or more functions when a camera of a computing device is pointed at one or more objects. For example, a user may wish to see their schedule stored on their portable computing device while performing their morning routine. While the user may use spoken commands to invoke the automated Assistant to view the schedule (e.g., "is," do the automated Assistant view my agenda "), the user instead configures the automated Assistant to automatically provide the schedule when the camera of the portable computing device is pointed at an object having one or more particular features. The user may configure this setting using a spoken command such as "Assistant, while I direct the camera at a mirror in the generating, ease of viewing my agenda". In response, the automated assistant can cause the spoken command to be parsed to identify text related to the new image shortcut configuration. A new image shortcut configuration may then be generated and stored for use at a later time. For example, the new image shortcut configuration may be in response to determining: the image that is currently "morning" and is captured by the camera includes a mirror so that the user's current schedule is audibly and/or graphically provided to the user via the portable computing device. For example, the next morning, the user may open a camera application on their portable computing device and point the camera at their mirror. In response to being "morning" and pointing the camera at the mirror, the automated assistant can cause a user's schedule of the day to be presented on the portable computing device and/or on the user's other computing device. For example, a user may point the camera of a portable watch toward their mirror and, in response, may audibly and/or graphically present the user's schedule on the user's smartphone, the user's smart television, or the user's separate voice-activated speaker.
In some implementations, a user can configure an automated assistant to provide information related to an image generated by a camera of a portable computing device. For example, when the user points the camera at the sky, the user may verbally instruct the automated Assistant to provide weather information (e.g., "Assistant, while I face the camera at the sky, please provide weather information to I when I are facing the sky"). Thereafter, when the user faces the camera to the sky, the automated assistant can query the weather application or website for weather data and present the weather data on the display of the portable computing device and/or on the display of another computing device. In some implementations, an automated assistant can use the geographic location of the portable computing device in conjunction with a camera captured image of the sky to provide weather information. For example, the automated assistant may generate a query that includes location and/or object information derived from the image. The query may be provided to a weather application, a weather website, and/or any other source of weather information. The automated assistant can then receive weather information specific to the location and/or image captured by the camera. The location-specific weather information may include forecasts corresponding to temperature, humidity, precipitation, cloud cover, and/or any other location-specific weather information.
In some implementations, the image captured by the camera may be processed at a computing device or remote device that provides a service to identify objects within the image, so that the information provided by the automated assistant may be based on the identified objects. For example, when a user configures the automated assistant to provide weather information when the user points the camera at the sky, objects in the sky may be identified and used as a basis for providing weather information. Such objects may include clouds or the absence of clouds. If no cloud is present, the automated assistant can provide weather information based at least on an assumption that the user can infer the state of the cloud coverage, without detailed information about the cloud coverage.
In some implementations, the user can configure the automated assistant to respond to images from the camera while also considering the time and/or location at which the image was captured and/or any other contextual data. For example, when a user points a camera to a train station, the user may indicate to the automated Assistant that they want traffic information (e.g., "is" available, is you able to provide traffic information when I point the camera to the train station. In response, the automated assistant may provide traffic information when the user subsequently points the camera at the train or train stop. The traffic information may be based on the time of day, day of week, and/or particular date that the camera captured the image of the train or train station, the current location of the portable computing device, stored user personal information (e.g., the user's calendar, the user's home or work address), and/or any other contextual data. For example, if an image is captured in the morning of a weekday, the automated assistant may determine traffic information from the current location of the portable computing device to the user's work address. For example, an automated assistant may generate and submit a query seeking a public transportation route from a current location to a work location. Traffic information may be received in response to the query and provided to the user at the portable computing device for presentation. On the other hand, if the time the user captured the image is at night, the automated assistant may retrieve and provide traffic information related to traveling to the user's home. As yet another example, if the user's calendar indicates an upcoming appointment at a particular location, the automated assistant may retrieve and provide traffic information related to traveling to the particular location. In other implementations, the automated Assistant may be configured by the user to provide media to read, view, or listen to (e.g., articles, podcasts, etc.) when the user points their camera at the train or train station at night (e.g., "Assistant, couldend program product with a spot book I-point the camera at a train at a travel word right" if I've provided the podcast when I're pointing the camera to the train at night of work "). In these and other ways, the user does not have to provide verbal or textual commands to invoke the automated assistant to provide information to the user.
In some implementations, a user can configure an automated assistant to provide information stored on or accessible through their device in response to the user pointing a camera at a particular object. For example, a user may store a bicycle lock password in a note of their portable computing device. When the user points the camera of the portable computing device at the bicycle lock, the user may instruct the automated assistant to create an image shortcut of the bicycle lock password. In other words, the user may invoke the automated Assistant with commands such as "Assistant, while I direct the camera at my bike lock, please provide I with the bike lock code in my notes when I point the camera to the bike lock. Thereafter, when the user points the camera at the bicycle lock, the automated assistant can cause the bicycle lock password to be presented, or can cause the note application including the bicycle lock password to be opened in a state in which the bicycle lock password is presented to the user. The bicycle lock password may be selectively presented simultaneously with the camera application providing a real-time image feed of the bicycle lock at which the camera is pointed.
In other implementations, the automated assistant may be configured to provide information from a remote device when a camera of the portable computing device is pointed at a particular object. For example, the user may configure the automated assistant to provide a security code for the vacation home when the user points the camera of the portable computing device at the door of the vacation home. An automated Assistant may be configured by responding to a command such as "Assistant, please provide the security code of a door when I point the camera at the door". The information (e.g., security code) provided by the automated assistant may be based on images captured by the camera, the location of the portable computing device, and/or data from a remote device. For example, the security code may be extracted from an email sent to the user and stored at an email server accessible to the portable computing device. The automated assistant can provide a query to an email server (or related server) to retrieve the security code. The query may optionally include an identifier of the location where the image was taken to identify the security code from a plurality of candidate security codes extracted from other emails of the user (e.g., security codes that may correspond to other locations). When the automated assistant retrieves the security code, the security code may be presented on the display of the portable computing device when the user points the camera at the doorway of the vacation home. Alternatively, the automated assistant may provide the security code through a different medium (e.g., through a text message, an audio announcement, etc.). For example, an automated Assistant may convert the security code to audio and then project it over a speaker of the portable computing device (e.g., "is the person who is the camera at the door, is the person who is the security code for the door. Thereafter, The automated assistant can audibly provide The security code (e.g., "The security code from your email is 2,7,1,8,2 (security code in email is 2,7,1,8, 2)") when The user points The camera at The door.
Turning now to the drawings, FIG. 1 illustrates a
Although a
The
Server device 112 may include other applications and/or scripts for processing data provided by
In some implementations, the
The image shortcut setting 120 may be preset with the
The speech-to-
For example, the phrase "Assistant, please find myshopping list at the repeater" may be processed as text by speech-to-
In some implementations, the server device 112 can include one or more machine learning models that are trained using images previously captured by the camera 106 to expedite the process of identifying objects in the images. Further, the
In other implementations, contextual data combined with image data from the camera and text data from a user's spoken command may be used to generate image shortcut settings. For example, when the user provides the command "Assistant, please provide my cropping list I point my camera at the repeater," the
In other implementations, the image shortcut setting 120 may be set to cause the
FIG. 2A shows a view 200 of a
An assistant application accessible to the
In some implementations, generating the image shortcut setting may be based on sensor data received from one or more sensors of the
FIG. 2B shows a view 200 of a
In some implementations, images from a real-time image feed provided at the
In some implementations, context data or contextual identifiers can be inferred from the sampled images and used to determine whether the conditions set by the image shortcut are satisfied. For example, the user may instruct the automated Assistant to create an image shortcut setting based on conditions inferred from the camera image (e.g., "Assistant, while I point the camera is not available in the mourning," show send my with a text of this system "Good mourning!' (Assistant, please send me wife the text of" Good morning! "when I point the camera to the sky in the morning"). Subsequently, the user may point the camera at the morning sky (i.e., the sun crosses the horizon), which may be processed by the assistant application,
FIG. 3A shows a
In some implementations, the image that is the initial topic of the image shortcut setting can be processed to identify a number of objects within the image that can be used to trigger actions of the assistant application. For example, although the user has suggested that the assistant application provide a bicycle lock password when the camera is pointed at the bicycle lock, the image at the
In some implementations, the user can tap or draw a line around a portion of the image of the condition object intended for the image shortcut. For example, a user may point their camera at an object such as the
In some implementations, the
FIG. 3B illustrates a
In some implementations, the process of the action associated with the image shortcut setting (e.g., providing data) can be performed without the
In other implementations, the action associated with the image shortcut setting may be performed when the
Fig. 4A shows a
The data provided by the assistant application may change depending on when the user invokes the assistant application via the image shortcut setting. For example, the assistant application may infer a destination from a calendar application, historical travel data, and/or any other data source that may include location data. The inferred destination may depend on the time of day that the user pointed the camera to train
FIG. 4B shows a
If the assistant application is able to collect data related to the user's location, the travel schedule of the train through the
FIG. 5A shows a
FIG. 5B shows a view 516 of the
In some implementations, the user can cause the automated assistant to perform an action associated with the image shortcut setting by opening the image for display at the
FIG. 6 illustrates a
The
The
The
The
FIG. 7 illustrates a method 700 for generating an image shortcut setting based at least on a command from a user. Method 700 may be performed by a computing device, a server device, and/or any other device capable of interpreting commands from a user. The method 700 may include block 702: audio data corresponding to a request to the assistant application to create an image shortcut setting is received. The image shortcut settings may correspond to a process by which the assistant application responds to one or more objects that are present within a field of view of a camera of the computing device. The object may be specified by the user and identified by the assistant application or a separate application capable of recognizing the object using camera data and computer vision algorithms.
The method 700 may include block 704: image data is received from a real-time image feed generated by a camera of a computing device. The real-time image feed may be image data or sensor data generated by the camera in real-time as the camera is pointed at the object. The real-time image feed may be graphically represented at a graphical user interface (e.g., a touch display interface) of the computing device, allowing a user to confirm that the object is within the field of view of the camera. This also allows the user to provide commands for creating image shortcut settings while displaying the object at the computing device.
The method 700 may include block 706: a request for data from an assistant application is identified from the audio data. The request for data may be recognized by converting audio data to text data through a speech recognition algorithm executed at a computing device or a remote device (e.g., a server device). In some implementations, instead of receiving audio data at block 702, text data can be received as a query or a request for an automated assistant to create an image shortcut setting. The text data may be received at an assistant interface, such as a graphical user interface that includes one or more fields for receiving manually entered text data. The text data may then be processed by the assistant application to determine that the request is included in the data received at block 702 and to identify the type of data that the user has requested. For example, a request to receive weather data in response to a user pointing their camera to the sky may be embedded in text data extracted from audio data or manually entered text data.
The method 700 may also include block 708: an object identifier corresponding to an object at which a camera of a computing device is pointed is identified from the image data. The image data may be processed by an assistant application, a computing device, a separate computing device (e.g., a server device), and/or any other device capable of processing image data. The image data may be provided to one or more machine learning models to identify objects within the image data, or otherwise input to a computer vision algorithm to generate an object identifier and a location of the object from the image data. Thereafter, the assistant application may use the object identifier and/or the location of the object when performing the function associated with the image shortcut setting.
The method 700 may further include block 710: an image shortcut setting is generated based on the request for data and the object identifier. The image shortcut settings may be generated by the assistant application to provide a process by which the user may instruct the assistant application to perform an action (e.g., retrieve weather data) when the user points the camera at an object (e.g., the sky). In this way, the user does not have to provide text or verbal input to the computing device to retrieve the data, but merely points the camera of the computing device at the object.
Fig. 8 is a block diagram of an
The user interface input devices 822 may include a keyboard, a pointing device such as a mouse, trackball, touchpad, or tablet, a scanner, a touch screen incorporated into the display, an audio input device such as a voice recognition system, microphone, and/or other types of input devices. In general, use of the term "input device" is intended to include all possible types of devices and methods for inputting information into
User
These software modules are typically executed by
Where the system described herein collects or may utilize personal information about a user (or "participant" as referenced herein), the user may be provided with an opportunity to control whether programs or features collect user information (e.g., information about the user's social network, social actions or activities, profession, the user's preferences, or the user's current geographic location), or whether and/or how to receive content from a content server that is more relevant to the user. Moreover, some data may be processed in one or more ways before being stored or used in order to remove personally identifiable information. For example, the identity of the user may be processed so that no personally identifiable information for the user can be determined, or the geographic location of the user for which geographic location information is obtained may be generalized (e.g., to city, zip, or state levels) so that a particular geographic location of the user cannot be determined. Thus, the user may control how information is collected about the user and/or used.
While multiple embodiments have been described and illustrated herein, various other means and/or structures for performing the function and/or obtaining the result and/or one or more of the advantages described herein may be utilized and each of such variations and/or modifications is considered to be within the scope of the embodiments described herein. More generally, all parameters, dimensions, materials, and configurations described herein are meant to be exemplary, and the actual parameters, dimensions, materials, and/or configurations will depend upon the specific application for which the present teachings are used. Those skilled in the art will recognize, or be able to ascertain using no more than routine experimentation, many equivalents to the specific embodiments described herein. It is, therefore, to be understood that the foregoing embodiments are presented by way of example only and that, within the scope of the appended claims and equivalents thereto, embodiments may be practiced otherwise than as specifically described and claimed. Embodiments of the present disclosure are directed to each individual feature, system, article, material, kit, and/or method described herein. In addition, any combination of two or more such features, systems, articles, materials, kits, and/or methods, if such features, systems, articles, materials, kits, and/or methods are not mutually inconsistent, is included within the scope of the present disclosure.
- 上一篇:一种医用注射器针头装配设备
- 下一篇:用于头戴式显示器上的信息提示方法、装置和设备