Artificial intelligence system for inferring realistic intent
阅读说明:本技术 用于推断现实意图的人工智能系统 (Artificial intelligence system for inferring realistic intent ) 是由 P·N·贝内特 M·M·哈斯加瓦 N·戈特比 R·W·怀特 A·杰哈 于 2019-02-05 设计创作,主要内容包括:使人工智能系统能够根据用户输入推断现实意图,并自动建议和/或执行与预测的意图相关联的动作的技术。在一个方面,从被识别为包含现实意图的可采取动作的语句中提取核心任务描述。机器分类器接收核心任务描述、可采取动作的语句和用户输入,以预测用户输入的意图类别。可以基于在训练语料库上提取的核心任务描述的弱标记群集,使用无监督学习技术来训练机器分类器。核心任务描述可以包括动词-宾语对。(Techniques that enable an artificial intelligence system to infer real-world intent from user input, and automatically suggest and/or perform actions associated with the predicted intent. In one aspect, a core task description is extracted from actionable statements identified as containing realistic intent. The machine classifier receives a core task description, actionable statements, and user input to predict an intent category of the user input. Machine classifiers can be trained using unsupervised learning techniques based on weak label clusters of core task descriptions extracted on a training corpus. The core task description may include verb-object pairs.)
1. A method for a computing device to digitally perform an action in response to a user input, the method comprising:
identifying actionable statements from the user input;
extracting a core task description from the actionable statement, the core task description including a verb entity and an object entity;
assigning an intent class to the actionable statement by providing features to a machine classifier, the features including the actionable statement and the core task description; and
performing at least one action associated with the assigned intent category on the computing device.
2. The method of claim 1, further comprising:
displaying the at least one action associated with the assigned intent category to the user; and
receiving a user approval prior to performing the at least one action.
3. The method of claim 1, wherein the verb entity includes at least one symbol from the actionable statement that represents a task action, and the object entity includes at least one symbol from the actionable statement that represents an object to which the task action is applied.
4. The method of claim 1, wherein the identifying the actionable statement comprises: applying a commitment classifier or a request classifier to the user input.
5. The method of claim 1, wherein the at least one action comprises: launching a proxy application on the computing device.
6. The method of claim 1, wherein the features further comprise contextual features independent of the user input, the contextual features derived from previous use of the device by a user or from parameters associated with a user profile or a group model.
7. The method of claim 1, further comprising: training the machine classifier using weak supervision, the training comprising:
identifying a training sentence from each of a plurality of corpus items;
extracting a training description from each of the training sentences;
grouping the training descriptions into a plurality of clusters by textual similarity;
receiving an annotation of an intent associated with each of the plurality of clusters; and
training the machine classifier to map each identified training sentence to a corresponding annotated intention.
8. The method of claim 7, wherein the verb entities comprise symbols from respective training sentences representing task actions and the object entities comprise symbols from respective actionable sentences representing objects to which the task actions apply, the grouping of the training descriptions comprising:
grouping the training descriptions into a first set of clusters based on textual similarity of respective object entities; and
refining the first set of clusters into a second set of clusters based on textual similarity of the corresponding verb entities.
9. An apparatus for digitally performing an action in response to a user input, the apparatus comprising:
a recognizer module configured to recognize actionable statements from the user input;
an extraction module configured to extract a core task description from the actionable statement, the core task description including a verb entity and an object entity; and
a machine classifier configured to assign an intent category to the actionable statement based on features including the actionable statement and the core task description;
the apparatus is configured to perform at least one action associated with the assigned intent category.
10. An apparatus comprising a processor and a memory, the memory storing instructions executable by the processor to cause the processor to:
identifying actionable statements from the user input;
extracting a core task description from the actionable statement, the core task description including a verb entity and an object entity;
assigning an intent class to the actionable statement by providing features to a machine classifier, the features including the actionable statement and the core task description; and
performing, using the processor, at least one action associated with the assigned intent category.
Background
Modern personal computing devices, such as smartphones and personal computers, increasingly have the ability to support complex computing systems, such as Artificial Intelligence (AI) systems that interact with human users in novel ways. One application of AI is intent inference, where a device may infer certain types of user intent (referred to as "real-world intent") by analyzing the content of a user communication, and further take relevant and timely action in response to the inferred intent without requiring the user to issue any explicit command.
The design of AI systems for intent inference requires novel and efficient processing techniques for training and implementing machine classifiers, as well as techniques for interfacing AI systems with agent applications to perform external actions in response to inferred intents.
Drawings
Fig. 1 illustrates an exemplary embodiment of the present disclosure in which user a and user B engage in a messaging session using a chat application.
FIG. 2 illustrates an alternative exemplary embodiment of the present disclosure in which a user composes a new email message using an email client on a device.
FIG. 3 illustrates an alternative exemplary embodiment of the present disclosure in which a user has a voice conversation with a digital assistant running on a device.
FIG. 4 illustrates exemplary actions that a digital assistant may take in response to the scenario of FIG. 1, in accordance with the present disclosure.
FIG. 5 illustrates an exemplary embodiment of a method for processing user input to identify intent to execute a task statement, predict intent, and/or suggest and execute an actionable task in accordance with the present disclosure.
FIG. 6 illustrates an exemplary embodiment of an Artificial Intelligence (AI) module for implementing the method of FIG. 5.
FIG. 7 illustrates an exemplary embodiment of a method for training a machine classifier to predict intent classes of actionable statements given various input features.
Fig. 8A, 8B, and 8C collectively illustrate an illustrative example of training in accordance with the method of fig. 7, illustrating certain aspects of the present disclosure.
Fig. 9 schematically shows the intent of other clusters and tokens that can be derived by processing corpus items in the manner described.
Fig. 10 illustrates an exemplary embodiment of a method according to the present disclosure.
Fig. 11 illustrates an exemplary embodiment of an apparatus according to the present disclosure.
Fig. 12 illustrates an alternative exemplary embodiment of an apparatus according to the present disclosure.
Detailed Description
Various aspects of the technology described herein are generally directed to techniques for inferring realistic intent via user input to a digital device. In the present specification and claims, a realistic intent is a user intent that causes the device to be able to provide a task of assistance to the user (referred to herein as an "actionable task"). Actionable statements refer to statements of actionable tasks.
In one aspect, actionable statements are identified from user input, and core task descriptions are extracted from the actionable statements. The machine classifier predicts the intent class of each actionable statement based on the core task descriptions, user input, and other contextual functions. Machine classifiers can be trained using supervised or unsupervised learning techniques, e.g., weak labeled clustering based on core task descriptions extracted from a training corpus. In one aspect, clustering may be based on textual and semantic similarity of verb-object pairs in the core task description.
The detailed description set forth below in connection with the appended drawings is intended as a description of exemplary aspects "by way of example, illustration, or description" and is not necessarily to be construed as preferred or advantageous over other exemplary aspects. The detailed description includes specific details for the purpose of providing a thorough understanding of the exemplary aspects of the invention. It will be apparent to one of ordinary skill in the art that the exemplary aspects of the present invention may be practiced without these specific details. In some instances, well-known structures and devices are shown in block diagram form in order to avoid obscuring the novelty of the exemplary aspects presented herein.
Fig. 1, 2, and 3 illustrate exemplary embodiments of the present disclosure. It should be noted that the illustrated embodiments are for illustrative purposes only and are not meant to limit the scope of the present disclosure to any particular application, scenario, context, or platform to which the disclosed techniques may be applied.
Fig. 1 illustrates an exemplary embodiment of the present disclosure in which user a and user B participate in a
At this point, in order to follow the intent of obtaining tickets, user a may typically temporarily leave the chat session and manually perform some other task, such as opening a web browser to find movie showtimes, or opening another application to purchase movie tickets, or making a telephone call to a movie theater, etc. User a may also configure his device to later remind him of the task of purchasing tickets or reserve time on his calendar to watch movies.
In the foregoing scenario, it is desirable to provide the device (user a or user B's device) with the ability to automatically identify actionable tasks, for example, for retrieving movie ticket information from the content of the
FIG. 2 illustrates an alternative exemplary embodiment of the present disclosure in which a user composes an email message and prepares it for transmission using an email client on a device (not explicitly shown in FIG. 2). Referring to the contents of
In such a scenario, it may be desirable to provide Dana's device with the ability to identify the presence of an actionable task in
FIG. 3 illustrates an alternative exemplary embodiment of the present disclosure in which a user 302 has a voice conversation 300 with a digital assistant (referred to herein as "DA") executing on a device 304. In an exemplary embodiment, the DA may correspond to, for example, the Cortana digital assistant from microsoft corporation. It should be noted that in fig. 3, the text shown may correspond to the content of the speech exchanged between the user 302 and the DA. It is further noted that while an explicit request is made for a DA in dialog 300, it should be understood that the techniques of this disclosure may also be applied to identify actionable statements from user input that is not explicitly directed to the DA or intent inference system, for example, as shown by
Referring to dialog 300, user 302 may explicitly request the DA to schedule a next week of tennis lessons with a tennis trainer at block 310. Based on the user input at block 310, the DA304 identifies an actionable task that schedules a tennis class and confirms details of the task to be performed at block 320.
To perform the reserved task, the DA304 can further retrieve the specific action required and perform it. For example, DA304 may automatically launch an appointment planning application on a device (not shown), schedule with the tennis trainer John, and confirm the appointment. The performance of the task may be further informed by specific contextual parameters available to the DA304 (e.g., the identity of the tennis trainer obtained from previous appointments, appropriate class hours based on the user's previous appointments and/or the user's digital calendar, etc.).
Through dialog 300, it should be appreciated that the intent inference system can desirably supplement and customize any identified actionable task with implicit contextual details, for example, parameters as may be available from a user's cumulative interactions with a device, parameters of a user's digital profile, parameters of another user's digital material with which the user is currently communicating, and/or parameters of one or more group (cohort) models, as described further herein below. For example, based on a history of previous events scheduled by the user through the device, some additional details of the user's current intent may be inferred (e.g., preferred times for tennis classes to be scheduled, preferred tennis coaches, preferred movie theaters, preferred applications for creating expense reports, etc.).
In an exemplary aspect, the theater suggestions can be further based on the location of the device (preferred theaters that the user visits frequently, obtained from, for example, a device geolocation system, or obtained from a user profile, and/or as learned from previous tasks performed by the planning application or device). Further, the contextual characteristics can include an identification of the device with which the user is communicating with the AI system. For example, an appointment scheduled from a smartphone device is more likely to be a personal appointment, while an appointment scheduled from a personal computer for work is more likely to be a work appointment.
In an exemplary embodiment, the cohort model may also be used to notify intent inference systems. In particular, the group model corresponds to one or more profiles created for the user along one or more dimensions, similar to the current user. Such a group model may be useful, for example, particularly when information for a current user is sparse due to a newly added current user or other reasons.
In accordance with the foregoing examples, it is desirable to provide a device running an AI system with the ability to identify the presence of an actionable statement, classify an intent behind the actionable statement, and further automatically perform a particular operation associated with the actionable statement based on user input. It is further desirable that the identification and execution of tasks be injected into contextual features that may be available to the device and that user feedback on classification intent be accepted to improve relevance and accuracy of intent inference and task execution.
Fig. 4 illustrates exemplary actions that may be performed by the AI system in response to the
In particular, after the
FIG. 5 illustrates an exemplary embodiment of a method 500 for processing user input to identify intent to execute a task statement, predict intent, and/or suggest and execute an actionable task in accordance with the present disclosure. It should be understood that method 500 may be performed in an AI system running on the same device or devices used to support the features described above with reference to fig. 1-4, or in a combination of these devices with other online or offline computing facilities.
In FIG. 5, at block 510, user input (or "input") is received. In an exemplary embodiment, the user input may include any data or stream of data received by the computing device through a User Interface (UI). Such input may include, for example, text, speech, static or dynamic images containing gestures (e.g., sign language), facial expressions, and so forth. In some exemplary embodiments, the device may receive and process the input in real-time, for example, as the user generates and inputs data to the device. Alternatively, the data may be stored and centrally processed after being received through the UI.
At block 520, the method 500 identifies the presence of one or more actionable statements in the user input. In particular, block 520 may mark one or more segments of user input as containing actionable statements. It should be noted that in the present specification and claims, the terms "identify" or "identifying" as used in the context of block 520 may refer to the identification of actionable statements in user input, but does not include predicting the actual intent behind such statements or associating predicted intent with operations that may be performed at a subsequent stage of method 500.
For example, referring to the
In an exemplary embodiment, such identification may be performed using any of a variety of techniques. For example, a commitment classifier for identifying Commitments (i.e., statements of a type of actionable statement) can be applied as described in U.S. patent application No.14/714,109 entitled "Management of Commitments and Requests from Communications and Content" filed on 5/15 2015 and U.S. patent application No.14/714,137 entitled "Automatic Extraction of Commitments and Requests from Communications and Content" filed on 5/15 2015. In alternative exemplary embodiments, recognition may utilize Conditional Random Fields (CRFs) or other (e.g., neural) extraction models on user input, and is not limited to classifiers only. In alternative exemplary embodiments, sentence breaks/splits may be used to process user input such as text, and classification models may be trained to identify the presence of actionable task statements using supervised or unsupervised labels. In alternative exemplary embodiments, a request classifier or other type of classifier may be applied to extract alternative types of actionable statements. It is contemplated that such alternative exemplary embodiments also fall within the scope of the present disclosure.
At block 530, a core task description is extracted from the identified actionable statement. In an exemplary embodiment, the core task descriptions may correspond to a subset of symbols (e.g., words or phrases) extracted from the actionable statement, where the extracted subset is selected to help predict the intent behind the actionable statement.
In an exemplary embodiment, the core task description may include verb entities and object entities, also referred to herein as "verb-object pairs," extracted from actionable statements. The verb entity includes one or more symbols (e.g., words) that capture an action (referred to herein as a "task action"), and the object entity includes one or more symbols that represent objects to which the task action applies. It should be noted that a verb entity may generally include one or more verbs, but need not include all verbs in a sentence. The object entity may include a noun or noun phrase.
Verb-object pairs are not limited to combinations of only two words. For example, "emailing a expense report" may be a verb-object pair extracted from
In alternative exemplary embodiments, blocks 520 and 530 may be performed as a single functional block, and such alternative exemplary embodiments are contemplated to fall within the scope of the present disclosure. For example, block 520 may be considered a sort operation, while block 530 may be considered a sub-sort operation, where the intent is considered to be part of an activity taxonomy. In particular, if the user committed to take action, at block 520, the sentence may be classified as "committed," and block 530 may subdivide the committed into, for example, "intent to send email" (if the verb-object pair corresponds to "send email" or "send daily update email").
At block 540, the machine classifier is used to predict the intent carried by the identified actionable statement by assigning a statement intent class. In particular, the machine classifier may receive features such as actionable statements, other segments of user input in addition to and/or including actionable statements, core task descriptions extracted at block 530, and so forth. The machine classifier may further utilize other features to make predictions, such as contextual features, which include features that are independent of user input (e.g., features derived from a user's previous use of the device or from parameters associated with a user profile or group model).
Based on these features, the machine classifier can assign an actionable statement to one of a plurality of intent categories, i.e., it can "label" the actionable statement using the intent categories. For example, for the
At block 550, the method 500 proposes and/or performs an action associated with the intent predicted at block 540. For example, the associated action may be displayed on the UI of the device and the user may be asked to confirm the suggested action for execution. The device may then perform the approved action.
In an exemplary embodiment, the particular actions associated with any intent may be pre-configured by the user, or they may be derived from a database of intent-to-action mappings available to the AI system. In an exemplary embodiment, the method 500 may be enabled to launch and/or configure one or more agent applications on the computing device to perform associated actions, thereby extending the scope of actions that the AI system may accommodate. For example, in the
In an exemplary embodiment, once an associated task is identified, the task may be enriched by adding an action link that connects to an application, service, or skill that may be used to complete the action. The recommended actions may be presented in various ways (e.g., in an inline, or card form) through the UI, and the user may be invited to select one or more actions per task. The AI system may support performing selected actions and provide links or links containing preprogrammed parameters to other applications along with the task payload. In an exemplary embodiment, the responsibility for performing the details of certain actions may be delegated to the broker application based on the broker capabilities and/or user preferences.
At block 560, user feedback regarding the relevance and/or accuracy of the predicted intent and/or associated action is received. In an exemplary embodiment, such feedback may include: for example, explicit user confirmation of a suggested task (direct positive feedback), feedback, user rejection of an action suggested by the AI system (direct negative feedback), or user selection of an alternative action or task in accordance with the suggestion by the AI system (indirect negative feedback).
At block 570, the user feedback obtained at block 560 may be used to refine the machine classifier. In an exemplary embodiment, refinement of the machine classifier may be performed as described herein below with reference to fig. 7.
Fig. 6 illustrates an exemplary embodiment of an Artificial Intelligence (AI)
In fig. 6, the
The
Actionable statement 620a is coupled to
Actionable statements 620a, core task descriptions 622a, and other portions of user input 610a may be coupled as input features to a
In an exemplary embodiment, the
Intent category 624a is provided to task suggestion/
The
FIG. 7 illustrates an exemplary embodiment of a
At
At
At
At
In an exemplary embodiment where the training description includes verb-object pairs, clustering may be performed in two or more stages, where in an initial stage, the pairs sharing similar object entities are combined together. For example, for a single object "email," a person may "write," "send," "delete," "forward," "draft," "pass," "work on," and so forth. Thus, in a first phase, all such verb-object pairs (e.g., "write email," "send email," etc.) that share the object "email" may be grouped into the same cluster.
Thus, in a first stage of clustering, training descriptions may first be grouped into a first set of clusters based on textual similarity of the corresponding objects. Subsequently, in a second stage, the first set of clusters can be refined into a second set of clusters based on textual similarity of the respective verbs. Refinement at the second stage may include: for example, the training descriptions are reassigned from the first set of clusters to different clusters, the training descriptions are removed from the first set of clusters, new clusters are created, and so on.
After
Each of the plurality of clusters may be further manually labeled or annotated by a human operator at
At
At
In an exemplary embodiment, the features of the machine classifier may include derived features, such as the identified actionable statements, and/or other text taken from the context of the actionable statements. The features may further include training descriptions, related context from the overall corpus item, information from metadata of the communication corpus item, or information from similar task descriptions.
Fig. 8A, 8B, and 8C collectively illustrate an illustrative example of training in accordance with
In fig. 8A, the plurality (N) of example corpus items received at
At block 820, the existence of actionable statements is identified from item 1 in text 810, according to
At block 830, a training description is extracted from the actionable statement in accordance with
At block 832, the training descriptions are clustered according to
As shown in fig. 7, training block 710-732 is repeated over many corpus entries. Cluster 1(834) illustratively shows a final sample cluster containing four training descriptions according to the execution of the
The clusters 834a, 835 of fig. 8B illustrate how the clusters are manually refined according to the
At block 836, each labeled cluster may be associated with one or more actions, per
Fig. 8C illustrates training 824 the
In an exemplary embodiment, the user feedback may be used to further refine the performance of the methods and AI systems described herein. Referring back to FIG. 7,
In particular, block 760 relates to a type of user feedback in which the user indicates that one or more actionable statements identified by the AI system are not actually actionable statements, i.e., they contain no real-world intent. For example, when presenting a set of actions that the AI system performs in response to user input, the user may select an option that states that the identified sentence does not actually constitute an actionable sentence. In this case, such user feedback may be incorporated to adjust one or more parameters of
In an exemplary embodiment,
In an exemplary embodiment, during training of the machine classifier (e.g., at
FIG. 10 illustrates an exemplary embodiment of a method 1000 for causing a computing device to digitally perform an action in response to a user input. It should be noted that FIG. 10 is shown for illustrative purposes only and is not meant to limit the scope of the present disclosure.
In FIG. 10, at block 1010, actionable statements are identified from user input.
At block 1020, a core task description is extracted from the actionable statement. The core task description may include a verb entity and an object entity.
At block 1030, intent categories are assigned to actionable statements by providing features to the machine classifier, the features including actionable statements and core task descriptions.
At block 1040, at least one action associated with the assigned intent category is performed on the computing device.
Fig. 11 shows an exemplary embodiment of a
Fig. 12 shows an
In the present specification and claims, it will be understood that when an element is referred to as being "connected to" or "coupled to" another element, it can be directly connected or coupled to the other element or intervening elements may be present. In contrast, when an element is referred to as being "directly connected to" or "directly coupled to" another element, there are no intervening elements present. Further, when an element is referred to as being "electrically coupled" to another element, it is meant that a low resistance path exists between the elements, whereas when an element is referred to as simply being "coupled" to another element, there may or may not be a low resistance path between the elements.
The functions described herein may be performed, at least in part, by one or more hardware and/or software logic components. By way of example, and not limitation, exemplary types of hardware logic components that may be used include Field Programmable Gate Arrays (FPGAs), Application Specific Integrated Circuits (ASICs), program specific standard products (ASSPs), systems on a chip (SOCs), Complex Programmable Logic Devices (CPLDs), and so forth.
While certain illustrated embodiments have been shown in the drawings and have been described above in detail, the invention is susceptible to various modifications and alternative constructions. It should be understood, however, that there is no intention to limit the invention to the specific forms disclosed, but on the contrary, the intention is to cover all modifications, alternative constructions, and equivalents falling within the spirit and scope of the invention.
- 上一篇:一种医用注射器针头装配设备
- 下一篇:数据传输系统中的信道建模