Artificial intelligence system for inferring realistic intent

文档序号：1102612 发布日期：2020-09-25 浏览：8次中文

阅读说明：本技术 用于推断现实意图的人工智能系统 (Artificial intelligence system for inferring realistic intent ) 是由 P·N·贝内特 M·M·哈斯加瓦 N·戈特比 R·W·怀特 A·杰哈于 2019-02-05 设计创作，主要内容包括：使人工智能系统能够根据用户输入推断现实意图,并自动建议和/或执行与预测的意图相关联的动作的技术。在一个方面,从被识别为包含现实意图的可采取动作的语句中提取核心任务描述。机器分类器接收核心任务描述、可采取动作的语句和用户输入,以预测用户输入的意图类别。可以基于在训练语料库上提取的核心任务描述的弱标记群集,使用无监督学习技术来训练机器分类器。核心任务描述可以包括动词-宾语对。(Techniques that enable an artificial intelligence system to infer real-world intent from user input, and automatically suggest and/or perform actions associated with the predicted intent. In one aspect, a core task description is extracted from actionable statements identified as containing realistic intent. The machine classifier receives a core task description, actionable statements, and user input to predict an intent category of the user input. Machine classifiers can be trained using unsupervised learning techniques based on weak label clusters of core task descriptions extracted on a training corpus. The core task description may include verb-object pairs.)

1. A method for a computing device to digitally perform an action in response to a user input, the method comprising:

identifying actionable statements from the user input;

extracting a core task description from the actionable statement, the core task description including a verb entity and an object entity;

assigning an intent class to the actionable statement by providing features to a machine classifier, the features including the actionable statement and the core task description; and

performing at least one action associated with the assigned intent category on the computing device.

2. The method of claim 1, further comprising:

displaying the at least one action associated with the assigned intent category to the user; and

receiving a user approval prior to performing the at least one action.

3. The method of claim 1, wherein the verb entity includes at least one symbol from the actionable statement that represents a task action, and the object entity includes at least one symbol from the actionable statement that represents an object to which the task action is applied.

4. The method of claim 1, wherein the identifying the actionable statement comprises: applying a commitment classifier or a request classifier to the user input.

5. The method of claim 1, wherein the at least one action comprises: launching a proxy application on the computing device.

6. The method of claim 1, wherein the features further comprise contextual features independent of the user input, the contextual features derived from previous use of the device by a user or from parameters associated with a user profile or a group model.

7. The method of claim 1, further comprising: training the machine classifier using weak supervision, the training comprising:

identifying a training sentence from each of a plurality of corpus items;

extracting a training description from each of the training sentences;

grouping the training descriptions into a plurality of clusters by textual similarity;

receiving an annotation of an intent associated with each of the plurality of clusters; and

training the machine classifier to map each identified training sentence to a corresponding annotated intention.

8. The method of claim 7, wherein the verb entities comprise symbols from respective training sentences representing task actions and the object entities comprise symbols from respective actionable sentences representing objects to which the task actions apply, the grouping of the training descriptions comprising:

grouping the training descriptions into a first set of clusters based on textual similarity of respective object entities; and

refining the first set of clusters into a second set of clusters based on textual similarity of the corresponding verb entities.

9. An apparatus for digitally performing an action in response to a user input, the apparatus comprising:

a recognizer module configured to recognize actionable statements from the user input;

an extraction module configured to extract a core task description from the actionable statement, the core task description including a verb entity and an object entity; and

a machine classifier configured to assign an intent category to the actionable statement based on features including the actionable statement and the core task description;

the apparatus is configured to perform at least one action associated with the assigned intent category.

10. An apparatus comprising a processor and a memory, the memory storing instructions executable by the processor to cause the processor to:

identifying actionable statements from the user input;

extracting a core task description from the actionable statement, the core task description including a verb entity and an object entity;

assigning an intent class to the actionable statement by providing features to a machine classifier, the features including the actionable statement and the core task description; and

performing, using the processor, at least one action associated with the assigned intent category.

Background

Modern personal computing devices, such as smartphones and personal computers, increasingly have the ability to support complex computing systems, such as Artificial Intelligence (AI) systems that interact with human users in novel ways. One application of AI is intent inference, where a device may infer certain types of user intent (referred to as "real-world intent") by analyzing the content of a user communication, and further take relevant and timely action in response to the inferred intent without requiring the user to issue any explicit command.

The design of AI systems for intent inference requires novel and efficient processing techniques for training and implementing machine classifiers, as well as techniques for interfacing AI systems with agent applications to perform external actions in response to inferred intents.

Drawings

Fig. 1 illustrates an exemplary embodiment of the present disclosure in which user a and user B engage in a messaging session using a chat application.

FIG. 2 illustrates an alternative exemplary embodiment of the present disclosure in which a user composes a new email message using an email client on a device.

FIG. 3 illustrates an alternative exemplary embodiment of the present disclosure in which a user has a voice conversation with a digital assistant running on a device.

FIG. 4 illustrates exemplary actions that a digital assistant may take in response to the scenario of FIG. 1, in accordance with the present disclosure.

FIG. 5 illustrates an exemplary embodiment of a method for processing user input to identify intent to execute a task statement, predict intent, and/or suggest and execute an actionable task in accordance with the present disclosure.

FIG. 6 illustrates an exemplary embodiment of an Artificial Intelligence (AI) module for implementing the method of FIG. 5.

FIG. 7 illustrates an exemplary embodiment of a method for training a machine classifier to predict intent classes of actionable statements given various input features.

Fig. 8A, 8B, and 8C collectively illustrate an illustrative example of training in accordance with the method of fig. 7, illustrating certain aspects of the present disclosure.

Fig. 9 schematically shows the intent of other clusters and tokens that can be derived by processing corpus items in the manner described.

Fig. 10 illustrates an exemplary embodiment of a method according to the present disclosure.

Fig. 11 illustrates an exemplary embodiment of an apparatus according to the present disclosure.

Fig. 12 illustrates an alternative exemplary embodiment of an apparatus according to the present disclosure.

Detailed Description

Various aspects of the technology described herein are generally directed to techniques for inferring realistic intent via user input to a digital device. In the present specification and claims, a realistic intent is a user intent that causes the device to be able to provide a task of assistance to the user (referred to herein as an "actionable task"). Actionable statements refer to statements of actionable tasks.

In one aspect, actionable statements are identified from user input, and core task descriptions are extracted from the actionable statements. The machine classifier predicts the intent class of each actionable statement based on the core task descriptions, user input, and other contextual functions. Machine classifiers can be trained using supervised or unsupervised learning techniques, e.g., weak labeled clustering based on core task descriptions extracted from a training corpus. In one aspect, clustering may be based on textual and semantic similarity of verb-object pairs in the core task description.

The detailed description set forth below in connection with the appended drawings is intended as a description of exemplary aspects "by way of example, illustration, or description" and is not necessarily to be construed as preferred or advantageous over other exemplary aspects. The detailed description includes specific details for the purpose of providing a thorough understanding of the exemplary aspects of the invention. It will be apparent to one of ordinary skill in the art that the exemplary aspects of the present invention may be practiced without these specific details. In some instances, well-known structures and devices are shown in block diagram form in order to avoid obscuring the novelty of the exemplary aspects presented herein.

Fig. 1, 2, and 3 illustrate exemplary embodiments of the present disclosure. It should be noted that the illustrated embodiments are for illustrative purposes only and are not meant to limit the scope of the present disclosure to any particular application, scenario, context, or platform to which the disclosed techniques may be applied.

Fig. 1 illustrates an exemplary embodiment of the present disclosure in which user a and user B participate in a digital messaging session 100 using a personal computing device (herein "device", not explicitly shown in fig. 1) (e.g., a smartphone, laptop or desktop computer, etc.). Referring to the content of the message session 100, user a and user B have a conversation regarding viewing a movie to be shown. At 110, user B suggests viewing the movie "super hero III". At 120, user a proposes to find a ticket for the saturday showing of the movie.

At this point, in order to follow the intent of obtaining tickets, user a may typically temporarily leave the chat session and manually perform some other task, such as opening a web browser to find movie showtimes, or opening another application to purchase movie tickets, or making a telephone call to a movie theater, etc. User a may also configure his device to later remind him of the task of purchasing tickets or reserve time on his calendar to watch movies.

In the foregoing scenario, it is desirable to provide the device (user a or user B's device) with the ability to automatically identify actionable tasks, for example, for retrieving movie ticket information from the content of the message session 100, and/or to automatically perform any associated tasks (e.g., purchase movie tickets, set reminders, etc.).

FIG. 2 illustrates an alternative exemplary embodiment of the present disclosure in which a user composes an email message and prepares it for transmission using an email client on a device (not explicitly shown in FIG. 2). Referring to the contents of email 200, the sender (Dana Smith) confirms to the recipient (John Brown) in statement 210 that she will email to him a3 month expense report before the end of the week. After sending the email, Dana may, for example, open a word processing and/or spreadsheet application to edit the 3-month usage report. Alternatively or additionally, Dana may set a reminder on her device to perform the task of preparing the expense report at a later time.

In such a scenario, it may be desirable to provide Dana's device with the ability to identify the presence of an actionable task in email 200, and/or automatically launch an appropriate application to process the task. It may also be desirable, where possible, to launch an application using appropriate template settings (e.g., an expense report populated with specific data fields specific to 3 months), or send an email to the recipient based on a previously prepared report, and so on.

FIG. 3 illustrates an alternative exemplary embodiment of the present disclosure in which a user 302 has a voice conversation 300 with a digital assistant (referred to herein as "DA") executing on a device 304. In an exemplary embodiment, the DA may correspond to, for example, the Cortana digital assistant from microsoft corporation. It should be noted that in fig. 3, the text shown may correspond to the content of the speech exchanged between the user 302 and the DA. It is further noted that while an explicit request is made for a DA in dialog 300, it should be understood that the techniques of this disclosure may also be applied to identify actionable statements from user input that is not explicitly directed to the DA or intent inference system, for example, as shown by message conversation 100 and email 200 described above, or otherwise.

Referring to dialog 300, user 302 may explicitly request the DA to schedule a next week of tennis lessons with a tennis trainer at block 310. Based on the user input at block 310, the DA304 identifies an actionable task that schedules a tennis class and confirms details of the task to be performed at block 320.

To perform the reserved task, the DA304 can further retrieve the specific action required and perform it. For example, DA304 may automatically launch an appointment planning application on a device (not shown), schedule with the tennis trainer John, and confirm the appointment. The performance of the task may be further informed by specific contextual parameters available to the DA304 (e.g., the identity of the tennis trainer obtained from previous appointments, appropriate class hours based on the user's previous appointments and/or the user's digital calendar, etc.).

Through dialog 300, it should be appreciated that the intent inference system can desirably supplement and customize any identified actionable task with implicit contextual details, for example, parameters as may be available from a user's cumulative interactions with a device, parameters of a user's digital profile, parameters of another user's digital material with which the user is currently communicating, and/or parameters of one or more group (cohort) models, as described further herein below. For example, based on a history of previous events scheduled by the user through the device, some additional details of the user's current intent may be inferred (e.g., preferred times for tennis classes to be scheduled, preferred tennis coaches, preferred movie theaters, preferred applications for creating expense reports, etc.).

In an exemplary aspect, the theater suggestions can be further based on the location of the device (preferred theaters that the user visits frequently, obtained from, for example, a device geolocation system, or obtained from a user profile, and/or as learned from previous tasks performed by the planning application or device). Further, the contextual characteristics can include an identification of the device with which the user is communicating with the AI system. For example, an appointment scheduled from a smartphone device is more likely to be a personal appointment, while an appointment scheduled from a personal computer for work is more likely to be a work appointment.

In an exemplary embodiment, the cohort model may also be used to notify intent inference systems. In particular, the group model corresponds to one or more profiles created for the user along one or more dimensions, similar to the current user. Such a group model may be useful, for example, particularly when information for a current user is sparse due to a newly added current user or other reasons.

In accordance with the foregoing examples, it is desirable to provide a device running an AI system with the ability to identify the presence of an actionable statement, classify an intent behind the actionable statement, and further automatically perform a particular operation associated with the actionable statement based on user input. It is further desirable that the identification and execution of tasks be injected into contextual features that may be available to the device and that user feedback on classification intent be accepted to improve relevance and accuracy of intent inference and task execution.

Fig. 4 illustrates exemplary actions that may be performed by the AI system in response to the scenario 100, in accordance with the present disclosure. It should be noted that fig. 4 is shown for illustrative purposes only, and fig. 4 is not meant to limit the scope of the present disclosure to any particular type of application, scenario, display format, or action that may be performed.

In particular, after the input 120 of user A, user A's device may display a dialog box 405 to user A, as shown in FIG. 4. In an exemplary embodiment, the dialog box may be displayed privately at user A's device, or may alternatively be displayed to all participants in the dialog. From the contents 410 of dialog box 405, it can be seen that the device infers various parameters of user a's intent to purchase movie tickets based on box 120, such as identification of the movie, possible desired show times, preferred movie theaters, and so on. Based on the inferred intent, the device may have already queried for local movie showings on the internet, for example using a dedicated movie ticket booking application or a web search engine such as Bing. The device may further offer to automatically purchase the movie tickets while waiting for further confirmation by user a, and proceed to purchase the movie tickets, as shown in blocks 420, 430.

FIG. 5 illustrates an exemplary embodiment of a method 500 for processing user input to identify intent to execute a task statement, predict intent, and/or suggest and execute an actionable task in accordance with the present disclosure. It should be understood that method 500 may be performed in an AI system running on the same device or devices used to support the features described above with reference to fig. 1-4, or in a combination of these devices with other online or offline computing facilities.

In FIG. 5, at block 510, user input (or "input") is received. In an exemplary embodiment, the user input may include any data or stream of data received by the computing device through a User Interface (UI). Such input may include, for example, text, speech, static or dynamic images containing gestures (e.g., sign language), facial expressions, and so forth. In some exemplary embodiments, the device may receive and process the input in real-time, for example, as the user generates and inputs data to the device. Alternatively, the data may be stored and centrally processed after being received through the UI.

At block 520, the method 500 identifies the presence of one or more actionable statements in the user input. In particular, block 520 may mark one or more segments of user input as containing actionable statements. It should be noted that in the present specification and claims, the terms "identify" or "identifying" as used in the context of block 520 may refer to the identification of actionable statements in user input, but does not include predicting the actual intent behind such statements or associating predicted intent with operations that may be performed at a subsequent stage of method 500.

For example, referring to the conversation 100 in fig. 1, the method 500 may identify actionable statements at the underlined portion of the block 120 of the message conversation 100. This identification may be performed in real time, for example, when user a and user B are actively engaged in their session. It should be noted that the existence of non-actionable statements (e.g., block 105) and actionable statements (e.g., block 120) in the session 100, and it should be understood that block 520 is designed to flag statements such as block 120, rather than flags statements such as block 105. .

In an exemplary embodiment, such identification may be performed using any of a variety of techniques. For example, a commitment classifier for identifying Commitments (i.e., statements of a type of actionable statement) can be applied as described in U.S. patent application No.14/714,109 entitled "Management of Commitments and Requests from Communications and Content" filed on 5/15 2015 and U.S. patent application No.14/714,137 entitled "Automatic Extraction of Commitments and Requests from Communications and Content" filed on 5/15 2015. In alternative exemplary embodiments, recognition may utilize Conditional Random Fields (CRFs) or other (e.g., neural) extraction models on user input, and is not limited to classifiers only. In alternative exemplary embodiments, sentence breaks/splits may be used to process user input such as text, and classification models may be trained to identify the presence of actionable task statements using supervised or unsupervised labels. In alternative exemplary embodiments, a request classifier or other type of classifier may be applied to extract alternative types of actionable statements. It is contemplated that such alternative exemplary embodiments also fall within the scope of the present disclosure.

At block 530, a core task description is extracted from the identified actionable statement. In an exemplary embodiment, the core task descriptions may correspond to a subset of symbols (e.g., words or phrases) extracted from the actionable statement, where the extracted subset is selected to help predict the intent behind the actionable statement.

In an exemplary embodiment, the core task description may include verb entities and object entities, also referred to herein as "verb-object pairs," extracted from actionable statements. The verb entity includes one or more symbols (e.g., words) that capture an action (referred to herein as a "task action"), and the object entity includes one or more symbols that represent objects to which the task action applies. It should be noted that a verb entity may generally include one or more verbs, but need not include all verbs in a sentence. The object entity may include a noun or noun phrase.

Verb-object pairs are not limited to combinations of only two words. For example, "emailing a expense report" may be a verb-object pair extracted from statement 210 in FIG. 2. In this case, "emailed" may be the verb entity and "expense report" may be the object entity. The extraction of the core task descriptions may employ, for example, any of a variety of Natural Language Processing (NLP) tools (e.g., dependency parser, cull tree + finite state machine), and so forth.

In alternative exemplary embodiments, blocks 520 and 530 may be performed as a single functional block, and such alternative exemplary embodiments are contemplated to fall within the scope of the present disclosure. For example, block 520 may be considered a sort operation, while block 530 may be considered a sub-sort operation, where the intent is considered to be part of an activity taxonomy. In particular, if the user committed to take action, at block 520, the sentence may be classified as "committed," and block 530 may subdivide the committed into, for example, "intent to send email" (if the verb-object pair corresponds to "send email" or "send daily update email").

At block 540, the machine classifier is used to predict the intent carried by the identified actionable statement by assigning a statement intent class. In particular, the machine classifier may receive features such as actionable statements, other segments of user input in addition to and/or including actionable statements, core task descriptions extracted at block 530, and so forth. The machine classifier may further utilize other features to make predictions, such as contextual features, which include features that are independent of user input (e.g., features derived from a user's previous use of the device or from parameters associated with a user profile or group model).

Based on these features, the machine classifier can assign an actionable statement to one of a plurality of intent categories, i.e., it can "label" the actionable statement using the intent categories. For example, for the message session 100, the machine classifier at block 540 may label the statement of user a as an intent category of "buy movie tickets" at block 120, where the intent category is one of a variety of different possible intent categories. In an exemplary embodiment, the input-output mapping of the machine classifier may be trained according to the techniques described herein below with reference to FIG. 7.

At block 550, the method 500 proposes and/or performs an action associated with the intent predicted at block 540. For example, the associated action may be displayed on the UI of the device and the user may be asked to confirm the suggested action for execution. The device may then perform the approved action.

In an exemplary embodiment, the particular actions associated with any intent may be pre-configured by the user, or they may be derived from a database of intent-to-action mappings available to the AI system. In an exemplary embodiment, the method 500 may be enabled to launch and/or configure one or more agent applications on the computing device to perform associated actions, thereby extending the scope of actions that the AI system may accommodate. For example, in the email 200, a spreadsheet application may be launched in response to predicting the intent of the actionable statement 210 as an intent to prepare an expense report.

In an exemplary embodiment, once an associated task is identified, the task may be enriched by adding an action link that connects to an application, service, or skill that may be used to complete the action. The recommended actions may be presented in various ways (e.g., in an inline, or card form) through the UI, and the user may be invited to select one or more actions per task. The AI system may support performing selected actions and provide links or links containing preprogrammed parameters to other applications along with the task payload. In an exemplary embodiment, the responsibility for performing the details of certain actions may be delegated to the broker application based on the broker capabilities and/or user preferences.

At block 560, user feedback regarding the relevance and/or accuracy of the predicted intent and/or associated action is received. In an exemplary embodiment, such feedback may include: for example, explicit user confirmation of a suggested task (direct positive feedback), feedback, user rejection of an action suggested by the AI system (direct negative feedback), or user selection of an alternative action or task in accordance with the suggestion by the AI system (indirect negative feedback).

At block 570, the user feedback obtained at block 560 may be used to refine the machine classifier. In an exemplary embodiment, refinement of the machine classifier may be performed as described herein below with reference to fig. 7.

Fig. 6 illustrates an exemplary embodiment of an Artificial Intelligence (AI) module 600 for implementing the method 500. It should be noted that fig. 6 is shown for illustrative purposes only and is not meant to limit the scope of the present disclosure.

In fig. 6, the AI module 600 interacts with a User Interface (UI)610 to receive user input and further output data processed by the module 600 to the user. In an exemplary embodiment, the AI module 600 and UI 610 may be provided on a single device (e.g., any device that supports the functionality described herein above with reference to fig. 1-4).

The AI module 600 includes an actionable statement identifier 620 coupled to the UI 610. The recognizer 620 may perform the functions described with reference to block 520, for example, it may receive user input and recognize the presence of actionable statements. As an output, the recognizer 620 generates an actionable statement 620a, e.g., corresponding to a portion of the user input marked as containing actionable statements.

Actionable statement 620a is coupled to core extractor 622. Extractor 622 may perform the functions described with reference to block 530, for example, it may extract a "core task description" 622a from the actionable statement. In an exemplary embodiment, core task description 622 may include verb-object pairs.

Actionable statements 620a, core task descriptions 622a, and other portions of user input 610a may be coupled as input features to a machine classifier 624. The classifier 624 may perform the functions described with reference to block 540, for example, it may predict an intent carried by the identified actionable statement 620a and output the predicted intent as an assigned intent category (or "label") 624 a.

In an exemplary embodiment, the machine classifier 624 may further receive contextual features 630a generated by the user profile/context data block 630. In particular, block 630 may store contextual characteristics associated with the use of device or profile parameters. Contextual features may be derived from the user through the UI 610 (e.g., explicitly input by the user to set a user profile or group model), or implicitly from the user through interactions between the UI 610 and the device. Contextual characteristics may also be derived from other sources other than UI 610 (e.g., via an internet profile associated with the user).

Intent category 624a is provided to task suggestion/execution block 626. Block 626 may perform the functions described with reference to block 550, e.g., it may suggest and/or perform actions associated with intent tag 624 a. Block 626 may include a sub-module 628 configured to launch an external application or agent (not explicitly shown in fig. 6) to perform the associated action.

The AI module 600 also includes a feedback module 640 to solicit and receive user feedback 640a through the UI 610. Module 640 may perform the functions described with reference to block 560, for example, it may receive user feedback regarding the predicted intent and/or relevance and/or accuracy of the relevant action. The user feedback 640a may be used to refine the machine classifier 624, as described below with reference to fig. 7.

FIG. 7 illustrates an exemplary embodiment of a method 700 for training a machine classifier 624 to predict the intent of an actionable statement based on various features. It should be noted that fig. 7 is shown for illustrative purposes only, and is not meant to limit the scope of the present disclosure to any particular technique for training a machine classifier.

At block 710, corpus items are received to train machine classifiers. In an exemplary embodiment, the corpus items may correspond to historical or reference user inputs containing content that may be used to train machine classifiers to predict task intent. For example, any of the items 100, 200, 300 described above may be used as corpus items to train machine classifiers. Corpus items may include items generated by the current user or other users communicating with the current user, or other users sharing communications with the current user, and so forth.

At block 720, actionable statements (referred to herein as "training statements") are identified from the received corpus items. In an exemplary embodiment, identifying the training statement may be performed in the same or similar manner as described with reference to block 520 for identifying the actionable statement.

At block 730, a core task description (referred to herein as a "training description") is extracted from each identified actionable statement. In an exemplary embodiment, extracting the training description may be performed in the same or similar manner as described with reference to block 530 for extracting the core task description (e.g., verb-object pair based extraction).

At block 732, the training descriptions are grouped into "clusters," where each cluster includes one or more training descriptions that are arbitrated to have similar intents. In an exemplary embodiment, a bag-of-words model (bag-of-words model) may be used to represent the text-based training description and may be clustered using techniques such as K-means. In alternative exemplary embodiments, any representation for achieving similar functionality may be implemented.

In an exemplary embodiment where the training description includes verb-object pairs, clustering may be performed in two or more stages, where in an initial stage, the pairs sharing similar object entities are combined together. For example, for a single object "email," a person may "write," "send," "delete," "forward," "draft," "pass," "work on," and so forth. Thus, in a first phase, all such verb-object pairs (e.g., "write email," "send email," etc.) that share the object "email" may be grouped into the same cluster.

Thus, in a first stage of clustering, training descriptions may first be grouped into a first set of clusters based on textual similarity of the corresponding objects. Subsequently, in a second stage, the first set of clusters can be refined into a second set of clusters based on textual similarity of the respective verbs. Refinement at the second stage may include: for example, the training descriptions are reassigned from the first set of clusters to different clusters, the training descriptions are removed from the first set of clusters, new clusters are created, and so on.

After block 732, a determination is made as to whether there are more corpus items to process before continuing the training. If so, the method 700 returns to block 710 and processes other corpus items. Otherwise, the method passes to block 734. It should be appreciated that performing block 710- < - > 732 on multiple instances of the corpus item results in grouping the multiple training descriptions into different clusters, where each cluster is associated with a different intent.

Each of the plurality of clusters may be further manually labeled or annotated by a human operator at block 734. In particular, a human operator may examine the training description associated with each cluster and manually annotate the cluster with an intent category. Further, at block 734, the content of each cluster may be manually refined. For example, if a human operator believes that one or more training descriptions in one cluster do not properly belong to the cluster, such training descriptions may be deleted and/or reassigned to another cluster. In some exemplary embodiments of method 700, the manual evaluation at block 734 is optional.

At block 736, each cluster can optionally be associated with a set of actions related to the intent of the token. In an exemplary embodiment, block 736 may be performed manually by a human operator, or by crowd sourcing, etc. In an exemplary embodiment, actions may be associated with an intent based on preferences of a group to which the user belongs or preferences of the general population.

At block 740, a weakly supervised machine learning model is applied to train a machine classifier using the features and corresponding labeled intent clusters. Specifically, after block 710-. The labeled intent classes are used to train a machine classifier to accurately map each set of features to a corresponding intent class. It should be noted that in this context, "weak supervision" refers to the aspect of automatically clustering the training descriptions of each actionable statement using computational techniques, rather than requiring explicit manual tagging of each core task description. In this way, weak supervision may advantageously enable machine classifiers to be trained using large corpus datasets.

In an exemplary embodiment, the features of the machine classifier may include derived features, such as the identified actionable statements, and/or other text taken from the context of the actionable statements. The features may further include training descriptions, related context from the overall corpus item, information from metadata of the communication corpus item, or information from similar task descriptions.

Fig. 8A, 8B, and 8C collectively illustrate an illustrative example of training in accordance with method 700, illustrating certain aspects of the performance of method 700. It should be noted that fig. 8A, 8B, and 8C are shown for exemplary purposes only and are not meant to limit the scope of the present disclosure to any particular instance of performing method 700.

In fig. 8A, the plurality (N) of example corpus items received at block 710 are schematically shown as "item 1" to "item N," and only the text 810 of the first corpus item (item 1) is explicitly shown. In particular, the text 810 corresponds to block 120 of the message conversation 100 previously described above, illustratively considered as a corpus item for training.

At block 820, the existence of actionable statements is identified from item 1 in text 810, according to training block 720. In this example, the actionable statements correspond to underlined statements of text 810.

At block 830, a training description is extracted from the actionable statement in accordance with training block 730. In the exemplary embodiment shown, the training is described as a verb-object pair "Ticket Purchase" 830 a. FIG. 8A further illustrates other examples 830b, 830c of verb-object pairs (which contain similar intent as the identified actionable sentence) that may be extracted from, for example, other corpus items (not shown in FIG. 8A).

At block 832, the training descriptions are clustered according to training block 732. In fig. 8A, the clustering technique described herein above is illustrated to automatically identify the extracted descriptions 830a, 830b, 830c as belonging to the same cluster (cluster 1).

As shown in fig. 7, training block 710-732 is repeated over many corpus entries. Cluster 1(834) illustratively shows a final sample cluster containing four training descriptions according to the execution of the training block 734. In particular, cluster 1 is manually labeled with the corresponding intent. For example, examining the training description in cluster 1 may cause a human operator to annotate cluster 1 with the label "intent to buy tickets" (which corresponds to the intent category "buy tickets"). Fig. 9 schematically shows other clusters 910, 920, 930, and labeled intents 912, 922, 932 that may be derived by processing corpus items in the manner described.

The clusters 834a, 835 of fig. 8B illustrate how the clusters are manually refined according to the training block 734. For example, the training originally clustered into cluster 1(834) describes "ticketing" 830d, which can be manually removed from cluster 1(834a) and reassigned to cluster 2(835), where cluster 2 corresponds to "intent to acquire pre-purchased tickets".

At block 836, each labeled cluster may be associated with one or more actions, per training block 736. For example, corresponding to an "intent to buy tickets" (i.e., a label for cluster 1), actions 836a, 836b, 836c may be associated.

Fig. 8C illustrates training 824 the machine classifier 624 using a plurality (X) of actionable statements (i.e., actionable statement 1 through actionable statement X) and corresponding labels (i.e., label 1 through label X), according to training block 740.

In an exemplary embodiment, the user feedback may be used to further refine the performance of the methods and AI systems described herein. Referring back to FIG. 7, column 750 shows illustrative types of feedback that may be accommodated by the method 700 to train the machine classifier 624. It should be noted that these feedback types are shown for illustrative purposes only, and are not meant to limit the types of feedback that may be accommodated in accordance with the present disclosure.

In particular, block 760 relates to a type of user feedback in which the user indicates that one or more actionable statements identified by the AI system are not actually actionable statements, i.e., they contain no real-world intent. For example, when presenting a set of actions that the AI system performs in response to user input, the user may select an option that states that the identified sentence does not actually constitute an actionable sentence. In this case, such user feedback may be incorporated to adjust one or more parameters of block 720 during the training phase.

Block 762 relates to user feedback in which one or more actions suggested by the AI system for an intent category do not represent the best action associated with the intent category. Alternatively, the user feedback may be that the suggested action is not appropriate for the intent category. For example, in response to a prediction of a user's intent to prepare an expense report, the action associated with the action may be launching a pre-configured spreadsheet application. Based on the user feedback, the alternative action may instead be associated with an intent to prepare an expense report. For example, the user may explicitly choose to launch another preferred application or implicitly deny the associated action by not subsequently interacting further with the suggested application.

In an exemplary embodiment, user feedback 762 may be accommodated during the training phase by modifying block 736 of method 700 to associate predicted intent categories with other actions.

Block 764 relates to a type of user feedback where the user indicates that the predicted intent category is erroneous. In an exemplary embodiment, the user may explicitly or implicitly indicate the identified alternative (actionable) intent carried by the actionable statement. For example, assume that the AI system predicts the intent category of "schedule a meeting" for a user input consisting of the statement "let us discuss next". In response to the AI system suggesting an action associated with the intent category "schedule an appointment," the user may provide feedback that the preferred intent category will be "set reminder".

In an exemplary embodiment, during training of the machine classifier (e.g., at block 732 of method 700), user feedback 764 may be accommodated. For example, the original verb-object pairs extracted from the recognized actionable statement may be reassigned to another cluster, which corresponds to the preferred intent category indicated by the user feedback.

FIG. 10 illustrates an exemplary embodiment of a method 1000 for causing a computing device to digitally perform an action in response to a user input. It should be noted that FIG. 10 is shown for illustrative purposes only and is not meant to limit the scope of the present disclosure.

In FIG. 10, at block 1010, actionable statements are identified from user input.

At block 1020, a core task description is extracted from the actionable statement. The core task description may include a verb entity and an object entity.

At block 1030, intent categories are assigned to actionable statements by providing features to the machine classifier, the features including actionable statements and core task descriptions.

At block 1040, at least one action associated with the assigned intent category is performed on the computing device.

Fig. 11 shows an exemplary embodiment of a device 1100 for digitally performing an action in response to a user input. The device includes: a recognizer module 1110 configured to recognize actionable statements from user input; an extraction module 1120 configured to extract a core task description from the actionable statement, the core task description including a verb entity and an object entity; and a machine classifier 1130 configured to assign intent categories to the actionable statements based on features including the actionable statements and the core task descriptions. The apparatus 1100 is configured to perform at least one action associated with the assigned intent category.

Fig. 12 shows an apparatus 1200 comprising a processor 1210 and a memory 1220, wherein the memory 1220 stores processor-executable instructions to cause the processor to: identifying actionable sentences according to user input, and extracting core task descriptions from the actionable sentences, wherein the core task descriptions comprise verb entities and object entities; assigning intent classes to actionable statements by providing features to a machine classifier, the features including actionable statements and core task descriptions; and performing, using the processor, at least one action associated with the assigned intent category.

In the present specification and claims, it will be understood that when an element is referred to as being "connected to" or "coupled to" another element, it can be directly connected or coupled to the other element or intervening elements may be present. In contrast, when an element is referred to as being "directly connected to" or "directly coupled to" another element, there are no intervening elements present. Further, when an element is referred to as being "electrically coupled" to another element, it is meant that a low resistance path exists between the elements, whereas when an element is referred to as simply being "coupled" to another element, there may or may not be a low resistance path between the elements.

The functions described herein may be performed, at least in part, by one or more hardware and/or software logic components. By way of example, and not limitation, exemplary types of hardware logic components that may be used include Field Programmable Gate Arrays (FPGAs), Application Specific Integrated Circuits (ASICs), program specific standard products (ASSPs), systems on a chip (SOCs), Complex Programmable Logic Devices (CPLDs), and so forth.

While certain illustrated embodiments have been shown in the drawings and have been described above in detail, the invention is susceptible to various modifications and alternative constructions. It should be understood, however, that there is no intention to limit the invention to the specific forms disclosed, but on the contrary, the intention is to cover all modifications, alternative constructions, and equivalents falling within the spirit and scope of the invention.

26页详细技术资料下载

上一篇：一种医用注射器针头装配设备

下一篇：数据传输系统中的信道建模

Artificial intelligence system for inferring realistic intent

相关技术

网友询问留言