Virtual assistant for generating personalized responses within a communication session

文档序号：1409749 发布日期：2020-03-06 浏览：6次中文

阅读说明：本技术 用于在通信会话内生成个性化响应的虚拟助手 (Virtual assistant for generating personalized responses within a communication session ) 是由 A·米勒 S·魏因伯格 H·索姆奇 H·费托西于 2018-06-27 设计创作，主要内容包括：公开了用于自动生成对通信会话(CS)内的内容的响应的智能代理(IA)。IA被训练为以对用户的响应以及用户在CS内的上下文为目标。IA接收包括编码用户的对话的自然语言表达的CS内容,并且基于自然语言模型来确定内容特征。所述内容特征指示所述表达的预期语义。IA识别与目标用户可能相关的内容以生成对其的响应。识别这样的内容包括基于内容特征、CS的上下文、用户-兴趣模型、以及内容-相关性模型来确定内容的相关性。识别要响应的可能相关的内容是基于内容的所确定的相关性以及相关性阈值的。对内容的所识别的部分的各种响应被自动生成,并且基于以该用户为目标的自然语言响应-生成模型而被提供。(An Intelligent Agent (IA) for automatically generating responses to content within a Communication Session (CS) is disclosed. The IAs are trained to target responses to the user and the user's context within the CS. The IA receives CS content comprising a natural language expression encoding a dialog of a user and determines content characteristics based on a natural language model. The content features indicate expected semantics of the expression. The IA identifies content that may be relevant to the target user to generate a response thereto. Identifying such content includes determining relevance of the content based on the content characteristics, the context of the CS, a user-interest model, and a content-relevance model. Identifying potentially relevant content to respond is based on the determined relevance of the content and a relevance threshold. Various responses to the identified portion of content are automatically generated and provided based on a natural language response-generating model targeted to the user.)

1. A computerized system comprising:

one or more processors; and

computer storage memory having computer-executable instructions stored thereon that, when executed by the one or more processors, implement a method comprising:

receiving content exchanged within a Communication Session (CS), wherein the content includes one or more natural language expressions encoding a portion of a conversation conducted by a plurality of users participating in the CS;

determining a relevance of the content based on a user-interest model of a first user of the plurality of users and a content-relevance model of the first user;

identifying one or more potentially relevant portions of the content based on the relevance of the content and one or more relevance thresholds, wherein the identified one or more potentially relevant portions of the content are potentially relevant to the first user; and

generating a response to the one or more potentially relevant portions of the content based on a response-generating model of the first user.

2. The system of claim 1, wherein the method further comprises:

determining one or more content features based on the content and one or more natural language models, wherein the one or more content features indicate expected semantics of the one or more natural language expressions;

determining the relevance of the content further based on the content features; and

the response to one or more potentially relevant portions is also generated based on the one or more content characteristics.

3. The system of claim 1 or 2, wherein the method further comprises:

receiving metadata associated with the CS;

determining one or more contextual features of the CS based on the received metadata and a CS context model, wherein the one or more contextual features indicate a context of the conversation of the first user; and

generating the response to the one or more potentially relevant portions of the content based also on the one or more contextual characteristics of the CS.

4. The system of claim 1, 2 or 3, wherein the method further comprises:

receiving user feedback based on the responses to the one or more potentially relevant portions of the content; and

updating the response-generation model based on the user feedback.

5. The system of claim 4, wherein the method further comprises:

determining one or more content-substantive features based on the content and a content-substantive model included in the one or more natural language models, wherein the one or more content-substantive features indicate one or more topics discussed in the conversation; and

determining one or more content-style features based on the content and a content-style model included in the one or more natural language models, wherein the one or more content-style features indicate an emotion of at least one of the plurality of users; and

generating the response to the one or more potentially relevant portions of the content based also on the one or more content-substantive features and the one or more content-stylistic features of the content.

6. The system of claim 1, 2 or 3, wherein the method further comprises:

determining one or more content-substantive features to encode in the response based on other content-substantive features encoded in the potentially relevant portions of the content;

determining one or more content-style features to encode in the response based on other content-style features encoded in the potentially relevant portions of the content; and

generating the response to the one or more potentially relevant portions of the content such that the response encodes the one or more content-substantive features and the one or more content-stylistic features.

7. The system according to any one of claims 1-6, wherein the method further comprises:

providing the response to the one or more potentially relevant portions of the content to the first user when the system is operating in a semi-automatic mode; and

providing the response to the one or more potentially relevant portions of the content to the CS when the system is operating in an automatic mode.

8. A method, comprising:

determining a relevance of the content based on a user-interest model of a first user of the plurality of users and a content-relevance model of the first user;

generating a response to the one or more potentially relevant portions of the content based on a response-generating model of the first user.

9. The method of claim 8, further comprising:

determining the relevance of the content further based on the content features; and

a response to one or more potentially relevant portions is also generated based on the one or more content characteristics.

10. The method of claim 8 or 9, further comprising:

receiving metadata associated with the CS;

generating the response to the one or more potentially relevant portions of the content based also on the one or more contextual characteristics of the CS.

11. The method of claim 8, 9 or 10, further comprising:

identifying sub-portions of the one or more potentially relevant portions of the content based on the relevance of the content and a further relevance threshold, wherein the identified sub-portions of the one or more portions of the content are highly relevant to the first user;

generating a response to highly relevant content based on the response-generation model of the first user; and

providing real-time notification of the identified highly relevant content and the response to the highly relevant content to the first user.

12. The method according to any of claims 8-11, further comprising:

13. The method according to any of claims 8-12, further comprising:

receiving user feedback based on the responses to the one or more potentially relevant portions of the content; and

updating, based on the user feedback, at least one of: the response-generation model, the content-substance model, or the content-style model.

14. The method according to any of claims 8-13, further comprising:

providing the response to the one or more potentially relevant portions of the content to the first user while operating in a semi-automatic mode; and

providing the response to the one or more potentially relevant portions of the content to the CS when operating in an automatic mode.

15. The method according to any of claims 8-11, further comprising:

determining one or more content-substantive features to encode in the response based on other content-substantive features encoded in the potentially relevant portions of the content;

determining one or more content-style features to encode in the response based on other content-style features encoded in the potentially relevant portions of the content; and

Background

People are increasingly finding themselves multitasking among multiple jobs distributed across multiple computing interfaces, applications, platforms, displays, and devices. For example, due to the increasing popularity of networked computing, users may now engage in one or more communication sessions, such as chat or Instant Messaging (IM) sessions, simultaneously while shopping online, generating work-related documents, scheduling vehicle maintenance, or other multiple tasks.

Active participation in a communication session often requires reading (or otherwise consuming), processing, and responding to conversations within the communication session. Thus, even if the computing device enables the user to participate in separate and distinct tasks simultaneously, the user may still experience difficulty in the real-time execution of the various functions associated with each task. For example, it is challenging for a user to actively monitor, track, and respond to conversations while comparing products, drafting notes on work, or reviewing a calendar for the availability of their automobile to a service station on an electronic trading website.

Because active participation in a communication session is challenging when the user's attention is shifted to other tasks, the user may risk losing the opportunity to respond at important moments in the conversation. Thus, a user attempting to multitask may appear unresponsive and/or disengaged from other users. In addition, the user may lose the opportunity to provide the dialog with its unique perspective. More succinctly, the utility and user experience of a real-time communication session may be reduced when a user attempts to distribute his or her attention across multiple tasks.

Disclosure of Invention

This summary is provided to introduce a selection of concepts in a simplified form that are further described below in the detailed description. It should be understood that this summary is not intended to identify key features or essential features of the claimed subject matter, nor is it intended to be used as an aid, in isolation, in determining the scope of the claimed subject matter.

Embodiments described in this disclosure are directed to providing enhanced services to users participating in one or more Communication Sessions (CSs). In some embodiments, at least a portion of such CS services may be provided via an enhanced Virtual Assistant (VA), an enhanced Intelligent Agent (IA), an enhanced "chat bot," or as part of a computer program or service for facilitating communications. Enhanced services may include analyzing the CS, identifying potentially relevant content of the CS, generating a response to the potentially relevant content, and providing a response to the identified potentially relevant content. Various enhanced services may be enabled by one or more methods. One such method may be implemented by receiving content exchanged in the CS. The content includes natural language expressions or utterances that encode conversations conducted by a plurality of users participating in the CS. The method may be implemented by determining content characteristics of the content based on the content and a natural language model. The content features indicate the expected semantics of the natural language expression. The method may be further implemented by determining relevance of the content and identifying portions of the content that are likely to be relevant to the user. Determining the relevance of the content may be based on the content characteristics, a user-interest model of the user, a context of the user, and a content-relevance model of the user. Identifying potentially relevant content is based on the relevance of the content and various relevance thresholds. Responses to potentially relevant portions of the identified content may be generated based on the response-generation model. The response may be provided to the user and the CS.

In various embodiments, the method may further include monitoring user activity of the user, and identifying and/or inferring a user-activity pattern based on the monitored user activity. Additionally, user-activity-patterns may be inferred from other data sources (e.g., user history including user profile information). User interest and/or content-relevance models may be generated, updated, and/or trained based on the inferred user-activity patterns. The method may also include monitoring various CSs in which the user is engaged, and identifying and/or inferring patterns in the content of the monitored CSs. For example, various patterns of content-substantive and content-stylistic features within user-provided content may be inferred. Content-substantive models, content-stylistic models, and response-generic models may be generated, updated, and/or trained based on the inferred content-substantive and content-stylistic patterns. When the user is out of routine as additional input for calculating the relevance rank, the method may include various features of the current user context other than the CS.

The method may also include receiving additional data associated with the CS, such as, but not limited to, metadata associated with the CS. The context characteristics of the CS may be determined based on the received further data and the CS context model. The contextual characteristics may indicate the context of the conversation. In some embodiments, the context features the context of the CS of a particular user. The relevance of the content may also be determined based on contextual characteristics of the CS. Generating responses to potentially relevant portions of content may also be based on contextual characteristics of the CS.

The method may be further implemented by identifying sub-portions of potentially relevant portions of the content that are highly relevant to the user. Identifying highly relevant portions of content may be based on the relevance of the content and an additional relevance threshold. The further relevance threshold may be greater than the relevance threshold for identifying the likely relevant portion of the content. For example, the additional threshold may be a highly correlated threshold or an urgency threshold. The response to the highly relevant content may be generated based on a response-generation model. A real-time or near real-time notification of highly relevant content and a response to the highly relevant content may be provided to the first user.

In one embodiment, the method is implemented by determining content-substantive features and content-stylistic features. The content-substantive features and the content-stylistic features are based on a content substantive and content-stylistic model included in the natural language model. The content-substantive feature may indicate a subject matter discussed in the conversation. The content-style characteristics may include characteristics other than essential characteristics of the content. For example, the content-style characteristics may indicate emotions, tones, volumes, tempos, tones, or other style-related characteristics of one or more of the users participating in the CS, or variations or changes in one or more of the style-related characteristics. The determined relevance of the content may also be based on content-substantive features and content-stylistic features. Generating a response to the potentially relevant content may also be based on content-substantive features and content-style features of the potentially relevant content.

The method may also be implemented by generating a summarized version of at least some of the potentially relevant portions of the content. The abstracted version may be generated using one or more natural language models. Responses to potentially relevant content may also be generated based on the summarized versions of the potentially relevant portions of the content. The method may provide a summarized version of the potentially relevant content to the user.

The method may also include determining a content-substantive feature to encode in the response. Determining content-substantive characteristics to encode in the response may be based on other content-substantive features and/or content-stylistic features encoded in potentially relevant portions of the content. In some embodiments, a response-generating model and/or a content-substantive model are employed to determine content-substantive features to be encoded in the response. A content-style characteristic to be encoded in the response may be determined. The determination of the content-style characteristics to encode in the response may be based on other content-substantive characteristics and/or content-style characteristics encoded in the relevant portion of the content. In some embodiments, a response-generating model and/or a content-style model are employed to determine content-substantive features to be encoded in the response. When the method operates in a semi-automatic mode, the user is first provided with a response to a potentially relevant portion of the content. When the method operates in the automatic mode, a response to a possibly relevant part of the content is provided to the CS.

In various embodiments, the user may provide feedback to the system regarding the accuracy and/or utility of the response. Various embodiments may employ this feedback to further train and/or update various models, such as, but not limited to, a content-relevance model, a user-interest model, a user context model, a content-substantive model, a content-style model, or a response-generation model. For example, a user may be provided with a User Interface (UI) that enables the user to provide feedback regarding the automatically generated response. Such a UI may include one or more buttons, sliders, levers, scales, etc. that enable the user to score and/or annotate the response. For example, the user may score the responses with a "4 in 5" star or some other scale. The scoring scale may be a binary scale, such as a "thumbs up" or "thumbs down" scale. The user may annotate and/or correct the response. In one embodiment, a prompt may be provided to the user for each portion of the response, where the user may indicate the utility of each portion. The prompt may individually highlight each portion of the response as the user browses and evaluates the various portions of the summary. Various models may then be refined and/or updated based on the training employing such feedback.

In other embodiments, user feedback may be provided via other mechanisms. For example, the method may include receiving at least one other response to the potentially relevant portion. The other responses may be manually organized (curl) or edited versions of the automatically generated responses. In another embodiment, the other response may be generated manually by the user. This other response may be used as user feedback for training purposes. A comparison of the response to the other responses may be generated. Models including a content-relevance model and a response-generation model may be updated based on a comparison of the response to the other responses. That is, other responses of the CS may be used as "true-data" or "baseline" responses of the training content-relevance model and the response-generation model. In this manner, embodiments of the present disclosure provide enhanced utility and user experience in participating in one or more CSs.

Drawings

Aspects of the disclosure are described in detail below with reference to the attached drawing figures, wherein:

FIG. 1 is a block diagram of an example operating environment suitable for implementations of the present disclosure;

FIG. 2 is a diagram depicting an example computing architecture suitable for implementing aspects of the present disclosure;

fig. 3 depicts a flow diagram of a method for providing enhanced communication session services in accordance with an embodiment of the present disclosure;

FIG. 4 depicts a flow diagram of a method for identifying content in the communication session of FIG. 3 that may be relevant to providing a response thereto in accordance with an embodiment of the present disclosure;

FIG. 5 depicts a flow diagram of a method for generating and providing responses to the communication session content of FIG. 3 in accordance with an embodiment of the present disclosure; and

FIG. 6 is a block diagram of an exemplary computing environment suitable for use in implementing embodiments of the present disclosure.

Detailed Description

The subject matter of aspects of the present disclosure is described with specificity herein to meet statutory requirements. However, the description itself is not intended to limit the scope of this patent. Rather, the inventors have contemplated that the claimed subject matter might also be embodied in other ways, to include different steps or combinations of steps than those described in this document, in conjunction with other present or future technologies. Moreover, although the terms "step" and/or "block" may be used herein to connote different elements of methods employed, the terms should not be interpreted as implying any particular order among or between various steps herein disclosed unless and except when the order of individual steps is explicitly described.

Aspects of the present disclosure relate to providing various enhanced services to users participating in one or more Communication Sessions (CSs). As discussed throughout this document, the provided services may enhance the utility and user experience of users participating in one or more CSs. In particular, embodiments herein actively monitor and analyze the content and context of one or more CSs via various Natural Language Processing (NLP) and other Machine Learning (ML) methods. Based on the analysis of the content and context of the CS, embodiments automatically generate responses that the user may manually provide in an ongoing conversation in the CS. NLP and other ML methods are used to generate the possible responses of the user. That is, embodiments employ an ML approach to "learn" the possible responses that a user would otherwise provide manually in a conversation based on the content and context of the conversation. Accordingly, embodiments provide an Intelligent Agent (IA) customized for a user, wherein the IA automatically provides input content (i.e., responses) that may facilitate conversations conducted within one or more CSs. Automatic generation of possible responses may release the user's attention to focus on other activities. Thus, the user may appear to concentrate on an ongoing conversation while simultaneously focusing on other tasks.

Various data models, such as, but not limited to, Machine Learning (ML) data models, are employed to monitor and analyze data related to the CS, including the CS' content, metadata, and other data associated with the CS, in real-time or near real-time. That is, a data model is employed to determine the context of the CS and to determine the relevance of various portions of the content based on the determined context and features encoded in the content. Portions of content that are likely to be relevant to the user based on the user's activities and interests are identified. The context of the identified relevant portions of the content may also be determined. Based on the determined context and the nature and style of the relevant portion, various embodiments generate additional content (e.g., a response) for the CS, where the nature and style of the response is targeted to the user of the CS and the context about the user. That is, embodiments automatically generate a response for the CS, where the generated response is targeted to the user and the user's context within the CS.

More specifically, embodiments generate a logical response based on the analyzed content and content substantive data model. The logical response may include the semantics or substance of the response. For example, if the response includes an answer to a question posed to the user in potentially relevant content, the logical response may include an answer to the question. The logical response is updated to generate a stylized response that targets the user's conversational style based on the content-style data model. The stylized response is a personalized response that includes both the substance of the response and stylized features that simulate the user's style of conversation. In the above example, the stylized response may include an answer to the question, but is spoken in a manner that simulates or mimics the dialog style of the user in answering the question.

47页详细技术资料下载

上一篇：一种医用注射器针头装配设备

下一篇：可扩展用户简档

Virtual assistant for generating personalized responses within a communication session

相关技术

网友询问留言