Automatically generating conversational services from computing applications

文档序号：789606 发布日期：2021-04-09 浏览：8次中文

阅读说明：本技术 从计算应用自动地生成对话式服务 (Automatically generating conversational services from computing applications ) 是由 O·利瓦 J·A·卡克 D·C·伯格李佳君于 2019-05-21 设计创作，主要内容包括：公开了一个或多个面向任务的对话式机器人程序的自动生成。说明性地,提供了系统和方法,该系统和方法允许跟踪一个或多个计算应用的交互,包括：与一个或多个计算应用的一个或多个编程性元件的交互、与该一个或多个计算应用的(多个)图形用户界面的交互、和/或该一个或多个计算应用正在其上执行的一个或多个计算环境的操作,以用于收集各种状态数据。状态数据可以说明性地用图形表示,以示出一个或多个计算应用的一个或多个功能/操作的总体执行路径,以用于在生成表示面向期望任务的对话式机器人程序的一个或多个指令时使用,这些指令可以通过一个或多个计算应用的一个或多个应用编程接口操作性地执行。(Automated generation of one or more task-oriented conversational robot programs is disclosed. Illustratively, systems and methods are provided that allow for tracking interactions of one or more computing applications, including: interaction with one or more programmatic elements of one or more computing applications, interaction with graphical user interface(s) of the one or more computing applications, and/or operation of one or more computing environments on which the one or more computing applications are executing, for collecting various state data. The state data may illustratively be graphically represented to show an overall execution path of one or more functions/operations of one or more computing applications for use in generating one or more instructions representing the task-oriented conversational robot program, which may be operatively executed through one or more application programming interfaces of the one or more computing applications.)

1. A system, comprising:

at least one processor; and

at least one memory having computer-readable instructions stored thereon that, when executed by the at least one processor, cause the at least one processor to:

receiving first input data comprising data representing one or more user interactions with one or more programmatic elements, data representing the one or more programmatic elements, and data representing a state of an application in which one or more of the programmatic elements are being executed;

processing the first input data to generate intermediate processed input data by:

segmenting the first input data according to one or more selected application states;

identifying one or more dependencies between the one or more selected application states; and

extracting one or more service Application Programming Interfaces (APIs) from the application using the identified one or more states and one or more dependencies; and

generating a user interface using the intermediate processing input data by:

generating one or more questions for the identified one or more dependencies;

generating a sample of user responses to the generated one or more questions; and

generating a sample of a user utterance for triggering execution of the extracted one or more service APIs by the application.

2. The system of claim 1, wherein the one or more user interactions comprise one or more trajectories of behavior of the application.

3. The system of claim 1, wherein the data representative of the one or more programmatic elements comprises: data representing a user interface of the application.

4. The system of claim 1, wherein the instructions further cause the at least one processor to segment the first input data according to one or more selected application states, the one or more selected application states comprising: one or more states of a user interface of an application executable on the computing environment, and one or more states representing one or more internal processing states of the application executable on the computing environment.

5. The system of claim 1, wherein the data representative of the state of the application comprises: a dynamic application state, a static application state, and a state of an underlying operating system for a computer environment on which the application is being executed, in which the programmatic element is being executed.

6. A computer-implemented method, comprising:

receiving, by a computing environment, first input data comprising data representing one or more user interactions with one or more programmatic elements, data representing the one or more programmatic elements, and data representing a state of an application in which the one or more programmatic elements are being executed;

processing the first input data to generate intermediate processed input data by:

segmenting the first input data according to one or more selected application states;

ordering the one or more selected application states as one or more elements of a state diagram; and

extracting one or more service Application Programming Interfaces (APIs) as one or more paths of the state diagram, the one or more paths comprising: data representing one or more parameters for the extracted one or more service APIs; and

generating a natural user interface using the intermediate processing input data by:

generating one or more questions for the identified parameters;

generating a sample of one or more user responses to the generated one or more questions; and

generating a sample of one or more user utterances for execution by the application of the extracted one or more service APIs.

7. The computer-implemented method of claim 6, further comprising: performing tracking of the application operating on the computing environment, the tracking generating data representing data processed by one or more functions of the application.

8. The computer-implemented method of claim 6, wherein segmenting the first input data according to one or more selected application states comprises: segmenting the first input data according to one or more states of a user interface of the application and one or more internal processing states of the application.

9. The computer-implemented method of claim 6, wherein the state diagram further comprises data representing one or more dependencies between the one or more selected application states.

10. The computer-implemented method of claim 9, wherein the one or more paths include data representing one or more states of the application.

11. A computer-readable storage medium having stored thereon computer-executable instructions that, when executed by one or more processors of a computing device, cause the one or more processors of the computing device to:

receiving first input data, the first input data comprising: data representing one or more user interactions with a programmatic element, data representing said programmatic element, and data representing a state of an application in which said programmatic element is being executed;

processing the first input data to generate intermediate processed input data by:

segmenting the first input data according to one or more selected application states;

identifying one or more dependencies between the one or more selected application states; and

extracting one or more service Application Programming Interface (API) from the application using the identified one or more dependencies; and

generating a natural user interface using the intermediate processing input data by:

generating one or more questions for the identified one or more dependencies;

generating a sample of user responses to the generated one or more questions; and

generating a sample of one or more utterances for execution by the application of the extracted one or more service APIs.

12. The computer-readable storage medium of claim 11, wherein the instructions further cause the one or more processors of the computing device to:

the tracking generates data representing data processed by one or more functions of the application.

13. The computer-readable storage medium of claim 12, wherein the instructions further cause the one or more processors of the computing device to:

ordering the one or more selected application states as one or more elements of a state diagram, the state diagram comprising: data representing one or more dependencies between the one or more selected application states.

14. The computer-readable storage medium of claim 11, wherein the instructions further cause the one or more processors of the computing device to:

extracting the one or more service APIs as one or more paths on the state diagram, the one or more paths including: data representing one or more states of the application.

15. The computer-readable storage medium of claim 11, wherein the instructions further cause the one or more processors of the computing device to:

defining one or more service APIs from the received input data.

Background

In recent years, the prospect and excitement regarding conversational services has grown rapidly. In addition to the popularity of smart ASSISTANTs, there is the rise in the use and execution of specialized robot programs (bots) (e.g., also called "skills" in the ALEXA virtual ASSISTANT from amazon, com, inc. and the CORTANA virtual ASSISTANT from MICROSOFT corp. and "actions" in the GOOGLE ASSISTANT (GOOGLE ASSISTANT) virtual ASSISTANT from GOOGLE llc.). Particularly useful are task-oriented chat bots that act as agents on behalf of users to interact with external services using natural language dialogues through the execution of one or more operations of a corresponding computer application to accomplish a particular task (booking a taxi, reserving a restaurant, or finding a menu).

Currently, most conversational services can be built using a slot-filling method. With such an approach, the user's phrase (e.g., "i want a cup of coffee") indicates intent, actions supported by the system, such as "click coffee" (along with additional parameters such as type, size). The input parameters necessary for the intended execution, such as coffee type and size, may be described as slots. With the slot filling method, the control structure can operatively define operations for a multi-session between the system and the user to collect all slots necessary to meet the intent.

Slot filling has proven reliable, but requires significant developer effort. First, the control structure, usually in the form of a finite state automaton, needs to be designed manually for each task. Such control structures can be complex, as they need to take into account many possible execution paths. Second, to support user interaction with natural language, a model for understanding user questions and answers needs to be trained. Training typically requires many utterances (utterances), i.e., example phrases that the user may use during the interaction. For example, even the object is "what is your party size? "such simple questions, the user can answer in many different ways, such as" 3 "," all of us "," i am wife ", or" uncertain, i tell you later ". To train a robust model, the developer must consider all such possible phrase variants and provide many utterances for each slot and for each intent. Finally, the developer must enumerate possible values for each slot to facilitate slot identification in language understanding. As a result, this entire process requires significant manual code writing, thus hindering scalability to new tasks and domains.

An alternative to slot filling is a corpus-based approach, where the bot is automatically trained from a dataset of past conversations. This approach has shown the prospect of "chatty" bots for non-task-oriented bots, but it is unclear whether it can model task-oriented bots alone. Systems that are purely machine-learned cannot guarantee that critical in-task constraints are met (e.g., users cannot reserve restaurants without a specified time), and they lack a model to ensure completion of the actual goals in the task. These systems are also difficult to train due to the deficiencies of domain-specific dialog logs.

The mixing process also exists. For example, Hybrid Code Networks (HCNs) are methods to make machine-learned systems practical by combining the rules of developer manual coding with a recurrent neural network. HCNs reduce the amount of training data at the expense of developer effort. Another hybrid approach is a "knowledge-based" dialogue model,which will be derived from the textual data (e.g.,) To slave conversational dataTo generate an informative response in the derived model. However, these models only work for single-echo responses and rely on knowledge sources in the field.

With respect to these considerations and others, the disclosures made herein are presented.

Disclosure of Invention

The techniques described herein allow for the automatic creation of task-oriented conversational bots for execution in collaboration with one or more computing applications. Operationally, the methods of the systems described herein provide for tracking (tracing) of interactions with a computing application (including interactions with a GUI of the computing application) in the case of performing one or more computing application tasks/operations to achieve a result for a particular task, to retrieve state data about the computing application, the computing environment on which the computing application is executing, and the state of the GUI, for automatically generating a task-oriented bot that operates to perform one or more desired tasks. In this way, the following form of on-demand technical benefit can be achieved: improving overall processing efficiency associated with the generation and execution of one or more task-oriented conversational robot programs.

To achieve the technical benefits briefly mentioned above, illustratively, the systems and methods described herein provide support for automatic generation of conversational services, which results in enhanced processing efficiency and scalability of the computational process that generates the conversational services. In illustrative operations, data may be collected representing one or more interactions with one or more computing applications executable on one or more computing environments, and/or a trajectory (trace) of one or more operations/features/commands of one or more computing environments and/or GUIs of one or more computing environments (if available). The collected data may include, but is not limited to, data representing the state of one or more computing applications, the state of one or more computing environments, and the state of one or more GUIs (if available).

Additionally, the collected data may be used as input data to an exemplary question-answer interface generation system that includes a neural network model that has been trained using open-field dialog logs. In an illustrative implementation, one or more generated conversational services are executable by one or more mobile applications such that the created service API calls may be executed according to each user's interaction. Illustratively, the collected data, which is specific to interaction with one or more GUIs of one or more computing applications, may be operatively employed to create task-oriented "skills" and/or "actions" for one or more personal digital assistants (such as CORRANA virtual assistants from MICROSOFT CORP).

In one illustrative implementation, an authoring tool for a task-oriented conversational bot may be provided, which may include a graphical user interface to allow a participating user (e.g., developer/author) to interact with data collected for display in an illustrative authoring tool GUI as part of the automatic creation of the task-oriented conversational bot. In this illustrative implementation, the collected data may include a state diagram having one or more execution paths that illustrate the execution of various operations/features/commands of one or more computing applications, one or more GUIs, and/or one or more computing environments. Illustratively, a task-oriented conversational robot program may be generated synchronously or asynchronously with the operation of a selected one or more execution paths to achieve a desired series of functions/operations to be performed by one or more computing applications.

It should be appreciated that although described with respect to a system, the above-described aspects may also be implemented as a computer-controlled apparatus, a computer process, a computing system, or as an article of manufacture such as a computer-readable medium. These and various other features will become apparent from a reading of the following detailed description and a review of the associated drawings. This summary introduces some concepts in a simplified form that are further described below in the detailed description.

This summary is not intended to identify key features or essential features of the claimed subject matter, nor is it intended to be used to limit the scope of the claimed subject matter.

Drawings

Specific embodiments are described with reference to the accompanying drawings. In the drawings, the left-most digit of a reference number identifies the drawing in which the reference number first appears. The use of the same reference numbers in different figures indicates similar or identical items. References to individual items in a set of plural numbers of items may refer to each individual item using a reference number having a letter in a sequence of letters. A generalized reference to this set of plural-numbered terms may use a specific reference number that does not contain a letter sequence.

FIG. 1 illustrates an exemplary slot-based conversational robot program building process.

FIG. 2 illustrates an exemplary user interface for use in a robot program build process according to the systems and methods described herein.

FIG. 3 illustrates a block diagram of an exemplary architecture for use in robot program building in accordance with the systems and aspects described herein.

FIG. 4 illustrates an exemplary interaction trajectory of a user with an illustrative computing application for creating an automated task-oriented conversational robot program in accordance with the systems and methods described herein.

FIG. 5 illustrates an exemplary user interface trajectory for an illustrative computing application for creating an automated task-oriented conversational robot program in accordance with the systems and methods described herein.

FIG. 6 illustrates an exemplary bot visualization user interface operable to allow a participating user to graphically illustrate operations used in the creation of a task-oriented conversational bot.

FIG. 7 is a flow diagram of an illustrative process for automatically creating a task-oriented conversational robot program in accordance with the systems and methods described herein.

FIG. 8 is a computer architecture diagram showing an illustrative computer hardware and software architecture for a computing device according to one embodiment.

FIG. 9 is a network diagram illustrating a distributed computing environment in which aspects of the disclosed technology may be implemented in accordance with various embodiments presented herein.

Detailed Description

The following detailed description describes techniques that enable automatic creation of task-oriented conversational robotics programs for execution on one or more computing applications. In one illustrative implementation, one or more computing applications may support a graphical user interface having one or more programmatic elements that may operatively provide support for the performance of one or more operations on the one or more computing applications. These illustrative one or more programmatic elements may include, but are not limited to, buttons, menus, dialog boxes, etc., that when interacted with, initiate execution of one or more instructions in one or more computing applications.

Operationally, the methods of the systems described herein support tracking interactions with a computing application (including interactions with a GUI of the computing application) to retrieve state data regarding the computing application, the computing environment on which the computing application is executing, and the state of the GUI, in the case of performing one or more computing application tasks/operations to achieve a result for a particular task, for automatically generating a task-oriented bot operable to perform one or more desired tasks. By automatically launching (bootstrap) a conversational service from one or more computing applications, the number of processing steps required to generate instructions required to perform one or more operations on the one or more computing applications is reduced, thereby increasing the processing efficiency of a computing environment on which the one or more computing applications may be executed.

To the accomplishment of the technical advantages briefly mentioned above, illustratively, the systems and methods described herein provide support for the automatic generation of conversational services, in which illustrative operations data may be collected representing one or more interactions with one or more computing applications executable on one or more computing environments, and/or the tracking of one or more operations/features/commands of one or more computing environments and/or of a GUI of one or more computing environments, if available. The collected data may include, but is not limited to, data representing the state of one or more computing applications, the state of one or more computing environments, and the state of one or more GUIs (if available).

In one illustrative implementation, the collected data may be used to define one or more service Application Programming Interfaces (APIs) that may be operable to cooperate with one or more computing applications, one or more computing environments, and/or one or more GUIs (if available) of one or more computing applications. The collected data may also be used to generate one or more task models available to the exemplary robot program or personal assistant system for the creation and execution of one or more tasks. Additionally, the collected data may be used as input data by an exemplary question-answer interface generation system operable to use one or more neural network models that have been trained from open-field conversational logs. In one illustrative implementation, one or more generated conversational services are executable by one or more mobile applications such that the created service API calls may be executed according to each user's interaction. Illustratively, the collected data, specific to interaction with one or more GUIs of one or more computing applications, may be operatively employed to create task-oriented "skills" and/or "actions" for one or more personal digital assistants (such as CORRANA virtual assistant from MICROSOFT CORP).

In one illustrative implementation, an authoring tool for a task-oriented conversational bot may be provided, which may include a graphical user interface to allow a participating user (e.g., developer/author) to interact with data collected for display in an illustrative authoring tool GUI as part of the automatic creation of the task-oriented conversational bot. In this illustrative implementation, the collected data may include a state diagram having one or more execution paths illustrating the execution of various operations/features/commands of one or more computing applications, one or more GUIs, and/or one or more computing environments. Illustratively, a task-oriented conversational robot program may be generated along selected operation(s) of one or more execution paths, synchronously or asynchronously, to achieve a desired sequence of functions/operations to be performed by one or more computing applications.

In one illustrative implementation, the systems and methods described herein may use an exemplary User Interface (UI) tree model to represent exemplary computer application interaction data in the screen of a UI, including but not limited to UI elements and relationships between UI elements, and one or more interactions that may be performed on them. In an illustrative implementation, a task model from interaction with an exemplary computing application may be derived through one or more trajectories by operatively abstracting UI screens and events into intents and slots. In an illustrative operation, the trajectory data may be further segmented to identify one or more operational states of a computing application executing an exemplary user interface. Such states may include dynamic and static computing application states. The computing application state data may be used when generating the robot program, instructing how the computing application functions during execution of the application operations and/or functions.

Illustratively, the task model may represent a logical backbone (backbone) of the robot program. In one illustrative implementation, a question-answer interface for a dialog may be automatically established, which operatively utilizes a hybrid rules-and neural network-based approach. In illustrative operation, using the GUI and on-screen content of an exemplary computing application, the nature and semantics of the slots can be deduced, thus prompting questions can be generated by means of some semantic rules. To generate questions for slots that cannot be semantically classified, and to generate utterances for user answers, one or more neural network transduction (transduction) models may be trained (e.g., by utilizing millions of conversation pairs provided by one or more large social media networks, such as the TWITTER communication service from TWITTER inc.).

In one illustrative implementation, the task-oriented bot may understand user requests and perform related tasks. As shown in FIG. 1, the exemplary task-oriented bot creation process 100 may include, but is not limited to, three operations: 110; 130, and 150. As shown, operation 110 may include defining a control structure. Operationally, a software developer can define a control structure that represents the kind of action (referred to as "intent" 115) that an exemplary computing environment can support. Illustratively, intent 115 may be associated with one or more slots 120 and may have one or more constraints or dependencies 125 on other intents. Constraints may be viewed as one or more conditions specific to the intended execution.

For example, in a restaurant reservation robot program, an intent named reservation restaurant may be defined and may have slots that may include: 1) a restaurant name; 2) gathering scale; and 3) time. The corresponding intent named confirmation reservation may have a dependency on the intent to reserve the restaurant. In an exemplary operation, based on such an exemplary structure, the robot program may perform slot filling, i.e., collect the slots (e.g., required data elements) for each intent and then perform the associated action(s).

At operation 130, the developer may define a conversational interface for a control structure (e.g., a set of intents 135 that an exemplary computing environment may support) (e.g., by training a machine learning model capable of language processing) that operatively performs one or more functions, including but not limited to: 1) identifying a user intent 135 and a slot 120; 2) ask the user 140 for the required slot 120; and 3) understand user answer 145. For example, from the phrase "i want a two-person table," the exemplary computing environment executing the creation process 100 may identify an intent called reserving a restaurant, and the slot party size equals two, and then ask the user for the missing slot time. The resulting intent may then be communicated to a collaborative services backend operation of the exemplary computing environment at operation 150, which has the ability to invoke one or more service APIs and return results 160 from such API calls to participating users (not shown) (e.g., confirm the reservation).

As shown in fig. 2, an exemplary bot creation tool 200 with various user interface controls and dialog boxes is shown. Operationally, in using the exemplary bot creation tool 200, a developer may supply one or more examples of user utterances 210, which may be operationally mapped to intentions 220 in a task. In this illustrative operation, the developer may define one or more slots 230 for intent 220, to mark one or more slot values 230 that appear in the provided utterance 210 (the marking of the slot values may illustratively provide data elements required for performance of the intent), and to provide a cued question for each slot and specify the order of the cueing 240.

Operationally, the developer may repeat these operations to parse the user answers, i.e., specify phrases for possible user answers, and tag the slot values in the example answers. Operationally, to facilitate identifying slots from user phrases, possible values for each slot 240 may be provided. In one illustrative implementation, after many utterance samples are specified for intent 220 and slot 240, a machine learning model may be trained that may be operative to provide reasonable accuracy on input similar to the examples.

FIG. 3 illustrates various collaboration components and functions of an exemplary task-oriented conversational bot generation environment 300. As illustrated, the exemplary task-oriented conversational robot program generation environment 300 may include a neural sequence transduction module 310, a task model extraction module 330, and a question/answer generation module 370. Further, as shown in fig. 3, the neural sequence transduction module may be operable to perform various functions including, but not limited to, data collection 315 (e.g., data set 317), data annotation 320, and model training 325. Operationally, data collection can be viewed as a process of collecting data required for the generation of a conversational service, data tagging can be viewed as a process of associating one or more selected characteristics to the collected data, and model training can be viewed as one or more instructions that can be inferred from the collected data.

The task model extraction module 330 may be operative to perform various functions including, but not limited to, trajectory collection 335 (e.g., collecting data representing user interactions with an exemplary computing application), trajectory aggregation 340 (e.g., integrating collected trajectories according to one or more selected rules), and intent/slot extraction 345 (e.g., defining intents and slots associated with the intents based on the collected trajectory data).

The question/answer generation module 370 may include, but is not limited to, generating a slot answer 375, which may be operatively generated from a question-to-answer transformation 355 (e.g., associating an answer to a question and associating a question to an answer based on one or more selected rules) that operates on data provided by the neural sequence transduction module 310 and an exemplary data template from the robot program template 365. Likewise, the question and answer generation module 370 may include a question 380, which may be operatively generated using data provided by the entity extraction module 350, the entity extraction module 350 may operatively receive data from the data annotation functionality 320 of the neural sequence transduction module 310. In addition, the slot questions 380 may also be generated by the answer-to-question transformation 360, the answer-to-question transformation 360 operating on data provided by the neural sequence transduction module 310.

FIG. 4 shows three exemplary screens (a, b, and c) from a user's interaction with an exemplary mobile computing application 400 (e.g., an application for utilizing the OPENTABLE restaurant reservation service from OPENTABLE), which shows the user searching for restaurants (screen a-410) and selecting a restaurant (not shown) from a list, viewing information about the selected restaurant (screen b-420), and making a reservation (screen c-430). Operationally, various intents and slots can be inferred: start a restaurant search (screen a-410), view restaurant information, view a restaurant menu, and view restaurant reviews (screen b-420) (which may be defined as a slot-restaurant name (selected in the restaurant list screen)), and reserve a restaurant (screen c-430) (take a reservation time as a slot (selected in screen b-420)).

In one illustrative operation, the "application language" may be operatively translated into the "robot programming language" in order to programmatically extract intent and slots. In an application language, a user may perform tasks via transitions with UI elements and from one page to another. In the robot program language, a user can perform tasks via a graph of fill slots and navigation intents. Operationally, to extract information from a mobile application, (1) a static analysis may be applied that examines the source code of the mobile application without performing it, and/or (2) a dynamic analysis in which it is analyzed by executing the application.

In one illustrative implementation, the interaction track may be represented as a sequence of UI events. Each UI event may be associated with a UI element on which the action is performed (e.g., clicking on a button or typing text in a text field) and a UI tree that includes the relationship, type, and content of the UI elements and their hierarchies on the screen at the time of interaction. In this illustrative implementation, one or more application tracks provided by a developer may be processed to extract a task model. The task model may consist of a set of intents, which may also include a set of slots and dependencies from other intents. A slot may have a name and a set of possible values. In one illustrative implementation, task model extraction may be achieved using trajectory aggregation, intent extraction, and slot extraction.

Figure 5 illustrates all UI events reported during an exemplary user interaction with an exemplary intent search criteria activity when using the OPENTABLE application described briefly above with respect to figure 4. Operationally, interactions 510, 525, and 530 may be used in the extraction process as described in FIG. 3. In one illustrative operation, the user interactions 505, 515, 550, and 555 can be exempt from processing because such interactions are associated with immutable content (e.g., buttons of an exemplary mobile application having static tabs that, when interacted with, cause a transition to a new page or dialog box) that does not provide generated informational data specific to the conversational bot. Further, interactions 535, 540, and 545 may be exempt from extraction operations because such interactions are associated with non-visible UI elements, which may occur when an application overlaps multiple layouts with each other. Likewise, interactions 520 may be exempt from processing because such interactions are associated with empty content. Operationally, slots can be extracted from the remaining UI elements, and appropriate intents can be assigned to such slots.

Figure 6 illustrates an exemplary restaurant reservation modeling tool 600 and exemplary models extracted from the OPENTABLE mobile application described briefly above with respect to figures 4 and 5. As shown, the nodes in graph 610 may represent system-supported intent actions with parameters specific to the intent. Illustratively, the input parameters may be necessary data elements for the intended execution. The flow (flow) starts with an intention named start of conversation (startofconversion) and ends with an intention named end of conversation (endofconversion).

In one illustrative operation, restaurants may be searched via viewing restaurant suggestions such as "nearby restaurants" or "outdoor seats" (by search result intent), or via conducting a customized search (by search criteria and search result _ 2). For each restaurant identified in the results, the user can retrieve the profile, appointment options, menus, and ratings (with corresponding intent), and then proceed to make the appointment (confirm the appointment) and submit it (confirm appointment _ complete). The slot may be represented as shown in the center of fig. 6. The customized search (search result _2) may require several slots: search query, city, party size, and time. The slot may then maintain a number of fields, names, identifiers, values, prompt questions, utterances for possible answers; additional fields may be generated by additional operations.

In this illustrative implementation, the previous operations may result in a logic flow of the robot program. Operationally, in order to navigate the logic flow using natural language, the bot needs to be able to ask questions and understand answers. In this illustrative implementation, question generation may be performed (i.e., identifying one appropriate prompt question for each slot), and answer generation may be performed (i.e., generating a large set of possible answers for each prompt so that an answer understanding model may be trained).

FIG. 7 is a flow diagram of an illustrative process 700, the process 700 being directed to the automatic creation of a task-oriented conversational bot by an exemplary computing environment, which results in enhanced processing efficiency of the exemplary computing environment. As shown, the process begins at block 705 where data representing one or more interactions with a computing environment may be received. Processing then proceeds to block 710 where the received data is partitioned according to one or more states of the computing environment.

One or more dependencies between states may be confirmed at block 715, and such dependencies are used to identify one or more service APIs used by the exemplary computing application, as described above. Processing then proceeds to block 720, where one or more questions and one or more answers may be generated from the determined dependencies. Dependency data, along with the generated question/answer data, and state data, may be used to generate input data for the service API.

Processing then proceeds to block 730 where a check is performed to determine if there is additional data requiring processing. If there is no additional interactive data, processing terminates at block 735. If additional interactive data requires processing, processing loops back to block 710 and continues therefrom.

Fig. 8 is a computer architecture diagram showing an illustrative computer hardware and software architecture for a computing device that, as described, may be operatively executed by the methods of the systems described herein. In particular, the architecture illustrated in fig. 9 may be utilized to implement a server computer, mobile phone, e-reader, smart phone, desktop computer, augmented or virtual reality (AR/VR) device, tablet computer, laptop computer, or other type of computing device.

The computer 800 illustrated in FIG. 8 includes a central processing unit 802 ("CPU"), a system memory 804, including a random access memory 806 ("RAM") and a read-only memory ("ROM"), and a system bus 810 that couples the memory area 804 to the CPU 802. A basic input/output system ("BIOS" or "firmware") containing the basic routines that help to transfer information between elements within the computer 800, such as during start-up, may be stored in ROM 808. The computer 800 also includes a mass storage device 812 for storing an operating system 822, application programs, and other types of programs. The mass storage device 812 may also be configured to store other types of programs and data.

The mass storage device 812 is connected to the CPU802 through a mass storage controller (not shown) connected to the bus 810. The mass storage device 812 and its associated computer-readable media provide non-volatile storage for the computer 800. Although the description of computer-readable media contained herein refers to a mass storage device, such as a hard disk drive, CD-ROM drive, or USB storage key, it should be appreciated by those skilled in the art that computer-readable media can be any available computer storage media or communication media that can be accessed by the computer 800.

Communication media includes computer readable instructions, data structures, program modules or other data in a modulated data signal such as a carrier wave or other transport mechanism and includes any transmission media. The term "modulated data signal" means the following signal: one or more of its characteristics are changed or set so as to encode information in the signal. By way of example, and not limitation, communication media includes wired media such as a wired network or direct-wired connection, and wireless media such as acoustic, RF, infrared and other wireless media. Combinations of any of the above should also be included within the scope of computer readable media.

By way of example, and not limitation, computer storage media may include volatile and nonvolatile, removable and non-removable media implemented in any method or technology for storage of information, such as computer readable instructions, data structures, program modules, and other data. For example, computer storage media includes, but is not limited to, RAM, ROM, EPROM, EEPROM, flash memory or other solid state memory technology, CD-ROM, digital versatile disks ("DVD"), HD-DVD, BLU-RAY, or other optical storage, magnetic cassettes, magnetic tape, magnetic disk storage or other magnetic storage devices, or any other medium which can be used to store the desired information and which can accessed by the computer 800. For the purposes of the claims, the phrase "computer storage medium" and variations thereof, does not include the wave or signal itself or the communication medium.

According to various configurations, the computer 800 may operate in a networked environment using logical connections to remote computers through a network, such as the network 820. The computer 800 may connect to the network 820 through a network interface unit 816 connected to the bus 810. It should be appreciated that the network interface unit 816 may also be utilized to connect to other types of networks and remote computer systems. The computer 800 may also include an input/output controller 818 for receiving and processing input from a number of other devices 827, including a keyboard, mouse, touch input, and electronic stylus (not shown in FIG. 8), or physical sensors, such as a video camera. Similarly, an input/output controller 818 may provide output to a display screen or other type of output device 825.

It should be appreciated that the software components described herein, when loaded into the CPU802 and executed, may transform the CPU802 and the overall computer 800 from a general-purpose computing device into a special-purpose computing device customized to facilitate the functionality presented herein. The CPU802 may be constructed from any number of transistors or other discrete circuit elements that may individually or collectively assume any number of states. More particularly, the CPU802 may operate as a finite state machine, in response to executable instructions contained within the software modules disclosed herein. These computer-executable instructions may transform the CPU802 via specifying how the CPU802 transitions between states, thereby transforming the transistors or other discrete hardware elements that make up the CPU 802.

Encoding a software module presented herein may also transform the physical structure of a computer-readable medium presented herein. In different implementations of this description, the particular physical transformation depends on various factors. Examples of such factors include, but are not limited to, the technology used to implement the computer-readable media, whether such computer-readable media have characteristics of primary or secondary storage, and the like. For example, if the computer-readable medium is implemented as semiconductor-based memory, the software disclosed herein may be encoded on the computer-readable medium via transforming the physical state of the semiconductor memory. For example, software may transform transistors, capacitors, or other discrete circuit elements that make up a semiconductor memory. Software may also transform the physical state of such components to store data thereon.

As another example, the computer readable media disclosed herein may be implemented using magnetic or optical technology. In such implementations, the software presented herein may transform the physical state of magnetic or optical media when the software is encoded therein. These transformations may also include altering the characteristics or traits of particular locations within a given optical medium to change the optical traits of those locations. Transformations of other physical media are possible without departing from the scope and spirit of the present description, with the foregoing examples being provided only to facilitate this discussion.

In view of the above, it should be appreciated that many types of physical transformations take place in the computer 800 in order to store and execute the software components presented herein. It should also be understood that the architecture illustrated in fig. 8 for computer 800, or a similar architecture, may be utilized to implement other types of computing devices, including handheld computers, video game devices, embedded computer systems (such as smart phones, tablets, and AR/VR devices), and other types of computing devices known to those skilled in the art. It is also contemplated that computer 800 may not include all of the components shown in fig. 8, may include other components not explicitly shown in fig. 8, or may utilize an architecture that is completely different from that shown in fig. 8.

Fig. 9 is a network diagram illustrating a distributed network computing environment 900 in which aspects of the disclosed technology may be implemented in accordance with various embodiments presented herein in the distributed network computing environment 900. As shown in fig. 9, one or more server computers 900 may be interconnected via a communication network 820 (which may be either a fixed line or wireless LAN, WAN, intranet, extranet, peer-to-peer network, virtual private network, internet, bluetooth communication network, proprietary low voltage communication network, or other communication network, or a combination thereof) with a number of client computing devices, including, but not limited to, tablet computer 900B, game console 900C, smart watch 900D, phone 900E (such as a smart phone), personal computer 900F, and AR/VR device 900G.

In a network environment in which the communications network is the Internet, for example, the server 900A can be a dedicated server computer operable to process and communicate data to and from the client computing devices 900B-900G via any number of known protocols, such as the Hypertext transfer protocol ("HTTP"), File transfer protocol ("FTP"), or simple object Access protocol ("SOAP"). Additionally, the networked computing environment 900 may utilize various data security protocols, such as secure sockets layer ("SSL") or good privacy ("PGP"). Each of the client computing devices 900B-900G may be equipped with an operating system operable to support one or more computing applications or terminal sessions, such as web browsers (not shown in fig. 9), or other graphical user interfaces (not shown in fig. 9), or mobile desktop environments (not shown in fig. 9) to gain access to the server computer 900A.

The server computer 900A may be communicatively coupled to other computing environments (not shown in FIG. 9) and receive data regarding the participating user's interaction/resource network. In one illustrative operation, a user (not shown in FIG. 9) may interact with computing applications running on client computing devices 900B-900G to obtain desired data and/or execute other computing applications.

Data and/or computing applications may be stored on one or more servers 900A and communicated to cooperating users through client computing devices over exemplary communications network 820. Participating users (not shown in fig. 9) may request access to particular data and applications that are hosted in whole or in part on server computer 900A. For processing and storage, such data may be communicated between client computing devices 900B-900G and server computer 900A.

Server computer 900A may host computing applications, processes, and applets for the generation, authentication, encryption, and communication of data and applications, and may implement application/data transactions with other server computing environments (not shown in fig. 9), third party service providers (not shown in fig. 9), network attached storage ("NAS"), and storage area networks ("SANs").

It should be understood that the computing architecture shown in fig. 8 and the distributed network computing environment shown in fig. 9 have been simplified for ease of discussion. It should also be understood that the computing architecture and distributed computing network may include and utilize many more computing components, devices, software programs, networked devices, and other components not specifically described herein.

Example clauses

The disclosure presented herein includes the technical solutions set forth in the following clauses.

Example clause a, a system comprising at least one processor and at least one memory having computer-readable instructions stored thereon, the instructions, when executed by the at least one processor, cause the at least one processor to: receiving first input data, the first input data including data representing one or more user interactions with one or more programmatic elements, data representing the one or more programmatic elements, and data representing a state of an application in which the one or more programmatic elements are being executed; processing the first input data to generate intermediate processed data by: segmenting the input data according to one or more selected states of the application; identifying one or more dependencies between the one or more selected application states; and extracting one or more service APIs from the application using the identified one or more dependencies; and generating a natural user interface using the intermediate process input data by: generating one or more questions for the identified one or more dependencies; generating a sample of user answers for the generated one or more questions; and generating a sample of the user utterance for triggering the extracted one or more service APIs to be executed by the application.

Example clause B, the system of claim example clause a, wherein the one or more user interactions comprise one or more trajectories of behavior of the application.

The system of example clause C, example clause a or B, wherein the data representing one or more programmatic elements comprises: data representing a user interface of an application executable on a computing environment.

Example clause D, the system of example clauses a-C, wherein the computer readable instructions further cause the at least one processor to segment the first input data according to one or more selected application states, the one or more selected application states comprising: one or more states of a user interface of an application executable on the computing environment, and a processing state representing one or more interiors of the application executable on the computing environment.

Example clause E, the system of example clauses a-D, wherein the data representing the application state in which the programmatic element is being executed comprises: a dynamic application state, a static application state, and a state of an underlying operating system of the computer environment on which the application is executed.

Example clause F, the system of example clauses a-E, wherein the computer readable instructions further cause the at least one processor to order the one or more selected application states as one or more elements of a state diagram, the state diagram including information indicative of one or more dependencies between the one or more selected application states.

The system of example clause G, example clauses a-F, wherein the computer readable instructions further cause the at least one processor to extract the one or more service APIs as one or more paths on the state diagram, the one or more paths comprising one or more selected states representing the application.

Example clause H, a computer-implemented method, comprising: receiving, by a computing environment, first input data comprising data representing one or more user interactions with one or more programmatic elements, data representing the one or more programmatic elements, and data representing a state of an application in which the one or more programmatic elements are being executed; processing the first input data to generate intermediate processed input data by: segmenting the input data according to one or more selected application states; ordering one or more selected application states as one or more elements of a state diagram; and extracting one or more service Application Programming Interfaces (APIs) as one or more paths on the state diagram, the one or more paths including data representing one or more parameters for the extracted one or more service APIs; and generating a natural user interface using the intermediate processing input data by: generating one or more questions for the identified parameters; generating a sample of one or more user responses to the generated one or more questions; and generating one or more user utterances for triggering the extracted one or more service APIs to be executed by the application.

Example clause I, the computer-implemented method of example clause H, further comprising performing a trace of an application operating on the computing environment, the trace generating data representing data processed by one or more functions of the application.

The computer-implemented method of example clauses J, example clauses H and I, wherein segmenting the first input data according to the one or more selected application states comprises: the first input data is segmented according to one or more states of a user interface of the application and one or more internal processing states of the application.

Example clause K, example clauses H through K, wherein the state diagram further includes data representing one or more dependencies between one or more selected application states.

The computer-implemented method of example clause L, example clauses H to K, further comprising generating skills from the generated input data, providing to the one or more service APIs for execution.

The computer-implemented method of example clause M, example clauses H to K, further comprising: skills are generated and provided for execution by one or more user channels (channels) that include a collaborative computing application.

The computer-implemented method of example clause N, example clauses H to M, further comprising identifying one or more service APIs from the received input data.

Example clause O, the computer-implemented method of example clauses H to N, further comprising training the computing environment to: the method further includes generating one or more questions for the identified parameters and generating a sample of one or more user answers to the generated one or more questions.

Example clause P, a computer-readable storage medium having stored thereon computer-executable instructions that, when executed by one or more processors of a computing device, cause the one or more processors of the computing device to: receiving first input data, the first input data including data representing one or more user interactions with a programmatic element, data representing the programmatic element, and data representing a state of an application in which the programmatic element is being executed; processing the first input data to generate intermediate processed input data by: segmenting the first input data according to one or more selected states; identifying one or more dependencies between the one or more selected application states; and extracting one or more service APIs from the application using the identified one or more dependencies; and generating a natural user interface using the intermediate process input data by: generating one or more questions for the identified one or more dependencies; generating a sample of user answers for the generated one or more questions; and generating a sample of the one or more user utterances for execution by the application of the extracted one or more service APIs.

Example clause Q, the computer-readable storage medium of example clause P, wherein the instructions further cause the one or more processors of the computing device to: the trace generates data representing data processed by one or more functions of the application.

The computer-readable storage medium of example clauses R, example clauses P, and Q, wherein the instructions further cause the one or more processors of the computing device to: one or more selected application states are ordered as one or more elements of a state diagram that includes data representing one or more dependencies between the one or more selected application states.

Example clause S, the computer-readable storage medium of example clauses P to R, wherein the instructions further cause the one or more processors of the computing device to: one or more service APIs are extracted as one or more paths on a state diagram, the one or more paths including data representing one or more states of an application.

Example clause T, the computer-readable storage medium of example clauses P to S, wherein the instructions further cause the one or more processors of the computing device to: one or more service APIs are defined from the received input data.

Conclusion

Finally, although the technical solutions have been described in language specific to structural features and/or methodological acts, it is to be understood that the technical solutions defined in the claims are not necessarily limited to the specific features or acts described above. Rather, the specific features and acts described above are disclosed as example forms of implementing the claims.

24页详细技术资料下载

Automatically generating conversational services from computing applications

相关技术

网友询问留言