Global to local memory pointer network for task oriented dialog

文档序号:621310 发布日期:2021-05-07 浏览:10次 中文

阅读说明:本技术 用于面向任务的对话的全局到本地存储器指针网络 (Global to local memory pointer network for task oriented dialog ) 是由 C-S·吴 C·熊 R·佐赫尔 于 2019-09-25 设计创作,主要内容包括:提供了一种用于为用户和计算机之间的对话生成应答的系统和相应的方法。系统包括存储对话历史和知识库的信息的存储器。编码器可以接收来自用户的新话语并且生成用于过滤存储器中的知识库信息的全局存储器指针。解码器可以为新话语生成至少一个本地存储器指针和草稿应答。草稿应答包括要由来自存储器的知识库信息替换的至少一个草稿标签。系统使用本地存储器指针,从过滤的知识库信息中选择词语来替换草稿应答中的至少一个草稿标签,从而生成对话计算机应答。(A system and corresponding method for generating a response for a dialog between a user and a computer are provided. The system includes a memory that stores information of the conversation history and knowledge base. The encoder may receive a new utterance from a user and generate a global memory pointer for filtering the knowledge base information in the memory. The decoder may generate at least one local memory pointer and a draft response for the new utterance. The draft answer includes at least one draft label to be replaced by the knowledge base information from the memory. The system uses the local memory pointer to select a word from the filtered knowledge base information to replace at least one draft label in the draft response, thereby generating a conversational computer response.)

1. A method for generating a response for a dialog between a user and a computer, the method comprising:

storing in a memory a conversation history and a knowledge base, wherein the conversation history comprises information of a sequence of user utterances and computer responses exchanged during the conversation, wherein the knowledge base comprises information that can be used in conversing computer responses;

receiving, at the computer, a new utterance from the user;

generating a global memory pointer based on the new utterance;

generating a draft response for the new utterance, the draft response including at least one draft label to be replaced by knowledge base information from the memory;

filtering the knowledge base information in the memory using the global memory pointer;

generating at least one local memory pointer; and

selecting a word from the filtered knowledge base information using the local memory pointer to replace the at least one draft label in the draft response, thereby generating the dialogue computer response.

2. The method of claim 1, wherein the conversation history includes a set of embedded matrices for the conversation history information.

3. The method of claim 1 or 2, wherein the knowledge base comprises a set of embedded matrices for the knowledge base information.

4. The method of any of claims 1-3, wherein generating the global memory pointer comprises:

encoding the new utterance to generate one or more hidden states; and

querying the knowledge base information in the memory using the one or more hidden states.

5. The method of any of claims 1-4, wherein the global memory pointer comprises a vector having a plurality of elements, each element associated with an independent probability.

6. The method of any of claims 1-5, wherein the local memory pointers comprise a sequence of pointers, each pointer for selecting a respective term from the filtered knowledge base information to replace a respective draft label in the draft answer.

7. A non-transitory machine-readable medium comprising executable code that, when executed by one or more processors associated with a computer, is adapted to cause the one or more processors to perform a method comprising:

storing in a memory a conversation history and a knowledge base, wherein the conversation history comprises information of a sequence of user utterances and computer responses exchanged during the conversation, wherein the knowledge base comprises information that can be used in conversing computer responses;

receiving, at the computer, a new utterance from a user;

generating a global memory pointer based on the new utterance;

generating a draft response for the new utterance, the draft response including at least one draft label to be replaced by knowledge base information from the memory;

filtering the knowledge base information in the memory using the global memory pointer;

generating at least one local memory pointer; and

selecting a word from the filtered knowledge base information using the local memory pointer to replace the at least one draft label in the draft response, thereby generating the dialogue computer response.

8. The non-transitory machine-readable medium of claim 7, wherein the conversation history includes a set of embedded matrices for the conversation history information.

9. The non-transitory machine-readable medium of claim 7 or 8, wherein the knowledge base comprises a set of embedded matrices for the knowledge base information.

10. The non-transitory machine readable medium of any of claims 7-9, wherein generating the global memory pointer comprises:

encoding the new utterance to generate one or more hidden states; and

querying the knowledge base information in the memory using the one or more hidden states.

11. The non-transitory machine readable medium of any of claims 7-10, wherein the global memory pointer comprises a vector having a plurality of elements, each element associated with an independent probability.

12. The non-transitory machine readable medium of any of claims 7-11, wherein the local memory pointers comprise a sequence of pointers, each pointer to select a respective term from the filtered knowledge base information to replace a respective draft label in the draft answer.

13. A system for generating a response for a dialog between a user and a computer, the system comprising:

a memory storing a conversation history and a knowledge base, wherein the conversation history comprises information of a sequence of user utterances and computer responses exchanged during the conversation, wherein the knowledge base comprises information that can be used in conversing computer responses;

an encoder capable of receiving a new utterance from the user and generating a global memory pointer based on the new utterance, wherein the global memory pointer is used to filter the knowledge base information in the memory; and

a decoder capable of generating at least one local memory pointer and a draft response for the new utterance, the draft response including at least one draft tag to be replaced by knowledge base information from the memory;

wherein the system uses the local memory pointer to select a term from the filtered knowledge base information to replace the at least one draft label in the draft response to generate the dialogue computer response.

14. The system of claim 13, wherein the conversation history includes a set of embedded matrices for the conversation history information.

15. The system of claim 13 or 14, wherein the knowledge base comprises a set of embedded matrices for the knowledge base information.

16. The system of any of claims 13-15, wherein the global memory pointer comprises a vector having a plurality of elements, each element associated with an independent probability.

17. The system of any of claims 13-16, wherein the local memory pointers comprise a sequence of pointers, each pointer for selecting a respective term from the filtered knowledge base information to replace a respective draft label in the draft answer.

18. The system of any of claims 13-17, wherein the memory comprises an end-to-end storage network.

19. The system of any one of claims 13-18, wherein the encoder comprises a context-recursive neural network.

20. The system of any of claims 13-19, wherein the decoder comprises a draft recurrent neural network.

Technical Field

The present application relates generally to dialog systems and more particularly to using a global to local (global to local) memory pointer network for task-oriented dialog (task-oriented dialog).

Background

Task oriented dialog systems have been developed to achieve specific user goals, such as reserving restaurants, finding places of interest, assisting navigation or driving directions, and the like. Typically, user queries to these dialog systems are limited to a relatively small set of dialog words or utterances that are input or provided in natural language. Traditional task-oriented dialog solutions are implemented using techniques for natural language understanding, dialog management, and natural language generation, where each module is customized individually and at some cost for a particular purpose or task.

Drawings

FIG. 1 is a simplified diagram of a computing device according to some embodiments.

FIG. 2 is a simplified diagram of a global local memory pointer model or network according to some embodiments.

FIG. 3 is a simplified diagram of an external knowledge store according to some embodiments.

Fig. 4 is a simplified diagram of an encoder for encoding dialog histories and knowledge bases according to some embodiments.

Fig. 5 is a simplified diagram of a decoder according to some embodiments.

FIG. 6 illustrates an example table of a knowledge base and conversation history in accordance with some embodiments.

FIG. 7 is a simplified diagram of a method for a global local memory pointer model or network, according to some embodiments.

Fig. 8 and 9 show example tables comparing a global local memory pointer model or network to a baseline.

FIG. 10 illustrates an example table showing the contribution of a global local memory pointer model or network to performance.

Fig. 11 is an example of memory attention visualization in the SMD navigation domain.

Fig. 12 shows an example table (hops) showing the hyper-parameters selected for different hops.

FIG. 13 illustrates an example graph showing scores for human similarity and appropriateness of a global local memory pointer model or network compared to other baselines.

In the drawings, elements having the same reference number have the same or similar function.

Detailed Description

The description and drawings illustrating aspects, examples, embodiments or applications should not be taken as limiting. The claims define the invention to be protected. Various mechanical, compositional, structural, electrical, and operational changes may be made without departing from the spirit and scope of the description and claims. In some instances, well-known circuits, structures or techniques have not been shown or described in detail as they would be known to one skilled in the art. The same numbers in two or more drawings identify the same or similar elements.

In the description, specific details are set forth describing some embodiments according to the application. Numerous specific details are set forth in order to provide a thorough understanding of the embodiments. It will be apparent, however, to one skilled in the art, that some embodiments may be practiced without some or all of these specific details. The specific embodiments disclosed herein are intended to be illustrative rather than restrictive. Those skilled in the art will recognize that, although not specifically described herein, other elements are also within the scope and spirit of the present application. Furthermore, to avoid unnecessary repetition, one or more features illustrated and described in connection with one embodiment may be incorporated into other embodiments unless specifically described otherwise or such one or more features would render the embodiments inoperative.

To reduce the manpower required for developing a dialog system and to expand between fields or applications of a dialog system, an end-to-end dialog system has been developed that inputs plaintext and directly outputs a system response. However, a common disadvantage of these end-to-end dialog systems is that they do not efficiently incorporate an external Knowledge Base (KB) into the system answer generation. One reason for this is that a large dynamic knowledge base can be a bulky and noisy input that will destabilize the generation or output of a response. Unlike the chatty (chat-chat) scenario, the problem can be particularly challenging or detrimental to use in a task-oriented dialog system, as the information in the knowledge base is often expected to include the correct or appropriate entity in the response. For example, for a dialogue system implementing a car driving assistant, the knowledge base may include information as shown in the example table 610 shown in FIG. 6. In an example dialog of a user interacting with the system, the user/driver may make a query for gasoline (e.g., "i need gasoline") as shown in table 620. The system accessing the knowledge base of table 620 may identify "Waliro (Valero)" as "gas station". But in response to a subsequent query from the driver "what is the address? ", the system may identify a number of possibilities-" Van Ness Ave 580 (580Van Ness Ave) "," Van Ness Ave 394 "," Arrowhead 842 (842Arrowhead Way) "," Oster street 200 (200Alester Ave) ", and so on. The driver would expect the system to provide an address to a petrol station (e.g. walelo) rather than using the address for a friend's premises (e.g. tom house) or a coffee and tea premises (e.g. Coupa) or some other random premises.

To address the problem, according to some embodiments, the present application provides a Global Local Memory Pointer (GLMP) network or model for generating responses in a task oriented dialog system. The GLMP network or model includes a global memory encoder, a local memory decoder, and an external knowledge store. GLMP shares extrinsic knowledge between the encoder and decoder, and learns global memory pointers using the encoder and extrinsic knowledge. It is then propagated to the decoder and modifies the external knowledge, filtering words that are not necessary for copying into the response. Thereafter, the local memory decoder first uses a Recurrent Neural Network (RNN) to obtain a draft response with a draft (sketch) tag, rather than generating the system response directly. The tagged draft answer operates as, or can be considered to learn, the underlying dialog management to generate a template for the dialog action. The decoder then generates a local memory pointer to copy the word from the external knowledge storage to replace the draft label.

Computing device

FIG. 1 is a simplified diagram of a computing device 100 according to some embodiments. As shown in fig. 1, computing device 100 includes a processor 110 coupled to a memory 120. The operation of computing device 100 is controlled by processor 110. Although the computing device 100 is shown with only one processor 110, it should be understood that the processor 110 may represent one or more central processing units, multi-core processors, microprocessors, microcontrollers, digital signal processors, Field Programmable Gate Arrays (FPGAs), Application Specific Integrated Circuits (ASICs), Graphics Processing Units (GPUs), Tensor Processing Units (TPUs), and/or the like in the computing device 100. Computing device 100 may be implemented as a stand-alone subsystem, a board added to a computing device, and/or a virtual machine.

Memory 120 may be used to store software executed by computing device 100 and/or one or more data structures used during operation of computing device 100. Memory 120 may include one or more types of machine-readable media. Some common forms of machine-readable media may comprise a floppy disk, a flexible disk, hard disk, magnetic tape, any other magnetic medium, a CD-ROM, any other optical medium, punch cards, paper tape, any other physical medium with patterns of holes, a RAM, a PROM, an EPROM, a FLASH-EPROM, any other memory chip or cartridge, and/or any other medium from which a processor or computer is adapted to read.

The processor 110 and/or memory 120 may be arranged in any suitable physical arrangement. In some embodiments, the processor 110 and/or the memory 120 may be implemented on the same board, in the same package (e.g., a system in package), on the same chip (e.g., a system on a chip), and/or the like. In some embodiments, processor 110 and/or memory 120 may include distributed, virtualized, and/or containerized computing resources. According to these embodiments, processor 110 and/or memory 120 may be located in one or more data centers and/or cloud computing facilities.

As shown, the memory 120 includes a global local memory pointer module 130. Global local memory pointer module 130 can be used to implement and/or generate global local memory pointers for response generation in task-oriented dialogs of the systems, methods, and models described further herein. In some examples, the global local memory pointer module 130 may be used or incorporated in a dialog system through which one or more users may interact with a machine (e.g., a computer). Each dialog may include the exchange of information, questions, queries, responses between the user and the machine. This series of exchanges constitutes the history of the conversation. For a given conversation, the global local memory pointer module 130 receives a user utterance or speech 150 and generates an appropriate response 160 for it. To accomplish this, as described in more detail below, the global local memory pointer module 130 generates global and local pointers for information or data in a knowledge base from which a response may be generated or created. The global local memory pointer module 130 may also receive one or more knowledge bases 155.

In some examples, global local memory pointer module 130 may include a single-layer or multi-layer neural network with appropriate preprocessing, encoding, decoding, and output layers. The neural network has good application prospect as a technology for automatically analyzing real world information with human-like accuracy. Generally, neural network models receive input information and make predictions based on the input information. For example, a neural network classifier may predict a class of input information in a predetermined set of classes. While other methods of analyzing real world information may involve hard-coded processes, statistical analysis, and/or the like, neural networks learn to predict step-by-step through a process of trial and error using machine learning processes. A given neural network model may be trained using a large number of training examples, iteratively proceeding until the neural network model begins to make similar inferences that a human may make consistent with the training examples. In some examples, global local memory pointer module 130 may include a storage network for storing a knowledge base and a history of current conversations, among other things. Although depicted as a software module, global local memory pointer module 130 may be implemented using hardware, software, and/or a combination of hardware and software.

Although FIG. 1 is a high-level diagram, FIGS. 2-5 illustrate more details of a global local memory pointer model or network, according to some embodiments. Fig. 7 illustrates a corresponding method 700 for a global local memory pointer model or network.

Global local memory pointer model

FIG. 2 is a simplified diagram of a global local memory pointer model or network 200 for a task-oriented dialog system according to some embodiments. In some embodiments, the global local memory pointer model or network 200 may implement the global local memory pointer module 130 of FIG. 1.

In some embodiments, as shown, the model may include or include a global memory encoder 210, a local memory decoder 220, and a shared external knowledge memory 230. In some embodiments, one or both of the encoder 210 and decoder 220 include one or more Recurrent Neural Networks (RNNs).

The global local memory pointer model 200 receives as input one or more Knowledge Bases (KB) and information for a current conversation, such as an exchange between a user and a system. The knowledge base includes information or data that may be relevant to generating a response to a query or utterance of a user associated with the conversation. The information may include, for example, names of people, places, or points of interest (pois), types of each poi, addresses or contact information for each poi, and so on. An example of the information of the knowledge base is shown in table 610 of FIG. 6. The dialog information may include a history of utterances and responses exchanged between the user and the system for the current dialog. An example of this dialog history information is shown in table 620 of FIG. 6. The input words or utterances from the dialog history and knowledge base may be treated as a sequence of elements X ═ (X), respectivelyl,……,xn) And B ═ B1,……,bl). In some embodiments, as shown in FIG. 2, external knowledge store 230 receives and stores one or more Knowledge Bases (KB) in knowledge base store 232 and information for the current conversation (e.g., exchanges between the user and the system) in conversation store 234. The output of the model 200 is Y ═ Y (Y)l,……,ym) Which is the desired system response to the current user utterance in the dialog.

Global memory encoder 210 may receive one or more utterances spoken by a user during a conversation with a computing device (process 720 of fig. 7). According to some embodiments, global memory encoder 210 uses contextual RNNs to encode dialog histories and write their hidden states into external knowledge memory 230. The last hidden state is then used to read the external knowledge and generate a global memory pointer G. During the decoding phase, the local memory decoder 220 first generates a draft answer using the draft RNN. Draft responses do not copy information from the external knowledge base; rather, it operates or serves as a template for system responses with draft tags for items to be copied or obtained from an external repository for further processing. Examples of draft responses may be: "@ poi is @ distance", where @ poi and @ distance are the draft labels to be filled in with the point of interest (poi) and the distance, respectively. The global memory pointer G and draft RNN hidden states are passed to the external knowledge store 230 as filters and queries. Based on the input, a local memory pointer L is returned from external knowledge store 230. The local memory pointer L is used to copy the plaintext (e.g. "varelo", "4 miles") from external knowledge to replace the draft label (e.g. @ poi, @ distance) to obtain the final system answer, e.g. "varelo is 4 miles away". The computing device may output a response to the user in response to the utterance of the user.

External knowledge storage

Fig. 3 is a simplified diagram of an external knowledge store 300 according to some embodiments. The external knowledge store 300 stores the conversation history and knowledge base (process 710 of fig. 7). In some embodiments, the external knowledge store 300 may implement the external knowledge store 230 of the neural model 200 of fig. 2. The external knowledge memory 300 includes a KB memory module 332 and a dialogue memory module 334, which in some embodiments may implement the memories 232 and 234, respectively, of the neural model 200 of fig. 2.

In some embodiments, the external knowledge in memory 300 includes a global context representation shared with an encoder (e.g., 210) and a decoder (e.g., 220) of a global local memory pointer model (e.g., 200). To incorporate external knowledge into the learning framework, in some embodiments, the external knowledge store 300 may be implemented using a peer-to-peer storage network (MN) that stores a structure KB and word-level information of time-dependent conversation histories. As shown, this may include KB memory and dialogue memory. In addition, end-to-end storage networks (MNs) provide, support, or enable multi-hop inference capabilities, which may enhance replication mechanisms.

Global context representation

In some embodiments, in KB memoryIn block 332, each element biE B is represented in a triple format as a (subject, relationship, object) structure, which is a common format for representing KB nodes. For example, the knowledge base B in the table 610 of FIG. 6 would be represented as { (Tom's, distance, 3 miles), … …, (Starbucks, Address, 792Bedoin St) }. Dialog context X, on the other hand, is stored in dialog memory module 334, where speaker and time coding is included like a triplet format, as described in further detail in Board et al, "Learning end-to-end coarse-oriented dialog," International Conference on Learning retrieval, abs/1605.07683,2017, which is incorporated herein by reference. For example, the first utterance from the driver in table 620 in FIG. 6 will be represented as { ($ user, turn1, I), ($ user, turn1, need), ($ user, turn1, gas) }. For both memory modules, the bag of words representation is used as a memory embedding. During reasoning, once a memory location is pointed to, the target word is replicated; for example, if a triple (tom, distance, 3 miles) is selected, 3 miles will be replicated. The target (.) function is represented as obtaining target words from triples.

Knowledge reading and writing

In some embodiments, the external knowledge includes a set of trainable embedding matrices C ═ C (C ═ C)1,...,CK+1) Where Ck ∈ R | V | xdembK is the maximum memory hop in the end-to-end storage network (MN), | V | is the vocabulary size, and demb is the embedding dimension. The memory in the external knowledge is denoted as M ═ B; x]=(m1,...,mn+l) Wherein m isiIs one of the triplet components. To read memory, external knowledge uses an initial query vector q1. In addition, it may cycle through K hops and calculate the attention weight at each hop K using,

whereinIs to use an embedded matrix CkEmbedding in the ith memory location, qkIs the query vector for hop k, and B (.) is the bag of words function. Note that pk∈ Rn+1Is the soft memory attention that determines the memory relevance with respect to the query vector. Then, the model passes through ck+1Weighted sum read-out memory okAnd updating the query vector qk+1. In the form of a sheet, the sheet is,

global memory encoder

Fig. 4 is a simplified diagram of an encoder 400 for encoding dialog history X and knowledge base B according to some embodiments. Encoder 400 may receive one or more utterances uttered by a user during a conversation with a computing device (process 720 of fig. 7). In some embodiments, the encoder 400 may implement the encoder 210 of the neural model 200 of fig. 2.

In some embodiments, the encoder 400 may be implemented as a context-Recursive Neural Network (RNN). The context RNN is used to model order dependencies and to encode a context or dialog history X. The hidden state H is then written to external knowledge or memory (e.g., 230 or 300 shown in fig. 2 and 3). The last encoder hidden state then reads the external knowledge as a query and generates or gets two outputs, a global memory pointer G and a memory readout.

Intuitively, writing hidden states to external knowledge can provide sequential and contextualized information and can also alleviate common out-of-vocabulary (OOV) challenges, since end-to-end storage network (MN) architectures have difficulty modeling dependencies between memories, which can be a drawback especially in conversation-related tasks. In addition, using the encoded dialog context as a query may encourage an external knowledge store (e.g., 230 or 300) to read information related to hidden dialog states or user intentions. In addition, global memory pointers that learn the global memory distribution are passed to the decoder along with the encoded dialog history and encoded Knowledge Base (KB) information.

Context RNN

In some embodiments, the context RNN of the encoder 400 may include or be implemented with a plurality of encoding elements 402, which encoding elements 402, separately or together, may include one or more bidirectional gated cycle units (GRUs) (e.g. as described in Chung et al 2014, which is incorporated herein by reference). Each encoding element 402 may operate on words or text of the context or dialog history X to generate a hidden stateThe last hidden state is used to query the external knowledge store as the encoded dialog history. Further, the hidden state H is written into the dialog memory module 334 in the external knowledge 300 by adding the original stored representation to the corresponding hidden state. In the formula (I), the compound is shown in the specification,

if m isiIs e.X and

global memory pointer

Encoder 400 generates global memory pointer G (process 730 of fig. 7). In some embodiments, global memory pointer G ═ (G)1,...,gn+1) Including vectors containing real values between 0 and 1. Unlike the conventional attention mechanism where all weights sum to 1, each element in global memory pointer G can be an independent probability. The model 200 is first usedUntil the last hop to query the external knowledge 300 and the model performs an inner product followed by a Sigmoid function, rather than applying the Softmax function as in (1). The memory distribution obtained is the global memory pointer G, which is passed to the decoder. In order to further strengthen the global pointing capability, an auxiliary wear training global memory pointer is added to serve as a multi-label classification task. Adding such additional supervision may improve performance as shown in ablation studies (ablation study). Finally, the memory is read out qK+1Used as the encoded KB information.

In the auxiliary task, tags are defined by checking whether the target words in memory are present in the expected system response YThen using G and GlabelLoss of binary cross entropy between Loss of LossgA global memory pointer is trained. In the formula (I), the compound is shown in the specification,

in some embodiments, as explained in more detail below, the global memory pointer is used to filter information from the knowledge base module (232 or 332) of the memory for use in generating an appropriate dialog response to the user utterance.

Local memory decoder

Fig. 5 is a simplified diagram of a decoder 500 according to some embodiments. In some embodiments, decoder 500 may implement decoder 220 of neural model 200 of fig. 2. In some embodiments, the decoder 500 is implemented as a Recurrent Neural Network (RNN).

In some embodiments, the RNN of the decoder 500 generates a template or draft for a computer response to the user utterance. The draft answer may include groups of elements. Some of these elements of the draft response will appear in the actual dialog response output from the computing device 100. Other of these elements, which may be referred to as draft tags, will be replaced by words from the knowledge base in the actual dialog response. An example of a draft answer is "@ poi @ far", where @ poi and @ distance are each draft tags. In a computer dialog response, these draft tags may be replaced with the words "starbucks" and "1 mile" from the knowledge storage (e.g., 232 or 332), respectively, so that the response is actually output as "starbucks 1 mile away".

Local memory decoder 500 uses encoded dialog historyCoded KB information qK+1And global memory pointer G, first using dialogue historyAnd coded KB information qK+1Initializes its draft RNN and generates a draft response that excludes slot values (slot values) but includes a draft label. At each decoding time step, the hidden state of the draft RNN serves two purposes: (1) predicting a next token in the vocabulary, which may be identical to the standard sequence-to-sequence (S2S) learning; (2) the external knowledge is queried as a vector. If a draft label is generated, the global memory pointer G is passed to the external knowledge 300 and the desired output word will be picked from the local memory pointer L. Otherwise, the output words are words generated by the draft RNN. For example, in FIG. 5, the poi marker (@ poi) is generated at the first time step, so the word "Starbucks" is extracted from the local memory pointer L as the system output word.

Draft RNN

Decoder 500 generates a draft answer (process 740 in fig. 7). In some embodiments, the draft RNN of decoder 500 may include or be implemented with multiple elements 502, which elements 502 may individually or together include one or more bidirectional gated loop units (GRUs). In some embodiments, draft RNN is used to generate draft responses without real slot valuesDraft RNN learning based on coded dialogsAnd KB information (q)K+1) A dynamic dialog action template is generated. Draft RNN hidden State at each decoding time step tAnd its output distributionIs defined as

Draft RNN is trained using standard cross entropy Loss, and Loss will bevIs defined as

Based on the provided entity table, the slot values in Y are replaced with draft labels. The draft label ST is all possible slot types starting with a special token, e.g. the @ address represents all addresses and the @ distance represents all distance information.

Local memory pointer

Decoder 500 generates one or more local memory pointers L (process 760 in fig. 7). In some embodiments, the local memory pointer L ═ L (L)1,...,Lm) Including a sequence of pointers. The global memory pointer G filters the knowledge base information of the external knowledge store 300 (process 750 of fig. 7). At each time step t, the global memory pointer G first uses its attention weight to modify the global context representation,

and is

Then draft RNN hidden StateThe external knowledge 300 is queried. The memory attention in the last hop is the corresponding local memory pointer LtIt is represented as a memory distribution at time step t. To train the local memory pointers, a supervision of the memory attention of the last jump is added to the external knowledge. Local memory pointer L to decode time step tlabelThe position tag of (1) is defined as

Position n +1+1 is an empty token in memory that allows the model to compute the loss function even if yt is not present in the external knowledge. Then, L and L are addedlabelThe loss between is defined as

Further, using the record R ∈ Rn+1To prevent multiple copies of the same entity. All elements in R are initialized to 1 at the beginning. The global local memory pointer model or network generates a dialogue computer response Y for the current user utterance (process 770 of fig. 7). During the decoding phase, if a memory location has been pointed to, its corresponding location in R will be attenuated by the learned scalar R. That is, if the corresponding token has been copied, the global context representation is soft-masked. During the inference time, define yt as

Wherein |, is an elemental multiplication. Finally, all parameters are jointly trained by minimizing the sum of the three losses:

Loss=Lossg+Lossv+Lossl (11)。

data set

In some embodiments, two common task-oriented multi-turn dialog datasets (public multi-turn task-oriented dialog datasets) may be used to evaluate the model: bAbI dialogs (as described In detail In doors et al, "Learning end-to-end high-oriented dialog", International Conference on Learning, abs/1605.07683,2017, which is incorporated herein by reference) and Stanford Multi-domain dialogs (SMD) (as described In Eric et al, "A copy-augmented sequence-to-sequence architecture fields good performance on task-oriented dialog", In Proceedings of the 15th Conference of the European channel of the Association for the computerized linearity: Volume 2, Short Paper, pp.468, Valencia, Spain (more particularly In 2014), which is incorporated herein by reference). The bAbI dialog includes five simulation tasks in the restaurant domain. Tasks 1 through 4 are related to calling API calls, modifying API calls, recommending options, and providing additional information, respectively. Task 5 is a federation of tasks 1-4. There are two test sets per task: one following the same distribution as the training set and the other having OOV entity values. SMDs, on the other hand, are human-to-human multi-domain dialogue datasets. It has three distinct areas: calendar scheduling, weather information retrieval and point of interest navigation. The key difference between these two data sets is that the former has a longer conversation wheel but regular user and system behavior, the latter has fewer conversation wheels but different answers and the KB information is more complex.

Results

A bAbI dialogue. The table of fig. 8 is an example of an evaluation according to a bAbI dialog. Based on the per-response accuracy and task completion rate (in parentheses) of various tasks (e.g., T1, T2, T3, T4, T5) on the bAbI dialog, the table compares the performance of the Global Local Memory Pointer (GLMP) model or network to baselines of: QRN (see Seo et al, "Query-reduction networks for Query analysis", International Conference on Learning retrieval, 2017, incorporated herein by reference), MN (see Bordes et al, "Learning end-to-end high-orientation dialog", International Conference on Learning retrieval, abs/l605.07683, 2017, incorporated herein by reference), GMN (see Liu et al, "Gated end-to-end memory networks", In Proceedings of the 15th Conference of the European header of the Query analysis for compatibility, Volume 1, Long pages 1, 10, incorporated herein by reference), and Val/7 (see Ucle et al, incorporated herein by reference), GMN (see Liu et al, City et al, "gateway end-to-end memory networks", In Proceedings of the 15th Conference of the European header of the Query analysis for compatibility, Volume 1, Long pages 1, 10, 9 th Conference for compatibility analysis, Volume management, V.7, U.7, incorporated herein by reference, U.7, U., "associating the unknown words," In Proceedings of the 54th environmental requirements of the Association for the computerized Linear constraints (Volume 1, Long Papers), pp.140-149, Berlin, Germany (2016. 8.8), Association for the computerized Linear constraints, http:// www.aclweb.org/antisense/P16-1014, which is incorporated herein by reference), and Mem2Seq (Madotto et al, "Mem 2Seq: effective interacting cementitious bands In-end-to-end task-oriented protocols systems", In Proceedings of the same of the International protocols of the scientific constraints for the compatibility, URL.8, 1468, which is incorporated herein by reference). Note that utterance retrieval methods (e.g., QRN, MN, and GMN) cannot correctly recommend options (T3) and provide additional information (T4), and poor generalization capability is observed in OOV settings, which has a performance difference of about 30% in T5. While previous generation-based approaches have mitigated the gap by merging replication mechanisms, the simplest cases (such as generating and modifying API calls (T1, T2) still face OOV performance degradation of 6-17%. on the other hand, the GLMP model or network of the present application achieves the highest task completion rate of 90.5% in full-dialog tasks, and in particular exceeds other baselines with significant margins in OOV settings.

Stanford Multi-Domain dialogue (SMD). The table of fig. 9 is an example of an evaluation of a human-human dialog scenario according to SMD. This follows the previous dialogue work to evaluate the GLMP model on two automated evaluation metrics, BLEU and entity Fl score 2. As shown in the first table of fig. 9, GLMP achieved the highest 14.12BLEU and 55.38% of the entity F1 score, which is a slight improvement in BLEU, but a huge benefit was achieved in entity F1. In fact, for unsupervised assessment metrics in a task-oriented dialog, entity F1 may be a more comprehensive assessment metric than per-response accuracy or BLEU, with humans being able to select the correct entity, but responses are very diverse. Note that the results of the rule-based and KVR are not directly comparable, as they simplify the task by mapping the expression of the entity to a canonical form using named entity identification and link 3.

Further, as shown in the second table of fig. 9, manual evaluation of the generated responses is reported. The GLMP model was compared to the previous existing model Mem2Seq and the original data set was responded to. 200 different dialog scenarios were randomly selected from the test set to evaluate three different responses. System appropriateness and human similarity were evaluated on a scale of 1 to 5 using Amazon Mechanical maps (Amazon Mechanical turn). As a result, shown in the second table of fig. 9, the GLMP model outperforms Mem2Seq in both measurements, consistent with previous observations. As expected, humans set an upper limit on the score for the performance of the assessment.

Thus, in the SMD dataset, the GLMP model achieves the highest BLEU score and entity Fl score above baseline, including previous prior art results.

And (4) ablation research. The contributions of the memory writes of global memory pointer G and session history H, which are the results of an ablation study using a single-hop model, are shown in the table of fig. 10. The results of using GLMP with K ═ 1 in the bAbI OOV setting were compared to SMD. GLMP without H means that the context RNN in the global memory encoder does not write hidden states to external knowledge. As shown in the table, the GLMP model without H lost 5.5% more in the complete dialogue task. On the other hand, the GLMP model without G (meaning that the global memory pointer is not used to modify external knowledge) results in a drop of 8.29% of the entity Fl in the SMD dataset. Note that a 1.8% increase can be observed in task 5, thus suggesting that the use of global memory pointer G may impose a prior probability of error before decoding in the OOV setting. However, in most cases our global memory pointer can still improve performance.

Visualization and qualitative assessment. Attention weight analysis has been frequently used to interpret deep learning models. Fig. 11 is an example of memory attention visualization in the SMD navigation domain. Fig. 11 shows the attention vector in the last jump for each generation time step. The Y-axis is external knowledge that can be copied, including KB information and dialog history. Based on the question "what is the address? ", Gold answer and generated answer are at the top, and a global memory pointer, G, is shown in the left column. It can be observed that in the right column, the final memory pointer successfully copies the entity chevron (entry chevron) in step 0 and its address 783Arcadia Pl in step 3 to fill in the draft utterance. On the other hand, no globally weighted memory attention is reported in the middle column. It can be seen that even if the attention weights are concentrated on several points of interest and addresses in step 0 and step 3, the global memory pointer can alleviate the problem as desired.

Details of training

According to some embodiments, the model of the present application is trained end-to-end using an Adam optimizer (Kingma et al, "a method for storing optimization," International Conference on Learning responses, 2015, incorporated herein by reference), and the Learning rate declines from le-3To le-4. The number of hops K was set to 1, 3, 6 to compare the performance differences. All the embeddings are initialized randomly, and a simple greedy strategy is adopted in the decoding stage, so that beam searching is not needed. On the development set (per response accuracy of the bAbI dialogue and BLEU score of SMD), the hidden size and the dropout rate, etc. superparameters are adjusted by grid search. Furthermore, to increase the generalization of the model and simulate OOV settings, a small number of input sources are usedThe token is randomly masked into an unknown token. The model is implemented in PyTorch and the hyper-parameters for each task T1, T2, T3, T4, T5 are listed in the table of fig. 12. The following table shows the hyper-parameters selected for different hops in each dataset. The values are the embedding dimension and the GRU concealment size, and the values between brackets are the respective loss rates. For all models, a learning rate equal to 0.001 and a decay rate of 0.5 was used.

Human assessment

The output of the GLMP model and Mem2Seq was compared to human assessments for appropriateness and human similarity (naturalness). The appropriateness was rated from 1 to 5 as follows:

5: provides correct grammar, correct logic, correct dialogue flow and correct entity

4: in the provided entity there is the correct dialog flow, logic and syntax, but slightly wrong

3: provide significant errors with respect to syntax or logic or entities, but are acceptable

2: provide poor syntax, logic and entities

1: wrong grammars, wrong logic, wrong dialog flows and wrong entities are provided. The level of human similarity (naturalness) is rated from 1 to 5 as follows:

5: 100% of the speech is what the person would say

4: 75% of the speech is what the person would say

3: say 50% of the words

2: 25% of the speech is what the person would say

1: say 0% is what the person would say

The chart in fig. 13 shows the appropriateness and human similarity scores according to 200 dialog scenarios.

Accordingly, an end-to-end trainable model for task oriented dialog using a global to local memory pointer network is disclosed herein. The global memory encoder and the local memory decoder are designed to incorporate shared external knowledge into the learning framework. Experience has shown that global and local memory pointers can efficiently generate system responses even in out-of-vocabulary (OOV) situations, and visualize how the global memory pointers help. As a result, the model achieves the results of the prior art in both simulation and human-to-human dialogue datasets, and has the potential to be extended to other tasks such as question-answering and text summarization.

The description and drawings illustrating aspects, examples, embodiments or applications of the invention should not be taken to be limiting. Various mechanical, compositional, structural, electrical, and operational changes may be made without departing from the spirit and scope of the description and claims. In some instances, well-known circuits, structures or techniques have not been shown or described in detail to avoid obscuring embodiments of the application. The same numbers in two or more drawings identify the same or similar elements.

In the description, specific details are set forth describing some embodiments according to the application. Numerous specific details are set forth in order to provide a thorough understanding of the embodiments. It will be apparent, however, to one skilled in the art, that some embodiments may be practiced without some or all of these specific details. The specific embodiments disclosed herein are intended to be illustrative rather than restrictive. Those skilled in the art will recognize that, although not specifically described herein, other elements are also within the scope and spirit of the present application. Furthermore, to avoid unnecessary repetition, one or more features shown and described in connection with one embodiment may be incorporated into other embodiments unless specifically described otherwise or such one or more features would render the embodiments inoperative.

While exemplary embodiments have been shown and described, a wide range of modifications, changes, and substitutions are contemplated in the foregoing disclosure and in some instances, some features of the embodiments may be employed without a corresponding use of the other features. One of ordinary skill in the art would recognize many variations, alternatives, and modifications. Accordingly, the scope of the invention should be limited only by the attached claims, and, as appropriate, the claims should be construed broadly and in a manner consistent with the scope of the embodiments disclosed herein.

22页详细技术资料下载
上一篇:一种医用注射器针头装配设备
下一篇:电子装置和用于控制电子装置的方法

网友询问留言

已有0条留言

还没有人留言评论。精彩留言会获得点赞!

精彩留言,会给你点赞!