Method for hierarchical modeling contribution-aware context for long-distance dialog state tracking

文档序号:1905178 发布日期:2021-11-30 浏览:21次 中文

阅读说明:本技术 针对长距离对话状态追踪的分层建模贡献感知的上下文的方法 (Method for hierarchical modeling contribution-aware context for long-distance dialog state tracking ) 是由 党建武 齐剑书 王龙标 司宇珂 于 2021-09-08 设计创作,主要内容包括:本发明公开针对长距离对话状态追踪的分层建模贡献感知上下文的方法,包括如下步骤:构建对话轮建模模块:利用包含了对话轮编码器和上下文编码器的分层编码器,编码按轮切分的对话历史上下文输入,得到包含了较完整轮内信息和上下文信息的对话轮表示;构建贡献感知的上下文建模模块:计算各个对话轮在当前槽值预测过程中所能做出的贡献,并据此建模槽特定的贡献感知的上下文表示;构建基于贡献感知的上下文的槽值生成模块:该模块将当前槽向量与槽特定的贡献感知的上下文表示作为初始输入和初始隐藏状态,通过在每个时间步根据隐藏状态计算词表分布和对话历史分布,逐字生成正确的槽值序列。本发明更好地完成长距离对话状态追踪任务。(The invention discloses a method for contributing sensing context to layered modeling aiming at long-distance dialog state tracking, which comprises the following steps: constructing a dialogue wheel modeling module: encoding dialog history context input segmented by turns by using a layered encoder comprising a dialog wheel encoder and a context encoder to obtain a dialog wheel representation comprising more complete in-turn information and context information; constructing a contribution-aware context modeling module: calculating the contribution which can be made by each dialogue wheel in the current slot value prediction process, and modeling the context representation of slot-specific contribution perception according to the contribution; a slot value generation module that builds a context based on contribution awareness: the module takes as initial input and initial hidden states the current slot vector and the context representation of the slot-specific contribution awareness, generates word-by-word the correct sequence of slot values by computing the vocabulary distribution and the dialogue history distribution at each time step from the hidden states. The invention can better complete the long-distance dialogue state tracking task.)

1. a method for contributing context-aware context for hierarchical modeling of long-distance dialog state tracking, comprising the steps of:

(1) constructing a dialogue wheel modeling module:

the dialogue-wheel modeling module utilizes a hierarchical structure comprising a dialogue wheel encoder and a context encoder;

coding the dialogue context in the training corpus to obtain dialogue wheel representation containing context information; this module gives the dialog context Xt={T1,T2,Tk,...,TtEncode, where T denotes the number of dialog turns in the dialog context, Tk={Sk,UkDenotes a conversation wheel TkComprises a system statement SkAnd a user statement UkPresentation dialogue wheel TkIn a system statement of (1) has NskThe number of the individual words is,presentation dialogue wheel TkHas N in the user statementukA word;

(2) constructing a contribution-aware context modeling module:

the contribution-aware context modeling module utilizes an attention mechanism according to different slots sjFor each dialogue wheel TiScore to obtain scorei,jThe possible contribution of each dialogue wheel in the prediction process of the current slot value is measured, and the context representation sc of the slot-specific contribution perception is obtained according to the measured contributionj

(3) A slot value generation module that builds a context based on contribution awareness:

a decoder utilizing copy augmentation based on a contribution-aware context slot value generation module;

the modules being inserted in respective slots sjAs an initial input, sc is represented in a slot-specific contribution-aware contextjAs initial hidden state, in every decoding step selecting from conversation history or vocabulary list to obtain word produced by current decoding step so as to gradually produce slot value vj

2. The method for contributing context awareness for hierarchical modeling for long-distance dialog state tracking according to claim 1, wherein the step (1) comprises the following steps for obtaining the corpus:

(101) for each conversation wheel, taking all sentences from the conversation start to the current conversation wheel as a conversation history;

(102) dividing the conversation history obtained in the step (101) according to a conversation wheel, wherein one system statement and one user statement are used as the conversation wheel, and the system statement in the first conversation may be empty;

(103) counting all domain slot pairs appearing in the training expectation, and constructing a slot set, wherein the specific format is 'domain-slot';

(104) and standardizing the dialogue state labels, and taking the slot values in the standardized dialogue state as the training labels of the corresponding slots.

3. The method for contributing context awareness for hierarchical modeling of long-distance dialog state tracking according to claim 1, wherein in step (1), the hierarchical encoder comprises a lower-level dialog wheel encoder and a higher-level context encoder; the dialog wheel encoder consists of a bidirectional GRU, for each dialog wheel T in the contextkEncoding to obtain the dialog wheel vector representation thkFurthermore, the vector representation H of all words in the conversation history is calculated by utilizing a residual connection mechanismkThe specific calculation formula is as follows:

whk,i=wk,i+hk,i (6)

respectively representing the dialogue wheels T obtained by forward GRU and backward GRU codingkHidden state corresponding to the ith word, | TkI denotes a dialogue wheel TkThe number of Chinese words;representing a forward GRU in a low-level dialog wheel encoder,representing a backward GRU in a low-level dialog wheel encoder; h isk,iPresentation dialogue wheel TkThe hidden state vector of the ith word is obtained by bidirectional GRU coding; th (h)kRepresenting dialog wheels T encoded by a lower-level dialog wheel encoderkA vector representation of (a); whk,iTo the dialog wheel TkThe ith word in the previous dialogue history is represented by a vector obtained by a dialogue wheel encoder; | HkI represents to the dialogue wheel TkThe number of words in the conversation history so far; w is ak,iTo the dialog wheel TkWord-embedded representation of the ith word in the previous dialog history;

the high-level context encoder is composed of another bidirectional GRU, which constructs a dialog history C composed of sequentially ordered dialog-wheel vector representationskAs an input, a dialogue turn vector representation Th including context information is then calculated using a residual join mechanismi(ii) a The specific calculation formula is as follows:

wherein C isk={th1,th2,...,thkAll the dialogue wheel representations which are obtained by a low-level dialogue wheel encoder and are arranged in sequence are contained, and k represents the number of the dialogue wheels in the dialogue context; as in the above, the above-mentioned,representing the forward GRU in the higher level context encoder,represents a backward GRU in a high-level context encoder;respectively representing hidden states corresponding to the ith dialogue wheel in the dialogue context obtained by forward GRU and backward GRU coding; th'iRepresenting a hidden state vector obtained by the ith dialogue wheel through bidirectional GRU coding in the dialogue context; thiThe vector representation containing the context information obtained by the context encoder of the ith dialog turn in the dialog context is represented.

4. The method for modeling contribution-aware context hierarchically for long-distance dialog state tracking as recited in claim 1, wherein in step (2), the contribution-aware context modeling module utilizes an attention mechanism according to different slots sjFor each dialog wheel T in the dialog contextiScore to obtain scorei,jMeasuring the contribution which can be made by each dialogue wheel in the current slot value prediction process, and calculating to obtain the dialogue context representation of slot specific contribution perception by performing weighted summation on all the dialogue wheel representations according to the contribution;

the specific calculation formula is as follows:

scorei,j=sjThi (11)

wherein, scorei,jIndicating slot sjTo conversation wheel TiCalculated attention score, measure the dialogue wheel TiAt the groove sjThe contribution that can be made in the slot value generation process; sjA word embedding vector representation representing the jth slot; thiRepresenting the dialogue wheel T obtained in step (1)iA vector representation of (a); w is ai,jRepresents the attention scorei,jNormalizing the result by softmax operation; sc (sc)jIndicating slot sjA particular contribution-aware dialog context vector representation.

5. The method for modeling contribution-aware context hierarchically for long-distance dialog state tracking as recited in claim 1, wherein in step (3), the contribution-aware context-based slot value generation module embeds the slot in sjAs an initial input, the slot s obtained in step (2) is usedjSpecific contribution-aware context representation scjStarting the decoding process as an initial hidden state at each time step i:

first, the decoder embeds the words from the previous step into dwi-1,jAs input to the current time step, the decoder state dh is obtainedi,j

dhi,j=GRU(dwi-1,j,dhi-1,j) (14)

Wherein dh isi,jIs shown at slot sjDecoding the decoder hidden state obtained in step i; dwi-1,jIs shown at slot sjDecoding the word prediction result obtained in step i-1;

thereafter, using the decoder state dhi,jSeparately calculating probability distribution of open vocabulary and dialogue history

Wherein the content of the first and second substances,is shown at slot sjThe probability distribution about the vocabulary calculated in the decoding step i;is shown at slot sjThe probability distribution about the history of the dialog calculated in decoding step i; softmax represents a normalized exponential function; e represents a word embedding vector matrix corresponding to the vocabulary; htRepresenting a word vector matrix corresponding to the dialogue history obtained by the low-layer dialogue wheel encoder in the step (1); | V | represents the number of words contained in the vocabulary; | HtI denotes a dialogue wheel TtThe dialogue history of (2) contains the number of words;

finally, the two obtained probability distributions are weighted and combined to obtain the final probability distribution of the whole word listAnd selecting the word with the highest probability as the result of the current decoding step:

wherein the content of the first and second substances,representing the final probability distribution result of weighted combination of the vocabulary and the dialogue historical probability distribution; sigmoid represents an activation function; w is a parameter to be learned; wdi,jIs shown at slot sjDecoding the dialog context vector learned in step i;

the loss function in the above decoding process is represented as follows:

wherein J represents the number of domain slot pairs contained in the data set; | YjL represents the number of words contained in the annotation slot value corresponding to the current dialogue wheel slot j;indicating the unique hot code of the ith word in the labeled slot value corresponding to slot j.

Technical Field

The invention relates to the technical field of natural language processing and task type dialog systems, in particular to a method for contributing to sensing context by layered modeling aiming at long-distance dialog state tracking.

Background

In recent years, a task-oriented dialog system has attracted a lot of attention in both the industrial and academic circles, and has been widely used to help users complete tasks such as restaurant reservations, sight spot queries, and the like through spoken language interaction. The traditional task type dialogue system is a pipeline structure and is composed of four components of natural language understanding, dialogue state tracking, dialogue strategy learning and natural language generation[1]

Wherein the goal of dialog state tracking is to keep track of the user's goals and intentions on each turn of the dialog and represent them as a dialog state, i.e., a set of slots and their corresponding values[2]. Because the dialogue strategy learning and natural language generation module needs to rely on the result of dialogue state tracking to select the next system action and generate the next system reply, an accurate dialogue state prediction is very key to improving the overall performance of the dialogue system[3]

To address the challenges in the task of session state tracking, many methods have been proposed in recent years, which can be largely divided into two categories: a predefined ontology based approach and an open vocabulary based approach.

(1) The pre-defined ontology-based approach assumes that all the values that each slot may take are pre-defined in advance in a set of candidate values, and the dialog state prediction process is actually a multi-classification process for each slot with respect to all the elements in the set of candidate values. However, in reality, since the number of candidate values may be large or even dynamically changing, it is often difficult to predefine such a set of candidate values.

(2) The generative approach breaks the assumption of a predefined ontology, instead of giving only the target slot, the slot value is generated directly from the context.

However, as the dialog progresses, the dialog context accumulates, important information far from the current dialog turn is easily lost in modeling the dialog context, resulting in failure of dialog state prediction, and too much contextual information may make it difficult for the dialog state tracker to focus on critical information.

Disclosure of Invention

The invention aims to overcome the defects in the prior art, and provides a method for modeling contribution-aware context in a layered manner for tracking a long-distance dialog state.

The purpose of the invention is realized by the following technical scheme: a method for contributing context-aware to the hierarchical modeling of long-distance dialog state tracking, comprising the steps of:

(1) constructing a dialogue wheel modeling module:

the dialogue wheel modeling module encodes dialogue contexts in the training corpus by utilizing a layered structure (comprising a dialogue wheel encoder and a context encoder) to obtain dialogue wheel representation comprising context information; this module gives the dialog context Xt={T1,T2,Tk,...,TtEncode, where T denotes the number of dialog turns in the dialog context, Tk={Sk,UkDenotes a conversation wheel TkComprises a system statement SkAnd a user statement UkPresentation dialogue wheel TkIn a system statement of (1) has NskThe number of the individual words is,presentation dialogue wheel TkUser language ofIn sentence there is NukA word;

(2) constructing a contribution-aware context modeling module:

the contribution-aware context modeling module utilizes an attention mechanism according to different slots sjFor each dialogue wheel TiScore to obtain scorei,jThe possible contribution of each dialogue wheel in the prediction process of the current slot value is measured, and the context representation sc of the slot-specific contribution perception is obtained according to the measured contributionj

(3) A slot value generation module that builds a context based on contribution awareness:

the contribution-aware context-based slot value generation module utilizes a copy-augmented decoder; the modules being inserted in respective slots sjAs an initial input, sc is represented in a slot-specific contribution-aware contextjAs initial hidden state, in every decoding step selecting from conversation history or vocabulary list to obtain word produced by current decoding step so as to gradually produce slot value vj

Further, the step (1) of obtaining the corpus includes the following steps:

(101) for each conversation wheel, taking all sentences from the conversation start to the current conversation wheel as a conversation history;

(102) dividing the conversation history obtained in the step (101) according to a conversation wheel, wherein one system sentence and one user sentence are used as the conversation wheel (note: the system sentence in the first conversation may be empty);

(103) counting all domain slot pairs appearing in the training expectation, and constructing a slot set, wherein the specific format is 'domain-slot';

(104) and standardizing the dialogue state labeling, such as correcting the labeling error and spelling error, unifying labels with different meanings and words, and the like, and taking the slot value in the standardized dialogue state as a training label of the corresponding slot.

Further, in step (1), the layered encoder comprises a lower layer dialog wheel encoder and a higher layer context encoder. The dialog wheel encoder consists of a bidirectional GRU, for each dialog wheel T in the contextkEncoding to obtain the dialog wheel vector representation thkFurthermore, the vector representation H of all words in the conversation history is calculated by utilizing a residual connection mechanismkThe specific calculation formula is as follows:

whk,i=wk,i+hk,i (6)

respectively representing the dialogue wheels T obtained by forward GRU and backward GRU codingkHidden state corresponding to the ith word, | TkI denotes a dialogue wheel TkThe number of Chinese words;representing a forward GRU in a low-level dialog wheel encoder,representing a backward GRU in a low-level dialog wheel encoder; h isk,iPresentation dialogue wheel TkThe ith word passes through two directionsHidden state vectors obtained by GRU coding; th (h)kRepresenting dialog wheels T encoded by a lower-level dialog wheel encoderkA vector representation of (a); whk,iTo the dialog wheel TkThe ith word in the previous dialogue history is represented by a vector obtained by a dialogue wheel encoder; | HkI represents to the dialogue wheel TkThe number of words in the conversation history so far; w is ak,iTo the dialog wheel TkWord-embedded representation of the ith word in the previous dialog history;

the high-level context encoder is composed of another bidirectional GRU, which constructs a dialog history C composed of sequentially ordered dialog-wheel vector representationskAs an input, a dialogue turn vector representation Th including context information is then calculated using a residual join mechanismi(ii) a The specific calculation formula is as follows:

Thi=thi+th′i (10)

wherein C isk={th1,th2,...,thkAll the dialogue wheel representations which are obtained by a low-level dialogue wheel encoder and are arranged in sequence are contained, and k represents the number of the dialogue wheels in the dialogue context; as in the above, the above-mentioned,representing the forward GRU in the higher level context encoder,represents a backward GRU in a high-level context encoder;respectively representing hidden states corresponding to the ith dialogue wheel in the dialogue context obtained by forward GRU and backward GRU coding; th'iRepresenting a hidden state vector obtained by the ith dialogue wheel through bidirectional GRU coding in the dialogue context; thiThe vector representation containing the context information obtained by the context encoder of the ith dialog turn in the dialog context is represented.

Further, in step (2), the contribution-aware context modeling module utilizes an attention mechanism according to different slots sjFor each dialog wheel T in the dialog contextiScore to obtain scorei,jMeasuring the contribution which can be made by each dialogue wheel in the current slot value prediction process, and calculating to obtain the dialogue context representation of slot specific contribution perception by performing weighted summation on all the dialogue wheel representations according to the contribution; the specific calculation formula is as follows:

scorei,j=sjThi (11)

wherein, scorei,jIndicating slot sjTo conversation wheel TiCalculated attention score, measure the dialogue wheel TiAt the groove sjThe contribution that can be made in the slot value generation process; sjA word embedding vector representation representing the jth slot; thiRepresenting the dialogue wheel T obtained in step (1)iA vector representation of (a); w is ai,jRepresents the attention scorei,jNormalizing the result by softmax operation; sc (sc)jIndicating slot sjSpecific pairs of contribution perceptionsThe context vector representation.

Further, in step (3), the slot value generation module embeds the slot into s based on the contribution-aware contextjAs an initial input, the slot s obtained in step (2) is usedjSpecific contribution-aware context representation scjStarting the decoding process as an initial hidden state at each time step i:

first, the decoder embeds the words from the previous step into dwi-1,jAs input to the current time step, the decoder state dh is obtainedi,j

dhi,j=GRU(dwi-1,j,dhi-1,j) (14)

Wherein dh isi,jIs shown at slot sjDecoding the decoder hidden state obtained in step i; dwi-1,jIs shown at slot sjThe word prediction result obtained in decoding step i-1.

Thereafter, using the decoder state dhi,jSeparately calculating probability distribution of open vocabulary and dialogue history

Wherein the content of the first and second substances,is shown at slot sjThe probability distribution about the vocabulary calculated in the decoding step i;is shown at slot sjThe probability distribution about the history of the dialog calculated in decoding step i; softmax represents a normalized exponential function; e represents a word embedding vector matrix corresponding to the vocabulary; htRepresenting a word vector matrix corresponding to the dialogue history obtained by the low-layer dialogue wheel encoder in the step (1); | V | represents the number of words contained in the vocabulary; | HtI denotes a dialogue wheel TtContains the number of words in the dialog history of (1).

Finally, the two obtained probability distributions are weighted and combined to obtain the final probability distribution of the whole word listAnd selecting the word with the highest probability as the result of the current decoding step:

wherein the content of the first and second substances,representing the final probability distribution result of weighted combination of the vocabulary and the dialogue historical probability distribution; sigmoid represents an activation function; w is a parameter to be learned; wdi,jIs shown at slot sjThe dialog context vector learned in decoding step i.

The loss function in the above decoding process is represented as follows:

wherein J represents the number of domain slot pairs contained in the data set; | YjL represents the number of words contained in the annotation slot value corresponding to the current dialogue wheel slot j;indicating the unique hot code of the ith word in the labeled slot value corresponding to slot j.

Has the advantages that:

1. the present invention solves the problem of critical information loss that occurs earlier when a single sequence model is used to model long contexts. In the context modeling of step (1), a hierarchically structured encoder is introduced in place of the single sequence model used in the previous model. In the mode using a single sequence model as an encoder, the whole dialogue context is spliced and then directly sent to the sequence model for encoding, and when the dialogue context is long, the key information appearing earlier may be forgotten in the encoding process.

Therefore, the present invention uses a hierarchical encoder to divide the entire dialog context into a plurality of sequences according to the dialog wheel, encodes each dialog wheel statement in a lower layer encoder to obtain a vector representation for each dialog wheel, and encodes all dialog wheel representations contained in the dialog context in a higher layer encoder so that each dialog wheel representation can contain context information. By reducing the length of the sequence fed to each encoder, the information contained in each dialog context is preserved as completely as possible during the encoding process.

2. The invention avoids the interference of irrelevant dialogue wheels to model prediction in the multi-round long dialogue context. In the step (2), an attention mechanism is utilized, each dialogue wheel in the dialogue context is scored according to the current slot, contributions which can be made by different dialogue wheels in the current slot value prediction process are measured, the dialogue wheels are weighted and combined according to different contribution scores, and a context representation of slot-specific contribution perception is obtained, so that the help model focuses more on dialogue wheel information related to the current slot in the multi-round long dialogue context, and ignores irrelevant dialogue wheel information.

Experiments show that the two improvement methods can effectively improve the conversation state prediction performance of the model under the condition of long conversation context.

Drawings

FIG. 1 is a general block diagram of the method of the present invention for modeling contribution-aware context hierarchically for long-distance dialog state tracking;

FIG. 2 is a block diagram of a lower level dialog wheel encoder of a layered encoder in the method for modeling contribution-aware context for long-range dialog state tracking provided by the present invention;

FIG. 3 is a block diagram of a high level context encoder of a layered encoder in the method for modeling contribution-aware context for long-distance dialog state tracking according to the present invention.

Detailed Description

The invention is described in further detail below with reference to the figures and specific examples. It should be understood that the specific embodiments described herein are merely illustrative of the invention and are not intended to limit the invention.

The implementation method of the present invention is given by taking a multi-domain data set multiwoz2.0 as an example. The overall framework of the method is shown in figure 1. The whole system algorithm flow comprises 3 steps of dialogue wheel modeling, contribution-aware context modeling and slot value generation.

The method comprises the following specific steps:

(1) modeling a conversation wheel:

the present invention primarily utilizes the multiwoz2.0 dataset. The data set is a multi-domain data set containing 10438 sessions involving 7 domains of sight, hospital, police, hotel, restaurant, taxi and train, and since the hospital and police domains are only present in the training set, we only used the data of the remaining 5 domains during the experiment. The invention takes the dialogue data set as the original corpus and carries out the following processing:

(1) for each conversation wheel, taking all sentences from the conversation start to the current conversation wheel as a conversation history;

(2) dividing the conversation history obtained in the step (1) according to a conversation wheel, wherein one system statement and one user statement are used as the conversation wheel (note: the system statement in the first conversation may be empty);

(3) counting all domain slot pairs appearing in the original corpus, and constructing a slot set with a specific format of 'domain-slot';

(4) and standardizing the dialogue state labeling, such as correcting the labeling error and spelling error, unifying labels with different meanings and words, and the like, and taking the slot value in the standardized dialogue state as a training label of the corresponding slot.

Table 1 shows the detailed statistics of the data set. The data set collectively contains 71410 rounds of dialogue, 56668 rounds of dialogue for training, 7374 rounds of dialogue for validation, and 7368 rounds of dialogue for testing.

The longest dialog context in the training set contains 879 words, the longest dialog context in the verification set contains 659 words, and the longest dialog context in the test set contains 615 words. The dialogue data in the dataset is mainly distributed in the context length range of 0-300 words.

TABLE 1 dialog data set statistics

Corpus All Training Dev Test
Total 71410 56668 7374 7368
Max Length 879 879 659 615
0-99 30797 24954 2936 2907
100-199 22956 18005 2469 2482
200-299 13330 10293 1535 1502
300-399 3616 2834 382 400
400- 711 582 52 77

Based on the above data sets, the present invention utilizes a dialogue-wheel modeling module (see fig. 2 and 3) composed of two bidirectional GRUs as a dialogue-wheel encoder and a context encoder, respectively, to encode a dialogue-wheel representation Th containing dialogue context informationiAnd word vector matrix H corresponding to conversation historyk

whk,i=wk,i+hk,i

Thi=thi+th′i

Respectively representing the dialog wheel T obtained by forward GRU and backward GRU coding in the lower-layer dialog wheel coderkHidden state corresponding to the ith word, | TkI denotes a dialogue wheel TkThe number of Chinese words;representing a forward GRU in a low-level dialog wheel encoder,representing a backward GRU in a low-level dialog wheel encoder; h isk,iPresentation dialogue wheel TkThe hidden state vector of the ith word is obtained by bidirectional GRU coding; th (h)kRepresenting dialog wheels T encoded by a lower-level dialog wheel encoderkA vector representation of (a); whk,iTo the dialog wheel TkThe ith word in the previous dialogue history is represented by a vector obtained by a dialogue wheel encoder; | HkI represents to the dialogue wheel TkThe number of words in the conversation history so far; w is ak,iTo the dialog wheel TkWord-embedded representation of the ith word in the previous dialog history; ck={th1,th2,...,thkAll the dialogue wheel representations which are obtained by a low-level dialogue wheel encoder and are arranged in sequence are contained, and k represents the number of the dialogue wheels in the dialogue context;

as in the above, the above-mentioned,representing the forward GRU in the higher level context encoder,represents a backward GRU in a high-level context encoder;respectively representHidden states corresponding to the ith dialog wheel in the dialog context obtained by forward GRU and backward GRU coding in the high-level context coder; th'iRepresenting a hidden state vector obtained by the ith dialogue wheel through bidirectional GRU coding in the dialogue context; thiThe vector representation containing the context information obtained by the context encoder of the ith dialog turn in the dialog context is represented.

(2) Contribution-aware context modeling process:

and calculating a contribution degree score according to the current slot vector and all the dialogue wheel representations obtained in the previous step by using an attention mechanism, and taking the contribution degree score as a weight to perform weighted summation on the dialogue wheel vector representations in the context to obtain a contribution perception context vector representation:

scorei,j=sjThi

wherein, scorei,jIndicating slot sjTo conversation wheel TiCalculated attention score, measure the dialogue wheel TiAt the groove sjThe contribution that can be made in the slot value generation process; w is ai,jRepresents the attention scorei,jNormalizing the result by softmax operation; sc (sc)jIndicating slot sjA particular contribution-aware dialog context vector representation.

(3) Generating a reply

And respectively taking the current slot vector and the context vector representation of the specific contribution perception of the current slot obtained in the last step as the initial input and the initial hidden state of the decoder for slot value generation. At each time step, respectively calculating probability distribution of the vocabulary and the dialogue history by using the hidden state of the decoder, and obtaining final probability distribution of the vocabulary;

dhi,j=GRU(dwi-1,j,dhi-1,j)

wherein dh isi,jIs shown at slot sjDecoding the decoder hidden state obtained in step i; dwi-1,jIs shown at slot sjDecoding the word prediction result obtained in step i-1;is shown at slot sjThe probability distribution about the vocabulary calculated in the decoding step i;is shown at slot sjThe probability distribution about the history of the dialog calculated in decoding step i; softmax represents a normalized exponential function; e represents a word embedding vector matrix corresponding to the vocabulary; htRepresenting a word vector matrix corresponding to the dialogue history obtained by the low-layer dialogue wheel encoder in the step (1); | V | represents the number of words contained in the vocabulary; | HtL represents the number of words contained in the dialogue history of the dialogue wheel t;representing the final probability distribution result of the weighted combination of the two probability distributions;

sigmoid represents an activation function; w is a parameter to be learned; wdi,jIs shown at slot sjThe dialog context vector learned in decoding step i.

Training of the model was performed using the following objective function:

wherein J represents the number of domain slot pairs contained in the data set; | YjL represents the number of words contained in the annotation slot value corresponding to the current dialogue wheel slot j;indicating the unique hot code of the ith word in the labeled slot value corresponding to slot j.

In a specific implementation, the method is implemented based on a pytorch and trained on an Nvidia GPU. Various parameters were set in advance, and GlovE embedding was used[4]And character-wise embedding[5]Concatenation is a word embedding vector with dimension 400, and the hidden layer size of GRU in encoder and decoder is also set to 400.

Using Adam[6]The algorithm updates parameters at an initial learning rate of 0.001, and adopts early-stop strategy in the training process[7]By setting probability to 6, the training process ends after 6 consecutive epochs of model joint accuracy have not been improved.

Table 2-1 shows the results of this model (CACHE), the version of the model after replacing the encoder-decoder framework in the MLCSG with the inventive structure (CACHE + LM) and the other baseline models (TRADE, COMER, MLCSG) on the MultiWOZ2.0 dataset with respect to two evaluation indices (Slot Accuracy, Joint Accuracy).

Table 2-2 shows the model (CACHE), the baseline model (TRADE) of the model, a latest model algorithm (MLCSG) based on the encoder-decoder framework, and the replacement of the encoder-decoder in the MLCSG with the model version (CACHE + LM) of the structure of the invention, the Joint Accuracy results and data statistics on data of different context length ranges in the MultiWOZ2.0 test set.

TABLE 2-1 Overall results for the MultiWOZ2.0 test set

Model Slot Accuracy Joint Accuracy
TRADE 96.94% 48.53%
COMER - 48.79%
CACHE 96.99% 49.54%
MLCSG 97.18% 50.72%
CACHE+LM 97.15% 50.96%

TABLE 2-2 MULTIPWOZ2.0 test set data statistics for different dialog context length ranges and model results

The comparative experimental algorithms in the table are described below:

TRADE: a method for generating slot values from a dialog history or vocabulary using a copy-augmented encoder-decoder framework;

COMER: a method of sequentially generating fields, slots and values in a dialog state using a layered decoder;

MLCSG: a method using a multi-task learning framework with a language model as an auxiliary task on the basis of a TRADE model;

CACHE + LM: replacing an encoder-decoder structure in the MLCSG model with a version after the model structure is adopted;

remarking: the invention is mainly based on the improvement of the encoder part in the TRADE model, so that the comparison of various results with the TRADE model is more concerned in the experiment. At present, a plurality of methods improve the model performance by adding different functional module structures on the basis of an encoder-decoder framework, and the method provided by the invention can be directly transplanted in the methods (for example, the method is transplanted into an MLCSG model to obtain CACHE + LM) by replacing the encoder-decoder structure, so that excessive comparison is not performed in experiments, and only CACHE + LM is taken as an example for simple comparison and explanation.

As can be seen from the experimental results of table 2-1, all models achieved very good results in terms of Slot Accuracy (Slot Accuracy) and performed very closely, since in each dialogue-turn most Slot values are empty, which is easily predictable to the model, and therefore we are more concerned about the model's performance in Joint Accuracy (Joint Accuracy). Compared with the TRADE model, CACHE obtains 1.01% absolute improvement in joint accuracy, which shows that the method provided by the invention-utilizes a layered encoder to carry out conversation wheel modeling and constructs a context representation contributing to perception through an attention mechanism-is very helpful for completing a conversation state tracking task. In addition, compared with the MLCSG, the CACHE + LM obtains an absolute improvement of 0.24% in the aspect of joint accuracy, which shows that the method provided by the invention is still effective after being transplanted to other conversation state tracking models based on an encoder-decoder structure.

As can be seen from the experimental results of table 2-2, when the dialog context length exceeds 100 words, both CACHE and CACHE + LM are greatly improved compared to TRADE and MLCSG. Especially in the 200-. This shows that the method provided by the present invention, using the layered structure encoder, can effectively alleviate the problem of information loss occurring earlier in a long dialog context by reducing the length of the dialog sequence sent to each encoder; also, building a contribution-aware contextual representation can help the model focus on useful information in a lengthy dialog context.

The present invention is not limited to the above-described embodiments. The foregoing description of the specific embodiments is intended to describe and illustrate the technical solutions of the present invention, and the above specific embodiments are merely illustrative and not restrictive. Those skilled in the art can make many changes and modifications to the invention without departing from the spirit and scope of the invention as defined in the appended claims.

Reference documents:

[1]Shan Y,Li Z,Zhang J,et al.A contextual hierarchical attention network with adaptive objective for dialogue state tracking[C]//Proceedings of the 58th Annual Meeting of the Association for Computational Linguistics.2020:6322-6333.

[2]Zhu S,Li J,Chen L,et al.Efficient context and schema fusion networks for multi-domain dialogue state tracking[J].arXiv preprint arXiv:2004.03386,2020.

[3]Ye F,Manotumruksa J,Zhang Q,et al.Slot Self-Attentive Dialogue State Tracking[C]//Proceedings of the Web Conference 2021.2021:1598-1608.

[4]Pennington J,Socher R,Manning C D.Glove:Global vectors for word representation[C]//Proceedings of the 2014conference on empirical methods in natural language processing(EMNLP).2014:1532-1543.

[5]Hashimoto K,Xiong C,Tsuruoka Y,et al.A joint many-task model:Growing a neural network for multiple nlp tasks[J].arXiv preprint arXiv:1611.01587,2016.

[6]Kingma D P,Ba J.Adam:A method for stochastic optimization[J].arXiv preprint arXiv:1412.6980,2014.

[7]Caruana R,Lawrence S,Giles L.Overfitting in neural nets:Backpropagation,conjugate gradient,and early stopping[J].Advances in neural information processing systems,2001:402-408。

15页详细技术资料下载
上一篇:一种医用注射器针头装配设备
下一篇:一种基于反向翻译的英文文章自动语法纠错方法

网友询问留言

已有0条留言

还没有人留言评论。精彩留言会获得点赞!

精彩留言,会给你点赞!