Method for hierarchical modeling contribution-aware context for long-distance dialog state tracking

文档序号：1905178 发布日期：2021-11-30 浏览：21次中文

阅读说明：本技术 针对长距离对话状态追踪的分层建模贡献感知的上下文的方法 (Method for hierarchical modeling contribution-aware context for long-distance dialog state tracking ) 是由党建武齐剑书王龙标司宇珂于 2021-09-08 设计创作，主要内容包括：本发明公开针对长距离对话状态追踪的分层建模贡献感知上下文的方法,包括如下步骤：构建对话轮建模模块：利用包含了对话轮编码器和上下文编码器的分层编码器,编码按轮切分的对话历史上下文输入,得到包含了较完整轮内信息和上下文信息的对话轮表示；构建贡献感知的上下文建模模块：计算各个对话轮在当前槽值预测过程中所能做出的贡献,并据此建模槽特定的贡献感知的上下文表示；构建基于贡献感知的上下文的槽值生成模块：该模块将当前槽向量与槽特定的贡献感知的上下文表示作为初始输入和初始隐藏状态,通过在每个时间步根据隐藏状态计算词表分布和对话历史分布,逐字生成正确的槽值序列。本发明更好地完成长距离对话状态追踪任务。(The invention discloses a method for contributing sensing context to layered modeling aiming at long-distance dialog state tracking, which comprises the following steps: constructing a dialogue wheel modeling module: encoding dialog history context input segmented by turns by using a layered encoder comprising a dialog wheel encoder and a context encoder to obtain a dialog wheel representation comprising more complete in-turn information and context information; constructing a contribution-aware context modeling module: calculating the contribution which can be made by each dialogue wheel in the current slot value prediction process, and modeling the context representation of slot-specific contribution perception according to the contribution; a slot value generation module that builds a context based on contribution awareness: the module takes as initial input and initial hidden states the current slot vector and the context representation of the slot-specific contribution awareness, generates word-by-word the correct sequence of slot values by computing the vocabulary distribution and the dialogue history distribution at each time step from the hidden states. The invention can better complete the long-distance dialogue state tracking task.)

1. a method for contributing context-aware context for hierarchical modeling of long-distance dialog state tracking, comprising the steps of:

(1) constructing a dialogue wheel modeling module:

the dialogue-wheel modeling module utilizes a hierarchical structure comprising a dialogue wheel encoder and a context encoder;

coding the dialogue context in the training corpus to obtain dialogue wheel representation containing context information; this module gives the dialog context X_t＝{T₁,T₂,T_k,...,T_tEncode, where T denotes the number of dialog turns in the dialog context, T_k＝{S_k,U_kDenotes a conversation wheel T_kComprises a system statement S_kAnd a user statement U_k，Presentation dialogue wheel T_kIn a system statement of (1) has N_skThe number of the individual words is,presentation dialogue wheel T_kHas N in the user statement_ukA word;

(2) constructing a contribution-aware context modeling module:

the contribution-aware context modeling module utilizes an attention mechanism according to different slots s_jFor each dialogue wheel T_iScore to obtain score_i,jThe possible contribution of each dialogue wheel in the prediction process of the current slot value is measured, and the context representation sc of the slot-specific contribution perception is obtained according to the measured contribution_j；

(3) A slot value generation module that builds a context based on contribution awareness:

a decoder utilizing copy augmentation based on a contribution-aware context slot value generation module;

the modules being inserted in respective slots s_jAs an initial input, sc is represented in a slot-specific contribution-aware context_jAs initial hidden state, in every decoding step selecting from conversation history or vocabulary list to obtain word produced by current decoding step so as to gradually produce slot value v_j。

2. The method for contributing context awareness for hierarchical modeling for long-distance dialog state tracking according to claim 1, wherein the step (1) comprises the following steps for obtaining the corpus:

(101) for each conversation wheel, taking all sentences from the conversation start to the current conversation wheel as a conversation history;

(102) dividing the conversation history obtained in the step (101) according to a conversation wheel, wherein one system statement and one user statement are used as the conversation wheel, and the system statement in the first conversation may be empty;

(103) counting all domain slot pairs appearing in the training expectation, and constructing a slot set, wherein the specific format is 'domain-slot';

(104) and standardizing the dialogue state labels, and taking the slot values in the standardized dialogue state as the training labels of the corresponding slots.

3. The method for contributing context awareness for hierarchical modeling of long-distance dialog state tracking according to claim 1, wherein in step (1), the hierarchical encoder comprises a lower-level dialog wheel encoder and a higher-level context encoder; the dialog wheel encoder consists of a bidirectional GRU, for each dialog wheel T in the context_kEncoding to obtain the dialog wheel vector representation th_kFurthermore, the vector representation H of all words in the conversation history is calculated by utilizing a residual connection mechanism_kThe specific calculation formula is as follows:

wh_k,i＝w_k,i+h_k，i (6)

respectively representing the dialogue wheels T obtained by forward GRU and backward GRU coding_kHidden state corresponding to the ith word, | T_kI denotes a dialogue wheel T_kThe number of Chinese words;representing a forward GRU in a low-level dialog wheel encoder,representing a backward GRU in a low-level dialog wheel encoder; h is_k，iPresentation dialogue wheel T_kThe hidden state vector of the ith word is obtained by bidirectional GRU coding; th (h)_kRepresenting dialog wheels T encoded by a lower-level dialog wheel encoder_kA vector representation of (a); wh_k，iTo the dialog wheel T_kThe ith word in the previous dialogue history is represented by a vector obtained by a dialogue wheel encoder; | H_kI represents to the dialogue wheel T_kThe number of words in the conversation history so far; w is a_k，iTo the dialog wheel T_kWord-embedded representation of the ith word in the previous dialog history;

the high-level context encoder is composed of another bidirectional GRU, which constructs a dialog history C composed of sequentially ordered dialog-wheel vector representations_kAs an input, a dialogue turn vector representation Th including context information is then calculated using a residual join mechanism_i(ii) a The specific calculation formula is as follows:

wherein C is_k＝{th₁,th₂,...,th_kAll the dialogue wheel representations which are obtained by a low-level dialogue wheel encoder and are arranged in sequence are contained, and k represents the number of the dialogue wheels in the dialogue context; as in the above, the above-mentioned,representing the forward GRU in the higher level context encoder,represents a backward GRU in a high-level context encoder;respectively representing hidden states corresponding to the ith dialogue wheel in the dialogue context obtained by forward GRU and backward GRU coding; th'_iRepresenting a hidden state vector obtained by the ith dialogue wheel through bidirectional GRU coding in the dialogue context; th_iThe vector representation containing the context information obtained by the context encoder of the ith dialog turn in the dialog context is represented.

4. The method for modeling contribution-aware context hierarchically for long-distance dialog state tracking as recited in claim 1, wherein in step (2), the contribution-aware context modeling module utilizes an attention mechanism according to different slots s_jFor each dialog wheel T in the dialog context_iScore to obtain score_i，jMeasuring the contribution which can be made by each dialogue wheel in the current slot value prediction process, and calculating to obtain the dialogue context representation of slot specific contribution perception by performing weighted summation on all the dialogue wheel representations according to the contribution;

the specific calculation formula is as follows:

score_i，j＝s_jTh_i (11)

wherein, score_i,jIndicating slot s_jTo conversation wheel T_iCalculated attention score, measure the dialogue wheel T_iAt the groove s_jThe contribution that can be made in the slot value generation process; s_jA word embedding vector representation representing the jth slot; th_iRepresenting the dialogue wheel T obtained in step (1)_iA vector representation of (a); w is a_i,jRepresents the attention score_i,jNormalizing the result by softmax operation; sc (sc)_jIndicating slot s_jA particular contribution-aware dialog context vector representation.

5. The method for modeling contribution-aware context hierarchically for long-distance dialog state tracking as recited in claim 1, wherein in step (3), the contribution-aware context-based slot value generation module embeds the slot in s_jAs an initial input, the slot s obtained in step (2) is used_jSpecific contribution-aware context representation sc_jStarting the decoding process as an initial hidden state at each time step i:

first, the decoder embeds the words from the previous step into dw_i-1,jAs input to the current time step, the decoder state dh is obtained_i,j：

dh_i,j＝GRU(dw_i-1,j,dh_i-1,j) (14)

Wherein dh is_i,jIs shown at slot s_jDecoding the decoder hidden state obtained in step i; dw_i-1,jIs shown at slot s_jDecoding the word prediction result obtained in step i-1;

thereafter, using the decoder state dh_i,jSeparately calculating probability distribution of open vocabulary and dialogue history

Wherein the content of the first and second substances,is shown at slot s_jThe probability distribution about the vocabulary calculated in the decoding step i;is shown at slot s_jThe probability distribution about the history of the dialog calculated in decoding step i; softmax represents a normalized exponential function; e represents a word embedding vector matrix corresponding to the vocabulary; h_tRepresenting a word vector matrix corresponding to the dialogue history obtained by the low-layer dialogue wheel encoder in the step (1); | V | represents the number of words contained in the vocabulary; | H_tI denotes a dialogue wheel T_tThe dialogue history of (2) contains the number of words;

finally, the two obtained probability distributions are weighted and combined to obtain the final probability distribution of the whole word listAnd selecting the word with the highest probability as the result of the current decoding step:

wherein the content of the first and second substances,representing the final probability distribution result of weighted combination of the vocabulary and the dialogue historical probability distribution; sigmoid represents an activation function; w is a parameter to be learned; wd_i,jIs shown at slot s_jDecoding the dialog context vector learned in step i;

the loss function in the above decoding process is represented as follows:

wherein J represents the number of domain slot pairs contained in the data set; | Y_jL represents the number of words contained in the annotation slot value corresponding to the current dialogue wheel slot j;indicating the unique hot code of the ith word in the labeled slot value corresponding to slot j.

Technical Field

The invention relates to the technical field of natural language processing and task type dialog systems, in particular to a method for contributing to sensing context by layered modeling aiming at long-distance dialog state tracking.

Background

In recent years, a task-oriented dialog system has attracted a lot of attention in both the industrial and academic circles, and has been widely used to help users complete tasks such as restaurant reservations, sight spot queries, and the like through spoken language interaction. The traditional task type dialogue system is a pipeline structure and is composed of four components of natural language understanding, dialogue state tracking, dialogue strategy learning and natural language generation^[1]。

Wherein the goal of dialog state tracking is to keep track of the user's goals and intentions on each turn of the dialog and represent them as a dialog state, i.e., a set of slots and their corresponding values^[2]. Because the dialogue strategy learning and natural language generation module needs to rely on the result of dialogue state tracking to select the next system action and generate the next system reply, an accurate dialogue state prediction is very key to improving the overall performance of the dialogue system^[3]。

To address the challenges in the task of session state tracking, many methods have been proposed in recent years, which can be largely divided into two categories: a predefined ontology based approach and an open vocabulary based approach.

(1) The pre-defined ontology-based approach assumes that all the values that each slot may take are pre-defined in advance in a set of candidate values, and the dialog state prediction process is actually a multi-classification process for each slot with respect to all the elements in the set of candidate values. However, in reality, since the number of candidate values may be large or even dynamically changing, it is often difficult to predefine such a set of candidate values.

(2) The generative approach breaks the assumption of a predefined ontology, instead of giving only the target slot, the slot value is generated directly from the context.

However, as the dialog progresses, the dialog context accumulates, important information far from the current dialog turn is easily lost in modeling the dialog context, resulting in failure of dialog state prediction, and too much contextual information may make it difficult for the dialog state tracker to focus on critical information.

Disclosure of Invention

The invention aims to overcome the defects in the prior art, and provides a method for modeling contribution-aware context in a layered manner for tracking a long-distance dialog state.

The purpose of the invention is realized by the following technical scheme: a method for contributing context-aware to the hierarchical modeling of long-distance dialog state tracking, comprising the steps of:

(1) constructing a dialogue wheel modeling module:

the dialogue wheel modeling module encodes dialogue contexts in the training corpus by utilizing a layered structure (comprising a dialogue wheel encoder and a context encoder) to obtain dialogue wheel representation comprising context information; this module gives the dialog context X_t＝{T₁,T₂,T_k,...,T_tEncode, where T denotes the number of dialog turns in the dialog context, T_k＝{S_k,U_kDenotes a conversation wheel T_kComprises a system statement S_kAnd a user statement U_k，Presentation dialogue wheel T_kIn a system statement of (1) has N_skThe number of the individual words is,presentation dialogue wheel T_kUser language ofIn sentence there is N_ukA word;

(2) constructing a contribution-aware context modeling module:

(3) A slot value generation module that builds a context based on contribution awareness:

the contribution-aware context-based slot value generation module utilizes a copy-augmented decoder; the modules being inserted in respective slots s_jAs an initial input, sc is represented in a slot-specific contribution-aware context_jAs initial hidden state, in every decoding step selecting from conversation history or vocabulary list to obtain word produced by current decoding step so as to gradually produce slot value v_j。

Further, the step (1) of obtaining the corpus includes the following steps:

(101) for each conversation wheel, taking all sentences from the conversation start to the current conversation wheel as a conversation history;

(102) dividing the conversation history obtained in the step (101) according to a conversation wheel, wherein one system sentence and one user sentence are used as the conversation wheel (note: the system sentence in the first conversation may be empty);

(103) counting all domain slot pairs appearing in the training expectation, and constructing a slot set, wherein the specific format is 'domain-slot';

(104) and standardizing the dialogue state labeling, such as correcting the labeling error and spelling error, unifying labels with different meanings and words, and the like, and taking the slot value in the standardized dialogue state as a training label of the corresponding slot.

Further, in step (1), the layered encoder comprises a lower layer dialog wheel encoder and a higher layer context encoder. The dialog wheel encoder consists of a bidirectional GRU, for each dialog wheel T in the context_kEncoding to obtain the dialog wheel vector representation th_kFurthermore, the vector representation H of all words in the conversation history is calculated by utilizing a residual connection mechanism_kThe specific calculation formula is as follows:

wh_k，i＝w_k，i+h_k，i (6)

respectively representing the dialogue wheels T obtained by forward GRU and backward GRU coding_kHidden state corresponding to the ith word, | T_kI denotes a dialogue wheel T_kThe number of Chinese words;representing a forward GRU in a low-level dialog wheel encoder,representing a backward GRU in a low-level dialog wheel encoder; h is_k，iPresentation dialogue wheel T_kThe ith word passes through two directionsHidden state vectors obtained by GRU coding; th (h)_kRepresenting dialog wheels T encoded by a lower-level dialog wheel encoder_kA vector representation of (a); wh_k，iTo the dialog wheel T_kThe ith word in the previous dialogue history is represented by a vector obtained by a dialogue wheel encoder; | H_kI represents to the dialogue wheel T_kThe number of words in the conversation history so far; w is a_k，iTo the dialog wheel T_kWord-embedded representation of the ith word in the previous dialog history;

Th_i＝th_i+th′_i (10)

wherein C is_k＝{th₁，th₂，...，th_kAll the dialogue wheel representations which are obtained by a low-level dialogue wheel encoder and are arranged in sequence are contained, and k represents the number of the dialogue wheels in the dialogue context; as in the above, the above-mentioned,representing the forward GRU in the higher level context encoder,represents a backward GRU in a high-level context encoder;respectively representing hidden states corresponding to the ith dialogue wheel in the dialogue context obtained by forward GRU and backward GRU coding; th'_iRepresenting a hidden state vector obtained by the ith dialogue wheel through bidirectional GRU coding in the dialogue context; th_iThe vector representation containing the context information obtained by the context encoder of the ith dialog turn in the dialog context is represented.

Further, in step (2), the contribution-aware context modeling module utilizes an attention mechanism according to different slots s_jFor each dialog wheel T in the dialog context_iScore to obtain score_i,jMeasuring the contribution which can be made by each dialogue wheel in the current slot value prediction process, and calculating to obtain the dialogue context representation of slot specific contribution perception by performing weighted summation on all the dialogue wheel representations according to the contribution; the specific calculation formula is as follows:

score_i,j＝s_jTh_i (11)

wherein, score_i,jIndicating slot s_jTo conversation wheel T_iCalculated attention score, measure the dialogue wheel T_iAt the groove s_jThe contribution that can be made in the slot value generation process; s_jA word embedding vector representation representing the jth slot; th_iRepresenting the dialogue wheel T obtained in step (1)_iA vector representation of (a); w is a_i,jRepresents the attention score_i,jNormalizing the result by softmax operation; sc (sc)_jIndicating slot s_jSpecific pairs of contribution perceptionsThe context vector representation.

Further, in step (3), the slot value generation module embeds the slot into s based on the contribution-aware context_jAs an initial input, the slot s obtained in step (2) is used_jSpecific contribution-aware context representation sc_jStarting the decoding process as an initial hidden state at each time step i:

first, the decoder embeds the words from the previous step into dw_i-1,jAs input to the current time step, the decoder state dh is obtained_i,j：

dh_i,j＝GRU(dw_i-1,j,dh_i-1,j) (14)

Wherein dh is_i,jIs shown at slot s_jDecoding the decoder hidden state obtained in step i; dw_i-1,jIs shown at slot s_jThe word prediction result obtained in decoding step i-1.

Thereafter, using the decoder state dh_i,jSeparately calculating probability distribution of open vocabulary and dialogue history

Wherein the content of the first and second substances,is shown at slot s_jThe probability distribution about the vocabulary calculated in the decoding step i;is shown at slot s_jThe probability distribution about the history of the dialog calculated in decoding step i; softmax represents a normalized exponential function; e represents a word embedding vector matrix corresponding to the vocabulary; h_tRepresenting a word vector matrix corresponding to the dialogue history obtained by the low-layer dialogue wheel encoder in the step (1); | V | represents the number of words contained in the vocabulary; | H_tI denotes a dialogue wheel T_tContains the number of words in the dialog history of (1).

Finally, the two obtained probability distributions are weighted and combined to obtain the final probability distribution of the whole word listAnd selecting the word with the highest probability as the result of the current decoding step:

wherein the content of the first and second substances,representing the final probability distribution result of weighted combination of the vocabulary and the dialogue historical probability distribution; sigmoid represents an activation function; w is a parameter to be learned; wd_i,jIs shown at slot s_jThe dialog context vector learned in decoding step i.

The loss function in the above decoding process is represented as follows:

Has the advantages that:

1. the present invention solves the problem of critical information loss that occurs earlier when a single sequence model is used to model long contexts. In the context modeling of step (1), a hierarchically structured encoder is introduced in place of the single sequence model used in the previous model. In the mode using a single sequence model as an encoder, the whole dialogue context is spliced and then directly sent to the sequence model for encoding, and when the dialogue context is long, the key information appearing earlier may be forgotten in the encoding process.

Therefore, the present invention uses a hierarchical encoder to divide the entire dialog context into a plurality of sequences according to the dialog wheel, encodes each dialog wheel statement in a lower layer encoder to obtain a vector representation for each dialog wheel, and encodes all dialog wheel representations contained in the dialog context in a higher layer encoder so that each dialog wheel representation can contain context information. By reducing the length of the sequence fed to each encoder, the information contained in each dialog context is preserved as completely as possible during the encoding process.

2. The invention avoids the interference of irrelevant dialogue wheels to model prediction in the multi-round long dialogue context. In the step (2), an attention mechanism is utilized, each dialogue wheel in the dialogue context is scored according to the current slot, contributions which can be made by different dialogue wheels in the current slot value prediction process are measured, the dialogue wheels are weighted and combined according to different contribution scores, and a context representation of slot-specific contribution perception is obtained, so that the help model focuses more on dialogue wheel information related to the current slot in the multi-round long dialogue context, and ignores irrelevant dialogue wheel information.

Experiments show that the two improvement methods can effectively improve the conversation state prediction performance of the model under the condition of long conversation context.

Drawings

FIG. 1 is a general block diagram of the method of the present invention for modeling contribution-aware context hierarchically for long-distance dialog state tracking;

FIG. 2 is a block diagram of a lower level dialog wheel encoder of a layered encoder in the method for modeling contribution-aware context for long-range dialog state tracking provided by the present invention;

FIG. 3 is a block diagram of a high level context encoder of a layered encoder in the method for modeling contribution-aware context for long-distance dialog state tracking according to the present invention.

Detailed Description

The invention is described in further detail below with reference to the figures and specific examples. It should be understood that the specific embodiments described herein are merely illustrative of the invention and are not intended to limit the invention.

The implementation method of the present invention is given by taking a multi-domain data set multiwoz2.0 as an example. The overall framework of the method is shown in figure 1. The whole system algorithm flow comprises 3 steps of dialogue wheel modeling, contribution-aware context modeling and slot value generation.

The method comprises the following specific steps:

(1) modeling a conversation wheel:

the present invention primarily utilizes the multiwoz2.0 dataset. The data set is a multi-domain data set containing 10438 sessions involving 7 domains of sight, hospital, police, hotel, restaurant, taxi and train, and since the hospital and police domains are only present in the training set, we only used the data of the remaining 5 domains during the experiment. The invention takes the dialogue data set as the original corpus and carries out the following processing:

(1) for each conversation wheel, taking all sentences from the conversation start to the current conversation wheel as a conversation history;

(2) dividing the conversation history obtained in the step (1) according to a conversation wheel, wherein one system statement and one user statement are used as the conversation wheel (note: the system statement in the first conversation may be empty);

(3) counting all domain slot pairs appearing in the original corpus, and constructing a slot set with a specific format of 'domain-slot';

(4) and standardizing the dialogue state labeling, such as correcting the labeling error and spelling error, unifying labels with different meanings and words, and the like, and taking the slot value in the standardized dialogue state as a training label of the corresponding slot.

Table 1 shows the detailed statistics of the data set. The data set collectively contains 71410 rounds of dialogue, 56668 rounds of dialogue for training, 7374 rounds of dialogue for validation, and 7368 rounds of dialogue for testing.

The longest dialog context in the training set contains 879 words, the longest dialog context in the verification set contains 659 words, and the longest dialog context in the test set contains 615 words. The dialogue data in the dataset is mainly distributed in the context length range of 0-300 words.

TABLE 1 dialog data set statistics

Corpus	All	Training	Dev	Test
					Total	71410	56668	7374	7368
Max Length	879	879	659	615
					0-99	30797	24954	2936	2907
100-199	22956	18005	2469	2482
					200-299	13330	10293	1535	1502
300-399	3616	2834	382	400
					400-	711	582	52	77

Based on the above data sets, the present invention utilizes a dialogue-wheel modeling module (see fig. 2 and 3) composed of two bidirectional GRUs as a dialogue-wheel encoder and a context encoder, respectively, to encode a dialogue-wheel representation Th containing dialogue context information_iAnd word vector matrix H corresponding to conversation history_k：

wh_k,i＝w_k,i+h_k,i

Th_i＝th_i+th′_i

Respectively representing the dialog wheel T obtained by forward GRU and backward GRU coding in the lower-layer dialog wheel coder_kHidden state corresponding to the ith word, | T_kI denotes a dialogue wheel T_kThe number of Chinese words;representing a forward GRU in a low-level dialog wheel encoder,representing a backward GRU in a low-level dialog wheel encoder; h is_k,iPresentation dialogue wheel T_kThe hidden state vector of the ith word is obtained by bidirectional GRU coding; th (h)_kRepresenting dialog wheels T encoded by a lower-level dialog wheel encoder_kA vector representation of (a); wh_k,iTo the dialog wheel T_kThe ith word in the previous dialogue history is represented by a vector obtained by a dialogue wheel encoder; | H_kI represents to the dialogue wheel T_kThe number of words in the conversation history so far; w is a_k,iTo the dialog wheel T_kWord-embedded representation of the ith word in the previous dialog history; c_k＝{th₁,th₂，...,th_kAll the dialogue wheel representations which are obtained by a low-level dialogue wheel encoder and are arranged in sequence are contained, and k represents the number of the dialogue wheels in the dialogue context;

as in the above, the above-mentioned,representing the forward GRU in the higher level context encoder,represents a backward GRU in a high-level context encoder;respectively representHidden states corresponding to the ith dialog wheel in the dialog context obtained by forward GRU and backward GRU coding in the high-level context coder; th'_iRepresenting a hidden state vector obtained by the ith dialogue wheel through bidirectional GRU coding in the dialogue context; th_iThe vector representation containing the context information obtained by the context encoder of the ith dialog turn in the dialog context is represented.

(2) Contribution-aware context modeling process:

and calculating a contribution degree score according to the current slot vector and all the dialogue wheel representations obtained in the previous step by using an attention mechanism, and taking the contribution degree score as a weight to perform weighted summation on the dialogue wheel vector representations in the context to obtain a contribution perception context vector representation:

score_i,j＝s_jTh_i

wherein, score_i,jIndicating slot s_jTo conversation wheel T_iCalculated attention score, measure the dialogue wheel T_iAt the groove s_jThe contribution that can be made in the slot value generation process; w is a_i,jRepresents the attention score_i,jNormalizing the result by softmax operation; sc (sc)_jIndicating slot s_jA particular contribution-aware dialog context vector representation.

(3) Generating a reply

And respectively taking the current slot vector and the context vector representation of the specific contribution perception of the current slot obtained in the last step as the initial input and the initial hidden state of the decoder for slot value generation. At each time step, respectively calculating probability distribution of the vocabulary and the dialogue history by using the hidden state of the decoder, and obtaining final probability distribution of the vocabulary;

dh_i,j＝GRU(dw_i-1,j,dh_i-1,j)

wherein dh is_i,jIs shown at slot s_jDecoding the decoder hidden state obtained in step i; dw_i-1,jIs shown at slot s_jDecoding the word prediction result obtained in step i-1;is shown at slot s_jThe probability distribution about the vocabulary calculated in the decoding step i;is shown at slot s_jThe probability distribution about the history of the dialog calculated in decoding step i; softmax represents a normalized exponential function; e represents a word embedding vector matrix corresponding to the vocabulary; h_tRepresenting a word vector matrix corresponding to the dialogue history obtained by the low-layer dialogue wheel encoder in the step (1); | V | represents the number of words contained in the vocabulary; | H_tL represents the number of words contained in the dialogue history of the dialogue wheel t;representing the final probability distribution result of the weighted combination of the two probability distributions;

sigmoid represents an activation function; w is a parameter to be learned; wd_i,jIs shown at slot s_jThe dialog context vector learned in decoding step i.

Training of the model was performed using the following objective function:

In a specific implementation, the method is implemented based on a pytorch and trained on an Nvidia GPU. Various parameters were set in advance, and GlovE embedding was used^[4]And character-wise embedding^[5]Concatenation is a word embedding vector with dimension 400, and the hidden layer size of GRU in encoder and decoder is also set to 400.

Using Adam^[6]The algorithm updates parameters at an initial learning rate of 0.001, and adopts early-stop strategy in the training process^[7]By setting probability to 6, the training process ends after 6 consecutive epochs of model joint accuracy have not been improved.

Table 2-1 shows the results of this model (CACHE), the version of the model after replacing the encoder-decoder framework in the MLCSG with the inventive structure (CACHE + LM) and the other baseline models (TRADE, COMER, MLCSG) on the MultiWOZ2.0 dataset with respect to two evaluation indices (Slot Accuracy, Joint Accuracy).

Table 2-2 shows the model (CACHE), the baseline model (TRADE) of the model, a latest model algorithm (MLCSG) based on the encoder-decoder framework, and the replacement of the encoder-decoder in the MLCSG with the model version (CACHE + LM) of the structure of the invention, the Joint Accuracy results and data statistics on data of different context length ranges in the MultiWOZ2.0 test set.

TABLE 2-1 Overall results for the MultiWOZ2.0 test set

Model	Slot Accuracy	Joint Accuracy
			TRADE	96.94％	48.53％
COMER	-	48.79％
			CACHE	96.99％	49.54％
MLCSG	97.18％	50.72％
			CACHE+LM	97.15％	50.96％

TABLE 2-2 MULTIPWOZ2.0 test set data statistics for different dialog context length ranges and model results

The comparative experimental algorithms in the table are described below:

TRADE: a method for generating slot values from a dialog history or vocabulary using a copy-augmented encoder-decoder framework;

COMER: a method of sequentially generating fields, slots and values in a dialog state using a layered decoder;

MLCSG: a method using a multi-task learning framework with a language model as an auxiliary task on the basis of a TRADE model;

CACHE + LM: replacing an encoder-decoder structure in the MLCSG model with a version after the model structure is adopted;

remarking: the invention is mainly based on the improvement of the encoder part in the TRADE model, so that the comparison of various results with the TRADE model is more concerned in the experiment. At present, a plurality of methods improve the model performance by adding different functional module structures on the basis of an encoder-decoder framework, and the method provided by the invention can be directly transplanted in the methods (for example, the method is transplanted into an MLCSG model to obtain CACHE + LM) by replacing the encoder-decoder structure, so that excessive comparison is not performed in experiments, and only CACHE + LM is taken as an example for simple comparison and explanation.

As can be seen from the experimental results of table 2-1, all models achieved very good results in terms of Slot Accuracy (Slot Accuracy) and performed very closely, since in each dialogue-turn most Slot values are empty, which is easily predictable to the model, and therefore we are more concerned about the model's performance in Joint Accuracy (Joint Accuracy). Compared with the TRADE model, CACHE obtains 1.01% absolute improvement in joint accuracy, which shows that the method provided by the invention-utilizes a layered encoder to carry out conversation wheel modeling and constructs a context representation contributing to perception through an attention mechanism-is very helpful for completing a conversation state tracking task. In addition, compared with the MLCSG, the CACHE + LM obtains an absolute improvement of 0.24% in the aspect of joint accuracy, which shows that the method provided by the invention is still effective after being transplanted to other conversation state tracking models based on an encoder-decoder structure.

As can be seen from the experimental results of table 2-2, when the dialog context length exceeds 100 words, both CACHE and CACHE + LM are greatly improved compared to TRADE and MLCSG. Especially in the 200-. This shows that the method provided by the present invention, using the layered structure encoder, can effectively alleviate the problem of information loss occurring earlier in a long dialog context by reducing the length of the dialog sequence sent to each encoder; also, building a contribution-aware contextual representation can help the model focus on useful information in a lengthy dialog context.

The present invention is not limited to the above-described embodiments. The foregoing description of the specific embodiments is intended to describe and illustrate the technical solutions of the present invention, and the above specific embodiments are merely illustrative and not restrictive. Those skilled in the art can make many changes and modifications to the invention without departing from the spirit and scope of the invention as defined in the appended claims.

Reference documents:

[1]Shan Y，Li Z，Zhang J，et al.A contextual hierarchical attention network with adaptive objective for dialogue state tracking[C]//Proceedings of the 58th Annual Meeting of the Association for Computational Linguistics.2020:6322-6333.

[2]Zhu S，Li J，Chen L,et al.Efficient context and schema fusion networks for multi-domain dialogue state tracking[J].arXiv preprint arXiv:2004.03386,2020.

[3]Ye F,Manotumruksa J，Zhang Q,et al.Slot Self-Attentive Dialogue State Tracking[C]//Proceedings of the Web Conference 2021.2021:1598-1608.

[4]Pennington J,Socher R,Manning C D.Glove:Global vectors for word representation[C]//Proceedings of the 2014conference on empirical methods in natural language processing(EMNLP).2014:1532-1543.

[5]Hashimoto K,Xiong C,Tsuruoka Y,et al.A joint many-task model:Growing a neural network for multiple nlp tasks[J].arXiv preprint arXiv:1611.01587,2016.

[6]Kingma D P,Ba J.Adam:A method for stochastic optimization[J].arXiv preprint arXiv:1412.6980,2014.

[7]Caruana R,Lawrence S,Giles L.Overfitting in neural nets:Backpropagation,conjugate gradient,and early stopping[J].Advances in neural information processing systems,2001:402-408。

15页详细技术资料下载

上一篇：一种医用注射器针头装配设备

下一篇：一种基于反向翻译的英文文章自动语法纠错方法

Method for hierarchical modeling contribution-aware context for long-distance dialog state tracking

相关技术

网友询问留言