Distance parameter alignment translation method based on transformer

文档序号:1556962 发布日期:2020-01-21 浏览:19次 中文

阅读说明:本技术 一种基于transformer的距离参量对齐翻译方法 (Distance parameter alignment translation method based on transformer ) 是由 闫明明 陈绪浩 李迅波 罗华成 赵宇 段世豪 于 2019-09-27 设计创作,主要内容包括:发明公开了一种基于transformer的距离参量对齐翻译方法,应用在基于注意力机制的transformer框架模型上;包括:在训练过程中将注意力机制输入的两种语言的词向量进行计算,得到一个相对距离参量的张量;对此距离张量进行归一化,得到计算规格的新的距离张量。该张量可以参与到注意力机制函数的输出对齐张量的计算中,源语言与目标语言的翻译中,对齐的句子之间词向量的距离代表它们的差异程度,故引入距离参量加入对齐函数的计算,可以有效的加大不同词的对齐概率差异,使得对齐更有效。上述带有距离权重机制的神经翻译方法,能够有效的提升注意力函数的对齐效果,提高翻译效果与分值。该算法可以应用于所有含有注意力机制的模型,不需要修改模型框架。(The invention discloses a distance parameter alignment translation method based on a transformer, which is applied to a transformer frame model based on an attention mechanism; the method comprises the following steps: calculating word vectors of two languages input by an attention mechanism in a training process to obtain a tensor of a relative distance parameter; the distance tensor is normalized to obtain a new distance tensor of the calculation specification. The tensor can participate in the calculation of the output alignment tensor of the attention mechanism function, and in the translation of a source language and a target language, the distance of word vectors between aligned sentences represents the difference degree of the words, so that the distance parameter is introduced to the calculation of the alignment function, the alignment probability difference of different words can be effectively increased, and the alignment is more effective. The neural translation method with the distance weight mechanism can effectively improve the alignment effect of the attention function and improve the translation effect and the score. The algorithm can be applied to all models with attention mechanism, and the model framework does not need to be modified.)

1. A distance parameter alignment translation method based on a transform is applied to a transform model based on an attention mechanism and is characterized in that; the method comprises the following steps:

calculating the word vector distance between a source language and a target language input sentence in the translation process to obtain a distance tensor; calculating a distance parameter and substituting the distance parameter into a calculation process;

introducing the distance tensor into an attention mechanism for calculation, and subtracting a part of the distance tensor from the output alignment tensor of the attention to obtain an output alignment tensor with higher efficiency; the alignment effect can be effectively improved, and the translation score is improved.

2. The transform-based distance parameter alignment translation method of claim 1, wherein; the method is implemented in the following specific manner;

the first step is as follows: generating the temporal semantic vector

Figure FDA0002218386510000011

Figure FDA0002218386510000012

st=tanh(W[st-1,yt-1])

eti=stWahi

The second step is that: passing hidden layer information and prediction

Figure FDA0002218386510000014

Calculating the Euclidean distance between Q and K by taking tensors Q and K of word vectors of a source language and a target language as initial quantities of calculation, and obtaining a tensor distance; normalization function normalization is carried out on the distance tensor to obtain a new distance tensor distance, and the distance is substituted into output to be calculated: the process is as follows:

step 1: let the hidden layer output vector be ki, and perform dot product operation QKt to obtain Si;

step 2: performing softmax normalization to obtain Ai alignment weight, wherein the calculation formula is

Figure FDA0002218386510000015

And step 3: then calculating the subtraction of the target language word vector zj and the source language word vector vi, and carrying out softmax function normalization on the output vector to obtain a distance tensor hi;

and 4, step 4: introducing a distance tensor to calculate to obtain an improved alignment weight Ai, wherein the calculation formula is Ai-0.5 hi;

and 5: multiplying ai and Vi to obtain an attitude (Q, K, V) with the calculation formula of

Figure FDA0002218386510000021

Step 6: repeating the steps 1-5 for 6 times to obtain a final output matrix;

and 7: and finally, the output matrix participates in subsequent operation.

Technical Field

The invention relates to neural machine translation, in particular to a neural machine translation method with a distance weighting mechanism.

Background

Neural network machine translation is a machine translation method proposed in recent years. Compared with the traditional statistical machine translation, the neural network machine translation can train a neural network which can be mapped from one sequence to another sequence, and the output can be a sequence with a variable length, so that the better performance can be obtained in the aspects of translation, conversation and text summarization. Neural network machine translation is actually a coding-decoding system, coding encodes a source language sequence, extracts information in the source language, and converts the information into another language, namely a target language through decoding, so that language translation is completed.

Since 2013, a neural machine translation system is proposed, along with the rapid development of computer computing power, the neural machine translation is also rapidly developed, a seq-seq model, a transform model and the like are successively proposed, and in 2013, a novel end-to-end encoder-decoder structure for machine translation is proposed by Nal Kalch brenner and Phil Blunom. The model may use a Convolutional Neural Network (CNN) to encode a given piece of source text into a continuous vector, and then use a Recurrent Neural Network (RNN) as a decoder to convert the state vector into the target language. Google in 2017 issued a new machine learning model, a Transformer, that performed far better than existing algorithms in machine translation and other language understanding tasks. The purpose of the framework is to accomplish a wider range of tasks, of which neural machine translation is only one.

The traditional technology has the following technical problems:

in the attention function alignment process, the existing framework firstly calculates the similarity of two sentence word vectors input, and then performs a series of calculations to obtain an alignment function. In the process, no relative distance is used for introducing calculation, and the participation of word vector distance in an alignment function is lacked. If the word vector distance of the 'eat' and the 'eat' is basically 0 when the alignment of the 'eat' and the 'eat' is calculated, the word vector distance of the 'eat' and the 'distance' is very large, and the difference of the alignment can be increased by introducing the word vector distance, so that the corresponding degree of similar words is higher, the alignment degree of dissimilar words is lower, and the translation effect is better.

Disclosure of Invention

Therefore, in order to solve the above-mentioned disadvantages, the present invention provides a transform-based distance parameter alignment translation method.

The invention is realized in this way, construct a distance parameter alignment translation method based on transformer, apply to the transformer model based on attention mechanism, characterized by that; the method comprises the following steps: calculating the word vector distance between a source language and a target language input sentence in the translation process to obtain a distance tensor; calculating a distance parameter and substituting the distance parameter into a calculation process;

introducing the distance tensor into an attention mechanism for calculation, and subtracting a part of the distance tensor from the output alignment tensor of the attention to obtain an output alignment tensor with higher efficiency; the alignment effect can be effectively improved, and the translation score is improved.

The method for distance parameter alignment translation based on the transformer is characterized by comprising the following steps of (1) aligning and translating a distance parameter based on the transformer; the method is implemented in the following specific manner;

the first step is as follows: generating the temporal semantic vector

Figure BDA0002218386520000021

Figure BDA0002218386520000022

st=tanh(W[st-1,yt-1])

eti=stWahi

The second step is that: passing hidden layer information and prediction

Figure BDA0002218386520000024

Figure BDA0002218386520000025

Calculating the Euclidean distance between Q and K by taking tensors Q and K of word vectors of a source language and a target language as initial quantities of calculation, and obtaining a tensor distance; normalization function normalization is carried out on the distance tensor to obtain a new distance tensor distance, and the distance is substituted into output to be calculated: the process is as follows:

step 1: let the hidden layer output vector be ki, and perform a dot product operation QKt to obtain Si.

Step 2: performing softmax normalization to obtain Ai alignment weight, wherein the calculation formula is

And step 3: then calculating the subtraction of the target language word vector zj and the source language word vector vi, and carrying out softmax function normalization on the output vector to obtain a distance tensor hi;

and 4, step 4: introducing a distance tensor to calculate to obtain an improved alignment weight Ai, wherein the calculation formula is Ai-0.5 hi;

and 5: multiplying ai and Vi to obtain an attitude (Q, K, V) with the calculation formula of

Figure BDA0002218386520000031

Step 6: repeating the steps 1-5 for 6 times to obtain a final output matrix;

and 7: and finally, the output matrix participates in subsequent operation.

The invention has the following advantages: the invention relates to a distance parameter alignment translation method based on a transformer, which is applied to a transformer frame model based on an attention mechanism; the method comprises the following steps: calculating word vectors of two languages input by an attention mechanism in a training process (different calculation modes can obtain different relative word vector distances), and obtaining a tensor of a relative distance parameter; the distance tensor is normalized to obtain a new distance tensor of the calculation specification. The tensor can participate in the calculation of the output alignment tensor of the attention mechanism function, and in the translation of a source language and a target language, the distance of word vectors between aligned sentences represents the difference degree of the words, so that the distance parameter is introduced to the calculation of the alignment function, the alignment probability difference of different words can be effectively increased, and the alignment is more effective. The neural translation method with the distance weight mechanism can effectively improve the alignment effect of the attention function and improve the translation effect and the score. The algorithm can be applied to all models with attention mechanism, and the model framework does not need to be modified.

Detailed Description

The present invention is described in detail below, and technical solutions in the embodiments of the present invention are clearly and completely described, and it is obvious that the described embodiments are only a part of the embodiments of the present invention, and not all of the embodiments. All other embodiments, which can be derived by a person skilled in the art from the embodiments given herein without making any creative effort, shall fall within the protection scope of the present invention.

The invention provides a distance parameter alignment translation method based on a transformer by improvement, which is applied to a transformer model based on an attention mechanism and is characterized by comprising the following steps:

calculating the word vector distance between a source language and a target language input sentence in the translation process to obtain a distance tensor; and calculating the distance parameter and carrying the distance parameter into a calculation process.

Introducing the distance tensor into an attention mechanism for calculation, and subtracting a part of the distance tensor from the output alignment tensor of the attention to obtain an output alignment tensor with higher efficiency; the alignment effect can be effectively improved, and the translation score is improved.

transform framework introduction:

encoder consisting of 6 identical layers, each layer containing two sub-layers, the first sub-layer being a multi-head attention layer and then a simple fully connected layer. Where each sub-layer is concatenated and normalized with the residual).

The Decoder consists of 6 identical layers, but the layers are different from the encorder, wherein the layers comprise three sub-layers, one self-addressing Layer is arranged, and the encorder-addressing Layer is finally a full connection Layer. Both of the first two sub-layers are based on multi-head authentication layers. One particular point is masking, which prevents future output words from being used during training.

Attention model:

the original encoder-decoder model is very classical, but has very large limitation. A large limitation is that the link between encoding and decoding is a fixed-length semantic vector C. That is, the encoder compresses the entire sequence of information into a fixed-length vector. However, there are two disadvantages to this, namely, the semantic vector cannot completely represent the information of the whole sequence, and the information carried by the first input content is diluted by the later input information. The longer the input sequence, the more severe this phenomenon is. This results in insufficient information being initially obtained for the input sequence at the time of decoding, which can compromise accuracy.

In order to solve the above problems, an attention model is proposed. When the model generates the output, an attention range is generated to indicate which parts of the input sequence are focused on when the output is next generated, and then the next output is generated according to the focused area, and the process is repeated. Attention and some behavior characteristics of a person have certain similarities, and when the person looks at a certain word, the person usually only focuses attention on words with information amount, but not all words, namely, the attention weight given to each word by the person is different. The attention model increases the training difficulty of the model, but improves the effect of text generation.

The first step is as follows: generating the temporal semantic vector

Figure BDA0002218386520000051

Figure BDA0002218386520000052

st=tanh(W[st-1,yt-1])

eti=stWahi

The second step is that: passing hidden layer information and prediction

Figure BDA0002218386520000053

Figure BDA0002218386520000056

The improvement here is a modification in the attention function.

Calculating the Euclidean distance between Q and K by taking tensors Q and K of word vectors of a source language and a target language as initial quantities of calculation, and obtaining a tensor distance; normalization function normalization is carried out on the distance tensor to obtain a new distance tensor distance, and the distance is substituted into output to be calculated: the process is as follows:

step 1: let the hidden layer output vector be ki, and perform a dot product operation QKt to obtain Si.

Step 2: performing softmax normalization to obtain Ai alignment weight, wherein the calculation formula is

Figure BDA0002218386520000054

And step 3: and then calculating the subtraction of the target language word vector zj and the source language word vector vi, and performing softmax function normalization on the output vector to obtain the distance tensor hi.

Step 4, introducing a distance tensor to calculate to obtain an improved alignment weight Ai, wherein the calculation formula is Ai-0.5 hi;

and 5: multiplying ai and Vi to obtain an attitude (Q, K, V) with the calculation formula of

Figure BDA0002218386520000055

Step 6: repeating the steps 1-5 and calculating 6 times to obtain the final output matrix.

And 7: and finally, the output matrix participates in subsequent operation.

The invention relates to a distance parameter alignment translation method based on a transformer, which is applied to a transformer frame model based on an attention mechanism; the method comprises the following steps: calculating word vectors of two languages input by an attention mechanism in a training process (different calculation modes can obtain different relative word vector distances), and obtaining a tensor of a relative distance parameter; the distance tensor is normalized to obtain a new distance tensor of the calculation specification. The tensor can participate in the calculation of the output alignment tensor of the attention mechanism function, and in the translation of a source language and a target language, the distance of word vectors between aligned sentences represents the difference degree of the words, so that the distance parameter is introduced to the calculation of the alignment function, the alignment probability difference of different words can be effectively increased, and the alignment is more effective. The neural translation method with the distance weight mechanism can effectively improve the alignment effect of the attention function and improve the translation effect and the score. The algorithm can be applied to all models with attention mechanism, and the model framework does not need to be modified.

The previous description of the disclosed embodiments is provided to enable any person skilled in the art to make or use the present invention. Various modifications to these embodiments will be readily apparent to those skilled in the art, and the generic principles defined herein may be applied to other embodiments without departing from the spirit or scope of the invention. Thus, the present invention is not intended to be limited to the embodiments shown herein but is to be accorded the widest scope consistent with the principles and novel features disclosed herein.

6页详细技术资料下载
上一篇:一种医用注射器针头装配设备
下一篇:一种基于transformer注意力机制输出的优化对齐方法

网友询问留言

已有0条留言

还没有人留言评论。精彩留言会获得点赞!

精彩留言,会给你点赞!