Translation realignment recurrent neural network cross-language machine translation method

文档序号:1556965 发布日期:2020-01-21 浏览:20次 中文

阅读说明:本技术 一种译文重对齐的循环神经网络跨语言机器翻译方法 (Translation realignment recurrent neural network cross-language machine translation method ) 是由 苏依拉 范婷婷 仁庆道尔吉 于 2019-10-15 设计创作,主要内容包括:一种译文重对齐的循环神经网络的跨语言机器翻译方法,基于编码器-解码器架构,其特征在于,编码器的循环神经网络和解码器的LSTM建模时,通过使用局部注意力方法生成的可变上下文向量和序列引导网络生成的序列引导向量,并配合重对齐方法,给出最符合原文语义的翻译序列。本发明翻译过程涉及上下文语境,并配合重对齐方法,可以得到更加接近人工翻译效果的目标语言文本。(A cross-language machine translation method of a recurrent neural network for realigning translated text is based on an encoder-decoder architecture and is characterized in that when the recurrent neural network of an encoder and the LSTM of a decoder are modeled, a variable context vector generated by using a local attention method and a sequence guide vector generated by a sequence guide network are matched with a realigning method to provide a translation sequence which best accords with the semantics of an original text. The translation process of the invention relates to context and can obtain the target language text which is closer to the manual translation effect by matching with the realignment method.)

1. A cyclic neural network cross-language machine translation method for realigning translated text adopts an encoder-decoder framework based on a local attention mechanism, and is characterized in that an additional sequence guide network using the local attention mechanism is added to the framework, an encoder encodes a source language sentence and represents the source language sentence as a context vector with fixed length, and a decoder gives a target language sentence according to the context vector and the sequence guide vector given by the sequence guide network.

2. The method of claim 1, wherein the encoder comprises a Recurrent Neural Network (RNN) based on local attention mechanism, the RNN comprising a hidden layer h and an output layer, the hidden layer h encoding an input source language sequence into a hidden state, the hidden layer h hiding a source side feature h at each time jiThe calculation formula of (2) is as follows:

hj=σ(W(hh)hj-1+W(hx)xj)

wherein x isjIs the input word vector at time j, which is a vector in the sequence x, x ═ x1,……,xj-1,xj,xj+1,……,xTIs an input source language sequence of T capacity; w(hx)Is a constraint input xjA weight matrix of (a); w(hh)Is to constrain the hidden layer output h at the previous timej-1A weight matrix of (a); h isj-1Is the output of the nonlinear activation function at time j-1; σ is a nonlinear activation function;

i.e. the output characteristic h of the hidden layer at each moment jjAre all based on the output of the previous-time hidden layerCharacteristic hj-1And the currently entered word vector xjIn (1).

3. The translation realignment recurrent neural network cross-language machine translation method according to claim 2, wherein the nonlinear activation function uses a sigmoid function.

4. The method of claim 2, wherein the local attention mechanism means that only one window of the source language sentence is focused on when generating each target word, and the window is aligned to the position ptIs the center, D is the radius, i.e., the window size is [ p ]t-D,pt+D]And D is selected empirically. Alignment position ptThe calculation formula of (a) is as follows:

Figure FDA0002233889340000011

where S is the source language sentence length, vpAnd WpFor the model parameters, T denotes transposition, htIs the hidden state of the target side, p is obtained after calculationtThe value range is [0, S];

And then generating a context vector required by the current target word according to the window, wherein the context vector is a weighted average of all values in the window, and the calculation formula is as follows:

wherein the content of the first and second substances,

Figure FDA0002233889340000022

Figure FDA0002233889340000023

wherein p istIs a real number, s is ptAn integer within the window being centered, σ being set to

Figure FDA0002233889340000024

the formula for score is:

Figure FDA0002233889340000025

5. the translation realignment recurrent neural network cross-language machine translation method according to claim 2, wherein the sequence steering network is disposed at a decoder side.

6. The method of claim 2 or 5, wherein the sequential guided network is an LSTM based on a previous guided vector g at each time step tt-1And the current pilot input signal ztGenerating a current guide vector gt

gt=f(zt;gt-1)

Leading input signal z at each instanttThe method is calculated by combining the previous attention vector and the attribute feature A;

Figure FDA0002233889340000026

finally according to the current guide vector gtGenerating a decoded input dt

dt=Wctct+Wgtgt

Wzt,Wct,WgtAre all weight matrices and f is a recursive function within the decoder LSTM unit.

7. The method of claim 6, wherein the decoder is an LSTM using local attention, using an input-feedback approach, and the alignment decision at each time step is combined with the alignment decision at the previous time, i.e. the attention vector at time t

Figure FDA0002233889340000031

Figure FDA0002233889340000032

where σ is the activation function, ctIs a context vector output by the encoder, and the decoder updates its target hidden state h every momentt

The decoder performs the calculation by the following equation:

Figure FDA0002233889340000033

Figure FDA0002233889340000034

Figure FDA0002233889340000035

mt=ft⊙mt-1+it⊙ct

ht=ot⊙tanh(mt)

wherein x istIs an input at time t, mtAnd htRespectively memory cell and hidden state it、ft、ot、ctRespectively an input gate, a forgetting gate, an output gate, a candidate memory unit,and bzRespectively, a parameter matrix and a bias;

attention vector

Figure FDA0002233889340000038

wherein the content of the first and second substances,is a target language word, W(S)Is the weight.

Technical Field

The invention belongs to the technical field of machine translation, and particularly relates to a recurrent neural network cross-language machine translation method for realigning translated text.

Background

With the use of computers becoming more and more diversified in human life, researchers have focused on the natural language field, where machine translation is an aspect of great research and utility values. Machine Translation (MT), which is a technique that is very natural for humans to handle and is not as easy for computers to compute numerically, studies how to translate one language text/speech segment into another language text/speech segment using a computer. With the gradual progress of internationalization, the research of machine translation is imperative.

The initial machine translation, i.e. phrase-type system, can only translate phrases, words, but the field related to more intensive connotation is very important. Subsequently, a language model is built.

The language model is used to calculate the likelihood of the occurrence of a series of words in a particular sequence. Conventional language models are based on the markov assumption that a word occurs only on a limited word or words that occur before it, and therefore have an N-gram structure, such as a trigram structure, and a word occurs only on the first two words, with a probability expressed as:

Figure BDA0002233889350000011

based on this, the language translation system may present several alternative word sequences, which the system needs to evaluate, and calculate all choices by a probability function to obtain a "score" (i.e., probability) for each choice, and the highest score is the most likely translation sequence. The input method is the language model. In the field of machine translation, however, often a limited number of words or phrases are not sufficient to describe the context and are associated with the context of the entire sentence or paragraph. Therefore, we need a method that can perform translation based on the complete context.

Disclosure of Invention

In order to overcome the defects of the prior art, the invention aims to provide a recurrent neural network cross-language machine translation method for realigning translated text.

In order to achieve the purpose, the invention adopts the technical scheme that:

a cyclic neural network cross-language machine translation method for realigning translated text adopts an encoder-decoder framework based on a local attention mechanism, and is characterized in that an additional sequence guide network using the local attention mechanism is added to the framework, an encoder encodes a source language sentence and represents the source language sentence as a context vector with fixed length, and a decoder gives a target language sentence according to the context vector and the sequence guide vector given by the sequence guide network.

The encoder consists of a Recurrent Neural Networks (RNN) based on a local attention mechanism, the RNN comprises a hidden layer h and an output layer, the hidden layer encodes an input source language sequence into a hidden state, and the hidden layer h is arranged at the source side at each moment jjThe calculation formula of (2) is as follows:

hj=σ(W(hh)hj-1+W(hx)xj)

wherein x isjIs the input word vector at time j, which is a vector in the sequence x, x ═ x1,……,xj-1,xj,xj+1,……,xTIs an input source language sequence of T capacity; w(hx)Is a constraint input xjA weight matrix of (a); w(hh)Is to constrain the hidden layer output h at the previous timej-1A weight matrix of (a); h isj-1Is the output of the nonlinear activation function at time j-1; σ is a nonlinear activation function;

i.e. the output characteristic h of the hidden layer at each moment jjAre all based on the output characteristic h of the previous-time hidden layerj-1And the currently entered word vector xjIn (1).

The nonlinear activation function uses a sigmoid function.

The local attention mechanism refers to focusing on only one window of the source language sentence when each target word is generated, the window being aligned to position ptIs the center, D is the radius, i.e., the window size is [ p ]t-D,pt+D]And D is selected empirically. Alignment position ptThe calculation formula of (a) is as follows:

where S is the source language sentence length, vpAnd WpFor the model parameters, T denotes transposition, htIs the hidden state of the target side, p is obtained after calculationtThe value range is [0, S];

And then generating a context vector required by the current target word according to the window, wherein the context vector is a weighted average of all values in the window, and the calculation formula is as follows:

Figure BDA0002233889350000031

wherein the content of the first and second substances,

Figure BDA0002233889350000032

representing all source-side hidden states, atIs a local alignment vector in ptPlacing a normal distribution for the window at the center, so that ptSurrounding alignment points may be included, and thus, the local alignment vector atThe calculation formula of (2) is as follows:

wherein p istIs a real number, s is ptAn integer within the window being centered, σ being set to

Figure BDA0002233889350000034

Calculating atRequires the use of ptThen the inverse gradient calculation can be performed for vpAnd WpAnd (6) learning.

The formula for score is:

Figure BDA0002233889350000035

the sequence guide network is arranged at the decoder end and is an LSTM based on a first-in-first-out basis at each time step tFront guide vector gt-1And the current pilot input signal ztGenerating a current guide vector gt

gt=f(zt;gt-1)

Leading input signal z at each instanttThe method is calculated by combining the previous attention vector and the attribute feature A;

Figure BDA0002233889350000036

finally according to the current guide vector gtGenerating a decoded input dt

dt=Wctct+Wgtgt

Wzt,Wct,WgtAre all weight matrices and f is a recursive function within the decoder LSTM unit.

The decoder is an LSTM using local attention, using an input-feedback approach, the alignment decision at each time step is combined with the alignment decision at the previous time instant, i.e. the attention vector at time tAnd the input of the next time step t +1 is combined and jointly entered into a decoder, and the calculation formula of the attention vector at each time t is as follows:

Figure BDA0002233889350000042

where σ is the activation function, ctIs a context vector output by the encoder, and the decoder updates its target hidden state h every momentt

The decoder performs the calculation by the following equation:

Figure BDA0002233889350000043

Figure BDA0002233889350000044

Figure BDA0002233889350000046

mt=ft⊙mt-1+it⊙ct

ht=ot⊙tanh(mt)

wherein x istIs an input at time t, mtAnd htRespectively memory cell and hidden state it、ft、ot、ctRespectively an input gate, a forgetting gate, an output gate, a candidate memory unit,

Figure BDA0002233889350000047

and bzRespectively, a parameter matrix and a bias;

attention vector

Figure BDA0002233889350000048

Inputting the prediction distribution into a softmax layer, and outputting the prediction distribution, wherein the calculation formula is as follows:

wherein the content of the first and second substances,

Figure BDA00022338893500000410

is a target language word, W(S)Is the weight.

Compared with the prior art, the invention has the beneficial effects that: the RNN is suitable for processing one-dimensional sequence data, is applied to the field of machine translation, gives a translation result based on a complete context, and can select words more conforming to the current context by a machine translation system based on the RNN compared with a traditional translation model to obtain a more smooth and accurate translation result.

Drawings

FIG. 1 is an overall architecture diagram of the present invention.

Fig. 2 is a diagram of the sequence-directed network architecture of the present invention.

Detailed Description

The following detailed description of embodiments of the invention refers to the accompanying drawings and accompanying examples.

The invention relates to a translation realignment recurrent neural network cross-language machine translation method, which is based on an encoder-decoder framework of a local attention sequence guide network. When in translation, a source language sentence is input into an encoder, the encoder encodes the source language sentence and converts the source language sentence into a word vector sequence, a hidden state sequence at the source side is obtained through hidden layer calculation of a cyclic neural network, a context vector is calculated according to an alignment vector and expressed as a context vector sequence, a decoder generates an attention vector according to the dynamically generated context vector, and a target language sequence is generated together with a sequence guide vector given by a sequence guide network.

In the invention, the encoder can be composed of a recurrent neural network (RNN for short) based on a local attention mechanism and is provided with a sequence guide network; the local attention means that when each target word is generated, the attention is focused on only one window of the source language sentence, and the invention firstly generates the alignment position p for each target wordt: the context vector of the text is used as a window [ p ] with a weighted mean value in the source hidden state sett-D,pt+D]Generating, D is selected empirically, a local alignment vector at∈R2D+1(ii) a Alignment position ptThe calculation formula of (a) is as follows:

Figure BDA0002233889350000051

where S is the source sentence length, vpAnd WpAre model parameters.

Local pairAlignment vector atThe calculation formula of (a) is as follows:

Figure BDA0002233889350000052

wherein, the calculation formula of score is as follows:

Figure BDA0002233889350000053

inputting each word x in the sequence into the encoder, circularly calculating the hidden layer of the recurrent neural network of the encoder according to a formula, reading the last word of the sequence, and obtaining a context vector c related to the whole sequence by the hidden layertAnd the encoder completes the work.

The decoder of the invention is an LSTM using local attention, using an input-feedback approach, the alignment decision at each time step is combined with the alignment decision at the previous time instant, i.e. the attention vector at time t

Figure BDA0002233889350000061

And the input of the next time step t +1 is combined and jointly entered into a decoder, and the calculation formula of the attention vector at each time t is as follows:

where σ is the activation function, ctIs a context vector output by the encoder, and the decoder updates its target hidden state h every momentt

The decoder performs the calculation by the following equation:

Figure BDA0002233889350000064

Figure BDA0002233889350000065

mt=ft⊙mt-1+it⊙ct

ht=ot⊙tanh(mt)

the sequence guide network of the invention is based on the LSTM, and at each time step t, the current guide vector is generated based on the previous guide vector, the current attention vector and the attribute characteristics. The attribute characteristics are set as the order-adjusting rules.

The method comprises the steps of performing word segmentation and part-of-speech tagging on an input source language sentence, analyzing a syntax tree, and then performing order adjustment on the source language sentence according to a target language sequence to enable the source language sentence to be close to the target language sequence as much as possible in the language sequence.

Taking the translation of Chinese and Mongolia as an example, the source language is Chinese, the target language is Mongolia, and the text vector of the input source language is X ═ X (X)1,x2,……,xn) The target text vector is Y ═ Y1,y2,……,ym). For convenience of processing, Mongolian is processed by Latin transcription. The method comprises the following specific steps:

1. the source language sequence X is (X)1,x2,……,xn) Input into the model, which processes it

2. The encoder and the decoder respectively generate the hidden state and the first target word y1

3. The sequence-oriented network follows the pre-entered ordering rules (i.e., attribute characteristics A), and the input sequence xtGenerating a current sequence guide vector gt(as shown in FIG. 1), input into a decoder

4. Model generation target word and input sequence alignment position ptAnd based on the aligned position, the encoder generates a context vector ct

5.The decoder generates a current corresponding target word y according to a current input in the sequence, a previous generated word, a sequence guide vector and a context vectort

6. The decoder and encoder then repeat this process until the sequence is complete.

9页详细技术资料下载
上一篇:一种医用注射器针头装配设备
下一篇:一种扫码准确度高的除湿型扫码系统

网友询问留言

已有0条留言

还没有人留言评论。精彩留言会获得点赞!

精彩留言,会给你点赞!