Using the neural machine translation of hidden tree attention

文档序号：1745753 发布日期：2019-11-26 浏览：12次中文

阅读说明：本技术 采用隐树注意力的神经机器翻译 (Using the neural machine translation of hidden tree attention ) 是由 J·布拉德伯里于 2018-04-11 设计创作，主要内容包括：我们介绍了一种用于机器翻译任务的注意力神经机器翻译模型,该模型实现了自然语言处理的长期目标,以利用语言的层次结构而无需先验注释。该模型包括具有新型注意力RNNG解码器的循环神经网络语法(RNNG)编码器,并应用策略梯度强化学习以在源序列和目标序列上诱导无监督树结构。当对没有明确分割或解析注释的字符级数据集进行训练时,模型学习似乎合理的分割和浅层解析,获得接近注意力基线的性能。(We talk of a kind of attention nerve Machine Translation Model for machine translation task, the model realization long term object of natural language processing, to be annotated using the hierarchical structure of language without priori.The model includes Recognition with Recurrent Neural Network grammer (RNNG) encoder with novel attention RNNG decoder, and application strategy Gradient Reinforcement Learning on source sequence and target sequence to induce unsupervised tree construction.When being trained to the character level data set without clearly dividing or parsing annotation, model learning seems reasonably segmentation and shallow parsing, obtains the performance close to attention baseline.)

1. a kind of attention nerve machine translation system, for the source sequence of first language to be translated into the target sequence of second language Column, the system comprises:

Encoder apparatus, for encoding the token of the source sequence and the phrase tree construction of the source sequence, wherein the source sequence At least one of the phrase tree construction of column includes:

Encoder tree node, the presentation code device shape when predicting the phrase type of the phrase tree construction of the source sequence State, and

Encoder assembles insertion, indicates the ingredient of the phrase tree construction of the source sequence；And

Decoder device based on attention, for exporting the token of the target sequence and the tree of phrases knot of the target sequence Structure, wherein the decoder insertion for each of the target sequence prediction phrase type of the phrase tree construction is by paying attention to The convex combination of the encoder assembles insertion of power weight scaling.

2. the system as claimed in claim 1, wherein the encoder apparatus and the decoder device based on attention are Shot and long term remembers (LSTM) network.

3. the system as claimed in claim 1, wherein the encoder apparatus and the decoder device based on attention are every A two-way LSTM (Bi-LSTM) including calculation code device combination insertion and decoder combination insertion.

4. the system as claimed in claim 1, wherein the encoder apparatus and the decoder device based on attention are Only storehouse Recognition with Recurrent Neural Network grammer (s-RNNG) network.

5. being also configured to such as system of any of claims 1-4

The final encoder assembles of the source sequence are used to be embedded in as the phrase class initially predicted for being used for the target sequence The decoder of type is embedded in.

6. being also configured to such as system of any of claims 1-4

The unsupervised phrase tree construction of both the source sequence and the target sequence is induced using Policy-Gradient Reinforcement Learning.

7. such as system of any of claims 1-6, wherein the tree of phrases structure is component resolving tree construction.

8. such as system of any of claims 1-6, wherein the tree of phrases structure is dependence parsing tree construction.

9. such as system of any of claims 1-8, further includes:

Comparator device, the different coding during decoder tree node and presentation code for that will indicate current decoder state The encoder tree node of device state is compared；

Normalizer normalizes the comparison result for index；And

Combiner apparatus is calculated for using the normalized result of the index as the attention weight corresponding to described The weighted sum of the encoder assembles insertion of encoder tree node.

10. system as claimed in claim 9, wherein using at least one in inner product, bilinear function and monolayer neural networks It is a to execute the comparison.

11. system as claimed in claim 10, wherein the comparison measures the phrase tree construction and the institute of the source sequence State the syntactic structure similarity between the phrase tree construction of target sequence.

12. system as claimed in claim 11, wherein most like short on the syntax of the source sequence and the target sequence Comparison between language tree construction generates highest attention weight.

13. such as system of any of claims 1-8, wherein the token is the token based on character.

14. system as claimed in claim 13, wherein intensively being compiled using real-valued vectors to the token based on character Code.

15. system as claimed in claim 13, wherein carrying out sparse volume to the token based on character using only hot vector Code.

16. the system as described in any one of claim 1-15, wherein both the source sequence and described target sequence is short Language tree construction includes one or more token ingredient and phrase type ingredient based on character.

17. system as claimed in claim 16 is also configured to use the phrase of the fixed vector as the source sequence The public insertion of the different phrase type ingredients of tree construction.

18. system as claimed in claim 16, wherein one or more orders based on character of encoder assembles embedded coding Board ingredient, without coded phrase type components.

19. one kind is based on the machine translation system of Recognition with Recurrent Neural Network grammer (RNNG), for turning over the source sequence of first language It is translated into the target sequence of second language, the system comprises:

RNNG encoder apparatus, for the order based on character by being embedded in each phrase tree construction in encoder assembles vector Board ingredient encodes the token of the source sequence and the phrase tree construction of the source sequence；And

RNNG decoder device based on attention, for export the target sequence token and the target sequence by short The phrase tree construction of language classification of type, wherein vector indicates the phrase type calculated by paying attention to encoder assembles vector, The attention is depending on the comparison between the RNNG coder state during current RNNG decoder states and coding.

20. system as claimed in claim 19, wherein using at least one in inner product, bilinear function and monolayer neural networks It is a to execute the comparison.

21. the system as described in any one of claim 19-20, wherein the comparison measures the phrase of the source sequence Syntactic structure similarity between tree construction and the phrase tree construction of the target sequence.

22. the system as described in any one of claim 19-20, wherein on the syntax of the source sequence and the target sequence Comparison between most like phrase tree construction generates highest attention weight.

23. the system as described in any one of claim 19-22 is additionally configured to the weighting by using multiple objective functions The RNNG encoder apparatus and the RNNG decoder device based on attention are parameterized with trained randomized policy Analytical strategy.

24. system as claimed in claim 23, wherein objective function is language model loss item, and reward prediction has height can Next token based on character of energy property.

25. system as claimed in claim 23, wherein objective function is tree attention item, reward the RNNG encoder with One-to-one attention corresponding relationship between the ingredient of the RNNG decoder based on attention.

26. a kind of attention nerve machine translation side of target sequence that the source sequence of first language is translated into second language Method, which comprises

The token of the source sequence and the tree of phrases of the source sequence are encoded using Recognition with Recurrent Neural Network grammer (RNNG) encoder Structure, wherein at least one of the phrase tree construction of the source sequence includes:

Encoder tree node, the presentation code device shape when predicting the phrase type of the phrase tree construction of the source sequence State, and

Encoder assembles insertion, indicates the ingredient of the phrase tree construction of the source sequence；And

The token of the target sequence and the phrase tree construction of the target sequence are exported using RNNG decoder, wherein being used for institute The decoder insertion for stating the prediction phrase type of each of target sequence phrase tree construction is scaled by attention weight The convex combination of encoder assembles insertion.

27. method as claimed in claim 26, further includes:

The final encoder assembles of the source sequence are used to be embedded in as the phrase class initially predicted for being used for the target sequence The decoder of type is embedded in.

28. method as claimed in claim 26, further includes:

The unsupervised phrase tree construction of both the source sequence and the target sequence is induced using Policy-Gradient Reinforcement Learning.

29. the method as described in any one of claim 26-28, wherein the tree of phrases structure is component resolving tree construction.

30. the method as described in any one of claim 26-28, wherein the tree of phrases structure is dependence analytic tree knot Structure.

31. the method as described in any one of claim 26-30, further includes:

By the encoder of the different coding device state during the decoder tree node and presentation code that indicate current decoder state Tree node is compared；

Index normalizes the comparison result；And

Use the normalized result of the index as the attention weight, calculates the volume for corresponding to the encoder tree node The weighted sum of code device combination insertion.

32. method as claimed in claim 31, wherein using at least one in inner product, bilinear function and monolayer neural networks It is a to execute the comparison.

33. method as claimed in claim 32, wherein the comparison measures the phrase tree construction and the institute of the source sequence State the syntactic structure similarity between the phrase tree construction of target sequence.

34. method as claimed in claim 33, wherein tree of phrases most like on the syntax of the source sequence and target sequence Comparison between structure generates highest attention weight.

35. the method as described in any one of claim 26-30, wherein the token is the token based on character.

36. method as claimed in claim 35, wherein intensively being compiled using real-valued vectors to the token based on character Code, or sparse coding is carried out to the token based on character using only hot vector.

37. the method as described in any one of claim 26-36, wherein both the source sequence and described target sequence is short Language tree construction includes one or more token ingredient and phrase type ingredient based on character.

38. method as claimed in claim 37 further includes using fixed vector as the tree of phrases knot of the source sequence The public insertion of the different phrase type ingredients of structure.

39. method as claimed in claim 37, wherein one or more orders based on character of encoder assembles embedded coding Board ingredient, without coded phrase type components.

40. one kind is based on the machine translation method of Recognition with Recurrent Neural Network grammer (RNNG), for turning over the source sequence of first language It is translated into the target sequence of second language, comprising:

Using RNNG encoder to pass through the token based on character for being embedded in each phrase tree construction in encoder assembles vector Ingredient encodes the token of the source sequence and the phrase tree construction of the source sequence；And

Using the RNNG decoder based on attention export the target sequence token and the target sequence by phrase class The phrase tree construction of type classification, wherein vector indicates the phrase type calculated by paying attention to encoder assembles vector, wherein described Attention is depending on the comparison between the RNNG coder state during current RNNG decoder states and coding.

41. method as claimed in claim 40, further include using in inner product, bilinear function and monolayer neural networks at least One is compared to execute.

42. the method as described in any one of claim 40-41, wherein the comparison measures the phrase of the source sequence Syntactic structure similarity between tree construction and the phrase tree construction of the target sequence.

43. the method as described in any one of claim 40-41, wherein on the syntax of the source sequence and the target sequence Comparison between most like phrase tree construction generates highest attention weight.

44. the method as described in any one of claim 40-44, further includes: by using the weighted sum of multiple objective functions Trained randomized policy parameterizes the analytical strategy of the RNNG encoder and the RNNG decoder based on attention.

45. method as claimed in claim 40, wherein objective function is language model loss item, and reward prediction has height can Next token based on character of energy property.

46. method as claimed in claim 40, wherein objective function is tree attention item, reward the RNNG encoder with One-to-one attention corresponding relationship between the ingredient of the RNNG decoder based on attention.

47. a kind of non-transitory computer-readable medium has and realizes translation system described in any one of claim 1-25 Computer executable instructions.

48. a kind of non-transitory computer-readable medium has the meter for realizing method described in any one of claim 26-46 Calculation machine executable instruction.

Background technique

Main topic of discussion is not answered merely because referring in this section and being considered as the prior art in this section.Similarly, no Problem mentioning in this section or related to the theme provided as background should be assumed to previously recognized in the prior art Know.Theme in this section only indicates that different methods, these methods itself may correspond to technology claimed It realizes.

Many effort of development language hierarchical structure are all utilized from people in natural language processing task (such as machine translation) The output of the self-contained resolver system of the treebank training of class annotation.Second method is intended to learn task and language at hand jointly The related fields for saying hierarchical structure, the training dataset induction never annotated may or may not correspond to treebank annotation The analytic tree of practice.

It is intended to the external parsing of most of deep learning model integrateds of the natural language processing using language hierarchy structure Device can specify that the recursive structure of neural network, or can provide supervisory signals or training number to predict the network of self structure According to.Some deep learning models use second method, hierarchical structure are considered as latent variable, to the condition random field based on figure Using reasoning, straight-through estimator or Policy-Gradient Reinforcement Learning, it is not suitable for having with study of the solution based on gradient discrete The problem of sneak condition.

For the task of machine translation, syntax notification model shows hope inside and outside deep learning context, Model based on layering phrase is increased through being often better than conventional model using form syntax input feature vector nerve Machine Translation Model By force, the resolver of tree structured encoder and joint training, each resolver are better than purely sequential baseline.

Have an opportunity the long term object for realizing natural language processing, to utilize the layer of language in the case where no priori annotation Secondary structure.It can lead to improved natural language processing.

28页详细技术资料下载

上一篇：一种医用注射器针头装配设备

下一篇：神经网络训练系统、方法和计算机可读存储介质

Using the neural machine translation of hidden tree attention

相关技术

网友询问留言