Training method of machine translation model, machine translation method and related equipment

文档序号:115964 发布日期:2021-10-19 浏览:30次 中文

阅读说明:本技术 机器翻译模型的训练方法、机器翻译方法及相关设备 (Training method of machine translation model, machine translation method and related equipment ) 是由 吴丽鑫 黄瑾 段亦涛 于 2021-06-23 设计创作,主要内容包括:本公开提供一种机器翻译模型的训练方法、机器翻译方法及相关设备,该训练方法包括:获取原始训练语料和训练用术语词典;根据训练用术语词典,对原始训练语料进行匹配检索,得到若干训练用术语匹配项;所述训练用术语匹配项包括:训练用源端术语及其对应的训练用目标端术语;根据若干训练用术语匹配项,生成辅助训练语料,并将所述原始训练语料和所述辅助训练语料进行组合,得到组合训练语料;为每个训练用目标端术语添加训练用术语位置标签,得到若干训练用术语约束项,并根据若干所述训练用术语约束项,得到训练用术语约束;根据所述组合训练语料和所述训练用术语约束,对所述机器翻译模型进行训练。本公开还提供了一种机器翻译方法及相关设备。(The present disclosure provides a training method of a machine translation model, a machine translation method and related devices, the training method including: acquiring an original training corpus and a training term dictionary; according to the training term dictionary, carrying out matching retrieval on the original training corpus to obtain a plurality of training term matching items; the training term matches include: a training source end term and a training target end term corresponding to the training source end term; generating an auxiliary training corpus according to a plurality of training term matching items, and combining the original training corpus and the auxiliary training corpus to obtain a combined training corpus; adding a training term position label to each training target term to obtain a plurality of training term constraint items, and obtaining training term constraint according to the plurality of training term constraint items; and training the machine translation model according to the combined training corpus and the training term constraint. The disclosure also provides a machine translation method and related equipment.)

1. A training method of a machine translation model comprises the following steps:

acquiring an original training corpus and a training term dictionary;

according to the training term dictionary, carrying out matching retrieval on the original training corpus to obtain a plurality of training term matching items; the training term matches include: a training source end term and a training target end term corresponding to the training source end term;

generating an auxiliary training corpus according to a plurality of training term matching items, and combining the original training corpus and the auxiliary training corpus to obtain a combined training corpus;

adding a training term position label to each training target end term to obtain a plurality of training term constraint items, and obtaining training term constraint according to the plurality of training term constraint items;

and training the machine translation model according to the combined training corpus and the training term constraint.

2. The method according to claim 1, wherein the training the machine translation model according to the combined training corpus and the training term constraints specifically comprises:

constructing a loss function according to the combined training corpus and the training term constraint; and training the machine translation model by taking the minimum loss function as a training target.

3. The method according to claim 1, wherein the training the machine translation model according to the combined training corpus and the training term constraints specifically comprises:

for any training target end sentence in the combined training corpus, performing right side filling and right side offset on the training target end sentence; and performing left filling and right offset on the training term constraint corresponding to the training target end sentence.

4. The method according to claim 3, wherein the training the machine translation model according to the combined training corpus and the training term constraints specifically comprises:

splicing the training term constraint and the training target end sentence; and performing improved multi-head self-attention processing on the spliced training term constraint and the training target end sentence.

5. The method according to claim 4, wherein the improved multi-head self-attention processing specifically comprises:

acquiring a training self-attention weight matrix of the spliced training term constraint and the training target end sentence, and extracting values of all positions on a diagonal line of the training self-attention weight matrix to be used as training standby values;

assigning positions corresponding to all filling processing in a self-attention weight matrix for training as negative infinity;

according to the training standby value, reassigning each position on a diagonal line of the training self-attention weight matrix to obtain a corrected training self-attention weight matrix; and obtaining updated spliced training term constraints and the representation of the training target end sentence according to the corrected training self-attention weight matrix.

6. The method according to claim 1, wherein the generating an auxiliary corpus according to a plurality of term matches for training, and combining the original corpus and the auxiliary corpus to obtain a combined corpus specifically comprises:

extracting all bilingual sentence pairs including the term matching items for training from the original corpus to obtain a first sub-auxiliary corpus;

for each training term matching item in the first sub-auxiliary training corpus, adding the training term position label to at least one of the training source end term and the training target end term included in the training term matching item to obtain a second sub-auxiliary training corpus;

combining the first sub-auxiliary training corpus and the second sub-auxiliary training corpus as the auxiliary training corpus with the original training corpus to obtain the combined training corpus; or, the second sub-auxiliary corpus is used as the auxiliary corpus and is combined with the original corpus to obtain the combined corpus.

7. The method according to claim 1, wherein the deriving a term constraint for training from a plurality of term constraints for training specifically comprises:

and sequentially splicing a plurality of the training term constraint items, and adding a punctuation symbol after the last training term constraint item to obtain the training term constraint.

8. The method of any of claims 1 to 7, wherein the training term location tag comprises: a start location indicator and an end location indicator; the start position symbol and the end position symbol each further include index information for expressing an index order in which the training term matching item corresponding to the training term position label appears first in all the training term matching items included in the training term sentence to which the training term position label belongs.

9. A machine translation method, comprising:

acquiring a sentence to be translated and a term dictionary;

according to the term dictionary, performing matching retrieval on the sentence to be translated to obtain a plurality of term matching items; the term match includes: a source end term and its corresponding target end term;

adding a term position label to each target-end term to obtain a plurality of term constraint items, and obtaining term constraint according to the term constraint items;

inputting the sentence to be translated and the term constraint into a pre-trained machine translation model to obtain a translation result corresponding to the sentence to be translated; the machine translation model is trained based on the method of any one of claims 1 to 8.

10. The method of claim 9, wherein the machine translation model comprises an encoder and a decoder; the decoder includes: a stitching module, an improved multi-head self-attention module, a multi-head attention module, a feed-forward network, and a separation module.

11. The method according to claim 10, wherein the inputting the sentence to be translated and the term constraint into a pre-trained machine translation model to obtain the translation result corresponding to the sentence to be translated specifically comprises:

inputting the statement to be translated into the encoder to obtain an intermediate representation corresponding to the statement to be translated;

inputting the intermediate representation and the term constraint into the decoder to obtain a translation result corresponding to the statement to be translated; the translation result includes a number of output words that are output step-by-step by the machine translation model.

12. The method of claim 11, wherein the machine translation model outputs the output word at each step, specifically comprising:

at the splicing module, splicing the term constraint with the output word;

performing improved multi-head self-attention processing on the spliced term constraint and the output word at the improved multi-head self-attention module to obtain a first attention feature;

performing multi-head attention processing on the multi-head attention module based on the intermediate representation and the first attention feature to obtain a second attention feature;

performing feature extraction on the second attention feature in the feedforward network to obtain a splicing feature;

at the separation module, separating the splicing characteristics to obtain term constraint characteristic representation and target end characteristic representation; the target-side feature represents the output word used to determine the current step.

13. The method according to claim 12, wherein the performing, at the improved multi-head self-attention module, an improved multi-head self-attention process on the term constraint after the concatenation and the output word to obtain a first attention feature specifically comprises:

acquiring a self-attention weight matrix of the spliced term constraint and the output word;

extracting values of positions on a diagonal line of the self-attention weight matrix as standby values;

assigning positions corresponding to all filling processing in the self-attention weight matrix as negative infinity;

according to the standby value, reassigning each position on the diagonal line of the self-attention weight matrix to obtain a corrected self-attention weight matrix;

and obtaining the first attention feature according to the corrected self-attention weight matrix.

14. A training apparatus for a machine translation model, comprising:

an acquisition module configured to acquire an original corpus and a training term dictionary;

the matching module is configured to perform matching retrieval on the original training corpus according to the training term dictionary to obtain a plurality of training term matching items; the training term matches include: a training source end term and a training target end term corresponding to the training source end term;

the corpus generating module is configured to generate an auxiliary training corpus according to a plurality of training term matching items, and combine the original training corpus and the auxiliary training corpus to obtain a combined training corpus;

a constraint generation module configured to add a training term position label to each training target term to obtain a plurality of training term constraint items, and obtain a training term constraint according to the plurality of training term constraint items;

a training module configured to train the machine translation model according to the combined training corpus and the training term constraints.

15. A machine translation device, comprising:

the obtaining module is configured to obtain a sentence to be translated and a term dictionary;

the matching module is configured to perform matching retrieval on the sentence to be translated according to the term dictionary to obtain a plurality of term matching items; the term match includes: a source end term and its corresponding target end term;

a constraint generation module configured to add a term position label to each of the target-end terms to obtain a plurality of term constraint terms, and obtain term constraints according to the plurality of term constraint terms;

the translation module is configured to input the sentence to be translated and the term constraint into a pre-trained machine translation model to obtain a translation result corresponding to the sentence to be translated; the machine translation model is trained based on the method of any one of claims 1 to 8.

16. An electronic device comprising a memory, a processor and a computer program stored on the memory and executable on the processor, the processor implementing the method of any one of claims 1 to 13 when executing the program.

17. A non-transitory computer readable storage medium storing computer instructions for causing a computer to perform the method of any one of claims 1 to 13.

Technical Field

The present disclosure relates to the field of machine translation technologies, and in particular, to a training method for a machine translation model, a machine translation method, and a related device.

Background

This section is intended to provide a background or context to the embodiments of the application that are recited in the claims. The description herein is not admitted to be prior art by inclusion in this section.

In recent years, with the rapid development of machine learning technology, machine translation technology has made great progress under several changes of model structures, and machine translation is widely used in various translation scenarios, but at the same time, more and more problems to be solved are revealed, and one of the problems is the problem of misinterpretation of professional terms.

Disclosure of Invention

In view of the above, there is a need for an improved method for effectively improving the problem of term mistranslation in the machine translation process.

The exemplary embodiment of the present disclosure provides a training method of a machine translation model, including:

acquiring an original training corpus and a training term dictionary;

according to the training term dictionary, carrying out matching retrieval on the original training corpus to obtain a plurality of training term matching items; the training term matches include: a training source end term and a training target end term corresponding to the training source end term;

generating an auxiliary training corpus according to a plurality of training term matching items, and combining the original training corpus and the auxiliary training corpus to obtain a combined training corpus;

adding a training term position label to each training target end term to obtain a plurality of training term constraint items, and obtaining training term constraint according to the plurality of training term constraint items;

and training the machine translation model according to the combined training corpus and the training term constraint.

In some exemplary embodiments, the training the machine translation model according to the combined training corpus and the training term constraint specifically includes: constructing a loss function according to the combined training corpus and the training term constraint; and training the machine translation model by taking the minimum loss function as a training target.

In some exemplary embodiments, the training the machine translation model according to the combined training corpus and the training term constraint specifically includes: for any training target end sentence in the combined training corpus, performing right side filling and right side offset on the training target end sentence; and performing left filling and right offset on the training term constraint corresponding to the training target end sentence.

In some exemplary embodiments, the training the machine translation model according to the combined training corpus and the training term constraint specifically includes: splicing the training term constraint and the training target end sentence; and performing improved multi-head self-attention processing on the spliced training term constraint and the training target end sentence.

In some exemplary embodiments, the improved multi-head self-attention processing specifically includes: acquiring a training self-attention weight matrix of the spliced training term constraint and the training target end sentence, and extracting values of all positions on a diagonal line of the training self-attention weight matrix to be used as training standby values; assigning positions corresponding to all filling processing in a self-attention weight matrix for training as negative infinity; according to the training standby value, reassigning each position on a diagonal line of the training self-attention weight matrix to obtain a corrected training self-attention weight matrix; and obtaining updated spliced training term constraints and the representation of the training target end sentence according to the corrected training self-attention weight matrix.

In some exemplary embodiments, the generating an auxiliary corpus according to a plurality of term matching items for training, and combining the original corpus and the auxiliary corpus to obtain a combined corpus specifically includes: extracting all bilingual sentence pairs including the term matching items for training from the original corpus to obtain a first sub-auxiliary corpus; for each training term matching item in the first sub-auxiliary training corpus, adding the training term position label to at least one of the training source end term and the training target end term included in the training term matching item to obtain a second sub-auxiliary training corpus; combining the first sub-auxiliary training corpus and the second sub-auxiliary training corpus as the auxiliary training corpus with the original training corpus to obtain the combined training corpus; or, the second sub-auxiliary corpus is used as the auxiliary corpus and is combined with the original corpus to obtain the combined corpus.

In some exemplary embodiments, the obtaining a term constraint for training according to a plurality of term constraints for training specifically includes: and sequentially splicing a plurality of the training term constraint items, and adding a punctuation symbol after the last training term constraint item to obtain the training term constraint.

In some exemplary embodiments, the training term location tag includes: a start location indicator and an end location indicator; the start position symbol and the end position symbol each further include index information for expressing an index order in which the training term matching item corresponding to the training term position label appears first in all the training term matching items included in the training term sentence to which the training term position label belongs.

Based on the same inventive concept, the exemplary embodiments of the present disclosure also provide a machine translation method, including:

acquiring a sentence to be translated and a term dictionary;

according to the term dictionary, performing matching retrieval on the sentence to be translated to obtain a plurality of term matching items; the term match includes: a source end term and its corresponding target end term;

adding a term position label to each target-end term to obtain a plurality of term constraint items, and obtaining term constraint according to the term constraint items;

inputting the sentence to be translated and the term constraint into a pre-trained machine translation model to obtain a translation result corresponding to the sentence to be translated; the machine translation model is obtained by training based on any one of the methods.

In some exemplary embodiments, the machine translation model comprises an encoder and a decoder; the decoder includes: a stitching module, an improved multi-head self-attention module, a multi-head attention module, a feed-forward network, and a separation module.

In some exemplary embodiments, the inputting the sentence to be translated and the term constraint into a pre-trained machine translation model to obtain a translation result corresponding to the sentence to be translated specifically includes: inputting the statement to be translated into the encoder to obtain an intermediate representation corresponding to the statement to be translated; inputting the intermediate representation and the term constraint into the decoder to obtain a translation result corresponding to the statement to be translated; the translation result includes a number of output words that are output step-by-step by the machine translation model.

In some exemplary embodiments, the outputting of the output word by the machine translation model at each step specifically includes: at the splicing module, splicing the term constraint with the output word; performing improved multi-head self-attention processing on the spliced term constraint and the output word at the improved multi-head self-attention module to obtain a first attention feature; performing multi-head attention processing on the multi-head attention module based on the intermediate representation and the first attention feature to obtain a second attention feature; performing feature extraction on the second attention feature in the feedforward network to obtain a splicing feature; at the separation module, separating the splicing characteristics to obtain term constraint characteristic representation and target end characteristic representation; the target-side feature represents the output word used to determine the current step.

In some exemplary embodiments, the performing, at the improved multi-head self-attention module, improved multi-head self-attention processing on the spliced term constraint and the output word to obtain a first attention feature specifically includes acquiring a self-attention weight matrix of the spliced term constraint and the output word; extracting values of positions on a diagonal line of the self-attention weight matrix as standby values; assigning positions corresponding to all filling processing in the self-attention weight matrix as negative infinity; according to the standby value, reassigning each position on the diagonal line of the self-attention weight matrix to obtain a corrected self-attention weight matrix; and obtaining the first attention feature according to the corrected self-attention weight matrix.

Based on the same inventive concept, the exemplary embodiments of the present disclosure also provide a training apparatus for a machine translation model, including:

an acquisition module configured to acquire an original corpus and a training term dictionary;

the matching module is configured to perform matching retrieval on the original training corpus according to the training term dictionary to obtain a plurality of training term matching items; the training term matches include: a training source end term and a training target end term corresponding to the training source end term;

the corpus generating module is configured to generate an auxiliary training corpus according to a plurality of training term matching items, and combine the original training corpus and the auxiliary training corpus to obtain a combined training corpus;

a constraint generation module configured to add a training term position label to each training target term to obtain a plurality of training term constraint items, and obtain a training term constraint according to the plurality of training term constraint items;

a training module configured to train the machine translation model according to the combined training corpus and the training term constraints.

Based on the same inventive concept, the exemplary embodiments of the present disclosure also provide a machine translation apparatus, including:

the obtaining module is configured to obtain a sentence to be translated and a term dictionary;

the matching module is configured to perform matching retrieval on the sentence to be translated according to the term dictionary to obtain a plurality of term matching items; the term match includes: a source end term and its corresponding target end term;

a constraint generation module configured to add a term position label to each of the target-end terms to obtain a plurality of term constraint terms, and obtain term constraints according to the plurality of term constraint terms;

the translation module is configured to input the sentence to be translated and the term constraint into a pre-trained machine translation model to obtain a translation result corresponding to the sentence to be translated; the machine translation model is obtained by training based on any one of the methods.

Based on the same inventive concept, the exemplary embodiments of the present disclosure further provide an electronic device, which includes a memory, a processor, and a computer program stored in the memory and executable on the processor, and when the processor executes the program, the processor implements the training method and/or the machine translation method of the machine translation model described in any one of the above items.

Based on the same inventive concept, the exemplary embodiments of the present disclosure also provide a non-transitory computer-readable storage medium storing computer instructions for causing a computer to perform the training method of a machine translation model and/or the machine translation method as described in any one of the above.

From the above description, it can be seen that the training method of the machine translation model, the machine translation method and the related device provided by the present disclosure implement a more reasonable and smooth term intervention translation effect by incorporating term constraints at the decoder side and providing mapping relationship information between term constraint items by using term location tags. The term intervention scheme of the embodiment of the disclosure does not increase the decoding search space, so the actual decoding speed is not influenced; in addition, the term constraint and the target sentence share the target end word list, so that the OOV problem caused by the cross-language family of the source language and the target language does not exist, the term misinterpretation problem can be effectively improved, and the translation accuracy is improved.

Drawings

In order to more clearly illustrate the technical solutions in the present disclosure or related technologies, the drawings needed to be used in the description of the embodiments or related technologies are briefly introduced below, and it is obvious that the drawings in the following description are only embodiments of the present disclosure, and for those skilled in the art, other drawings can be obtained according to these drawings without creative efforts.

FIG. 1 is a schematic diagram of an application scenario of an exemplary embodiment of the present disclosure;

FIG. 2 is a flowchart illustrating a method for training a translation model according to an exemplary embodiment of the present disclosure;

FIG. 3 is a block diagram of a machine translation model according to an exemplary embodiment of the present disclosure;

FIG. 4 is a flowchart illustrating a machine translation method according to an exemplary embodiment of the present disclosure;

FIG. 5 is a schematic structural diagram of a device for training a translation model according to an exemplary embodiment of the present disclosure;

FIG. 6 is a schematic diagram of a machine translation device according to an exemplary embodiment of the present disclosure;

fig. 7 is a schematic structural diagram of an electronic device according to an exemplary embodiment of the present disclosure.

Detailed Description

The principles and spirit of the present application will be described with reference to a number of exemplary embodiments. It should be understood that these embodiments are presented only to enable those skilled in the art to better understand and to implement the present disclosure, and are not intended to limit the scope of the present application in any way. Rather, these embodiments are provided so that this disclosure will be thorough and complete, and will fully convey the scope of the disclosure to those skilled in the art.

According to the embodiment of the disclosure, a training method of a machine translation model, a machine translation method and related equipment are provided.

In this document, it is to be understood that any number of elements in the figures are provided by way of illustration and not limitation, and any nomenclature is used for differentiation only and not in any limiting sense.

For convenience of understanding, terms referred to in the embodiments of the present disclosure are explained below:

and (3) machine translation: refers to a technique for implementing machine translation using a machine learning model;

the terms: refers to a collection of terms used to represent concepts in a particular subject area;

the term constraint: refers to a set of predefined terms;

the term intervention: refers to a class of techniques that allow translation results to satisfy all term constraints, provided that the term constraints are pre-specified.

The principles and spirit of the present application are explained in detail below with reference to several representative embodiments of the present application.

Summary of The Invention

In the prior art, some improvement schemes for the term misinterpretation problem exist. For example, the Tagging method implements term intervention through post-processing replacement, specifically, adds term labels in training data to make a model additionally learn the mapping relationship of the term labels; during translation, term labels are added on two sides of terms in a text to be translated, an original translation result with the term labels is obtained firstly, and then the term labels and the original term translation thereof are replaced by the terms, so that a final translation result with term intervention is obtained. For another example, a word constraint decoding method is a method that directly modifies the Beam Search algorithm used in decoding, expands the Search space, encloses the target term in the expanded Search space, and enforces that the target term appears in the translation result, thereby implementing term intervention.

In the course of implementing the present disclosure, the inventors found that the above prior arts all have significant disadvantages. For the Tagging method, because a post-processing mode is adopted to directly replace the segment in the translation result with the target term without considering the relevance between other translation contents and the replaced contents, the problems of repeated translation and incoherent translation are easy to occur in actual use, and the accuracy and the fluency of the translation are influenced. For the word constraint decoding method, because only the target term is used and the information of the source term is not used, the problem of unreasonable term generation position is easy to occur; meanwhile, the method increases the complexity of the decoding algorithm, and the decoding time is multiplied. In addition, in the Code-Switch method in the prior art, when a translation task with terms is performed, when the language family spans of a source end language and a target end language are large, an obvious oov (out of vocibulary) problem exists, so that the application scenario of the method is limited.

In order to solve the above problems, the present disclosure provides a training method of a machine translation model, a machine translation method and a related device, which achieve a more reasonable and smooth term intervention translation effect by incorporating term constraints at a decoder side and providing mapping relationship information between term constraints by using term position tags. The term intervention scheme of the disclosed embodiments does not increase the decoding space, so does not affect the actual decoding speed; in addition, the term constraint and the target sentence share the target end word list, so that the OOV problem caused by cross-language systems does not exist, the term misinterpretation problem can be effectively improved, and the translation accuracy is improved.

Having described the basic principles of the present application, various non-limiting embodiments of the present application are described in detail below.

Application scene overview

Reference is made to fig. 1, which is a schematic view of an application scenario of a training method and a translation method of a machine translation model according to an embodiment of the present application. The application scenario includes a terminal device 101, a server 102, and a data storage system 103. The terminal device 101, the server 102, and the data storage system 103 may be connected through a wired or wireless communication network. The terminal device 101 includes, but is not limited to, a desktop computer, a mobile phone, a mobile computer, a tablet computer, a media player, a smart wearable device, a Personal Digital Assistant (PDA), or other electronic devices capable of implementing the above functions. The server 102 and the data storage system 103 may be independent physical servers, may also be a server cluster or distributed system formed by a plurality of physical servers, and may also be cloud servers providing basic cloud computing services such as cloud services, cloud databases, cloud computing, cloud functions, cloud storage, network services, cloud communication, middleware services, domain name services, security services, CDNs, and big data and artificial intelligence platforms.

The server 102 is configured to provide a translation service to a user of the terminal device 101, and a client, which is in communication with the server 102 and through which the user can input a sentence to be translated, is installed in the terminal device 101. After a user sends an instruction through the client to translate, the client sends a sentence to be translated to the server 102, the server 102 obtains a term dictionary corresponding to the translated sentence, and the term dictionary can be provided by the user and is stored in the server 102 in advance. When the user submits the sentence to be translated, the used term dictionary can be selected through the client, and the server 102 correspondingly loads the corresponding term dictionary according to the selection of the user. After matching and searching, a plurality of term matching items comprising the active end terms and the corresponding target end terms are obtained, and term constraint is obtained by adding term position labels to the target end terms. The server 102 inputs the sentence to be translated and the term constraint into a pre-trained machine translation model to obtain a translation result output by the machine translation model, removes the term position label in the translation result to obtain a final translation result, and then sends the final translation result to the client, and the client displays the final translation result to the user to complete the translation.

The data storage system 103 stores a plurality of training corpora including a plurality of bilingual sentence pairs each composed of a source-end sentence and a corresponding destination-end sentence, and a training term dictionary. The server 102 may obtain the training corpus based on a large amount of training corpuses and the training term dictionary, and then train the machine translation model using the training corpus, so that the machine translation model can translate the sentence to be translated with terms. Sources of training data and training term dictionaries include, but are not limited to, existing databases, data crawled from the internet, or data uploaded while a user is using a client. When the output of the machine translation model meets the predetermined requirement, the server 102 may provide the user with a translation service based on the machine translation model, and at the same time, the server 102 may continuously optimize the machine translation model based on the added corpus and the training terms.

The following describes a training method and a machine translation method of a machine translation model according to an exemplary embodiment of the present disclosure, with reference to an application scenario of fig. 1. It should be noted that the above application scenarios are merely illustrated for the convenience of understanding the spirit and principles of the present disclosure, and the embodiments of the present disclosure are not limited in this respect. Rather, embodiments of the present disclosure may be applied to any scenario where applicable.

Exemplary method

Referring to fig. 2, an embodiment of the present disclosure provides a training method of a machine translation model, including the following steps:

step S201, obtaining an original training corpus and a training term dictionary.

In specific implementation, any corpus including a large number of bilingual sentence pairs can be obtained as the original corpus. A large amount of training term dictionaries are correspondingly obtained for the used original training corpora, particularly for the technical field to which the original training corpora belong.

In specific implementation, the term dictionary for training may be an existing term dictionary obtained according to the technical field to which the original training corpus belongs, or may be obtained by collecting terms in the internet, a database, and the like according to the technical field. The term dictionary includes a plurality of source terms and target terms corresponding to the source terms respectively. The term dictionary may include many-to-many source and destination terms in corresponding relation. For example, the term "immunoglobulin M" may comprise two source end representations: immunologlulin M, IgM, and the corresponding two target ends represent: immunoglobulin M, macroglobulin.

In specific implementation, the training term dictionary is also preprocessed. The preprocessing refers to the conventional preprocessing of training data in machine translation, such as normalization, word segmentation, lower case conversion, and the like. Wherein the pre-treatment is performed in units of single term representation. The method used for the pretreatment is not limited in the embodiments of the present disclosure.

It should be noted that the prefix "training" in this step and the subsequent steps is only for indicating that the corresponding technical features are used in the training process of the machine translation model, and does not constitute an additional limitation to the corresponding technical features.

Step S202, performing matching retrieval on the original training corpus according to the training term dictionary to obtain a plurality of training term matching items; the training term matches include: a training source term and its corresponding training target term.

In specific implementation, the original corpus is preprocessed, that is, each bilingual sentence pair included in the original corpus is processed, for example, normalized, participled, or converted into lower case, so that each bilingual sentence pair meets the processing requirements of the subsequent machine translation model. The method used for the pretreatment is not limited in the embodiments of the present disclosure.

In specific implementation, based on the preprocessed original corpus, the matching retrieval is performed on the original corpus through the term dictionary for training obtained in the previous step, and the matching retrieval specifically refers to retrieving in the original corpus to determine whether each term included in the term dictionary for training is included. Through matching retrieval, a plurality of training term matching items can be obtained, and the training term matching items specifically comprise training source end terms and corresponding training target end terms. For example, for a bilingual sentence pair in the original corpus: "quick _ Eye _ star _ Jonathan _ Van _ Ness", "pink-male _ rescue _ star _ kinson _ Van _ nys", assuming that there is a term pair "quick _ Eye" and "pink-male _ rescue" in the training term dictionary, through matching search, a training term matching term will be obtained: queer _ Eye-pink male _ rescue. It should be noted that, in the description of the embodiments of the present disclosure, the space between words is represented by "_" in order to more clearly represent the space between the words.

As an alternative embodiment, when the original corpus is subjected to matching retrieval by training the term dictionary, matching retrieval of the training term matching items can be performed by constructing a prefix tree and based on a key-value (value) manner. Specifically, a bilingual prefix tree may be first constructed using a preprocessed training term dictionary, where the bilingual prefix tree includes a source prefix tree and a target prefix tree, where the source prefix tree uses a source term as a key, and a corresponding term item id is a value, and the target prefix tree uses a target term as a key, and a corresponding term item id is a value. And then respectively using a source end prefix tree to carry out longest term matching in the source sentence to obtain a matched source end term mapping table, and using a target end prefix tree to carry out longest term matching in the target sentence to obtain a matched target end term mapping table, wherein the term mapping table takes term item id as key, and a specific appearance position set of terms in the text is value.

The prefix tree is an efficient character string index structure, and the longest term matched from the initial position can be quickly returned according to the input character string. For example: the mapping in the target end prefix tree is "pink bear" - >1, "pink bear _ rescue" - >2, "johnsen _ van _ netics" - > 3. If the input string is "pink-male _ soldier _ star _ knessen _ van _ netille", it will match to "pink-bear _ soldier" returning 2; then continue to match the remaining string "star _ johnson _ van _ nese _ netics", if the current string does not match any term, remove the first word to continue matching until the current string is empty; thus, the target can be matched to the two longest terms "pink bear _ soldier" and "Johnsen _ Van _ Nees", and a target term mapping table, {2: [ (0,2) ],3: [ (4,9) ] }, where the position in the term mapping table is an index in units of words, is obtained. And then, filtering the term items according to the source end term item mapping table and the target end term item mapping table, and only keeping the term items with consistent occurrence times of the source end and the target end as training term matching items.

In specific implementation, in addition to performing case-insensitive matching after lowercase conversion by using the prefix tree, case-sensitive matching can be performed by executing the same operation before lowercase conversion. In addition, other character string level matching methods may be used, or semantic level matching search may be performed by a deep learning technique to obtain training term matches.

Step S203, generating an auxiliary training corpus according to a plurality of training term matching items, and combining the original training corpus and the auxiliary training corpus to obtain a combined training corpus.

In specific implementation, all bilingual sentence pairs including the training term matching term are extracted from the original corpus to obtain a first sub-auxiliary corpus. That is, the first sub-auxiliary corpus consists of only bilingual sentence pairs including term matches for training.

In specific implementation, for each term matching item for training included in each bilingual sentence pair in the first sub-auxiliary corpus, the term position label for training is added to at least one of the source term for training and the target term for training included in the term matching item, so as to obtain the second sub-auxiliary corpus.

Wherein, the training term position label is obtained by modifying the term label used in the Tagging method. Specifically, the training term location tag includes: a start position symbol, denoted as < s: term _ idx >, and an end position symbol, denoted as < e: term _ idx >. The specific way of adding the training term position label is as follows: the start place character < s: term _ idx > and the end place character < e: term _ idx > are added before and after the training source side term or the training target side term, respectively.

Compared with the term labels < start > and < end > in the Tagging method, the start position character < s: term _ idx > and the end position character < e: term _ idx > in the embodiment of the present disclosure each further include index information term _ idx for expressing an index order in which the training term matching item corresponding to the training term position label appears first in all the training term matching items included in the training sentence to which the training term position label belongs. The index information is obtained by respectively carrying out index numbering on the training sentences in the first sub-auxiliary training corpus to express the sequence of all training term matching items appearing in one training sentence. For example, for a bilingual sentence pair in the first sub-co-corpus: "quer _ eye _ star _ jonathan _ van _ ness" - "pink _ soldier _ star _ johnson _ van _ neties" -; it includes two sets of training term matches: "quer _ eye" - "pink-male _ soldier", "jonathan _ van _ ness" - "johnsen _ van _ nese". The method adds training term position labels to both training source end terms and training target end terms, and the bilingual sentence pairs obtained after adding are as follows: "< s:1> _ quer _ eye _ < e:1> _ star _ < s:2> _ jonathan _ van _ new _ < e:2> -" < s:1> _ pink-male _ soldier _ < e:1> _ star _ < s:2> _ kinson _ van _ netics _ < e:2> ". In the embodiment of the disclosure, by adding the training term position labels, different training term matching items are accurately distinguished by taking sentences as units, and the same terms are associated by the same term position labels, so that the positions of the different term matching items in the translation result can be more accurately positioned in the learning and translation processes of the subsequent machine translation model, and the translation accuracy is improved.

In specific implementation, for a training term matching item, a training term position label may be added to both a training source end term and a training target end term included in the training term matching item, and the second sub-auxiliary training corpus obtained in this way is a bilingual sentence pair having training term position labels at both the source end and the target end. Alternatively, a training term position label may be added to only one of the training source end term and the training target end term, and the second sub-auxiliary corpus obtained in this way is a bilingual sentence pair having only one training term position label at the source end and the target end.

In specific implementation, the first sub-auxiliary training corpus and the second sub-auxiliary training corpus are used as auxiliary training corpora, and the auxiliary training corpora and the original training corpora are mixed according to a certain proportion, so that the combined training corpus is obtained. Or, the combined training corpus may be obtained by only using the second sub-auxiliary training corpus as the auxiliary training corpus and mixing the auxiliary training corpus with the original training corpus according to a certain ratio.

In addition, as an optional implementation manner, for the obtained combined corpus, the combined corpus may be further processed through a BPE (byte Pair encoding) algorithm, so as to convert the combined corpus into sub-word representations according to a BPE vocabulary, thereby implementing vocabulary compression. Note that, when generating the sub-word expression, it is necessary to set the expression corresponding to the training term position label so as not to perform sub-word segmentation, that is, to keep the training term position label intact.

And S204, adding a training term position label to each training target term to obtain a plurality of training term constraint items, and obtaining training term constraint according to the plurality of training term constraint items.

In specific implementation, the training term constraint item is obtained by adding the training term position label to the training target term in the training term matching item. The bilingual sentence pair obtained after adding the training term position label as in the previous example is: "< s:1> _ quer _ eye _ < e:1> _ star _ < s:2> _ jonathan _ van _ new _ < e:2> -" < s:1> _ pink-male _ soldier _ < e:1> _ star _ < s:2> _ kinson _ van _ netics _ < e:2> ". The method comprises two training term constraint items, which are respectively: < s:1> _ Pink-Xiongmen _ < e:1> and < s:2> _ Johnsen _ Van _ Nerns _ < e:2 >. It can be seen that the training term constraint terms are the starting position symbol and the ending position symbol and the training target term included therebetween. The term constraint term for training is also obtained by taking the sentence for training as a unit, that is, the term constraint term for training corresponding to each sentence for training is obtained for each sentence for training.

In specific implementation, for a training sentence, all training term constraint items included in the training sentence are sequentially spliced, and a punctuation character is added after the last training term constraint item to obtain the training term constraint. For example, for the training sentence "< s:1> _ Pingyuan _ soldier _ Naese _ \: "< s:1> _ Pink male-soldier _ < e:1> _ s:2> _ Johnsen _ Van _ Nerns _ < e:2> _< sep >". According to the method, for each training sentence, the corresponding training term constraint can be obtained correspondingly.

And S205, training the machine translation model according to the combined training corpus and the training term constraint.

In particular implementation, referring to fig. 3, the machine translation model may employ a network architecture including an Encoder (Encoder) and a Decoder (Decoder). In the prior art, the transform model, which includes an encoder and a decoder, only supports bilingual sentence pairs as input and output of the model, respectively. In the machine translation model of the embodiment of the disclosure, the source-end statement and the training term constraint are used as joint input, and the target-end statement is used as output. After training is performed by combining the training corpus and the training term constraints, the machine translation model according to the embodiment of the disclosure can learn the training term constraints in the input training term constraints and generate the training term constraints in the corresponding translation result context according to the context in which the source term corresponding to the training term constraints is located.

In specific implementation, referring to fig. 3, the input training source statements are transformed by the encoder through a series of neural network transformations, and then are represented as a high-dimensional vector, i.e., an intermediate representation. The decoder is responsible for re-decoding the semantic information to obtain the translation result. The encoder comprises a word vector matrix, and each word of the input source end sentence for training is converted into a corresponding word vector through the word vector matrix so as to obtain a word vector sequence of the source end sentence for training. The decoder of the embodiment of the present disclosure is similar to the existing decoder of the transform model, and is formed by stacking N identical decoding layers. The decoder of the disclosed embodiments differs from the prior art in that: its input includes its corresponding training term constraints in addition to the intermediate representation of the training source statement. And splicing two paths of signals at the input end of each decoding layer through a newly added splicing module, and separating the two paths of signals at the output end through a newly added separating module to obtain term constraint characteristic representation and target end characteristic representation.

In specific implementation, the loss function is calculated by using the target end feature representation. Loss functionThe specific calculation formula of (A) is as follows:wherein the content of the first and second substances,to combine training corpora (bilingual sentence pairs + corresponding training glossary constraints); (x, y, c) is a piece of training data, including a source end sentence x for training, a target end sentence y for training, and a term constraint c for training. Correspondingly, the output of the encoder, i.e. the intermediate representation xLThe specific calculation formula of (A) is as follows: x is the number ofL=Encoder(x)。

In some other embodiments, the term constraint characteristic representation and the target end characteristic representation may be used to jointly calculate the loss function, specifically, the term constraint characteristic representation and the target end characteristic representation may be spliced to calculate a value of the loss function, or the term constraint characteristic representation and the target end characteristic representation may be used to calculate values of the loss function respectively and then accumulate the values by weight.

In a specific implementation, in the training process, since the training target end sentence corresponding to the training source end sentence is known, when the input of the decoder is processed, the training target end sentence is represented as a left part of the current position in the training target end sentence in a right offset manner. Accordingly, the right-shift processing is similarly performed for the training term constraint corresponding to the training target sentence. And then, acquiring word vector representations respectively corresponding to the target end sentence for training and the term constraint for training through the shared target end word vector matrix. The machine translation model of the embodiment of the disclosure adopts relative position coding, and in order to prevent the relative position from dynamically changing with training, different filling processing is performed on target end sentences for training and terminology constraints for training. Specifically, the training is right-side filled with target-side statements, and the training is left-side filled with term constraints. Wherein, the specific mode of carrying out right side filling and left side filling is: for the target end sentence for training, the length of the target end sentence is the same as that of other target end sentences for training by adding a < pad > label; for the training term constraint, the length of the < pad > tag is made to be the same as that of other training term constraints by adding the tag. Where the right side padding and left side padding differ in the direction in which the < pad > tag is padded. For example, assume that there are two training term constraints: "< s:1> _ Pink male-saving soldier _ < e:1> _< s:2> _ Joe _ < e:2> _< sep >", "< s:1> _ sunny day _ < e:1> _< sep >". The latter training performs left-side padding when the term constraint input decoder is used, and the result after left-side padding is: "< pad > _< pad > _ < pad > _ s:1> _ sunny day _ < e:1> _< sep >", and the length after filling is identical to the former.

In specific implementation, referring to fig. 3, after the filling processing and the offset are performed, the term for training is constrained and the target end sentence for training is spliced in the splicing module. During splicing, the training terms are constrained to be in front and the training target end sentences are spliced in sequence in a rear mode. It is to be understood that, in some other embodiments, when the concatenation module concatenates the training term constraint and the training target end sentence, the training target end sentence may be preceded by the training term constraint and followed by the training term constraint.

In specific implementation, referring to fig. 3, for the jointed training term constraint and training target end sentence, an improved multi-head self-attention processing is performed in the improved multi-head self-attention module. Specifically, the improved multi-head self-attention processing may perform conventional multi-head self-attention processing, that is, a self-attention weight matrix is obtained by inner product calculation according to a hidden state corresponding to each word, and then the self-attention weight matrix after attention masking is obtained after attention masking processing (only the output word in the current step is considered when the output word is output). After the self-attention weight matrix after the attention mask for training is obtained, the value of each position on the diagonal line of the self-attention weight matrix after the attention mask for training is extracted as a standby value for training. And then, assigning positions corresponding to all filling processes in the self-attention weight matrix after the training attention mask as negative infinity. That is, when the filling processing is performed in the foregoing step, the attention weight value of the added < pad > tag at the corresponding position in the self-attention weight matrix after the training of the attention mask is assigned to negative infinity. And finally, according to the extracted standby value for training, reassigning each position on the diagonal line of the self-attention weight matrix after the attention mask for training to obtain a corrected self-attention weight matrix for training. The improved output of the multi-head self-attention module can be obtained according to the corrected self-attention weight matrix for training, namely updated training term constraint and vector representation of target end sentences for training. Through the above reassignment of the positions on the diagonal line of the self-attention weight matrix for training, the NaN value (nonnumeric) of the < pad > label used in the filling processing can be effectively prevented from occurring when the attention weight value is calculated, and the normal work of the decoder is ensured.

Referring to fig. 3, the machine translation model of the embodiment of the present disclosure includes, in addition to the aforementioned concatenation module, the improved multi-head self-attention module, and the separation module: a multi-head attention module for performing multi-head attention processing based on a first attention feature of the output of the multi-head self-attention module and the intermediate representation of the output of the encoder; based on a second attention feature output by the multi-head attention module, feature extraction is carried out to obtain a feedforward network of the splicing feature; and a Softmax layer for performing regression operation to obtain output probability based on the target end feature separated by the separation module. Further, normalization processing and residual concatenation processing for preventing model degradation are performed on the outputs of the respective layers. The working modes of the above layers are similar to those of the corresponding layers in the prior art, and are not described in detail in the embodiments of the present disclosure.

Based on the foregoing, for any decoding layer l of the N decoding layers included in the decoder of the machine translation model according to the embodiment of the present disclosure, the related calculation formula includes:

cl=ol[:len(cl-1)] (5)

wherein, the formula (1) is a calculation formula of splicing processing performed by the splicing module, l-1 represents a previous decoding layer,for training target-side sentences of only the current position and the left part, cl-1For training with the term constraint, ql、kl、vlA query value, a key value and a value used for multi-head self-attention calculation are respectively calculated; equation (2) is a calculation equation for improved multi-headed self-attention processing by the improved multi-headed self-attention module,is a first attention feature; formula (3) is a calculation formula of multi-head self-attention processing performed by the multi-head self-attention module,a second attention feature; equation (4) is a calculation equation of the splicing characteristics of the feedforward network output, olIs a splicing feature; formula (5) and formula (6) are respectively calculation formulas of the separation process of the separation module, clFor the purpose of the term constraint feature representation,is a target end feature representation.

Based on the same inventive concept, the embodiment of the disclosure also provides a machine translation method. Referring to fig. 4, the machine translation method includes the following steps:

step S401, obtaining a sentence to be translated and a term dictionary;

step S402, according to the term dictionary, carrying out matching retrieval on the sentence to be translated to obtain a plurality of term matching items; the term match includes: a source end term and its corresponding target end term;

step S403, adding a term position label to each target terminal term to obtain a plurality of term constraint items, and obtaining term constraint according to the term constraint items;

s404, inputting the sentence to be translated and the term constraint into a pre-trained machine translation model to obtain a translation result corresponding to the sentence to be translated; the machine translation model is obtained by training based on the training method of the machine translation model in any one of the above embodiments.

In specific implementation, the specific implementation of steps S401 to S403 may refer to step S201, step S202, and step S204 in the aforementioned embodiment of the training method of the machine translation model. Specifically, in the translation process, a sentence to be translated submitted for a user and a term dictionary in the corresponding technical field are acquired. The specific implementation manners of the term matching items and the term constraints obtained by performing matching retrieval can be referred to in the foregoing embodiment of the training method of the machine translation model.

In step S404, referring to fig. 3, the machine translation model includes an encoder and a decoder; the decoder includes: a stitching module, an improved multi-head self-attention module, a multi-head attention module, a feed-forward network, and a separation module. The difference from the aforementioned embodiment of the training method of the machine translation model is that the input of the encoder is the sentence to be translated, and the input of the decoder is the intermediate representation of the sentence to be translated output by the encoder and the corresponding term constraint. The decoder gradually outputs a plurality of output words to obtain a translation result corresponding to the sentence to be translated. And after all the output words are output by the decoder, further removing the term position labels in the translation result, and removing all the term position labels to obtain a final translation result which is finally output to a user.

In specific implementation, each step of the machine translation model outputs one output word, and the specific steps include:

at the splicing module, splicing the term constraint with the output word; wherein the term constraint is subjected to a filling process so that the filled term constraint has the same length as the other term constraints.

Performing improved multi-head self-attention processing on the spliced term constraint and the output word at the improved multi-head self-attention module to obtain a first attention feature;

performing multi-head attention processing on the multi-head attention module based on the intermediate representation and the first attention feature to obtain a second attention feature;

performing feature extraction on the second attention feature in the feedforward network to obtain a splicing feature;

at the separation module, separating the splicing characteristics to obtain term constraint characteristic representation and target end characteristic representation; the target-side feature represents the output word used to determine the current step.

In a specific implementation, in the improved multi-head self-attention module, the spliced term constraint and the output word are subjected to improved multi-head self-attention processing to obtain a first attention feature, which specifically includes:

acquiring a self-attention weight matrix of the spliced term constraint and the output word;

extracting values from positions on a diagonal of the attention weight matrix as standby values;

assigning positions corresponding to all filling processing in the self-attention weight matrix as negative infinity;

according to the standby value, reassigning each position on the diagonal line of the self-attention weight matrix to obtain a corrected self-attention weight matrix;

a first attention feature is obtained based on the modified self-attention weight matrix.

Exemplary device

Referring to fig. 5, based on the same inventive concept as the above-mentioned embodiment of the training method of any machine translation model, the disclosed embodiment further provides a training apparatus of a machine translation model, including:

an obtaining module 501 configured to obtain an original corpus and a training term dictionary;

a matching module 502 configured to perform matching retrieval on the original training corpus according to the training term dictionary to obtain a plurality of training term matching items; the training term matches include: a training source end term and a training target end term corresponding to the training source end term;

a corpus generating module 503 configured to generate an auxiliary corpus according to a plurality of term matches for training, and combine the original corpus and the auxiliary corpus to obtain a combined corpus;

a constraint generating module 504 configured to add a training term position label to each of the training target terms to obtain a plurality of training term constraint items, and obtain a training term constraint according to the plurality of training term constraint items;

a training module 505 configured to train the machine translation model according to the combined training corpus and the training term constraints.

In some optional embodiments, the training module 505 is specifically configured to construct a loss function according to the combined training corpus and the training term constraint; and training the machine translation model by taking the minimum loss function as a training target.

In some optional embodiments, the training module 505 is specifically configured to, for any training target end sentence in the combined training corpus, perform right side filling and right side shifting on the training target end sentence; and performing left filling and right offset on the training term constraint corresponding to the training target end sentence.

In some optional embodiments, the training module 505 is specifically configured to splice the training term constraints and the training target-side sentences; and performing improved multi-head self-attention processing on the spliced training term constraint and the training target end sentence. The improved multi-head self-attention processing method specifically comprises the following steps: acquiring a training self-attention weight matrix of the spliced training term constraint and the training target end sentence, and extracting values of all positions on a diagonal line of the training self-attention weight matrix to be used as training standby values; assigning positions corresponding to all filling processing in a self-attention weight matrix for training as negative infinity; according to the training standby value, reassigning each position on a diagonal line of the training self-attention weight matrix to obtain a corrected training self-attention weight matrix; and obtaining updated spliced training term constraints and the representation of the training target end sentence according to the corrected training self-attention weight matrix.

In some optional embodiments, the corpus generating module 503 is specifically configured to extract all bilingual sentence pairs including the term matching term for training from the original corpus to obtain a first sub-auxiliary corpus; for each training term matching item in the first sub-auxiliary training corpus, adding the training term position label to at least one of the training source end term and the training target end term included in the training term matching item to obtain a second sub-auxiliary training corpus; combining the first sub-auxiliary training corpus and the second sub-auxiliary training corpus as the auxiliary training corpus with the original training corpus to obtain the combined training corpus; or, the second sub-auxiliary corpus is used as the auxiliary corpus and is combined with the original corpus to obtain the combined corpus.

In some optional embodiments, the constraint generating module 504 is specifically configured to sequentially splice several training term constraint items, and add a punctuation symbol after the last training term constraint item to obtain the training term constraint.

In some alternative embodiments, the training term location tag comprises: a start location indicator and an end location indicator; the start position symbol and the end position symbol each further include index information for expressing an index order in which the training term matching item corresponding to the training term position label appears first in all the training term matching items included in the training term sentence to which the training term position label belongs.

The apparatus in the foregoing embodiment is used to implement the corresponding method for training a machine translation model in any one of the foregoing exemplary method for training a machine translation model, and has the beneficial effects of the corresponding method embodiment, which are not described herein again.

Referring to fig. 6, based on the same inventive concept as any of the above embodiments of the machine translation method, the embodiments of the present disclosure further provide a machine translation apparatus, including:

an obtaining module 601 configured to obtain a sentence to be translated and a term dictionary;

a matching module 602, configured to perform matching retrieval on the sentence to be translated according to the term dictionary to obtain a plurality of term matching items; the term match includes: a source end term and its corresponding target end term;

a constraint generating module 603 configured to add a term location tag to each of the target-end terms to obtain a plurality of term constraint terms, and obtain a term constraint according to the plurality of term constraint terms;

a translation module 604, configured to input the sentence to be translated and the term constraint into a pre-trained machine translation model, so as to obtain a translation result corresponding to the sentence to be translated; the machine translation model is obtained by training based on the training method of the machine translation model in any one of the above embodiments.

In some optional embodiments, the machine translation model comprises an encoder and a decoder; the decoder includes: a stitching module, an improved multi-head self-attention module, a multi-head attention module, a feed-forward network, and a separation module.

In some optional embodiments, the translation module 604 is specifically configured to input the sentence to be translated into the encoder, so as to obtain an intermediate representation corresponding to the sentence to be translated; inputting the intermediate representation and the term constraint into the decoder to obtain a translation result corresponding to the statement to be translated; the translation result includes a number of output words that are output step-by-step by the machine translation model.

In some optional embodiments, the outputting, by the machine translation model, one output word at each step specifically includes: at the splicing module, splicing the term constraint with the output word; performing improved multi-head self-attention processing on the spliced term constraint and the output word at the improved multi-head self-attention module to obtain a first attention feature; performing multi-head attention processing on the multi-head attention module based on the intermediate representation and the first attention feature to obtain a second attention feature; performing feature extraction on the second attention feature in the feedforward network to obtain a splicing feature; at the separation module, separating the splicing characteristics to obtain term constraint characteristic representation and target end characteristic representation; the target-side feature represents the output word used to determine the current step.

In some optional embodiments, the performing, at the improved multi-head self-attention module, improved multi-head self-attention processing on the term constraint after the concatenation and the output word to obtain a first attention feature specifically includes: acquiring a self-attention weight matrix of the spliced term constraint and the output word; extracting values of positions on a diagonal line of the self-attention weight matrix as standby values; assigning positions corresponding to all filling processing in the self-attention weight matrix as negative infinity; according to the standby value, reassigning each position on the diagonal line of the self-attention weight matrix to obtain a corrected self-attention weight matrix; and obtaining the first attention feature according to the corrected self-attention weight matrix.

The apparatus of the foregoing embodiment is used to implement the corresponding machine translation method in any embodiment of the foregoing exemplary machine translation method portion, and has the beneficial effects of the corresponding method embodiment, which are not described herein again.

Based on the same inventive concept as any of the above embodiments of the training method and the machine translation method for the machine translation model, an embodiment of the present disclosure further provides an electronic device, which includes a memory, a processor, and a computer program stored in the memory and executable on the processor, and when the processor executes the computer program, the training method and/or the machine translation method for the machine translation model according to any of the above embodiments are implemented.

Fig. 7 is a schematic diagram illustrating a more specific hardware structure of an electronic device according to this embodiment, where the electronic device may include: a processor 1010, a memory 1020, an input/output interface 1030, a communication interface 1040, and a bus 1050. Wherein the processor 1010, memory 1020, input/output interface 1030, and communication interface 1040 are communicatively coupled to each other within the device via bus 1050.

The processor 1010 may be implemented by a general-purpose CPU (Central Processing Unit), a microprocessor, an Application Specific Integrated Circuit (ASIC), or one or more Integrated circuits, and is configured to execute related programs to implement the technical solutions provided in the embodiments of the present disclosure.

The Memory 1020 may be implemented in the form of a ROM (Read Only Memory), a RAM (Random Access Memory), a static storage device, a dynamic storage device, or the like. The memory 1020 may store an operating system and other application programs, and when the technical solution provided by the embodiments of the present specification is implemented by software or firmware, the relevant program codes are stored in the memory 1020 and called to be executed by the processor 1010.

The input/output interface 1030 is used for connecting an input/output module to input and output information. The i/o module may be configured as a component in a device (not shown) or may be external to the device to provide a corresponding function. The input devices may include a keyboard, a mouse, a touch screen, a microphone, various sensors, etc., and the output devices may include a display, a speaker, a vibrator, an indicator light, etc.

The communication interface 1040 is used for connecting a communication module (not shown in the drawings) to implement communication interaction between the present apparatus and other apparatuses. The communication module can realize communication in a wired mode (such as USB, network cable and the like) and also can realize communication in a wireless mode (such as mobile network, WIFI, Bluetooth and the like).

Bus 1050 includes a path that transfers information between various components of the device, such as processor 1010, memory 1020, input/output interface 1030, and communication interface 1040.

It should be noted that although the above-mentioned device only shows the processor 1010, the memory 1020, the input/output interface 1030, the communication interface 1040 and the bus 1050, in a specific implementation, the device may also include other components necessary for normal operation. In addition, those skilled in the art will appreciate that the above-described apparatus may also include only those components necessary to implement the embodiments of the present description, and not necessarily all of the components shown in the figures.

The electronic device of the foregoing embodiment is used to implement the training method and/or the machine translation method of the corresponding machine translation model in any embodiment of the foregoing exemplary method portions, and has the beneficial effects of the corresponding method embodiment, which are not described herein again.

Exemplary program product

Based on the same inventive concept as any of the above-described machine translation model training method embodiments or machine translation method embodiments, the disclosed embodiments also provide a non-transitory computer-readable storage medium storing computer instructions for causing a computer to perform the machine translation model training method and/or the machine translation method according to any of the above-described exemplary method portions.

The non-transitory computer readable storage medium may be any available medium or data storage device that can be accessed by a computer, including but not limited to magnetic memory (e.g., floppy disks, hard disks, magnetic tape, magneto-optical disks (MOs), etc.), optical memory (e.g., CDs, DVDs, BDs, HVDs, etc.), and semiconductor memory (e.g., ROMs, EPROMs, EEPROMs, non-volatile memory (NAND FLASH), Solid State Disks (SSDs)), etc.

The computer instructions stored in the storage medium of the above embodiment are used to enable the computer to execute the training method and/or the machine translation method of the machine translation model according to any embodiment of the above exemplary method section, and have the beneficial effects of the corresponding method embodiment, which are not described herein again.

As will be appreciated by one skilled in the art, embodiments of the present invention may be embodied as a system, method or computer program product. Accordingly, the present disclosure may be embodied in the form of: entirely hardware, entirely software (including firmware, resident software, micro-code, etc.) or a combination of hardware and software, and is referred to herein generally as a "circuit," module "or" system. Furthermore, in some embodiments, the invention may also be embodied in the form of a computer program product in one or more computer-readable media having computer-readable program code embodied in the medium.

Any combination of one or more computer-readable media may be employed. The computer readable medium may be a computer readable signal medium or a computer readable storage medium. A computer readable storage medium may be, for example, but not limited to, an electronic, magnetic, optical, electromagnetic, infrared, or semiconductor system, apparatus, or device, or any combination of the foregoing. More specific examples (a non-exhaustive example) of the computer readable storage medium may include, for example: an electrical connection having one or more wires, a portable computer diskette, a hard disk, a Random Access Memory (RAM), a read-only memory (ROM), an erasable programmable read-only memory (EPROM or flash memory), an optical fiber, a portable compact disc read-only memory (CD-ROM), an optical storage device, a magnetic storage device, or any suitable combination of the foregoing. In the context of this document, a computer readable storage medium may be any tangible medium that can contain, or store a program for use by or in connection with an instruction execution system, apparatus, or device.

A computer readable signal medium may include a propagated data signal with computer readable program code embodied therein, for example, in baseband or as part of a carrier wave. Such a propagated data signal may take many forms, including, but not limited to, electro-magnetic, optical, or any suitable combination thereof. A computer readable signal medium may also be any computer readable medium that is not a computer readable storage medium and that can communicate, propagate, or transport a program for use by or in connection with an instruction execution system, apparatus, or device.

Program code embodied on a computer readable medium may be transmitted using any appropriate medium, including but not limited to wireless, wireline, optical fiber cable, RF, etc., or any suitable combination of the foregoing.

Computer program code for carrying out operations for aspects of the present invention may be written in any combination of one or more programming languages, including an object oriented programming language such as Java, Smalltalk, C + + or the like and conventional procedural programming languages, such as the "C" programming language or similar programming languages. The program code may execute entirely on the user's computer, partly on the user's computer, as a stand-alone software package, partly on the user's computer and partly on a remote computer or entirely on the remote computer or server. In the case of a remote computer, the remote computer may be connected to the user's computer through any type of network, including a Local Area Network (LAN) or a Wide Area Network (WAN), or the connection may be made to an external computer (for example, through the Internet using an Internet service provider).

It will be understood that each block of the flowchart illustrations and/or block diagrams, and combinations of blocks in the flowchart illustrations and/or block diagrams, can be implemented by computer program instructions. These computer program instructions may be provided to a processor of a general purpose computer, special purpose computer, or other programmable data processing apparatus to produce a machine, such that the instructions, which execute via the processor of the computer or other programmable data processing apparatus, create means for implementing the functions/acts specified in the flowchart and/or block diagram block or blocks.

These computer program instructions may also be stored in a computer-readable medium that can direct a computer or other programmable data processing apparatus to function in a particular manner, such that the instructions stored in the computer-readable medium produce an article of manufacture including instruction means which implement the function/act specified in the flowchart and/or block diagram block or blocks.

The computer program instructions may also be loaded onto a computer, other programmable data processing apparatus, or other devices to cause a series of operational steps to be performed on the computer, other programmable apparatus or other devices to produce a computer implemented process such that the instructions which execute on the computer or other programmable apparatus provide processes for implementing the functions/acts specified in the flowchart and/or block diagram block or blocks.

Moreover, while the operations of the method of the invention are depicted in the drawings in a particular order, this does not require or imply that the operations must be performed in this particular order, or that all of the illustrated operations must be performed, to achieve desirable results. Rather, the steps depicted in the flowcharts may change the order of execution. Additionally or alternatively, certain steps may be omitted, multiple steps combined into one step execution, and/or one step broken down into multiple step executions.

Use of the verbs "comprise", "comprise" and their conjugations in this application does not exclude the presence of elements or steps other than those stated in this application. The article "a" or "an" preceding an element does not exclude the presence of a plurality of such elements.

While the spirit and principles of the invention have been described with reference to several particular embodiments, it is to be understood that the invention is not limited to the disclosed embodiments, nor is the division of aspects, which is for convenience only as the features in such aspects may not be combined to benefit. The invention is intended to cover various modifications and equivalent arrangements included within the spirit and scope of the appended claims. The scope of the following claims is to be accorded the broadest interpretation so as to encompass all such modifications and equivalent structures and functions.

25页详细技术资料下载
上一篇:一种医用注射器针头装配设备
下一篇:一种融合句法信息的翻译质量自动评估方法

网友询问留言

已有0条留言

还没有人留言评论。精彩留言会获得点赞!

精彩留言,会给你点赞!