Translation model training method and device and translation model training device

文档序号：1905210 发布日期：2021-11-30 浏览：12次中文

阅读说明：本技术 一种翻译模型的训练方法、装置和用于翻译模型训练的装置 (Translation model training method and device and translation model training device ) 是由张培章杰张旭余健陈伟于 2021-07-29 设计创作，主要内容包括：本发明实施例提供了一种翻译模型的训练方法、装置和用于翻译模型训练的装置。所述方法包括：获取语言训练样本,所述语言训练样本包括训练语句和所述训练语句对应的目标语句；基于训练完成的语言模型和待训练的翻译模型对所述训练语句进行联合处理,得到所述训练语句对应的翻译结果；根据所述翻译结果和所述目标语句,计算所述翻译模型的损失值,并根据所述损失值调整所述翻译模型的模型参数。本发明实施例能够快速、有效的提升翻译模型翻译结果的流畅度,提高翻译模型的翻译性能。(The embodiment of the invention provides a translation model training method and device and a translation model training device. The method comprises the following steps: acquiring a language training sample, wherein the language training sample comprises training sentences and target sentences corresponding to the training sentences; performing combined processing on the training sentences based on the trained language model and the translation model to be trained to obtain translation results corresponding to the training sentences; and calculating a loss value of the translation model according to the translation result and the target statement, and adjusting model parameters of the translation model according to the loss value. The embodiment of the invention can quickly and effectively improve the fluency of the translation result of the translation model and improve the translation performance of the translation model.)

1. A method for training a translation model, the method comprising:

acquiring a language training sample, wherein the language training sample comprises training sentences and target sentences corresponding to the training sentences;

performing combined processing on the training sentences based on the trained language model and the translation model to be trained to obtain translation results corresponding to the training sentences;

and calculating a loss value of the translation model according to the translation result and the target statement, and adjusting model parameters of the translation model according to the loss value.

2. The method according to claim 1, wherein the jointly processing the training sentences based on the trained language model and the translation model to be trained to obtain the translation results corresponding to the training sentences comprises:

inputting the training sentences into an encoder of the translation model for encoding processing to obtain an encoding information matrix, wherein the encoding information matrix comprises word vectors of all words in the training sentences;

determining a reference result matrix according to the current word to be translated in the training sentence, wherein the reference result matrix comprises reference word vectors corresponding to words before the current word to be translated in the training sentence;

and performing combined processing on the coding information matrix and the reference result matrix based on the trained language model and the translation model to be trained to obtain a translation result corresponding to the training sentence.

3. The method according to claim 2, wherein the joint processing of the coding information matrix and the reference result matrix based on the trained language model and the translation model to be trained to obtain the translation result corresponding to the training sentence comprises:

fusing the decoder of the language model with the decoder of the translation model to obtain a target decoder;

inputting the coding information matrix and the reference result matrix into the target decoder for decoding processing to obtain a translation result corresponding to the training sentence;

calculating a loss value of the translation model according to the translation result and the target statement, wherein the calculation comprises:

and calculating a loss value of the translation model according to the translation result corresponding to the training sentence and the target sentence.

4. The method according to claim 3, wherein the inputting the encoded information matrix and the reference result matrix into the target decoder for decoding to obtain the translation result corresponding to the training sentence comprises:

inputting the reference result matrix into a first network layer and a second network layer of the target decoder respectively to obtain a first output matrix and a second output matrix, wherein the first network layer belongs to the decoder of the language model, and the second network layer belongs to the decoder of the translation model;

carrying out weighted summation on the first output matrix and the second output matrix to obtain a fusion matrix;

and inputting the coding information matrix and the fusion matrix into a third network layer of the target decoder to obtain a translation result corresponding to the training sentence, wherein the third network layer belongs to the decoder of the translation model.

5. The method of claim 4, wherein the weighted summation of the first output matrix and the second output matrix to obtain a fusion matrix comprises:

according to the fluency of translation results corresponding to the sentences in the previous round of training, adjusting the weight values of the first output matrix and/or the second output matrix;

and according to the weight value of the adjusted first output matrix and/or second output matrix, carrying out weighted summation on the first output matrix and the second output matrix to obtain a fusion matrix.

6. The method according to claim 2, wherein the joint processing of the coding information matrix and the reference result matrix based on the trained language model and the translation model to be trained to obtain the translation result corresponding to the training sentence comprises:

inputting the reference result matrix into a decoder of the language model for decoding processing to obtain a language processing result corresponding to the training sentence;

inputting the coding information matrix and the reference result matrix into a decoder of the translation model for decoding processing to obtain a translation result corresponding to the training sentence;

the calculating a loss value of the translation model according to the translation result and the target sentence, and adjusting a model parameter of the translation model according to the loss value includes:

determining a first loss value according to the reference result matrix and a language processing result corresponding to the training sentence;

determining a second loss value according to the translation result corresponding to the training sentence and the target sentence;

and carrying out weighted summation on the first loss value and the second loss value to obtain a joint loss value of the translation model and the language model, and adjusting model parameters of the translation model according to the joint loss value.

7. The method of claim 6, wherein the decoder of the language model and the decoder of the translation model share a classification network layer.

8. An apparatus for training a translation model, the apparatus comprising:

the training sample acquisition module is used for acquiring a language training sample, wherein the language training sample comprises training sentences and target sentences corresponding to the training sentences;

the joint processing module is used for carrying out joint processing on the training sentences based on the trained language model and the translation model to be trained to obtain translation results corresponding to the training sentences;

and the parameter adjusting module is used for calculating a loss value of the translation model according to the translation result and the target statement and adjusting the model parameter of the translation model according to the loss value.

9. The apparatus of claim 8, wherein the joint processing module comprises:

the first matrix determination submodule is used for inputting the training sentences into an encoder of the translation model for encoding processing to obtain an encoding information matrix, and the encoding information matrix comprises word vectors of all words in the training sentences;

a second matrix determining submodule, configured to determine a reference result matrix according to a current word to be translated in the training sentence, where the reference result matrix includes reference word vectors corresponding to words before the current word to be translated in the training sentence;

and the joint processing submodule is used for carrying out joint processing on the coding information matrix and the reference result matrix based on the trained language model and the translation model to be trained so as to obtain a translation result corresponding to the training sentence.

10. The apparatus of claim 9, wherein the joint processing submodule comprises:

the decoder fusion unit is used for fusing the decoder of the language model with the decoder of the translation model to obtain a target decoder;

the decoding processing unit is used for inputting the coding information matrix and the reference result matrix into the target decoder for decoding processing to obtain a translation result corresponding to the training sentence;

the parameter adjusting module comprises:

and the loss value operator module is used for calculating the loss value of the translation model according to the translation result corresponding to the training statement and the target statement.

11. The apparatus of claim 10, wherein the decoding processing unit comprises:

the first processing subunit is configured to input the reference result matrix into a first network layer and a second network layer of the target decoder, respectively, to obtain a first output matrix and a second output matrix, where the first network layer belongs to the decoder of the language model and the second network layer belongs to the decoder of the translation model;

the second processing subunit is used for carrying out weighted summation on the first output matrix and the second output matrix to obtain a fusion matrix;

and the third processing subunit is configured to input the coding information matrix and the fusion matrix into a third network layer of the target decoder, so as to obtain a translation result corresponding to the training sentence, where the third network layer belongs to the decoder of the translation model.

12. The apparatus of claim 11, wherein the second processing subunit is further configured to:

13. The apparatus of claim 9, wherein the joint processing submodule comprises:

the language model decoding unit is used for inputting the reference result matrix into a decoder of the language model for decoding processing to obtain a language processing result corresponding to the training sentence;

the translation model decoding unit is used for inputting the coding information matrix and the reference result matrix into a decoder of the translation model for decoding processing to obtain a translation result corresponding to the training sentence;

the parameter adjusting module comprises:

the first loss value determining submodule is used for determining a first loss value according to the reference result matrix and the language processing result corresponding to the training statement;

a second loss value determining submodule, configured to determine a second loss value according to the translation result corresponding to the training sentence and the target sentence;

and the joint loss value determining submodule is used for weighting and summing the first loss value and the second loss value to obtain a joint loss value of the translation model and the language model, and adjusting the model parameters of the translation model according to the joint loss value.

14. An apparatus for translation model training, the apparatus comprising a memory, and one or more programs, wherein the one or more programs are stored in the memory, and wherein the one or more programs configured to be executed by the one or more processors comprise instructions for performing the method of translation model training according to any one of claims 1 to 7.

15. A machine-readable medium having stored thereon instructions which, when executed by one or more processors, cause an apparatus to perform a method of training a translation model according to any of claims 1 to 7.

Technical Field

The invention relates to the technical field of artificial intelligence, in particular to a translation model training method and device and a translation model training device.

Background

With the improvement of the computing power of a computer, the application of a neural network is more and more extensive, for example, a translation model is constructed to realize the conversion from a statement to be translated to a target statement.

However, due to the lack of time sequence information of the translation model, there are problems that translated sentences are not smooth and the sentences are not consistent. And the translation model is generally an end-to-end network structure and comprises an encoder and a decoder, wherein the encoder comprises a plurality of encoding layers, the decoder comprises a plurality of decoding layers, and the model structure is complex, so that the performance improvement difficulty of the translation model is increased. Alternatively, the translation model may be trained and optimized by additionally introducing a large amount of training corpora, but the training period is long and the consumed computing resources are high.

Therefore, how to rapidly and effectively improve the fluency of the translation result of the translation model and improve the translation performance of the translation model becomes a problem to be solved urgently at present.

Disclosure of Invention

The embodiment of the invention provides a translation model training method and device and a translation model training device, which can quickly and effectively improve the fluency of translation results of a translation model and improve the translation performance of the translation model.

In order to solve the above problem, an embodiment of the present invention discloses a method for training a translation model, where the method includes:

acquiring a language training sample, wherein the language training sample comprises training sentences and target sentences corresponding to the training sentences;

Optionally, the performing, by the language model after training and the translation model to be trained, a joint processing on the training sentence to obtain a translation result corresponding to the training sentence includes:

Optionally, the performing, based on the trained language model and the to-be-trained translation model, joint processing on the coding information matrix and the reference result matrix to obtain a translation result corresponding to the training sentence includes:

fusing the decoder of the language model with the decoder of the translation model to obtain a target decoder;

inputting the coding information matrix and the reference result matrix into the target decoder for decoding processing to obtain a translation result corresponding to the training sentence;

calculating a loss value of the translation model according to the translation result and the target statement, wherein the calculation comprises:

and calculating a loss value of the translation model according to the translation result corresponding to the training sentence and the target sentence.

Optionally, the inputting the coding information matrix and the reference result matrix into the target decoder for decoding to obtain a translation result corresponding to the training sentence includes:

carrying out weighted summation on the first output matrix and the second output matrix to obtain a fusion matrix;

Optionally, the performing weighted summation on the first output matrix and the second output matrix to obtain a fusion matrix includes:

inputting the reference result matrix into a decoder of the language model for decoding processing to obtain a language processing result corresponding to the training sentence;

determining a first loss value according to the reference result matrix and a language processing result corresponding to the training sentence;

determining a second loss value according to the translation result corresponding to the training sentence and the target sentence;

Optionally, the decoder of the language model and the decoder of the translation model share a classification network layer.

On the other hand, the embodiment of the invention discloses a device for training a translation model, which comprises:

Optionally, the joint processing module includes:

Optionally, the joint processing submodule includes:

the decoder fusion unit is used for fusing the decoder of the language model with the decoder of the translation model to obtain a target decoder;

the parameter adjusting module comprises:

Optionally, the decoding processing unit includes:

the second processing subunit is used for carrying out weighted summation on the first output matrix and the second output matrix to obtain a fusion matrix;

Optionally, the second processing subunit is further configured to:

Optionally, the joint processing submodule includes:

the parameter adjusting module comprises:

a second loss value determining submodule, configured to determine a second loss value according to the translation result corresponding to the training sentence and the target sentence;

Optionally, the decoder of the language model and the decoder of the translation model share a classification network layer.

In yet another aspect, an embodiment of the present invention discloses an apparatus for translation model training, the apparatus including a memory, and one or more programs, wherein the one or more programs are stored in the memory, and configured to be executed by one or more processors, the one or more programs including instructions for performing a method for training a translation model according to one or more of the foregoing descriptions.

In yet another aspect, an embodiment of the present invention discloses a machine-readable medium having stored thereon instructions, which, when executed by one or more processors, cause an apparatus to perform a method for training a translation model as described in one or more of the preceding.

The embodiment of the invention has the following advantages:

the method comprises the steps of carrying out combined processing on training sentences in a training sample based on a trained language model and a translation model to be trained to obtain a translation result corresponding to the training sentences, then calculating a loss value of the translation model according to the translation result and target sentences corresponding to the training sentences, and adjusting model parameters of the translation model according to the loss value. According to the embodiment of the invention, the language model is introduced in the training process of the translation model, the trained language model and the translation model to be trained are utilized to carry out combined processing on the training sentences, a large amount of training corpora do not need to be additionally introduced, the training time and the consumption of computing resources of the translation model are reduced, the network structure of the translation model does not need to be changed, the fluency of the translation result of the translation model can be rapidly and effectively improved, and the translation performance of the translation model is improved.

Drawings

In order to more clearly illustrate the technical solutions of the embodiments of the present invention, the drawings needed to be used in the description of the embodiments of the present invention will be briefly introduced below, and it is obvious that the drawings in the following description are only some embodiments of the present invention, and it is obvious for those skilled in the art that other drawings can be obtained according to these drawings without inventive labor.

FIG. 1 is a schematic diagram of a computer system according to the present invention;

FIG. 2 is a flowchart illustrating the steps of one embodiment of a method for training a translation model of the present invention;

FIG. 3 is a diagram of a translation model architecture of the present invention;

FIG. 4 is a diagram illustrating a fused structure of a translation model and a language model according to the present invention;

FIG. 5 is a block diagram of a target decoder according to the present invention;

FIG. 6 is a schematic diagram of a fused structure of another translation model and language model of the present invention;

FIG. 7 is a block diagram of an embodiment of a device for training a translation model according to the present invention;

FIG. 8 is a block diagram of an apparatus 800 for translation model training of the present invention;

fig. 9 is a schematic diagram of a server in some embodiments of the invention.

Detailed Description

The technical solutions in the embodiments of the present invention will be clearly and completely described below with reference to the drawings in the embodiments of the present invention, and it is obvious that the described embodiments are some, not all, embodiments of the present invention. All other embodiments, which can be derived by a person skilled in the art from the embodiments given herein without making any creative effort, shall fall within the protection scope of the present invention.

Method embodiment

The embodiment of the invention provides a training method of a translation model, which can be applied to scenes such as machine translation, dialogue question answering and the like.

In a machine translation scene, the translation model trained by the method provided by the embodiment of the invention can be applied to application programs supporting translation functions, such as an electronic dictionary application program, an electronic book application program, a web browsing application program, a social application program, a picture and text recognition application program and the like. And when the application program receives the content to be translated, outputting a translation result by the trained translation model according to the input content to be translated. Illustratively, the content to be translated includes at least one of text-type content, picture-type content, audio-type content, and video-type content. The content of the picture type includes a picture taken by a camera assembly of the terminal or a picture containing the content to be translated, which is not specifically limited in the embodiment of the present invention.

In a dialogue question-answer scene, the translation model trained by the method provided by the embodiment of the invention can be applied to intelligent equipment such as an intelligent terminal or an intelligent home. Taking a virtual assistant set in an intelligent terminal as an example, the automatic question answering function of the virtual assistant is realized through the trained translation model. The user presents questions about the translation to the virtual assistant, and when the virtual assistant receives the questions input by the user, the translation model outputs translation results according to the input questions. Further, the translation result may be converted into speech or text, for example, fed back to the user by means of a virtual assistant. The interaction problem input by the user may be voice input or text input, and the embodiment of the present invention is not limited specifically.

The above two application scenarios are only exemplary illustrations and do not constitute a limitation to the application scenarios of the embodiments of the present invention. The training method of the translation model provided by the embodiment of the invention can also be applied to any other scenes needing machine translation.

It should be noted that the method for training a translation model provided in the embodiment of the present invention may be applied to a computer device with data processing capability. In an alternative embodiment, the method for training the translation model provided by the embodiment of the present invention may be applied to a personal computer, a workstation, or a server, that is, machine translation and training of the translation model may be implemented by the personal computer, the workstation, or the server.

For the trained translation model, the trained translation model can become a part of an application program and is installed in the terminal, so that the terminal outputs a translation result when receiving the content to be translated; or the trained translation model is arranged in a background server of the application program, so that the terminal provided with the application program realizes the translation function by means of the background server.

Referring to fig. 1, a schematic structural diagram of a computer system provided by an embodiment of the present invention is shown, where the computer system includes a terminal 110 and a server 120. The terminal 110 and the server 120 perform data communication via a communication network. Optionally, the communication network may be a wired network or a wireless network, and the communication network may be at least one of a local area network, a metropolitan area network, and a wide area network.

The terminal 110 is installed with an application program supporting a translation function, and the application program may be an electronic book reading application program, an electronic dictionary application program, a web browsing application program, a game application program, a social contact application program, and the like, which is not particularly limited in this embodiment of the present invention.

Optionally, the terminal 110 may be a mobile terminal such as a smart phone, a smart watch, a tablet computer, a laptop portable notebook computer, an intelligent robot, or a terminal such as a desktop computer, a projection computer, and the like, and the type of the terminal is not limited in the embodiment of the present invention.

The server 120 may be an independent physical server, a server cluster or a distributed system formed by a plurality of physical servers, and a cloud server that provides basic cloud computing services such as a cloud service, a cloud database, cloud computing, a cloud function, cloud storage, cloud communication, a Network service, a middleware service, a Content Delivery Network (CDN), a big data and artificial intelligence platform, and the like. In an alternative embodiment, server 120 may be a backend server for applications in terminal 110.

In some embodiments, terminal 110 includes a microphone; the terminal 110 collects voice contents through a microphone and transmits the collected voice contents to the server 120, the server 120 includes a voice translation module including a trained translation model. The server 120 receives the voice content sent by the terminal 110, translates the voice content through the translation model to obtain a translation result, and sends the translation result to the terminal 110 for displaying.

Or, the terminal 110 includes a speech translation module, and the speech translation module includes a trained translation model. After acquiring the voice content, the terminal 110 translates the voice content through the translation model to obtain a translation result, and displays the translation result.

In some embodiments, a text input control is included in the terminal 110, and the terminal 110 acquires the text content through the text input control and transmits the acquired text content to the server 120. The server 120 includes a text translation module, which includes a trained translation model; the server 120 receives the text content sent by the terminal 110, translates the text content through the translation model to obtain a translation result, and sends the translation result to the terminal 110 for displaying.

Alternatively, the terminal 110 includes a text translation module, and the text translation module includes a trained translation model. After the terminal 110 obtains the text content, the text content is translated through the translation model to obtain a translation result, and the translation result is displayed.

In some embodiments, terminal 110 includes a camera assembly therein; the terminal 110 acquires picture contents including contents to be translated through the camera assembly, and the terminal 110 transmits the picture to the server 120. The server 120 includes a picture translation module, which includes a trained translation model; the server 120 receives the picture sent by the terminal 110, recognizes and translates the content in the picture through the machine translation model, and sends the translation result to the terminal 110 for displaying.

Or, the terminal 110 includes a picture translation module, and the picture translation module includes a trained translation model. After the terminal acquires the picture, the terminal translates the picture content through the translation model to obtain a translation result, and displays the translation result.

It should be noted that, in the above embodiment, the display mode of the terminal includes a voice form or a text form.

For convenience of description, the following embodiments are described as examples in which the translation model training method is performed by a server.

Referring to fig. 2, a flowchart illustrating steps of an embodiment of a method for training a translation model according to the present invention is shown, where the method specifically includes the following steps:

step 201, obtaining a language training sample, where the language training sample includes a training sentence and a target sentence corresponding to the training sentence.

Step 202, performing combined processing on the training sentences based on the trained language model and the translation model to be trained to obtain translation results corresponding to the training sentences.

Step 203, calculating a loss value of the translation model according to the translation result and the target statement, and adjusting a model parameter of the translation model according to the loss value.

And adjusting the model parameters of the translation model according to the loss value until a convergence condition is met to obtain the trained translation model.

The training sentences are sentences to be translated for training; the target sentence is a sentence obtained after the sentence to be translated is translated, and is used for the accuracy of the output result of the translation model in the training process. The languages of the training sentence and the target sentence can be any language such as Chinese, English, French, Italian, German, and certainly can be any dialect. Similarly, the length of the training sample is not limited by the embodiments of the present invention, for example, the training sentence may be one sentence, or multiple sentences. There are many ways to obtain the training sentences and the target sentences corresponding to the training sentences, for example, existing translation comparison sentences on a network, translation comparison articles stored on a computer device, and the like can be used as language training samples in the embodiment of the present invention.

The language model is used to determine the association between words contained in the input text data. The language model may be constructed based on a preset algorithm, and the embodiment of the present invention is not particularly limited.

The translation model comprises an encoder and a decoder, and is used for encoding the statement to be translated into an encoding information matrix through the encoder, and then decoding the encoding information matrix by using the decoder to obtain a translation result corresponding to the statement to be translated. The encoding and decoding processes can be realized by a Convolutional Neural Network (CNN) or a Recurrent Neural Network (RNN) model. An NMT (Neural Machine Translation) model can be regarded as a complex Neural network, and iterative training is performed on the complex Neural network until a convergence condition is satisfied, so that a trained Translation model is obtained. The trained translation model may be used to perform translation tasks.

In the embodiment of the invention, before the translation model is trained, the language model is trained on the basis of the monolingual corpus to obtain a language model with stable performance, and then the trained language model is introduced to train the translation model so as to ensure the fluency of the translation result output by the translation model.

Step 202, performing combined processing on the training sentences based on the trained language model and the translation model to be trained, including connecting the language model and the translation model in series, and performing combined processing on the training sentences based on the connected language model and the translation model to obtain the translation results of the training sentences; or, fusing a decoder of the language model and a decoder of the translation model, and performing joint processing on a coding information matrix output by a coder of the translation model based on the fused decoders to obtain a translation result of the training sentence. No matter which combined processing mode is adopted, the language model is introduced before the translation model outputs the translation result, corresponding time sequence information is added to the finally output translation result, and the incidence relation among all words in the output translation result is determined, so that the fluency of the translation result is improved.

And finally, after each round of training is finished, calculating a loss value of the translation model according to the translation result corresponding to the training sentence and the target sentence, and adjusting the model parameters of the translation model according to the loss value until a convergence condition is met to obtain the trained translation model. Wherein, the loss value of the translation model can be determined according to the cross entropy of the translation result and the target statement. The convergence condition may be that the loss value of the translation model is smaller than a preset threshold value in multiple rounds of training, or that the error between the loss values is smaller than a preset value.

Referring to fig. 3, a mainstream translation model architecture diagram is shown: a Transformer framework. The workflow of the translation model in the embodiment of the present invention is described by taking a Transformer framework as an example. The translation model constructed based on the Transformer framework mainly comprises an encoder and a decoder. The word embedding layer (source embedding) is used for receiving an input sentence to be translated and carrying out word embedding processing on the sentence to be translated so as to obtain a word vector corresponding to each word contained in the sentence to be translated. The encoder and decoder include a plurality of layers, and each layer of the encoder/decoder is composed of a number of coding units/decoding units. The word vector corresponding to the statement to be translated is converted into a high-dimensional vector through a series of neural networks by each layer of the encoder. Each layer of the decoder is responsible for re-decoding (translating) the high-dimensional vector output by the encoder into the target language.

It should be noted that each layer of the encoder may include a self-attention layer (self-attention) and a feed-forward network layer (feed forward). The self-attention layer of the encoder is to take the weight of the word vector of each word in the sentence to be translated into consideration when encoding each word vector. The feedforward network layer of the encoder is to perform a nonlinear transformation process on the output vector from the attention layer. Each layer of the decoder may include an attention layer (self-attention), an encoder-decoder attention layer (encoder-decoder attention), and a feed-forward network layer (feed forward). In the decoding process, the self-attention layer of the decoder considers the influence of the translated new words on the currently decoded word vector, that is, the influence of the reference result matrix in the embodiment of the present invention on the currently decoded word vector, where the reference result matrix includes word vectors corresponding to each translated word in the training sentence. The encoding-decoding interest layer of the decoder takes into account the effect of the output of the encoder on the currently decoded word vector. The feedforward network layer of the decoder is to perform a nonlinear transformation process on the output vector of the encoding-decoding layer of interest. The classification network layer is used for receiving the decoding vector output by the last network layer of the decoder and converting the decoding vector into a translation result, such as generating a new word. And after the generated new words are processed by the embedding layer, acquiring word vectors of the generated new words, wherein the word vectors of the generated new words are used as the input of the first layer network layer of the decoder, and the process is circulated until an ending symbol is generated or other preset stop conditions are met, so that all words generated in the decoding stage form a translation result.

A specific processing procedure for performing joint processing on a training sentence based on a trained language model and a translation model to be trained in the embodiment of the present invention will be described below with reference to the translation model architecture shown in fig. 3.

In an optional embodiment of the present invention, the step 202 of performing a joint processing on the training sentence based on the trained language model and the translation model to be trained to obtain a translation result corresponding to the training sentence includes:

step S11, inputting the training sentences into an encoder of the translation model for encoding processing to obtain an encoding information matrix, wherein the encoding information matrix comprises word vectors of all words in the training sentences;

step S12, determining a reference result matrix according to the current word to be translated in the training sentence, wherein the reference result matrix comprises reference word vectors corresponding to words before the current word to be translated in the training sentence;

and step S13, performing combined processing on the coding information matrix and the reference result matrix based on the trained language model and the translation model to be trained to obtain a translation result corresponding to the training sentence.

It should be noted that the words in the embodiments of the present invention are words and phrases, which include words (including words and compound words) and phrases (also referred to as phrases), and are the minimum word-forming structural units that form sentences. If the language of the sentence is Chinese, the words can be characters, words, phrases and the like; if the language of the sentence is english, the word may be an english word or the like.

When the training samples are jointly processed based on the trained language model and the translation model to be trained, the training sentences can be input into the encoder of the translation model for encoding processing, and an encoding information matrix is obtained. As shown in fig. 3, the training sentences may be input into the word embedding layer to obtain word vectors corresponding to each word in the training sentences to be translated, each word vector forms an input matrix corresponding to the training sentence, and the input matrix is input into the encoder to be encoded, so that the encoding results corresponding to each word vector can be obtained.

In addition, a reference result matrix can be determined according to the current word to be translated in the training sentence, and the reference result matrix includes reference vectors corresponding to words before the current word to be translated in the training sentence, that is, word vectors corresponding to new words already translated in the training sentence. For the initial first word to be translated, the reference result matrix is a set initial value, and may be 0, for example; and for other words to be translated except the first word to be translated, the reference result matrix is a matrix formed by word vectors corresponding to words before the current word to be translated in the training sentence.

After the coding information matrix and the reference result matrix are obtained, the coding information matrix and the reference result matrix are subjected to combined processing based on the trained language model and the translation model to be trained, and then the translation result corresponding to the training sentence can be obtained.

In an optional embodiment of the present invention, the step S13, performing joint processing on the coding information matrix and the reference result matrix based on the trained language model and the translation model to be trained to obtain the translation result corresponding to the training sentence, includes:

s21, fusing the decoder of the language model and the decoder of the translation model to obtain a target decoder;

step S22, inputting the coding information matrix and the reference result matrix into the target decoder for decoding processing to obtain a translation result corresponding to the training sentence;

step 203, calculating a loss value of the translation model according to the translation result and the target sentence, includes:

and step S23, calculating a loss value of the translation model according to the translation result corresponding to the training sentence and the target sentence.

Referring to fig. 4, a schematic diagram of a fusion structure of a translation model and a language model according to an embodiment of the present invention is shown. Wherein, the translation model to be trained comprises an encoder A1 and a decoder A2, and the language model after being trained comprises an encoder B1 and a decoder B2. In an alternative embodiment of the present invention, the decoder a2 of the translation model and the decoder B2 of the language model may be fused to obtain a target decoder, and the target decoder decodes the coding information matrix and the reference result matrix obtained in steps S11 and S12 to obtain the translation result corresponding to the training sentence.

When the decoder a2 of the translation model and the decoder B2 of the language model are fused, the decoder of the language model can be used as an intermediate layer of the decoder of the translation model and embedded into the decoder of the translation model; it is also possible to connect the decoder of the language model in parallel with the decoder of the translation model, for example, the decoder B2 of the language model in parallel with the self-attention layer of the decoder a2 of the translation model, or the decoder B2 of the language model in parallel with the encoding-decoding attention layer of the decoder a2 of the translation model. No matter which combined processing mode is adopted, the language model is introduced before the translation model outputs the translation result, corresponding time sequence information is added to the finally output translation result, and the incidence relation among all words in the output translation result is determined, so that the fluency of the translation result is improved.

And after the target decoder obtains the translation result of the training sentence, calculating the loss value of the translation model according to the translation result corresponding to the training sentence and the target sentence, and adjusting the model parameters of the translation model according to the loss value until the convergence condition is met to obtain the trained translation model.

It should be noted that, in the embodiment of the present invention, the decoder in the trained translation model is a decoder of the translation model itself, and is not a fused target decoder. The embodiment of the invention only introduces the language model in the training process to improve the deep learning process of the translation model, does not change the model structure of the translation model, reduces the difficulty of improving the performance of the translation model, can quickly and effectively improve the fluency of the translation result of the translation model, and improves the translation performance of the translation model.

In an optional embodiment of the present invention, the step S22 of inputting the encoded information matrix and the reference result matrix into the target decoder for decoding processing to obtain a translation result corresponding to the training sentence includes:

substep S221, inputting the reference result matrix into a first network layer and a second network layer of the target decoder respectively to obtain a first output matrix and a second output matrix, where the first network layer belongs to the decoder of the language model and the second network layer belongs to the decoder of the translation model;

substep S222, performing weighted summation on the first output matrix and the second output matrix to obtain a fusion matrix;

and a substep S223 of inputting the coding information matrix and the fusion matrix into a third network layer of the target decoder to obtain a translation result corresponding to the training sentence, wherein the third network layer belongs to the decoder of the translation model.

Referring to fig. 5, a schematic structural diagram of a target decoder according to an embodiment of the present invention is shown. Wherein the first network layer of the decoder B2 of the language model is connected in parallel with the second network layer of the decoder a2 of the translation model. And inputting the reference result matrix into the first network layer and the second network layer respectively to obtain a first output matrix and a second output matrix.

And then, carrying out weighted summation on the first output matrix and the second output matrix to obtain a fusion matrix. And inputting the fusion matrix and the coding information matrix output by the coder into a third network layer of a decoder A2 of the translation model to continue decoding, so that a translation result corresponding to the training sentence can be obtained.

Among them, the first network layer may include a decoding layer of the decoder B2 of the language model, the second network layer may include a self-attention layer of the decoder a2 of the translation model shown in fig. 3, and the third network layer may include an encoding-decoding attention layer and a feedforward network layer of the decoder a2 of the translation model shown in fig. 3. That is, in the target decoder shown in fig. 5, the decoder of the language model is connected in parallel with the self-attention layer of the translation model decoder.

Of course, the decoder B2 of the language model may also be connected in parallel with the encoding-decoding interest layer of the decoder a2 of the translation model. However, since the feedforward network layer only performs nonlinear processing on input data, if the decoder B2 of the language model is connected in parallel with the encoding-decoding attention layer of the decoder a2 of the translation model, the main processing process of the decoder of the translation model is already completed, and then only the feedforward network layer performs nonlinear processing on the encoding-decoding attention layer of the decoder a2 and the output matrix of the decoder B2, the influence of the output of the language model on the translation result finally generated is limited, and the fluency improvement of the translation result is not high. Therefore, in the embodiment of the present invention, in order to effectively improve the fluency of the translation result and quickly improve the translation performance of the translation model, the decoder of the translation model and the decoder of the language model are generally fused in the manner shown in fig. 4.

In an optional embodiment of the present invention, the step of performing weighted summation on the first output matrix and the second output matrix to obtain a fusion matrix in sub-step S222 includes:

a11, adjusting the weight value of the first output matrix and/or the second output matrix according to the fluency of the translation result corresponding to the previous round of training sentences;

and A12, carrying out weighted summation on the first output matrix and the second output matrix according to the weight value of the adjusted first output matrix and/or second output matrix to obtain a fusion matrix.

When the first output matrix of the language model and the second output matrix of the translation model are weighted and summed, the weight value of the first output matrix and/or the second output matrix can be adjusted according to the fluency degree of the translation result corresponding to the sentence in the previous round of training. Specifically, if the fluency of the translation result corresponding to the sentence in the previous round is good, the weight value of the first output matrix of the language model can be properly reduced, and/or the weight value of the second output matrix of the translation model can be properly increased; if the fluency of the translation result corresponding to the sentence in the previous round is poor, the weight value of the first output matrix of the language model can be properly increased, and/or the weight value of the second output matrix of the translation model can be properly reduced.

In the embodiment of the invention, the weighted values of the first output matrix of the language model and the second output matrix of the translation model are dynamically adjusted according to the fluency of the translation result corresponding to the sentence in the previous round of training, so that the requirements of various training scenes can be met, the translation performance of the translation model is effectively improved, and the fluency of the translation result is improved while the accuracy of the translation result is ensured.

step S31, inputting the reference result matrix into a decoder of the language model for decoding processing to obtain a language processing result corresponding to the training sentence;

step S32, inputting the coding information matrix and the reference result matrix into a decoder of the translation model for decoding processing to obtain a translation result corresponding to the training sentence;

step 203, calculating a loss value of the translation model according to the translation result and the target sentence, and adjusting a model parameter of the translation model according to the loss value includes:

step S33, determining a first loss value according to the reference result matrix and the language processing result corresponding to the training sentence;

step S34, determining a second loss value according to the translation result corresponding to the training sentence and the target sentence;

step S35, carrying out weighted summation on the first loss value and the second loss value to obtain a joint loss value of the translation model and the language model, and adjusting model parameters of the translation model according to the joint loss value.

In the embodiment of the invention, the processing can be realized by fusing a decoder of the language model and a decoder of the translation target to obtain a target decoder, and the training sentences are jointly processed based on the target decoder, and the translation model can also be jointly trained based on the language model and the translation model which are independent of each other.

Referring to fig. 6, a schematic diagram of a fusion structure of another translation model and a language model provided by the embodiment of the present invention is shown. As shown in fig. 6, the reference result matrix is decoded by the decoder B2 of the language model to obtain a language processing result, and the encoded information matrix and the reference result matrix are decoded by the decoder a2 of the translation model to obtain a translation result corresponding to the training sentence. Wherein, the reference result matrix input in the decoder B2 of the language model and the decoder A2 of the translation model is the same.

Then, loss values of the language model and the translation model are calculated, respectively. Specifically, a first loss value of the language model is calculated according to the reference result matrix and the language processing result, and a second loss value of the translation model is calculated according to the translation result corresponding to the training sentence and the target sentence. Both the first loss value and the second loss value may be calculated by cross entropy, which is not limited in this embodiment of the present invention.

And finally, carrying out weighted summation on the first loss value and the second loss value to obtain a joint loss value of the language model and the translation model. And adjusting the model parameters of the translation model according to the joint loss value until a convergence condition is met to obtain the trained translation model.

When the first loss value and the second loss value are weighted and summed, the weight values of the first loss value and the second loss value can be determined according to the fluency of the translation result of the training sentence in the previous round, so that the model parameters of the translation model can be dynamically adjusted according to the output translation result.

In an alternative embodiment of the invention, the decoder of the language model and the decoder of the translation model share a classification network layer.

As shown in fig. 6, in the embodiment of the present invention, when the translation model is trained based on the joint loss value of the language model and the translation model, the language model and the translation model may be independent from each other, or may share a classification network layer. By sharing the classification network layer, corresponding model parameters of the language model and the translation model in the classification network layer can be kept the same, so that the consistency of the processing processes of the translation model and the language model in the classification network layer is ensured.

In summary, the language model is introduced in the training process of the translation model, the trained language model and the translation model to be trained are used for performing combined processing on the training sentences, a large amount of training corpora do not need to be additionally introduced, the training time of the translation model and the consumption of computing resources are reduced, the network structure of the translation model does not need to be changed, the fluency of the translation result of the translation model can be rapidly and effectively improved, and the translation performance of the translation model is improved.

It should be noted that, for simplicity of description, the method embodiments are described as a series of acts or combination of acts, but those skilled in the art will recognize that the present invention is not limited by the illustrated order of acts, as some steps may occur in other orders or concurrently in accordance with the embodiments of the present invention. Further, those skilled in the art will appreciate that the embodiments described in the specification are presently preferred and that no particular act is required to implement the invention.

Device embodiment

Referring to fig. 7, a block diagram of an embodiment of the present invention of a translation model training apparatus is shown, and the apparatus may include:

a training sample obtaining module 701, configured to obtain a language training sample, where the language training sample includes a training sentence and a target sentence corresponding to the training sentence;

a joint processing module 702, configured to perform joint processing on the training sentence based on the trained language model and the translation model to be trained, to obtain a translation result corresponding to the training sentence;

a parameter adjusting module 703, configured to calculate a loss value of the translation model according to the translation result and the target sentence, and adjust a model parameter of the translation model according to the loss value.

Optionally, the joint processing module 702 includes:

Optionally, the joint processing submodule includes:

the decoder fusion unit is used for fusing the decoder of the language model with the decoder of the translation model to obtain a target decoder;

the parameter adjusting module comprises:

Optionally, the decoding processing unit includes:

the second processing subunit is used for carrying out weighted summation on the first output matrix and the second output matrix to obtain a fusion matrix;

Optionally, the second processing subunit is further configured to:

Optionally, the joint processing submodule includes:

the parameter adjusting module comprises:

a second loss value determining submodule, configured to determine a second loss value according to the translation result corresponding to the training sentence and the target sentence;

Optionally, the decoder of the language model and the decoder of the translation model share a classification network layer.

For the device embodiment, since it is basically similar to the method embodiment, the description is simple, and for the relevant points, refer to the partial description of the method embodiment.

The embodiments in the present specification are described in a progressive manner, each embodiment focuses on differences from other embodiments, and the same and similar parts among the embodiments are referred to each other.

With regard to the apparatus in the above-described embodiment, the specific manner in which each module performs the operation has been described in detail in the embodiment related to the method, and will not be elaborated here.

An embodiment of the present invention provides an apparatus for translation model training, the apparatus comprising a memory, and one or more programs, wherein the one or more programs are stored in the memory and configured to be executed by one or more processors comprise instructions for: .

B11, obtaining a language training sample, wherein the language training sample comprises training sentences and target sentences corresponding to the training sentences;

b12, performing combined processing on the training sentences based on the trained language model and the translation model to be trained to obtain translation results corresponding to the training sentences;

and B13, calculating a loss value of the translation model according to the translation result and the target statement, and adjusting the model parameters of the translation model according to the loss value.