Middle-heading machine translation method based on reinforcement learning and machine translation quality evaluation

文档序号：615775 发布日期：2021-05-07 浏览：22次中文

阅读说明：本技术 基于强化学习和机器翻译质量评估的中朝机器翻译方法 (Middle-heading machine translation method based on reinforcement learning and machine translation quality evaluation ) 是由赵亚慧李飞雨崔荣一杨飞扬王琪金晶金城李丹阳李路军姜克鑫高君龙于 2021-01-19 设计创作，主要内容包括：本发明公开了基于强化学习和机器翻译质量评估的中朝机器翻译方法,主要包括以下步骤：将句子级别评价机制引入翻译模型中指导模型的训练,评价机制采用机器翻译质量评估,指导策略采用强化学习方法；机器翻译过程中,NMT系统作为强化学习的智能体,通过不断与环境进行交互获取当前时刻环境状态信息,根据当前环境的状态决策出下一步所选单词,同时获得当前状态执行选词操作后的奖励值,进入下一状态；通过机器翻译质量评估模型生成反馈信号,将机器翻译质量评估模型的输出作为奖励分数的一部分,所述模型通过网络结构对生成的译文进行全面评分。(The invention discloses a middle-heading machine translation method based on reinforcement learning and machine translation quality evaluation, which mainly comprises the following steps: introducing a sentence level evaluation mechanism into the translation model to guide the training of the model, wherein the evaluation mechanism adopts machine translation quality evaluation, and the guidance strategy adopts a reinforcement learning method; in the machine translation process, the NMT system is used as an intelligent agent for reinforcement learning, the environmental state information at the current moment is obtained through continuous interaction with the environment, the next selected word is decided according to the state of the current environment, and the reward value after the word selection operation is executed at the current state is obtained at the same timeEntering the next state; evaluating models by machine translation quality Generating a feedback signal, and taking the output of a machine translation quality evaluation model as a part of the reward score, wherein the model carries out comprehensive grading on the generated translation through a network structure.)

1. The middle-heading machine translation method based on reinforcement learning and machine translation quality evaluation is characterized by comprising the following steps of:

introducing a sentence level evaluation mechanism into a translation model to guide the training of the model, wherein the evaluation mechanism adopts machine translation quality evaluation, a guide strategy adopts a reinforcement learning method, in the machine translation process, an NMT system is used as an intelligent agent for reinforcement learning, environmental state information at the current moment is obtained by continuously interacting with the environment, a word selected in the next step is decided according to the state of the current environment, and meanwhile, a reward value after the word selection operation is executed in the current state is obtained, and the next state is entered;

evaluating models by machine translation qualityGenerating a feedback signal to model the machine translation qualityAs part of the reward score QE, the machine translation quality assessment modelComprehensively scoring the generated translation through a network structure;

and selecting a candidate word from the word list as an action by adopting an action sampling strategy based on beam search, and learning to obtain a corresponding translation with the highest score through a reward given by an environment after a decoder generates a target sentence.

2. The method for mid-heading machine translation based on reinforcement learning and machine translation quality assessment according to claim 1, wherein the evaluation mechanism guidance model comprises a machine translation module and a machine translation quality assessment module, the machine translation module is consistent with a Transformer by adopting a coder-decoder architecture, and the machine translation quality assessment module adopts a sentence-level machine translation quality assessment model Bilingual Expert for quality assessment.

3. The method for mid-heading machine translation based on reinforcement learning and machine translation quality assessment according to claim 2, wherein the machine translation quality assessment model comprises a bidirectional Transformer-based word prediction module and a Bi-LSTM-based regression prediction model, the bidirectional Transformer-based word prediction module comprises a self-attention encoder of a source sentence, a bidirectional self-attention encoder of a target sentence and a reconstructor of the target sentence, and the hidden state feature h is obtained by pre-training on a large-scale parallel corpus.

4. The method for middle-heading machine translation based on reinforcement learning and machine translation quality assessment according to claim 1, wherein the machine translation quality assessment model learns the translation corresponding to the highest score by referring to the reward given by the environment after the target sentence is generated by the decoder in the training process.

5. The method for middle-facing machine translation based on reinforcement learning and machine translation quality assessment according to claim 4, wherein the QE value obtained by the generation sentence passing through the machine translation quality assessment module is used as the training target of the machine translation quality assessment model.

6. The method for mid-heading machine translation based on reinforcement learning and machine translation quality assessment according to claim 5, wherein the feedback function based on QE value and BLEU value is:

wherein the content of the first and second substances,to generate a normalized BLEU value between a translation and a reference translation,evaluating a score for the normalized QE of the generated translation; the hyperparameter α is used to balance the weight between the BLEU value and the QE score.

7. The method of claim 6, wherein during training, the action samples represent conditional probabilities of selecting the word for a given source sentence and aboveThe goal is to maximize the desired reward; and when a complete target sentence is generated, the quality evaluation score of the sentence to be translated is used as the label information to calculate a feedback value, and the maximum expected yield is obtained by combining a Policy Gradient method in a reinforcement learning algorithm.

8. The method of claim 7, wherein reward modeling is used during training, wherein a cumulative reward is calculated as the current sequence feedback value for each sampling action, and the feedback difference between two consecutive time steps is term-level reward.

9. The method of claim 8, wherein the MLE training target is combined with the RL target, and the combined loss function L is a function of the loss of the RL target_combineComprises the following steps:

L_combine＝γ×L_mle+(1-γ)L_rl

and (3) balancing the cross entropy loss and the reinforcement learning target through the gamma value, so that the model benefit is maximized.

Technical Field

The invention belongs to the field of natural language processing in computer intelligent information processing, and particularly relates to a middle-oriented machine translation method based on reinforcement learning and machine translation quality evaluation.

Background

How to automatically realize the interconversion between different languages by using a computer in the machine translation research is an important research field of natural language processing and artificial intelligence and is also one of the common services of the internet at present. Although the machine translation is still far from professional translators, in some scenarios with low requirements on the translation quality or on the translation task in a specific field, the machine translation has obvious advantages in translation speed and is still widely applied. In view of the complexity and application prospect of machine translation, the academia and the industry take the field as an important research direction and become one of the most active research fields of natural language processing at present.

Due to the variety and complexity of natural language processing, it is still difficult to properly translate one language into another. At present, under the condition of large-scale linguistic data and computing power, neural machine translation shows huge potential, and the neural machine translation method is developed into a new machine translation method. The method only needs bilingual parallel corpora, is convenient for training a large-scale translation model, has high research value and strong industrialization capability, and becomes a leading-edge hotspot of the current machine translation research.

China is a unified multi-national nation, and each minority nationality language type and form are rich, and the problems of large language type span, deficient language resources, weak language processing basic technology and the like exist. The existence of these problems makes some of the currently mature machine translation methods unsuitable for translation between minority languages and korean. In fact, the automatic translation technology between the minority national languages and the chinese language in our country faces many complex scientific problems, such as machine translation of languages with rich forms, machine translation of languages with scarce resources, and the like, which are also important contents of the current machine translation research. Although the neural machine translation technology is rapidly developed and becomes the mainstream technology of machine translation research in recent years, the research on the machine translation of national minority languages is mainly focused on the minority languages such as Mongolian, Tibetan and Uygur.

The korean language is an official language of korean in our country, and is also used in korean living areas such as the korean peninsula, the usa, and the russian far east, and has the characteristic of transnational and transregional use. The Korean is one of 24 minority nationalities with own language in China, so the research of the machine translation of the Zhongzhu has important practical significance and urgent era demand on promoting the development of the work of the language characters of the minority nationalities and promoting the cross-language information exchange of the Zhongzhu and the Zhonghan. However, for mid-heading machine translation, there is a lack of large-scale parallel corpora between mid-heading language pairs, and the mid-heading language pairs belong to low-resource languages. The domestic research on the task starts late, has poor foundation and is lack of large-scale corpus resource libraries. There is a great challenge to promote medium towards machine translation quality in low resource environments.

Disclosure of Invention

According to the invention, the problem of poor exposure deviation and poor translation diversity caused by the fact that a teacher-forced strategy is used in a traditional neural machine translation model is effectively solved by introducing machine translation quality rating into a Zhongzhu machine translation model, the training process is greatly stabilized by effectively setting a sampling strategy, a reward function and a loss function, and the performance of the model is improved and maximized.

In order to achieve the purpose, the invention provides the following scheme:

the middle-heading machine translation method based on reinforcement learning and machine translation quality evaluation comprises the following steps:

Preferably, the evaluation mechanism guidance model comprises a machine translation module and a machine translation quality evaluation module, wherein the machine translation module is consistent with a Transformer by adopting an encoder-decoder architecture, and the machine translation quality evaluation module adopts a sentence-level machine translation quality evaluation model Bilingual Expert to carry out quality evaluation.

Preferably, the machine translation quality evaluation model comprises a bidirectional Transformer-based word prediction module and a Bi-LSTM-based regression prediction model, wherein the bidirectional Transformer-based word prediction module comprises a source sentence self-attention encoder, a target sentence bidirectional self-attention encoder and a target sentence reconstructor, and the hidden state feature h is obtained by pre-training on a large-scale parallel corpus.

Preferably, in the training process, after the decoder generates the target sentence, the machine translation quality evaluation model learns the translation corresponding to the highest score by referring to the reward given by the environment.

Preferably, the QE value obtained by the machine translation quality evaluation module of the generated sentence is used as a training target of the machine translation quality evaluation model.

Preferably, the feedback function based on the QE value and the BLEU value is:

Preferably, during training, the motion samples are represented as a given sourceConditional probability of sentence and word selected aboveThe goal is to maximize the desired reward; and when a complete target sentence is generated, the quality evaluation score of the sentence to be translated is used as the label information to calculate a feedback value, and the maximum expected yield is obtained by combining a Policy Gradient method in a reinforcement learning algorithm.

Preferably, reward shaping is used during training, i.e. a cumulative reward is calculated as the current sequence feedback value each time a sampling action is completed, and the feedback difference between two consecutive time steps is the term level reward.

Preferably, the MLE training target is combined with the RL target, the combined loss function L_combineComprises the following steps:

L_combine＝γ×L_mle+(1-γ)L_rl

and (3) balancing the cross entropy loss and the reinforcement learning target through the gamma value, so that the model benefit is maximized.

Compared with the prior art, the invention has the following advantages:

(1) according to the method, sentence-level machine translation quality evaluation is introduced into the neural machine translation model, so that the translation generated by the translation model is not completely converged into the reference translation, and the problem of poor diversity of the translations of the traditional neural machine translation model is solved;

(2) the invention adopts the reinforcement learning method to train the model, which is different from the common maximum likelihood estimation method, the reinforcement learning method of strategy optimization realizes the optimization of the target sequence of the model at sentence level, and solves the problem of exposure deviation caused by the mandatory strategy of a teacher;

(3) the invention provides a reward function based on QE evaluation, which improves model deviation caused by directly using a BLEU value as the reward function and further enhances the diversity of the model for generating a translation;

(4) the cross entropy loss function and the reinforcement learning reward function of the traditional neural machine translation model are linearly combined, the problems of unstable training, large variance and the like existing in reinforcement learning are solved by using a baseline feedback method, and the performance of the model is improved and maximized.

Drawings

In order to more clearly illustrate the embodiments of the present invention or the technical solutions in the prior art, the drawings needed to be used in the embodiments will be briefly described below, and it is obvious that the drawings in the following description are only some embodiments of the present invention, and it is obvious for those skilled in the art to obtain other drawings without inventive exercise.

FIG. 1 is a flow chart of the method of the present invention;

FIG. 2 is a schematic diagram of a model framework of the present invention;

fig. 3 is a schematic diagram of the sampling beam sampling strategy of the present invention.

Detailed Description

The technical solutions in the embodiments of the present invention will be clearly and completely described below with reference to the drawings in the embodiments of the present invention, and it is obvious that the described embodiments are only a part of the embodiments of the present invention, and not all of the embodiments. All other embodiments, which can be derived by a person skilled in the art from the embodiments given herein without making any creative effort, shall fall within the protection scope of the present invention.

In order to make the aforementioned objects, features and advantages of the present invention comprehensible, embodiments accompanied with figures are described in further detail below.

As shown in fig. 1, the present invention provides a middle-heading machine translation method based on reinforcement learning and machine translation quality evaluation.

The invention adopts an action sampling strategy based on beam searching. The machine translation task of reinforcement learning has huge and discrete action space, and the size of the task is the capacity of the whole word list. When the motion space is sampled, the optimal motion can be guaranteed to be obtained by exhaustive search, but the calculation cost is too high, and the greedy strategy has low calculation cost but cannot guarantee to obtain the optimal sequence. A reasonable strategy is therefore needed to trade off performance against computational cost. A map of the mid-heading machine translation model is shown in fig. 2.

The beam searching method selected by the invention can effectively balance performance and calculation cost by flexibly selecting the size of the beam. The beam search principle is schematically illustrated in fig. 3.

The environment configuration of this example is as follows: an Ubuntu system in Linux, a CPU master frequency of 3.20GHZ and a memory of 16G are implemented under an integrated development environment Pycharm, wherein a programming language is Python.

In the embodiment, the detailed parameters of each module are set as translation modules to be realized on a self-attention-based encoder-decoder framework, a transform system specifically realizes a transducer 2 transducer open source tool constructed by Google brain, dropout is set to be 0.1, the dimension of a word vector is 512, and an MLE training gradient optimization algorithm uses an Adam algorithm and uses learning rate attenuation scheduling; the number of layers of an encoder and a decoder of the feature extraction part is 2, the number of hidden units of a feedforward sublayer is 1024, and the number of heads of an attention machine mechanism is 4; the quality evaluation part uses a single-layer Bi-LSTM, the hidden layer unit is set to be 512, the gradient optimization algorithm uses Adam, and the learning rate is set to be 0.001; and (3) performing parameter initialization by using an MLE (Multi-level learning) model in the reinforcement learning training process, wherein the learning rate is set to be 0.0001, and the beam search width is set to be 6.

In the embodiment, a comparison experiment is carried out on the corpus constructed by the project of the real data set 'Chinese Korean science and technology information processing integrated platform', the original corpus contains 3 thousands of sentences, and the method relates to 3 fields of biotechnology, marine environment and aerospace. To alleviate the data sparseness problem, the experiment also used additional monolingual corpora. The details of the data set obtained after the pre-processing are shown in table 1:

TABLE 1

Categories	Language kind	Scale (sentence)
			Parallel corpora	Middle-orientation	30,000
Monolingual corpus	In	115,000
				Facing towards	115,000
QE corpus	In	30,000
				Facing towards	30,000

The korean language belongs to a low-resource language and lacks a large-scale corpus, so that a large number of low-frequency words exist in the corpus, and the quality of word vectors is low. Aiming at the problem, word embedding is carried out by using more flexible Korean language granularity in the preprocessing process, so that the data sparseness problem is relieved. And performing corpus preprocessing on the Korean language text by using three granularities of phonemes, syllables and words respectively. The phonemes are obtained using the open source phoneme decomposition tool hgtk, the syllables are obtained directly by reading the characters, and the word segmentation tool Kkma is used for word segmentation.

In order to verify the effectiveness of the method, the translation experiment is carried out on the LSTM model and the Transformer model which are added with attention mechanisms under the same hardware and corpus environment. The word vector dimensions used by the model are the same as the present invention, and the translation performance results are given in table 2:

TABLE 2

Translation performance indicates that: in different models, the method has excellent translation effect, compared with LSTM + attention, the BLEU value of the Chinese-to-the-phrase is improved by 9.87, the QE score is reduced by 59.68, the BLEU value of the Chinese-to-the-phrase is improved by 10.99, and the QE score is reduced by 57.76; compared with a Transformer, the BLEU value of the Chinese-to-Korean language is improved by 5.39, the QE score is reduced by 5.16, the BLEU value of the Chinese-to-Chinese language is improved by 2.73, and the QE score is reduced by 2.82.

The invention introduces a machine translation quality module to carry out reinforced training on the translation module, so that the performance of the machine translation quality evaluation module is experimentally verified to ensure the reasonability and effectiveness of the strategy. A comparison baseline system adopts an open source system Quest + +, and the system is an official baseline system in 2013-2019 of world machine translation tournament. The performance verification results are shown in table 3:

TABLE 3

Index (I)	Bilingual Expert	Baseline system
			Pearson’s↑	0.476	0.397
MAE↓	0.118	0.136
			RMSE↓	0.166	0.173

The quality evaluation verification result shows that: compared with a baseline system of a QE task, the Bilngual Expert adopted by the invention has better performance improvement, the Pearson correlation coefficient is improved by 0.079, the MAE is reduced by 0.018, the RMSE is reduced by 0.007, and the method has higher correlation with manual evaluation, thereby proving the effectiveness of the machine translation quality evaluation model adopted by the invention.

In order to show the model translation effect more clearly, the invention takes the translation tasks of middle-orientation and middle-orientation as an example, and the translation obtained by the source sentence through the QR-Transformer model is shown in Table 4:

TABLE 4

In order to solve the problems of exposure deviation and translation diversity difference caused by a teacher forced strategy in a machine translation task, the invention provides a middle orientation machine translation model QR-Transformer based on reinforcement learning and machine translation quality evaluation. The model introduces an evaluation mechanism at the sentence level to guide the model prediction to not fully converge on the reference translation. The evaluation mechanism adopts machine translation quality evaluation of a non-reference translation, and guides the strategy to adopt a reinforcement learning method. Experimental results show that the method can effectively improve the performance of the middle-heading machine translation.

The above-described embodiments are merely illustrative of the preferred embodiments of the present invention, and do not limit the scope of the present invention, and various modifications and improvements of the technical solutions of the present invention can be made by those skilled in the art without departing from the spirit of the present invention, and the technical solutions of the present invention are within the scope of the present invention defined by the claims.

10页详细技术资料下载

Middle-heading machine translation method based on reinforcement learning and machine translation quality evaluation

相关技术

网友询问留言