Data processing method, device, electronic equipment and medium

文档序号：1889980 发布日期：2021-11-26 浏览：6次中文

阅读说明：本技术 数据处理方法、装置、电子设备及介质 (Data processing method, device, electronic equipment and medium ) 是由颜建昊王福升孟凡东于 2021-03-26 设计创作，主要内容包括：本申请实施例公开了一种数据处理方法、装置、电子设备及介质,应用于机器学习技术领域。其中方法包括：获取样本文本数据和对应的翻译文本数据,将样本文本数据输入第一翻译模型得到每个文本词语分别对应的第一翻译结果,并将样本文本数据输入第二翻译模型得到每个文本词语分别对应的第二翻译结果,获取每个文本词语分别对应的翻译词语的重要指标参数,并根据重要指标参数从N个文本词语中确定目标文本词语,根据目标文本词语对应的第一翻译结果、对应的第二翻译结果和对应的翻译词语,确定模型翻译损失函数,并根据模型翻译损失函数修正第一翻译模型,得到目标翻译模型。采用本申请实施例,可以提高所获取到的目标翻译模型的准确性。(The embodiment of the application discloses a data processing method, a data processing device, electronic equipment and a medium, which are applied to the technical field of machine learning. The method comprises the following steps: the method comprises the steps of obtaining sample text data and corresponding translation text data, inputting the sample text data into a first translation model to obtain a first translation result corresponding to each text word, inputting the sample text data into a second translation model to obtain a second translation result corresponding to each text word, obtaining important index parameters of the translation word corresponding to each text word, determining a target text word from N text words according to the important index parameters, determining a model translation loss function according to the first translation result corresponding to the target text word, the corresponding second translation result and the corresponding translation word, and correcting the first translation model according to the model translation loss function to obtain a target translation model. By adopting the embodiment of the application, the accuracy of the acquired target translation model can be improved.)

1. A data processing method, comprising:

acquiring sample text data and translation text data of the sample text data; the sample text data comprises N text words, wherein N is a positive integer; the translated text data comprises a translated term corresponding to each of the N text terms; the sample text data and the translated text data have different text types;

inputting the sample text data into a first translation model, generating a first translation result corresponding to each text word in the first translation model, inputting the sample text data into a second translation model, and generating a second translation result corresponding to each text word in the second translation model;

acquiring important index parameters of the translation words corresponding to each text word, and determining a target text word from the N text words according to the important index parameters of the translation words corresponding to each text word;

determining a model translation loss function according to a first translation result corresponding to the target text word, a second translation result corresponding to the target text word and a translation word corresponding to the target text word in the translation text data, and correcting a model parameter of the first translation model according to the model translation loss function to obtain a target translation model.

2. The method of claim 1, wherein the N text terms include an ith text term, i being a positive integer less than or equal to N; the first translation result of the ith text word comprises a prediction probability of the ith text word for each word in a translation word stock; the translation words in the translation text data belong to the translation word stock;

the obtaining of the important index parameter of the translation term corresponding to each text term respectively includes:

determining the prediction probability of a translation word corresponding to the ith text word in the first translation result of the ith text word as a target prediction probability;

and generating an important index parameter of a translation word corresponding to the ith text word according to the target prediction probability.

3. The method of claim 1, wherein the N text terms include an ith text term, i being a positive integer less than or equal to N;

the obtaining of the important index parameter of the translation term corresponding to each text term respectively includes:

acquiring at least two important evaluation index values of the ith text word;

and aggregating the at least two important evaluation index values to obtain the important index parameters of the translation words corresponding to the ith text word.

4. The method according to claim 3, wherein the aggregating the at least two important evaluation index values to obtain an important index parameter of the translated term corresponding to the i-th text term comprises:

obtaining an evaluation weight corresponding to each important evaluation index value in the at least two important evaluation index values;

weighting each important evaluation index value according to the evaluation weight corresponding to each important evaluation index value respectively to obtain the weighted index value corresponding to each important evaluation index value respectively;

and determining the important index parameter of the translation word corresponding to the ith text word according to the weighted index value corresponding to each important evaluation index value.

5. The method of claim 1, wherein the determining a target text word from the N text words according to the importance indicator parameter of the translation word corresponding to each text word comprises:

sequencing the N text words according to the descending order of the important index parameters of the translation words corresponding to each text word respectively to obtain the sequenced N text words;

obtaining a sample selection number K, and determining the first K text words in the sequenced N text words as the target text words; k is a non-negative integer.

6. The method of claim 1, wherein obtaining sample text data comprises:

acquiring Z text words of the first translation model in the j-1 model training process; j is a positive integer greater than 1, Z is equal to N;

adding the Z text words to a first-in first-out queue;

in the j model training process of the first translation model, acquiring newly added sample text data, and adding text words contained in the newly added sample text data to the first-in first-out queue containing the Z text words to obtain a target queue;

determining the sample text data from the text words in the target queue.

7. The method of claim 1, further comprising:

acquiring text data to be translated; the text data to be translated and the sample text data have the same text type;

inputting the text data to be translated into the target translation model, and outputting target translation text data of the text data to be translated based on the target translation model; the target translation text data and the translation text data have the same text type.

8. A data processing apparatus, comprising:

the acquisition module is used for acquiring sample text data and translation text data of the sample text data; the sample text data comprises N text words, wherein N is a positive integer; the translated text data comprises a translated term corresponding to each of the N text terms; the sample text data and the translated text data have different text types;

the processing module is used for inputting the sample text data into a first translation model, generating a first translation result corresponding to each text word in the first translation model, inputting the sample text data into a second translation model, and generating a second translation result corresponding to each text word in the second translation model;

the determining module is used for acquiring the important index parameters of the translation words corresponding to each text word and determining a target text word from the N text words according to the important index parameters of the translation words corresponding to each text word;

the determining module is further configured to determine a model translation loss function according to the first translation result corresponding to the target text word, the second translation result corresponding to the target text word, and the translation word corresponding to the target text word in the translation text data, and correct the model parameter of the first translation model according to the model translation loss function to obtain the target translation model.

9. An electronic device comprising a processor and a memory, the processor being interconnected with the memory, wherein the memory is configured to store computer program instructions, and the processor is configured to execute the program instructions to implement the method of any one of claims 1-7.

10. A computer-readable storage medium, characterized in that the computer-readable storage medium stores a computer program comprising program instructions that, when executed by a processor, cause the processor to carry out the method according to any one of claims 1-7.

Technical Field

The present application relates to the field of machine learning technologies, and in particular, to a data processing method and apparatus, an electronic device, and a medium.

Background

With the continuous development of computer technology, Artificial Intelligence (AI) technology is becoming more mature, wherein the AI technology relates to machine learning related technology.

In the prior art, a model can be trained through a machine learning correlation technique, and the trained model can be applied to prediction of data (such as text translation prediction). By inputting the sample data into the model to be trained, the model can predict a prediction result aiming at the input sample data, and further can correct the model parameters of the model to be trained through the difference between the prediction result and the actual data attribute of the sample data, so as to finally obtain the trained target model. However, sample data input into a model usually contains data that adversely affects model training because the sample data is of various contents and types, and therefore, the model trained by directly using the sample data to correct model parameters is inaccurate.

Disclosure of Invention

The embodiment of the application provides a data processing method, a data processing device, an electronic device and a medium, which can improve the accuracy of an acquired target translation model.

In one aspect, an embodiment of the present application provides a data processing method, where the method includes:

In one aspect, an embodiment of the present application provides a data processing apparatus, where the apparatus includes:

In one aspect, the present application provides an electronic device, which includes a processor and a memory, where the processor and the memory are connected to each other, where the memory is used to store computer program instructions, and the processor is configured to execute the computer program instructions to implement part or all of the steps in the above method.

In one aspect, the present application provides a computer-readable storage medium, in which computer program instructions are stored, and when the computer program instructions are executed by a processor, the computer program instructions are used to perform some or all of the steps of the above method.

Accordingly, according to an aspect of the present application, there is provided a computer program product or a computer program comprising computer instructions stored in a computer readable storage medium, the computer instructions being read by a processor of a computer device from the computer readable storage medium, the computer instructions being executed by the processor to cause the computer device to perform the data processing method provided above.

In the embodiment of the application, sample text data and corresponding translated text data may be obtained, the sample text data is input into a first translation model, a first translation result corresponding to each text word is generated in the first translation model, the sample text data is input into a second translation model, a second translation result corresponding to each text word is generated in the second translation model, an important index parameter of the translated word corresponding to each text word is obtained, a target text word is determined from N text words according to the important index parameter of the translated word corresponding to each text word, a model translation loss function is determined according to the first translation result corresponding to the target text word, the second translation result corresponding to the target text word and the translated word corresponding to the target text word in the translated text data, and the model parameter of the first translation model is corrected according to the model translation loss function, and obtaining a target translation model. By implementing the scheme, the sample text data can be processed, and the target text words are selected from the text words contained in the sample text data for knowledge distillation, so that the overfitting phenomenon is effectively prevented, the knowledge distillation efficiency is improved, and the accuracy of the acquired target translation model is improved.

Drawings

In order to more clearly illustrate the technical solutions of the embodiments of the present invention, the drawings needed to be used in the description of the embodiments are briefly introduced below, and it is obvious that the drawings in the following description are some embodiments of the present invention, and it is obvious for those skilled in the art to obtain other drawings based on these drawings without creative efforts.

Fig. 1 is a schematic diagram of an application architecture according to an embodiment of the present application;

fig. 2 is a schematic flowchart of a data processing method according to an embodiment of the present application;

fig. 3 is a schematic view of a scenario for determining a target translation model according to an embodiment of the present application;

fig. 4 is a schematic flowchart of a data processing method according to an embodiment of the present application;

fig. 5 is a schematic view of a scenario for determining a target queue according to an embodiment of the present application;

FIG. 6a is a schematic flow chart of model training provided in the embodiments of the present application;

FIG. 6b is a schematic flow chart of model training according to an embodiment of the present disclosure;

fig. 7 is a schematic structural diagram of a data processing apparatus according to an embodiment of the present application;

fig. 8 is a schematic structural diagram of an electronic device according to an embodiment of the present application.

Detailed Description

The technical solution in the embodiments of the present invention will be clearly and completely described below with reference to the accompanying drawings in the embodiments of the present invention.

The embodiment of the application provides a data processing method, which can enable a target translation model obtained through training to translate a text more accurately.

The data processing method provided by the embodiment of the application is realized in the electronic equipment, and the electronic equipment can be a server or a terminal. The server may be an independent physical server, a server cluster or a distributed system formed by a plurality of physical servers, or a cloud server providing basic cloud computing services such as cloud service, a cloud database, cloud computing, a cloud function, cloud storage, network service, cloud communication, middleware service, domain name service, security service, CDN, big data and artificial intelligence platform. The terminal may be, but is not limited to, a smart phone, a tablet computer, a laptop computer, a desktop computer, a smart speaker, a smart watch, and the like. The terminal and the server may be directly or indirectly connected through wired or wireless communication, and the application is not limited herein.

The embodiment of the application relates to the technical field of Machine Learning, wherein Machine Learning (ML) is a multi-field cross subject and relates to multi-field subjects such as probability theory, statistics, approximation theory, convex analysis, algorithm complexity theory and the like. The special research on how a computer simulates or realizes the learning behavior of human beings so as to acquire new knowledge or skills and reorganize the existing knowledge structure to continuously improve the performance of the computer. Machine learning is the core of artificial intelligence, is the fundamental approach for computers to have intelligence, and is applied to all fields of artificial intelligence. Machine learning and deep learning generally include techniques such as artificial neural networks, belief networks, reinforcement learning, transfer learning, inductive learning, and formal education learning.

Referring to fig. 1, fig. 1 is a diagram illustrating an application architecture according to the present disclosure, where the application architecture can implement the data processing method according to the present disclosure. Specifically, fig. 1 includes an electronic device, and a first translation model, a second translation model, and related data stored for training are deployed in the electronic device, the electronic device may input sample text data in the related data into the first translation model, output a first translation result by the first translation model, input sample text data in the related data into the second translation model, and output a second translation result by the second translation model, and the electronic device may implement training of the first translation model according to the first translation result, the second translation result, and translation text data in the related data, and the like. The second translation model is a pre-trained model, relevant data used for training the second translation model can be the same as or different from relevant data used for training the first translation model, model translation performance of the second translation model indicates that the model can realize text translation with higher accuracy, the first translation model is trained through a second translation result generated by the second translation model, model translation performance of the second translation model can be transmitted to the first translation model, further, the trained target translation model has model translation performance similar to that of the second translation model, in the training process of the first translation model, model parameters of the first translation model are corrected, and model parameters of the second translation model can be kept unchanged.

Optionally, in some embodiments, the electronic device may process the training data through the technical solution of the present application, and select suitable sample data from the training data to perform knowledge distillation, so as to implement selective knowledge distillation. Knowledge Distillation (KD) related to the present application is to transfer Knowledge generated by one Model according to input sample data to another Model for learning, the Knowledge Distillation includes a Teacher Model (Teacher Model) and a Student Model (Student Model), the Teacher Model is a Model for transferring Knowledge in Knowledge Distillation, and the Student Model is a Model for learning and accepting Knowledge in Knowledge Distillation. It is understood that the student model in knowledge distillation is the first translation model in fig. 1 mentioned above, and the teacher model in knowledge distillation is the second translation model in fig. 1 mentioned above.

Optionally, in some embodiments, the electronic device may execute the data processing method according to an actual service requirement. For example, the technical solution of the present application can be applied to a language translation model training scenario (such as chinese-to-english translation, english-to-germany translation, etc.), when the electronic device obtains sample text data (such as a chinese training sample sentence) and translation text data of the sample text data (such as an english translation sentence corresponding to the chinese training sample sentence), executing the data processing method, determining target text words from N text words included in the sample text data, performing knowledge distillation based on the target text words, and determining a model translation loss function according to an additional supervised learning signal obtained after knowledge distillation and the like, and modifying the model parameters of the first translation model according to the model translation loss function to obtain a target translation model, therefore, the training of the language translation model is realized, better model training effect and knowledge distillation effect are achieved, and better translation performance and accuracy are realized. For another example, the technical scheme of the present application can also be applied to a model compression scenario, and by executing the data processing method of the present application, model compression and model volume reduction can be achieved, so that in the actual application of the target translation model, deployment can be better performed, the overhead of calculation and storage resources is reduced, the application cost is reduced, the translation efficiency is improved, and better use experience is brought to the user.

Optionally, data related to the present application, such as important index parameters and the like of translation terms corresponding to each text term included in the sample text data, may be stored in the database, or may be stored in the blockchain, such as by blockchain distributed storage, which is not limited in the present application.

It is to be understood that the foregoing scenarios are only examples, and do not constitute a limitation on application scenarios of the technical solutions provided in the embodiments of the present application, and the technical solutions of the present application may also be applied to other scenarios. For example, as can be known by those skilled in the art, with the evolution of system architecture and the emergence of new service scenarios, the technical solution provided in the embodiments of the present application is also applicable to similar technical problems.

The scheme provided by the embodiment of the application relates to the technologies such as machine learning of artificial intelligence and the like, and is specifically explained by the following embodiment:

based on the above description, the present application embodiment proposes a data processing method that can be executed by the above-mentioned electronic device. As shown in fig. 2, the flow of the data processing method in the embodiment of the present application may include:

s201, sample text data and translation text data of the sample text data are obtained.

The process and principle of the training of the first translation model by using each sample text data are the same, and the final trained model (such as a target translation model described below) can be obtained by performing several times of iterative training on the first translation model by using the one or more sample text data. Therefore, the following description will be given taking a process of training the first translation model with one sample text data as an example. In addition, one sample text data is composed of one or more training sample sentences, so that N text words included in the sample text data can be obtained according to the training sample sentences composing the sample text data, N is a positive integer, and the specific value of N is determined according to the actual application scene.

The translated text data may include a translated term corresponding to each text term in the N text terms; the sample text data and the translated text data have different text types. It is understood that the embodiment of the present application can be applied to any language translation model training process, such as chinese-english translation, english-german translation, and the like. If the text type of the sample text data is chinese and the text type of the corresponding translated text data is english in the chinese-to-english translation scenario, the sample text data may be one or more chinese training sample sentences, where a chinese training sample sentence is taken as an example for explanation, and correspondingly, the corresponding translated text data is an english translated sentence corresponding to the chinese training sample sentence.

In one possible implementation, the sample text data (also referred to as source-side sentence) is segmented into N text words, and the translation word corresponding to each text word is determined in the corresponding translation text data (also referred to as correct target-side sentence). For example, if the sample text data is "I'm happy" and includes 3 text words "I", "very", "happy", the corresponding translated text data is "I'm very happy", and the translated word corresponding to the text word "I" is "I'm", the translated word corresponding to the text word "very" is "very", and the translated word corresponding to the text word "happy" is "happy".

S202, inputting the sample text data into a first translation model, generating a first translation result corresponding to each text word in the first translation model, inputting the sample text data into a second translation model, and generating a second translation result corresponding to each text word in the second translation model.

In one possible embodiment, the first translation result includes a prediction probability of each text word corresponding to each word in a translation lexicon, and the first translation result generated in the first translation model respectively corresponding to each text word may be specifically defined by defining a translation lexicon in advance, the translation lexicon including a plurality of words, and translating the text data of type a into text data of type B, the type B being a text type to which the translation lexicon belongs, the text type of the translation lexicon being the same as the translation text data, and the translation text data corresponding to the sample text data being obtained from the words in the translation lexicon by obtaining the translation text data corresponding to the sample text data, so that the translation words in the translation text data belong to the translation lexicon, the first translation model generating a prediction probability of each text word for each word in the translation lexicon based on the translation lexicon, that is, the first translation model predicts the probability that the predicted translation word corresponding to the text word is each word in the translation word stock, and determines the prediction probability of each text word generated by the first translation model for each word in the translation word stock as the first translation result. The specific manner of generating the second translation result corresponding to each text word in the second translation model may be the same as the specific manner of generating the first translation result, and details are not repeated here in the embodiments of the present application.

For example, the sample text data includes a text word "XX", which may be any one of the words, "a", "B", "C", "D", "E" in the translation lexicon, and the text word is input into the first translation model, which generates probabilities that the predicted translation word corresponding to "XX" is each word in the translation lexicon, e.g., the predicted translation word corresponding to the text word "XX" is 20% of the predicted probability of the translation word "a" in the translation lexicon, 10% of the predicted probability of the word "B", 40% of the predicted probability of the word "C", 5% of the predicted probability of the word "D", and 25% of the predicted probability of the word "E" obtained from the first translation model, and the predicted probabilities are used as the first translation result of the text word "XX".

S203, obtaining the important index parameters of the translation words corresponding to each text word, and determining the target text word from the N text words according to the important index parameters of the translation words corresponding to each text word.

The important index parameters of the translated words can be obtained by calculating important evaluation index values of the translated words, the important evaluation index values are indexes for measuring the difficulty degree of one text word, the larger the important evaluation index value of one text word is, the more difficult a sample is indicated to the training of the first translation model by the text word, and the more important a sample is, model training is performed by the target text word determined according to the important evaluation index values, the income of the trained model can be effectively improved, the trained model can have higher translation accuracy and translation efficiency, and the trained model can learn more effective characteristics. Alternatively, the important evaluation index value may be a Cross Entropy (CE) of a translation term corresponding to each text term.

In a possible embodiment, the N text words include an ith text word, i is a positive integer less than or equal to N, and the first translation result of the ith text word includes a prediction probability of the ith text word for each word in the translation word library, and the specific manner of obtaining the important index parameter of the translation word corresponding to the ith text word may be to determine the prediction probability of the translation word corresponding to the ith text word in the first translation result of the ith text word as a target prediction probability, and generate the important index parameter of the translation word corresponding to the ith text word according to the target prediction probability. The specific way of determining the target prediction probability may be to find the prediction probability of the corresponding translation word in the first translation result according to the translation word corresponding to the ith text word, and determine the prediction probability of the corresponding translation word as the target prediction probability. Specifically, the important index parameter for generating the translation word corresponding to the ith text word according to the target prediction probability may be a cross entropy value of the translation word corresponding to the ith text word determined according to a cross entropy loss function, and the cross entropy value is determined as the important index parameter, where the cross entropy loss function is:

where | V | is the size of the translation lexicon, 1 is an indicator function, p (| -) is the prediction probability given the first translation model, θ |. |)_SIs a model parameter of the first translation model, y_iFor the predicted translated term corresponding to the ith text term,is the correct translation word corresponding to the ith text word, and x is sample text data.

Illustratively, the text word contained in the sample text data is "XX", the words in the translation lexicon are "a", "B", "C", "D", "E", the text word is input into the first translation model, the generated first translation result is that the prediction probability of the predicted translation word corresponding to the text word "XX" is 20% of the prediction probability of the word "a" in the translation lexicon, the prediction probability of the word "B" is 10%, the prediction probability of the word "C" is 40%, the prediction probability of the word "D" is 5%, the prediction probability of the word "E" is 25%, if the correctly translated word corresponding to the text word "XX" is "C", the prediction probability of the word "C" is determined as the target prediction probability, and the target prediction probability is substituted into the cross entropy loss function to obtain the cross entropy value of the translation word corresponding to the text word "XX".

In a possible implementation manner, according to the important index parameters of the translated terms corresponding to each text term, the specific manner of determining the target text term from the N text terms may be to sort the N text terms according to a descending order of the important index parameters of the translated terms corresponding to each text term, to obtain N sorted text terms, to obtain a sample selection number K, and to determine the top K text terms of the N sorted text terms as the target text terms. Wherein K is a non-negative integer less than or equal to N.

Alternatively, the sample selection number K may be determined by the number of text words whose important index parameter of the corresponding translation word is greater than a preset threshold, for example, the sample text data includes text words "a 1", "B1", "C1", "D1" and "E1", where the important index parameter of the translation sentence whose text word is "a 1" is 0.7, the important index parameter of the translation sentence whose text word is "B1" is 0.9, the important index parameter of the translation sentence whose text word is "C1" is 0.4, the important index parameter of the translation sentence whose text word is "D1" is 0.1, the important index parameter of the translation sentence whose text word is "E1" is 0.55, and the translation sentences are sorted according to the order of the important index parameters from large to small, that is "B1", "a 1", "E1", "C1" and "D1", if the threshold is 0.6, the sample selection number K is 2 and the determined target text words are "a 1", "B1".

Alternatively, the sample selection number K may be determined according to the number of text words r% before the selection sorting, for example, if the sample text data includes text words "a 2", "B2", "C2", "D2" and "E2", where the important index parameter of the translation sentence corresponding to the text word "a 2" is 0.7, the important index parameter of the translation sentence corresponding to the text word "B2" is 0.9, the important index parameter of the translation sentence corresponding to the text word "C2" is 0.4, the important index parameter of the translation sentence corresponding to the text word "D2" is 0.1, the important index parameter of the translation sentence corresponding to the text word "E2" is 0.55, the sorting is performed according to the descending order of the important index parameters, that is, "B2", "a 2", "E2", "C2" and "D2", and if the r% is 20%, the sample selection number K is 1, and the determined target text word is "B2".

Optionally, the sample selection number K may also be determined by the number of text words whose important index parameter of the corresponding translation word is greater than the preset threshold and is r% of the front of the sequence, for example, the text words included in the sample text data are "A3", "B3", "C3", "D3" and "E3", where the important index parameter of the translation sentence whose text word is "A3" is 0.7, the important index parameter of the translation sentence whose text word is "B3" is 0.9, the important index parameter of the translation sentence whose text word is "C3" is 0.4, the important index parameter of the translation sentence whose text word is "D3" is 0.1, the important index parameter of the translation sentence whose text word is "E3" is 0.55, and the sequence is ordered according to the important index parameters, that is "B3", "A3", "E3", "C3", "D3", if r% is set to 40% and the threshold is set to 0.8, then the sample selection number K is 1 and the determined target text word is "B3".

S204, determining a model translation loss function according to a first translation result corresponding to the target text word, a second translation result corresponding to the target text word and a translation word corresponding to the target text word in the translation text data, and correcting a model parameter of the first translation model according to the model translation loss function to obtain a target translation model.

In one possible embodiment, in the knowledge distillation process, the first translation model acquires additional supervised learning information by learning the second translation result output by the second translation model, and the cross entropy loss function is used for the knowledge distillation of the word levelReplacing the one-hot (one-hot) label in the first translation model with a second translation result output by a second translation model to obtain a cross entropy loss function：

Wherein q (y)_i＝k|y_＜i,x；θ_T) Is the second translation result, θ, output by the second translation model_TAnd theta_SModel parameters of the second translation model and the first translation model, respectively.

Therefore, according to the first translation result corresponding to the target text word, the second translation result corresponding to the target text word, and the translation word corresponding to the target text word in the translation text data, the obtained model translation loss function is:where α can be considered as a weight for balancing the values of the two cross entropy loss functions.

In one possible embodiment, modifying model parameters of the first translation model based on the model translation loss function is performed by optimizing the model translation loss function during model trainingImplementation, i.e. minimizing translation loss functionAnd further obtaining a target translation model. It is understood that, here, only one training of the model is taken as an example, the above-mentioned training method may be performed on the first translation model for a plurality of times during the training process, and the same or different sample text data may be used for training during the training process for a plurality of times during the training process of the modelThe number of times of model training can be determined by a technician according to actual training conditions, and can be determined by training the first translation model for multiple times until the model converges, namely, by using a loss functionAnd when the minimization is realized, completing the training of the model so as to obtain the final target translation model. In the multiple model training process, the steps can be executed regardless of whether the same sample text data or different sample text data is used, and the process and principle of any model training are the same.

For example, referring to fig. 3, fig. 3 is a schematic view of a scenario for determining a target translation model, where sample text data includes N text words (word 1, word 2, … …, word N), the sample text data is input into a first translation model to obtain a first translation result corresponding to each text word, the sample text data is input into a second translation model to obtain a second translation result corresponding to each text word, and an important index parameter of a translation word corresponding to each text word is obtained, and then a target text word is determined from the N text words according to the important index parameter of the translation word corresponding to each text word, a first translation result corresponding to the target text word and a second translation result corresponding to the target text word are obtained, and based on the first translation result corresponding to the target text word, the second translation result corresponding to the target text word, and the translation word corresponding to the target text word in the translation text data And correcting the model parameters of the first translation model to obtain the target translation model.

In the embodiment of the application, the electronic device obtains sample text data and translated text data of the sample text data, inputs the sample text data into a first translation model, generates a first translation result corresponding to each text word in the first translation model, inputs the sample text data into a second translation model, generates a second translation result corresponding to each text word in the second translation model, obtains an important index parameter of the translated word corresponding to each text word, determines a target text word from N text words according to the important index parameter of the translated word corresponding to each text word, determines a model translation loss function according to the first translation result corresponding to the target text word, the second translation result corresponding to the target text word and the translated word corresponding to the target text word in the translated text data, and correcting the model parameters of the first translation model according to the model translation loss function to obtain a target translation model. By implementing the scheme, the sample text data can be processed, and the appropriate target text words are selected from the text words contained in the sample text data for knowledge distillation, so that negative effects on the overall distillation effect can be avoided to a certain extent when part of the text words are subjected to knowledge distillation, an overfitting phenomenon is effectively prevented, the knowledge distillation efficiency and effect are improved, and the translation accuracy of the obtained target translation model is effectively improved.

Referring to fig. 4, fig. 4 is a flowchart illustrating a data processing method according to an embodiment of the present application, where the method can be executed by the above-mentioned electronic device. As shown in fig. 4, the flow of the data processing method in the embodiment of the present application may include:

s401, sample text data and translation text data of the sample text data are obtained. For a specific implementation of step S401, reference may be made to the related description of step S201 in the foregoing embodiment, and details are not described here again.

S402, inputting sample text data into a first translation model, generating a first translation result corresponding to each text word in the first translation model, inputting the sample text data into a second translation model, and generating a second translation result corresponding to each text word in the second translation model. For a specific implementation of step S402, reference may be made to the related description of step S202 in the foregoing embodiment, and details are not described here again.

And S403, acquiring important index parameters of the translation words corresponding to each text word.

In one possible embodiment, the important index parameter may be determined by one or more important evaluation index values, wherein the important evaluation index values may include any one or more of the following: the cross entropy value of the translated words, the word frequency of the translated words and the length of the translated words.

Optionally, if the important evaluation index value includes the cross entropy of the translated term, the specific obtaining manner may join in the related description of the cross entropy of the translated term calculated in step S203, which is not described herein again; optionally, if the important evaluation index value includes a word frequency of the translation word, the specific obtaining manner may be: acquiring translation text data corresponding to all sample text data used for model training, calculating the number (set as W) of translation words in all translation text data, and calculating the number (set as M) of words in translation words included in all translation text data, which are designated translation words, so that the word frequency of the designated translation text is M/W, for example, the translation word I corresponding to the text word I is an english word "happy", and the number of translation words in all translation text data is 1000, the number of words in translation words included in all translation text data, which are translation words I "happy" is 10, and the word frequency of the translation word I is 10/1000 ═ 0.01; optionally, if the important evaluation index value includes the length of the translation term, the specific obtaining manner may be: the character length of the translation word is calculated, for example, if the translation word I corresponding to the text word I is an english word, the character length of the translation word I is the number of letters constituting the english word, that is, if the translation word I is an english word "happy", the length of the translation word is 5.

In a possible implementation manner, if the important evaluation index value includes the word frequency of the translated word, since the translated word with a smaller word frequency is selected, the word frequency of the translated word can be processed, that is, the word of the translated word can be input into the formula: log (word frequency) obtains the word frequency of the processed translated words, and determines the word frequency of the processed translated words as an important evaluation index value.

In a possible implementation manner, the N text words include an ith text word, i is a positive integer less than or equal to N, and if the important index parameter is determined by an important evaluation index value, the important evaluation index value of the translation word corresponding to the ith text word is determined as the important index parameter of the translation word corresponding to the ith text word; if the important index parameter is determined by at least two important evaluation index values, the specific way of determining the important index parameter of the translated term corresponding to the ith text term is to aggregate the at least two evaluation index values when the at least two important evaluation index values of the ith text term are obtained, and further obtain the important index parameter of the translated term corresponding to the ith text term. The specific way of aggregating the at least two evaluation index values to obtain the important index parameter of the translated term corresponding to the ith text term may be to obtain an evaluation weight corresponding to each of the at least two important evaluation index values, weight each of the important evaluation index values according to the evaluation weight corresponding to each of the important evaluation index values, obtain a weighted index value corresponding to each of the important evaluation index values, and determine the important index parameter of the translated term corresponding to the ith text term according to the weighted index value corresponding to each of the important evaluation index values. Optionally, the determining the important index parameter of the translated term corresponding to the i-th text term according to the weighting index value corresponding to each important evaluation index value may specifically be to sum the weighting index values corresponding to each important evaluation index value to obtain the important index parameter of the translated term corresponding to the i-th text term. The evaluation weight may be determined by a technician in accordance with an actual model training process, or in accordance with structural designs of the second translation model and the first translation model, or in accordance with factors such as text types of the sample text data and the translated text data.

Illustratively, the important index parameter is determined by three important evaluation index values, which are respectively a cross entropy value (represented by An index value 1) of the translated term, a word frequency (after processing) (represented by An index value 2) of the translated term, and a length (represented by An index value 3) of the translated term, if the i-th text term is "An", and An index value 1 of the translated term corresponding to the i-th text term "An" is 1.6, An index value 2 is 0.3, and An index value 3 is 0.04, the evaluation weight of the index value 1 is 1.6, the evaluation weight of the index value 2 is 0.3, and the evaluation weight of the index value 3 is 0.04, so that the corresponding important evaluation index values are weighted according to the evaluation weights, and the weighted index value corresponding to the index value 1 is 0.64, the weighted index value corresponding to the index value 2 is 0.3, the weighted index value 3 is 0.2, and the weighted index value 3 is corresponding to the index value 3, and summing the corresponding weighted index values to obtain an important index parameter of the translation word corresponding to the ith text word, wherein the important index parameter is 1.14.

S404, sequencing the N text words according to the descending order of the important index parameters of the translation words corresponding to each text word respectively to obtain the sequenced N text words, obtaining the sample selection number K, and determining the first K text words in the sequenced N text words as target text words.

Wherein K is a non-negative integer less than or equal to N.

In one possible embodiment, the N text words that are ordered may be one training sample sentence or multiple training sample sentences from one model training process. If the sample text data including N text words is a multi-sentence training sample sentence, a batch (batch) of training sample sentences is divided for one training of the first translation model, and the batch of training sample sentences is used as sample text data, so that an important index parameter of each text word included in the batch of training sample sentences can be determined.

Optionally, the specific manner of determining the important index parameters of the translation words corresponding to each text word in a batch of training sample sentences is the same as the determination of the important index parameters of the translation words corresponding to each text word in a batch of training sample sentences, that is, taking a batch of training sample sentences as a unit, obtaining the text words included in each training sample sentence and the corresponding translation text data, determining the important index parameters of the translation words corresponding to the text words included in each training sample sentence, and ranking the N text words obtained based on the batch of training sample sentences according to the descending order of the important index parameters of the translation words corresponding to the N text words, so as to obtain the ranked N text words. Based on this, when N text words to be sorted come from the same batch of training sample sentences, the method of selecting the target text word is also called a local selection method. Optionally, the specific manner of obtaining the sample selection number K may be determined according to the number of the text words which are r% before the selection and the sorting, may also be determined according to the number of the text words of which the important index parameter of the corresponding translation word is greater than the preset threshold, and may also be determined according to the number of the text words of which the important index parameter of the corresponding translation word is greater than the preset threshold and which is r% before the sorting.

In a possible embodiment, the N text words to be ranked may also come from multiple training sample sentences in the multiple model training process, that is, from different batches of training sample sentences, and in this way, the specific way to obtain the sample text data is as follows: the method comprises the steps of obtaining Z text words of a First translation model in the j-1 model training process, adding the Z text words into a First-in First-out (FIFO) queue, obtaining newly added sample text data in the j model training process of the First translation model, adding the text words contained in the newly added sample text data into the FIFO queue containing the Z text words to obtain a target queue, and determining the sample text data according to the text words in the target queue. Wherein j is a positive integer greater than 1 and Z is equal to N. The newly added sample text data may be a batch of training sample sentences. The method comprises the steps of obtaining a new training sample sentence and a plurality of text words included in the new training sample sentence each time model training is carried out, adding the text words into a first-in first-out queue, wherein the first-in first-out queue contains a plurality of text words included in a plurality of batch training sample sentences in previous model training or previous model training, obtaining a target queue of the model training when the text words included in the new training sample sentence in model training are added into the first-in first-out queue, and determining sample text data according to N text words in the target queue, so that the sample text data comprises the text words included in the new training sample sentence in the model training and the text words included in the training sample sentence in the previous model training.

It can be understood that when the text word included in the newly added sample text data is added into the fifo queue, the text word which is the first-in text word in the queue is correspondingly removed, for example, as shown in fig. 5, fig. 5 shows that, when the newly added sample text data in the jth model training process includes the text word added into the fifo queue, a part of the text word in the previous model training process (for example, jth-1 time and jth-2 time) is already included in the fifo queue, in the jth model training process, as the newly added sample text data including the text word is added into the fifo queue, the text word in the fifo queue is removed, thereby obtaining the target queue, which includes all the text words included in the newly added sample text data in the jth model training process, and a part of the text word in the previous model training process (for example, jth-1 time), and determining sample text data of the jth model training according to the text words in the target queue, and acquiring the target text words from the text words in the target queue. In addition, after the text words contained in the newly added sample text data are added to the first-in first-out queue, if the number of the text words in the first-in first-out queue does not exceed the maximum number which can be accommodated by the first-in first-out queue, the text words in the first-in first-out queue cannot be removed. Further, in the training process of the model of the jth order, sample text data can be determined according to the N text words in the target queue, a new training sample sentence in the sample text data is input into the first translation model, a first translation result corresponding to each of the plurality of text words included in the new training sample sentence is obtained, the first translation result corresponding to each of the text words generated in the training process of the training sample sentence except the new training sample sentence in the sample text data in the previous model training process and the first translation result corresponding to each of the text words included in the plurality of text words included in the new training sample sentence are determined as the first translation result corresponding to each of the N text words included in the sample text data of the jth order of the model training, and the second translation results are the same. And acquiring important index parameters of translation words corresponding to each text word in the plurality of text words included in the new training sample sentence, determining the important index parameters of the translation words corresponding to each text word acquired in the previous model training process of the training sample sentences except the new training sample sentence in the sample text data and the important index parameters of the translation words corresponding to each text word in the plurality of text words included in the new training sample sentence as the important index parameters of the translation words corresponding to each text word in the N text words, and determining the target text word based on the important index parameters of the translation words corresponding to each text word. Based on this, when the N text words to be sorted come from different batches of training sample sentences, the method of selecting the target text word is also called a global selection method, and the global selection method may be. Optionally, the specific manner of determining the target text word may be the same as the manner of determining the target text word in the local selection method.

The method comprises the steps that a part of text words are subjected to knowledge distillation, the whole distillation effect is possibly negatively influenced, therefore, a proper target text word is adaptively selected to be subjected to knowledge distillation by combining the training condition of a first translation model, forward benefits can be brought to the training of the first translation model, the target text word can also be called as a difficult sample, the sample is a difficult sample in the training process of the first translation model, knowledge distillation is required to be carried out through a second translation model, an additional supervised learning signal is obtained, and the first translation model can learn the additional supervised learning signal to achieve a better training effect. The important index parameter is an index for measuring the difficulty degree of a text word.

Illustratively, batch I includes text words of "a 1", "a 2", "A3", "a 4", "a 5", batch II includes text words of "B1", "B2", "B3", "B4", batch III includes text words of "C1", "C2", "C3", "C4", "C5", if the method of selecting the target text word is a local selection method, (1) when the method of selecting the target text word is a local selection method, training the first translation model with batch I as sample text data at the time of model training z-th, and obtaining an important index parameter of the translation word corresponding to each text word in batch I, and selecting the target text word from the text words included in batch I for knowledge distillation, training the first translation model with batch II as sample text data at the time of model training z +1, and obtaining an important index parameter of the translation word corresponding to each text word in batch II, and selecting target text words from the text words included in the batch II for knowledge distillation, training the first translation model by taking the batch III as sample text data during the z +2 model training, acquiring important index parameters of the translation words corresponding to each text word in the batch III, and selecting the target text words from the text words included in the batch III for knowledge distillation. For example, as shown in fig. 6a, fig. 6a is a schematic flowchart of a process of selecting a target text word by using a local selection method and performing model training, where fig. 6a includes a first translation model, a second translation model, related data for model training (multiple batches of training sample sentences, translation text data corresponding to the training sample sentences, translation word stock, and the like), and a server for executing a step of model training, where the related data is stored in the server, and specifically:

firstly, the server divides a batch of training sample sentences for the training of the model of the z-th time to obtain a batch I, and the batch I is used as sample text data for the training of the model of the z-th time.

Secondly, the server inputs the batch I into the first translation model to obtain a first translation result, and inputs the batch I into the second translation model to obtain a second translation result.

Thirdly, the server obtains important index parameters of the translation words corresponding to each text word in the batch I, and determines a target text word from each text word in the batch I according to the important index parameters.

And fourthly, the server trains the first translation model according to the first translation result corresponding to the target text word, the second translation result corresponding to the target text word and the translation word corresponding to the target text word in the translation text data.

And fifthly, dividing a batch of training sample sentences for the model training of the z +1 th time by the server to obtain a batch II, and taking the batch II as sample text data of the model training of the z +1 th time.

And sixthly, the server inputs the batch II into the first translation model to obtain a first translation result, and inputs the batch II into the second translation model to obtain a second translation result.

And the server acquires an important index parameter of the translation word corresponding to each text word in the batch II and determines a target text word from each text word in the batch II according to the important index parameter.

And the server trains the first translation model according to the first translation result corresponding to the target text word, the second translation result corresponding to the target text word and the translation word corresponding to the target text word in the translation text data. It is understood that the above steps are also performed when performing the z +2 th model training.

(2) If the method for selecting the target text words is a global selection method, then batch I is taken as newly added sample text data during the z-th model training, the text words included in batch I are added into a first-in first-out queue, the maximum number of the text words which can be accommodated by the first-in first-out queue is set to be 8, and no text word is added in the first-in first-out queue before, so the text words in the first-in first-out queue are 'A1', 'A2', 'A3', 'A4' and 'A5', a target queue is obtained, the sample text data is determined according to the text words in the target queue, important index parameters of translation words corresponding to each text word in batch I are obtained, the target text words are selected from the text words included in batch I for knowledge distillation, and batch II is taken as newly added sample text data during the z + 1-th model training, adding the text words included in the batch II into a first-in first-out queue, wherein the target queue obtained by moving the text words included in the batch II into the first-in first-out queue has partial text words in the batch I and text words in the batch II because the 5 text words of the batch I exist in the first-in first-out queue, namely the target queue comprises the text words of 'A2', 'A3', 'A4', 'A5', 'B1', 'B2', 'B3' and 'B4', determining the target queue as sample text data of model training for the (z + 1) th time, obtaining important index parameters of translated texts corresponding to the text words of the batch II in the sample text data and important index parameters of the translated texts corresponding to the text words of the batch I in the sample text data in the model training process for the (z) th time, and determining the target text words according to the important index parameters of the translated texts corresponding to the text words in the sample text data, namely, according to the important index parameters obtained in the model training process of the z-th time of the translated texts corresponding to the text words "A2", "A3", "A4" and "A5", respectively, and according to the important index parameters obtained in the model training process of the z + 1-th time of the translated texts corresponding to the text words "B1", "B2", "B3" and "B4", respectively, the target text words subjected to knowledge distillation in the model training process of the z + 1-th time are determined in each text word, similarly, the model training of the z + 2-th time takes the batch III as the newly added sample text data, and adds the text words included in the batch III into the first-in first-out queue to obtain the target queue, wherein the text words include "B2", "B3", "B4", "C1", "C2", "C3", "C4" and "C5", and the important index parameters corresponding to the translated text words of the batch III in the sample text data and the text words of the batch II in the sample text data are obtained Determining target text words according to important index parameters of translated texts corresponding to the text words in sample text data, namely determining the important index parameters of the translated texts corresponding to the text words in the sample text data in the model training process of the z +1 th time according to the important index parameters of the translated texts corresponding to the text words of "B2", "B3" and "B4", and determining the target text words subjected to knowledge distillation in the model training process of the z +2 th time in the text words according to the important index parameters of the translated texts corresponding to the text words of "C1", "C2", "C3", "C4" and "C5". For example, as shown in fig. 6b, fig. 6b is a schematic flowchart of a process of selecting a target text word by using a global selection method and performing model training, where fig. 6b includes a first translation model, a second translation model, related data (a plurality of batches of training sample sentences, translated text data corresponding to the training sample sentences, translated words, and the like) for model training, and a server for executing a model training step, where the related data is stored in the server, and specifically includes:

s1, the server obtains a target queue I of the model training of the z-th time, and determines sample text data according to the target queue I, wherein the target queue I comprises text words of batch I: the server divides a batch of training sample sentences for the training of the model of the z th time to obtain a batch I, the batch I is used as newly-added sample text data for the training of the model of the z th time, text words included in the batch I are added into a first-in first-out queue, and the text words which are not added before in the first-in first-out queue are set, so that the text words of the batch I are included in the first-in first-out queue to obtain a target queue I, and the sample text data for the training of the model of the z th time is determined according to the text words included in the target queue I.

S2, the server inputs the batch I into the first translation model to obtain a first translation result, and inputs the batch I into the second translation model to obtain a second translation result.

S3, the server obtains the important index parameters of the translation words corresponding to each text word in the batch I, and determines the target text word from each text word in the batch I according to the important index parameters.

S4, the server trains the first translation model according to the first translation result corresponding to the target text word, the second translation result corresponding to the target text word and the translation word corresponding to the target text word in the translation text data.

S5, the server obtains a target queue II of the model training of the z +1 th time, and determines sample text data according to the target queue II, wherein the target queue II comprises text words of batch I and batch II: the server divides a batch of training sample sentences for the model training of the z +1 th time to obtain a batch II, the batch II is used as newly-added sample text data for the model training of the z +1 th time, and adding the text words included in the batch II into a first-in first-out queue, wherein the first-in first-out queue is a target queue I of the z-th model training, if the number of the text words included in the batch II is more than the maximum number which can be accommodated when all the text words included in the batch II are added into the first-in first-out queue, therefore, when moving in the text words of batch II, the corresponding moving out firstly carries out the partial text words in the queue, and supposing that the first-in first-out queue comprises the partial text words of batch I and the text words of batch II, and obtaining a target queue II, and determining sample text data of the model training of the z +1 th time according to the text words included in the target queue II.

S6, the server inputs the batch II into the first translation model to obtain a first translation result corresponding to the batch II, and inputs the batch II into the second translation model to obtain a first translation result corresponding to the batch II.

S7, the server obtains the important index parameter of the translation word corresponding to each text word in the batch II, and determines the target text word from each text word in the target queue II according to the important index parameter of the translation word corresponding to each text word in the batch II and the important index parameter of the translation word corresponding to the text word in the target queue II in the batch I. The target text word may be from batch I or batch II.

S8, the server trains the first translation model according to the first translation result corresponding to the target text word, the second translation result corresponding to the target text word and the translation sentence corresponding to the target text word in the translation text data. It is understood that the above steps are also performed when performing the z +2 th model training.

S405, determining a model translation loss function according to a first translation result corresponding to the target text word, a second translation result corresponding to the target text word and a translation word corresponding to the target text word in the translation text data, and correcting a model parameter of the first translation model according to the model translation loss function to obtain a target translation model.

The specific manner for determining the model translation loss function and obtaining the target translation model may refer to the related description in step S204, which is not described herein again.

In a possible implementation manner, after the target translation model is obtained, the target translation model may be tested, and the specific manner may be to obtain text data to be translated, input the text data to be translated into the target translation model, and output the target translation text data of the text data to be translated based on the target translation model. The text data to be translated and the sample text data have the same text type, and the target translation text data and the translation text data have the same text type.

Optionally, a large number of tests on the technical scheme of the application show that the data processing method provided by the application improves the translation efficiency and accuracy of the model and the efficiency and effect of knowledge distillation to a certain extent. For example, in an english Translation scenario, a Neural Machine Translation (NMT) model, a transform model, is used as a first Translation model for training, and a target Translation model obtained based on the training of the technical scheme of the present application is tested, so that a BLEU score (Bilingual Evaluation substitution) is greatly improved, that is, the Translation effect of the Translation model can be significantly improved. For another example, in a chinese-english translation scenario, a Transformer model is trained as a first translation model, and a target translation model obtained by training based on the technical scheme of the present application is tested, so that compared with the translation effect of the Transformer model without knowledge distillation and the translation effect of the Transformer model without knowledge distillation for selecting sample data, the BLEU score value is greatly improved, and it is demonstrated that the technical scheme of the present application can be applied to translation model training scenarios of different languages, and can obtain stable translation effect improvement. Besides, the method can be applied to transform model training, and can also be applied to model training of RNN (Recurrent Neural Network), LSTM (Long Short-Term Memory), GRU (threshold cycle Unit), and the like, and the test shows that the model training effect is greatly improved.

In the embodiment of the application, the electronic device obtains sample text data and translation text data of the sample text data, inputs the sample text data into a first translation model, generates first translation results corresponding to each text word in the first translation model, inputs the sample text data into a second translation model, generates second translation results corresponding to each text word in the second translation model, obtains important index parameters of the translation words corresponding to each text word, sorts N text words according to the descending order of the important index parameters of the translation words corresponding to each text word to obtain N sorted text words, obtains a sample selection number K, determines the first K text words in the N sorted text words as target text words, and determines the first K text words as the target text words according to the first translation results corresponding to the target text words, And determining a model translation loss function according to a second translation result corresponding to the target text word and a translation word corresponding to the target text word in the translation text data, and correcting the model parameter of the first translation model according to the model translation loss function to obtain the target translation model. By implementing the scheme, the sample text data can be processed, and the proper target text words are dynamically selected from the text words contained in the sample text data in combination with the training condition of the first translation model for knowledge distillation, so that negative effects on the overall distillation effect possibly generated when part of the text words are subjected to knowledge distillation can be avoided to a certain extent, an over-fitting phenomenon is effectively prevented, the knowledge distillation efficiency and effect are improved, and the translation accuracy of the obtained target translation model is effectively improved.

Referring to fig. 7, fig. 7 is a schematic structural diagram of a data processing apparatus provided in the present application. It should be noted that the data processing apparatus shown in fig. 7 is used for executing the method of the embodiment shown in fig. 2 and 4 of the present application, and for convenience of description, only the portion related to the embodiment of the present application is shown, and details of the specific technology are not disclosed, and reference is made to the embodiment shown in fig. 2 and 4 of the present application. The test data processing apparatus 700 may include: an obtaining module 701, a processing module 702, and a determining module 703. Wherein:

an obtaining module 701, configured to obtain sample text data and translation text data of the sample text data; the sample text data comprises N text words, wherein N is a positive integer; the translated text data comprises a translated word corresponding to each text word in the N text words; the sample text data and the translated text data have different text types;

the processing module 702 is configured to input sample text data into a first translation model, generate a first translation result corresponding to each text word in the first translation model, input the sample text data into a second translation model, and generate a second translation result corresponding to each text word in the second translation model;

the determining module 703 is configured to obtain an important index parameter of a translation word corresponding to each text word, and determine a target text word from the N text words according to the important index parameter of the translation word corresponding to each text word;

the determining module 703 is further configured to determine a model translation loss function according to the first translation result corresponding to the target text word, the second translation result corresponding to the target text word, and the translation word corresponding to the target text word in the translation text data, and correct the model parameter of the first translation model according to the model translation loss function to obtain the target translation model.

In one possible embodiment, the N text words include an ith text word, i being a positive integer less than or equal to N; the first translation result of the ith text word comprises a prediction probability of the ith text word for each word in the translation word stock; the translation words in the translation text data belong to a translation word bank; when the determining module 703 is configured to obtain the important index parameter of the translation term corresponding to each text term, specifically:

determining the prediction probability of a translation word corresponding to the ith text word in a first translation result of the ith text word as a target prediction probability;

and generating an important index parameter of the translation word corresponding to the ith text word according to the target prediction probability.

In one possible embodiment, the N text words include an ith text word, i being a positive integer less than or equal to N; when the determining module 703 is configured to obtain the important index parameter of the translation term corresponding to each text term, specifically:

acquiring at least two important evaluation index values of the ith text word;

and aggregating at least two important evaluation index values to obtain important index parameters of the translation words corresponding to the ith text word.

In a possible implementation manner, the determining module 703 is specifically configured to, when configured to aggregate at least two important evaluation index values to obtain an important index parameter of a translation term corresponding to an i-th text term:

obtaining an evaluation weight corresponding to each important evaluation index value in at least two important evaluation index values;

and determining the important index parameters of the translation words corresponding to the ith text word according to the weighted index values corresponding to each important evaluation index value.

In a possible implementation manner, the determining module 703, when configured to determine the target text word from the N text words according to the importance index parameter of the translation word corresponding to each text word, is specifically configured to:

sequencing the N text words according to the descending order of the important index parameters of the translation words corresponding to each text word respectively to obtain N sequenced text words;

obtaining a sample selection number K, and determining the first K text words in the sequenced N text words as target text words; k is a non-negative integer.

In a possible implementation, the obtaining module 701, when configured to obtain the sample text data, is specifically configured to:

acquiring Z text words of the first translation model in the j-1 model training process; j is a positive integer greater than 1, Z is equal to N;

adding Z text words to a first-in first-out queue;

in the j model training process of the first translation model, acquiring newly added sample text data, and adding text words contained in the newly added sample text data to a first-in first-out queue containing the Z text words to obtain a target queue;

sample text data is determined from the text words in the target queue.

In one possible implementation, the processing module 702 is further configured to:

acquiring text data to be translated; the text data to be translated and the sample text data have the same text type;

inputting the text data to be translated into a target translation model, and outputting target translation text data of the text data to be translated based on the target translation model; the target translation text data has the same text type as the translation text data.

In the embodiment of the application, the obtaining module obtains sample text data and translated text data of the sample text data, the processing module inputs the sample text data into a first translation model, generates a first translation result corresponding to each text word in the first translation model, inputs the sample text data into a second translation model, generates a second translation result corresponding to each text word in the second translation model, the determining module obtains an important index parameter of the translated word corresponding to each text word, determines a target text word from N text words according to the important index parameter of the translated word corresponding to each text word, and the determining module determines a target text word from the N text words according to the first translation result corresponding to the target text word, the second translation result corresponding to the target text word and the translated word corresponding to the target text word in the translated text data, and determining a model translation loss function, and correcting the model parameters of the first translation model according to the model translation loss function to obtain the target translation model. By implementing the scheme, the sample text data can be processed, and the appropriate target text words are selected from the text words contained in the sample text data for knowledge distillation, so that negative effects on the overall distillation effect can be avoided to a certain extent when part of the text words are subjected to knowledge distillation, an overfitting phenomenon is effectively prevented, the knowledge distillation efficiency and effect are improved, and the translation accuracy of the obtained target translation model is effectively improved.

The functional modules in the embodiments of the present application may be integrated into one processing module, or each module may exist alone physically, or two or more modules are integrated into one module. The integrated module may be implemented in a form of hardware, or may be implemented in a form of software functional module, which is not limited in this application.

Please refer to fig. 8, fig. 8 is a schematic structural diagram of an electronic device according to an embodiment of the present disclosure. As shown in fig. 8, the electronic device 800 includes: at least one process 801, memory 802. Optionally, the electronic device may also include a network interface 803. Wherein data can be exchanged between the processor 801, the memory 802 and the network interface 803, the network interface 803 is controlled by the processor for transceiving messages, the memory 802 is used for storing a computer program comprising program instructions, and the processor 801 is used for executing the program instructions stored in the memory 802. Wherein the processor 801 is configured to invoke the program instructions to perform the methods described above.

The memory 802 may include volatile memory (volatile memory), such as random-access memory (RAM); the memory 802 may also include a non-volatile memory (non-volatile memory), such as a flash memory (flash memory), a solid-state drive (SSD), etc.; the memory 802 may also comprise a combination of the above-described types of memory.

The processor 801 may be a Central Processing Unit (CPU) 801. In one embodiment, the processor 801 may also be a Graphics Processing Unit (GPU) 801. The processor 801 may also be a combination of a CPU and a GPU.

In one embodiment, memory 802 is used to store program instructions. The processor 801 may call the program instructions to perform the following steps:

acquiring sample text data and translation text data of the sample text data; the sample text data comprises N text words, wherein N is a positive integer; the translated text data comprises a translated word corresponding to each text word in the N text words; the sample text data and the translated text data have different text types;

inputting sample text data into a first translation model, generating a first translation result corresponding to each text word in the first translation model, inputting the sample text data into a second translation model, and generating a second translation result corresponding to each text word in the second translation model;

acquiring important index parameters of translation words corresponding to each text word, and determining a target text word from the N text words according to the important index parameters of the translation words corresponding to each text word;

In one possible embodiment, the N text words include an ith text word, i being a positive integer less than or equal to N; the first translation result of the ith text word comprises a prediction probability of the ith text word for each word in the translation word stock; the translation words in the translation text data belong to a translation word bank; when the processor 801 is configured to obtain the important index parameter of the translation term corresponding to each text term, the processor is specifically configured to:

determining the prediction probability of a translation word corresponding to the ith text word in a first translation result of the ith text word as a target prediction probability;

and generating an important index parameter of the translation word corresponding to the ith text word according to the target prediction probability.

In one possible embodiment, the N text words include an ith text word, i being a positive integer less than or equal to N; when the processor 801 is configured to obtain the important index parameter of the translation term corresponding to each text term, the processor is specifically configured to:

acquiring at least two important evaluation index values of the ith text word;

and aggregating at least two important evaluation index values to obtain important index parameters of the translation words corresponding to the ith text word.

In a possible implementation manner, when the processor 801 is configured to aggregate at least two important evaluation index values to obtain an important index parameter of a translation term corresponding to an ith text term, the processor is specifically configured to:

obtaining an evaluation weight corresponding to each important evaluation index value in at least two important evaluation index values;

In one possible implementation, the processor 801, when configured to determine the target text word from the N text words according to the importance index parameter of the translation word corresponding to each text word, is specifically configured to:

sequencing the N text words according to the descending order of the important index parameters of the translation words corresponding to each text word respectively to obtain N sequenced text words;

obtaining a sample selection number K, and determining the first K text words in the sequenced N text words as target text words; k is a non-negative integer.

In one possible implementation, the processor 801, when configured to obtain sample text data, is specifically configured to:

acquiring Z text words of the first translation model in the j-1 model training process; j is a positive integer greater than 1, Z is equal to N;

adding Z text words to a first-in first-out queue;

sample text data is determined from the text words in the target queue.

In one possible implementation, the processor 801 is further configured to:

acquiring text data to be translated; the text data to be translated and the sample text data have the same text type;

In specific implementation, the apparatus, the processor 801, the memory 802 and the like described in the embodiments of the present application may perform the implementation described in the above method embodiments, and may also perform the implementation described in the embodiments of the present application, which is not described herein again.

Also provided in embodiments of the present application is a computer (readable) storage medium storing a computer program comprising program instructions that, when executed by a processor, perform some or all of the steps performed in the above-described method embodiments. Alternatively, the computer storage media may be volatile or nonvolatile.

Reference herein to "a plurality" means two or more. "and/or" describes the association relationship of the associated objects, meaning that there may be three relationships, e.g., a and/or B, which may mean: a exists alone, A and B exist simultaneously, and B exists alone. The character "/" generally indicates that the former and latter associated objects are in an "or" relationship.

It will be understood by those skilled in the art that all or part of the processes of the methods of the above embodiments may be implemented by a computer program, which may be stored in a computer storage medium, and the computer storage medium may be a computer readable storage medium, and when executed, the processes of the above embodiments of the methods may be included. The computer storage medium may be a magnetic disk, an optical disk, a Read-Only Memory (ROM), a Random Access Memory (RAM), or the like.

While the present disclosure has been described with reference to particular embodiments, it will be understood by those skilled in the art that various changes in form and details may be made therein without departing from the spirit and scope of the present disclosure.

29页详细技术资料下载

Data processing method, device, electronic equipment and medium

相关技术

网友询问留言