Information processing method and device and electronic equipment

文档序号：191273 发布日期：2021-11-02 浏览：34次中文

阅读说明：本技术 信息处理方法、装置和电子设备 (Information processing method and device and electronic equipment ) 是由曹军蒋庆男赵程绮王明轩李磊王晓晖于 2021-07-29 设计创作，主要内容包括：本公开实施例公开了信息处理方法、装置和电子设备。该方法包括：获取将使用源语言表达的待翻译信息输入到预训练的第一翻译模型得到的第一隐状态向量,和所述第一隐状态向量被预测为预设词汇表中各词的第一概率分布；从目标语言的向量索引库中,获取与所述第一隐状态向量满足预设条件的至少一个目标索引项,所述目标索引项包括第二隐状态向量；确定所述第二隐状态向量的第二概率分布；融合所述第一概率分布和所述第二概率分布,得到融合概率分布；利用所述融合概率分布确定翻译结果。实现了实时构建数据索引,基于近邻检索,对神经机器翻译模型的解码过程进行干预,可以提升机器翻译模型的领域表现。(The embodiment of the disclosure discloses an information processing method, an information processing device and electronic equipment. The method comprises the following steps: acquiring a first hidden state vector obtained by inputting information to be translated expressed by using a source language into a pre-trained first translation model, and predicting the first hidden state vector into a first probability distribution of each word in a preset vocabulary; acquiring at least one target index item meeting a preset condition with the first hidden state vector from a vector index library of a target language, wherein the target index item comprises a second hidden state vector; determining a second probability distribution of the second hidden state vector; fusing the first probability distribution and the second probability distribution to obtain a fused probability distribution; and determining a translation result by using the fusion probability distribution. The method realizes real-time construction of data indexes, intervenes in the decoding process of the neural machine translation model based on neighbor retrieval, and can improve the field expression of the machine translation model.)

1. An information processing method comprising:

acquiring a first hidden state vector obtained by inputting information to be translated expressed by using a source language into a pre-trained first translation model, and predicting the first hidden state vector into a first probability distribution of each word in a preset vocabulary;

acquiring at least one target index item meeting a preset condition with the first hidden state vector from a vector index library of a target language, wherein the target index item comprises a second hidden state vector; determining a second probability distribution that the second latent state vector is predicted as a word in the preset vocabulary;

fusing the first probability distribution and the second probability distribution to obtain a fused probability distribution;

and returning the fusion probability distribution to the first translation model so as to determine a translation result according to the fusion probability distribution by the first translation model.

2. The method of claim 1, wherein said fusing the first probability distribution and the second probability distribution to obtain a fused probability distribution comprises:

determining a first fusion proportion and a second fusion proportion corresponding to the first probability distribution and the second probability distribution respectively by using a pre-trained fusion proportion determination model;

and fusing the first probability distribution and the second probability distribution according to the first fusion proportion and the second fusion proportion to obtain fusion probability distribution.

3. The method of claim 1, wherein said fusing the first probability distribution and the second probability distribution to obtain a fused probability distribution comprises:

and taking the sum of the product of the first probability distribution and the first fusion proportion and the product of the second probability and the second fusion proportion as a fusion probability distribution.

4. The method of claim 2, wherein the second fusion ratio corresponding to the second probability distribution is determined by the following formula:

wherein

q_tIs a first hidden state vector; k is a radical of_iIs the ith second hidden state vector; i is more than or equal to 1 and less than or equal to k, wherein k is the number of target index items meeting the preset condition;

k (q, K; sigma) is a kernel function with sigma as a parameter.

5. The method of claim 1, wherein the vector index library is built based on the steps of:

inputting preset parallel linguistic data into a pre-trained second translation model, and decoding by the second translation model to obtain reference hidden state vectors corresponding to a plurality of morphemes of a target language in the preset linguistic data, wherein the preset parallel expected data comprises a source language preset linguistic data and a target language preset linguistic data with the same semantic meaning;

building the vector index library based on a plurality of the reference hidden state vectors; wherein

The second translation model is obtained by training the first translation model in the same translation model by using the same training scheme.

6. The method of claim 1, wherein the obtaining a first hidden state vector that would result from inputting information to be translated expressed using a source language into a pre-trained first translation model, and the first hidden state vector being predicted as a first probability distribution of words in a preset vocabulary, comprises:

and acquiring the first hidden state vector and the first probability distribution by utilizing a first preset remote calling interface.

7. The method according to claim 1, wherein the obtaining, from a vector index library of a target language, at least one target index entry satisfying a preset condition with the first hidden-state vector comprises:

and acquiring at least one target index item meeting a preset condition with the first hidden state vector from a vector index library of a target language by utilizing a second preset remote calling interface.

8. An information handling model, comprising: a first translation model, a second translation model, an index building module and a fusion proportion determining model, wherein,

the first translation model is to: converting input information to be translated expressed by using a source language into a first hidden state vector and predicting the first hidden state vector as a first probability distribution of each morpheme in a preset vocabulary of a target language; outputting the first hidden state vector and the first probability distribution to a receiving fusion proportion determination model through a first preset remote calling interface; receiving fusion probability distribution output by a fusion proportion determination model, and determining a translation result corresponding to the information to be translated according to the fusion probability distribution;

the second translation model is to: decoding an input preset corpus to obtain reference hidden state vectors corresponding to a plurality of preset morphemes of the preset corpus, and sending the reference hidden state vectors to the index establishing module;

the index building module is used for: establishing the vector index library based on the reference hidden state vector; acquiring at least one target index item meeting a preset condition with the first hidden state vector from a vector index library of a target language, wherein the target index item comprises a second hidden state vector; outputting the second hidden state vector to the fusion proportion determination model through a second preset remote calling interface;

the fusion proportion determination model is used for: determining a second probability distribution that the second latent state vector is predicted as a word in the preset vocabulary; determining respective fusion proportions of the first probability distribution and the second probability distribution, and fusing the first probability distribution and the second probability distribution according to the fusion proportions to obtain fusion probability distribution.

9. An information processing apparatus comprising:

a first obtaining unit, configured to obtain a first hidden state vector obtained by inputting information to be translated, which is expressed in a source language, to a pre-trained first translation model, and predict the first hidden state vector as a first probability distribution of each word in a preset vocabulary;

a second obtaining unit, configured to obtain, from a vector index library of a target language, at least one target index entry that satisfies a preset condition with the first hidden state vector, where the target index entry includes a second hidden state vector; determining a second probability distribution that the second latent state vector is predicted as a word in the preset vocabulary;

a fusion unit, configured to fuse the first probability distribution and the second probability distribution to obtain a fused probability distribution;

and the translation unit is used for returning the fusion probability distribution to the first translation model so as to determine a translation result according to the fusion probability distribution by the first translation model.

10. An electronic device, comprising:

one or more processors;

a storage device for storing one or more programs,

when executed by the one or more processors, cause the one or more processors to implement the method of any one of claims 1-7.

11. A computer-readable medium, on which a computer program is stored which, when being executed by a processor, carries out the method according to any one of claims 1-7.

Technical Field

The present disclosure relates to the field of artificial intelligence technologies, and in particular, to an information processing method and apparatus, and an electronic device.

Background

Neural Machine Translation (NMT) has rapidly emerged in recent years. Compared with the statistical machine translation, the neural network translation is relatively simple in model, and mainly comprises two parts, namely an encoder and a decoder. The encoder represents the source language as a high-dimensional vector after a series of neural network transformations. The decoder is responsible for re-decoding (translating) this high-dimensional vector into the target language.

With the development of deep learning techniques, NMT models have surpassed statistical-based approaches in most languages with the help of massive parallel corpora.

Disclosure of Invention

This disclosure is provided to introduce concepts in a simplified form that are further described below in the detailed description. This disclosure is not intended to identify key features or essential features of the claimed subject matter, nor is it intended to be used to limit the scope of the claimed subject matter.

The embodiment of the disclosure provides an information processing method and device and electronic equipment.

In a first aspect, an embodiment of the present disclosure provides an information processing method, including: inputting information to be translated expressed by using a source language into a pre-trained first translation model, and acquiring a first hidden state vector generated by the first translation model according to the information to be translated and a first probability distribution of each morpheme in a preset vocabulary of a target language predicted by the first hidden state vector; acquiring at least one target index item meeting a preset condition with the first hidden state vector from a vector index library of a target language, wherein the target index item comprises a second hidden state vector; determining a second probability distribution that the second latent state vector is predicted as a word in the preset vocabulary; fusing the first probability distribution and the second probability distribution to obtain a fused probability distribution; and returning the fusion probability distribution to the first translation model so as to determine a translation result according to the fusion probability distribution by the first translation model.

In a second aspect, an embodiment of the present disclosure provides an information processing model, including: the system comprises a first translation model, a second translation model, an index building module and a fusion proportion determining model, wherein the first translation model is used for: converting input information to be translated expressed by using a source language into a first hidden state vector and predicting the first hidden state vector to be a first probability distribution of each word in a preset vocabulary; outputting the first hidden state vector and the first probability distribution through a first preset remote calling interface; receiving fusion probability distribution output by a fusion proportion determination model, and determining a translation result corresponding to the information to be translated according to the fusion probability distribution; the second translation model is to: decoding an input preset corpus to obtain reference hidden state vectors corresponding to a plurality of preset morphemes of the preset corpus, and sending the reference hidden state vectors to the index establishing module; the index building module is used for: establishing the vector index library based on the reference hidden state vector; the fusion proportion determination model is used for: and fusing the first probability distribution and the second probability distribution to obtain fused probability distribution.

In a third aspect, an embodiment of the present disclosure provides an information processing apparatus, including: the device comprises a first obtaining unit, a second obtaining unit and a third obtaining unit, wherein the first obtaining unit is used for inputting to-be-translated voice information expressed by a source language into a pre-trained first translation model, obtaining a first hidden state vector generated by the first translation model according to the to-be-translated voice information, and predicting the first hidden state vector into a first probability distribution of each word in a preset vocabulary; a second obtaining unit, configured to obtain, from a vector index library of a target language, at least one target index entry that satisfies a preset condition with the first hidden state vector, where the target index entry includes a second hidden state vector; determining a second probability distribution that the second latent state vector is predicted as a word in the preset vocabulary; a fusion unit, configured to fuse the first probability distribution and the second probability distribution to obtain a fused probability distribution; and the translation unit is used for returning the fusion probability distribution to the first translation model so as to determine a translation result according to the fusion probability distribution by the first translation model.

In a fourth aspect, an embodiment of the present disclosure provides an electronic device, including: one or more processors; a storage device for storing one or more programs which, when executed by the one or more processors, cause the one or more processors to implement the information processing method according to the first aspect.

In a fifth aspect, the disclosed embodiments provide a computer readable medium, on which a computer program is stored, which when executed by a processor implements the information processing method according to the first aspect.

According to the information processing method, the information processing device and the electronic equipment, a first hidden state vector obtained by inputting information to be translated expressed by using a source language into a pre-trained first translation model and a first probability distribution of each word in a preset vocabulary are predicted by the first hidden state vector are obtained; acquiring at least one target index item meeting a preset condition with the first hidden state vector from a vector index library of a target language, wherein the target index item comprises a second hidden state vector; determining a second probability distribution that the second latent state vector is predicted as a word in the preset vocabulary; fusing the first probability distribution and the second probability distribution to obtain a fused probability distribution; and returning the fusion probability distribution to the first translation model, so that the first translation model determines a translation result according to the fusion probability distribution, and the intervention of a decoding process of a neural machine translation model is realized by utilizing a constructed data index of the field to be applied based on neighbor retrieval, so that the trained machine translation model can be applied to the field to be applied without training and adjusting model parameters again when being applied to a specific field, and a relatively accurate translation result is obtained. The domain representation of the machine translation model can be improved. On the premise of not adjusting parameters of the machine translation model, the real-time performance and the generalization performance of the machine translation model are improved.

Drawings

The above and other features, advantages and aspects of various embodiments of the present disclosure will become more apparent by referring to the following detailed description when taken in conjunction with the accompanying drawings. Throughout the drawings, the same or similar reference numbers refer to the same or similar elements. It should be understood that the drawings are schematic and that elements and features are not necessarily drawn to scale.

FIG. 1 is a flow diagram of one embodiment of an information processing method according to the present disclosure;

FIG. 2 is a flow diagram of another embodiment of an information processing method according to the present disclosure;

FIG. 3 is a schematic block diagram of one embodiment of an information handling model according to the present disclosure;

FIG. 4 is a schematic diagram comparing the use of the information processing model shown in FIG. 3;

FIG. 5 is a flow diagram of one embodiment of an information processing apparatus according to the present disclosure;

fig. 6 is an exemplary system architecture to which an information processing method, an information processing apparatus, of one embodiment of the present disclosure may be applied;

fig. 7 is a schematic diagram of a basic structure of an electronic device provided according to an embodiment of the present disclosure.

Detailed Description

Embodiments of the present disclosure will be described in more detail below with reference to the accompanying drawings. While certain embodiments of the present disclosure are shown in the drawings, it is to be understood that the present disclosure may be embodied in various forms and should not be construed as limited to the embodiments set forth herein, but rather are provided for a more thorough and complete understanding of the present disclosure. It should be understood that the drawings and embodiments of the disclosure are for illustration purposes only and are not intended to limit the scope of the disclosure.

It should be understood that the various steps recited in the method embodiments of the present disclosure may be performed in a different order, and/or performed in parallel. Moreover, method embodiments may include additional steps and/or omit performing the illustrated steps. The scope of the present disclosure is not limited in this respect.

The term "include" and variations thereof as used herein are open-ended, i.e., "including but not limited to". The term "based on" is "based, at least in part, on". The term "one embodiment" means "at least one embodiment"; the term "another embodiment" means "at least one additional embodiment"; the term "some embodiments" means "at least some embodiments". Relevant definitions for other terms will be given in the following description.

It should be noted that the terms "first", "second", and the like in the present disclosure are only used for distinguishing different devices, modules or units, and are not used for limiting the order or interdependence relationship of the functions performed by the devices, modules or units.

It is noted that references to "a", "an", and "the" modifications in this disclosure are intended to be illustrative rather than limiting, and that those skilled in the art will recognize that "one or more" may be used unless the context clearly dictates otherwise.

The names of messages or information exchanged between devices in the embodiments of the present disclosure are for illustrative purposes only, and are not intended to limit the scope of the messages or information.

Referring to fig. 1, a flow of one embodiment of an information processing method according to the present disclosure is shown. As shown in fig. 1, the information processing method includes the steps of:

step 101, obtaining a first hidden state vector obtained by inputting information to be translated expressed by using a source language into a pre-trained first translation model, and predicting the first hidden state vector as a first probability distribution of each word in a preset vocabulary.

The first translation model here may be any machine learning model. Such as neural machine translation models, etc.

The first translation model may be a pre-trained model. The training for the first translation model may be supervised training and is not described in further detail herein.

The source language may be any language such as English, Chinese, French, etc. The target language may be any other language than the source language.

The information to be translated may include a word, a sentence group, etc.

After the information to be translated is input into the first translation model, the first translation model may encode the information to be translated in the source language to obtain an encoding vector. And then, transforming the coding vector to obtain a first hidden state vector corresponding to the target language. After the first hidden state vector is obtained, the first hidden state vector may be mapped to words in a predetermined vocabulary. For each word, the first translation model may calculate a probability of mapping the first hidden state vector to the word, thereby obtaining the first probability distribution.

The predetermined vocabulary of the target language may be a general vocabulary or a domain-specific vocabulary. The preset vocabulary table can be selected according to specific application scenarios.

If the input information to be translated comprises a plurality of words, the codes corresponding to the words can be numbered. For example, the three words "I", "love" and "hometown" are numbered by corresponding codes respectively. Can use h_jTo denote the coding of the above three words, j ═ 1,2,3, respectively.

As an implementation manner, the first probability distribution of the first hidden state vector may be obtained from the first translation model by using a pre-established first pre-set Remote Procedure Call (RPC) interface.

The remote calling interface is established in advance based on a preset calling protocol. Through the RPC interface, a first hidden state vector and a first probability distribution of the current information to be translated generated in the first translation model can be acquired at any time.

102, acquiring at least one target index item meeting preset conditions with the first hidden state vector from a vector index library of a target language, wherein the target index item comprises a second hidden state vector; determining a second probability distribution that the second latent state vector is predicted as a word in the predetermined vocabulary.

The vector index library of the target language may be pre-established. The target index library may include a plurality of reference hidden state vectors. Each reference hidden state vector may correspond to a target language morpheme in a predetermined vocabulary. The preset vocabulary may be a vocabulary corresponding to the target language. The vocabulary may include morphemes in a plurality of target languages. The target language morphemes may be words, sentences, or the like. Each morpheme in the preset vocabulary may correspond to a tag. The labels of different morphemes may be different.

The vector index library may store the reference hidden state vector and a tag corresponding to the reference hidden state reference vector in an associated manner. The label corresponding to the reference hidden state vector may be the same as the label of the morpheme of the target language corresponding to the reference hidden state vector in the preset vocabulary.

The vector index library can be established based on the following steps:

firstly, inputting preset parallel linguistic data into a pre-trained second translation model, and decoding by the second translation model to obtain reference hidden state vectors corresponding to a plurality of morphemes of a target language in the preset linguistic data, wherein the preset parallel expected language comprises synonymous source language preset linguistic data and target language preset linguistic data.

The second translation model here may be a model of the same structure as the first translation model. The second translation model may be obtained using the same training data and the same training method as those of the first translation model.

The predetermined parallel prediction may include a first predetermined corpus of the source language and a second corpus of the target language, the second corpus having the same semantic meaning as the first predetermined corpus.

In addition, the preset parallel expectation may be a user-customized parallel corpus.

The first preset corpus and the second preset corpus in the preset parallel expectation respectively comprise a plurality of morphemes, wherein the morphemes can be characters, words, sentences and the like. The reference hidden state vector corresponding to each morpheme can be obtained through the forced decoding.

By inputting the above-described pre-parallelized expectation to the second translation model, the second translation model can determine the correspondence between the morphemes in the source language and the morphemes in the target language. The morphemes in the target markup language may correspond to reference hidden state vectors. In addition, the label of a morpheme of the target language may be the same as the label of the same morpheme in the predetermined vocabulary of the target language.

Secondly, the vector index library is established based on the reference hidden state vector.

The first hidden state vector may be matched with a plurality of reference hidden state vectors, and at least one second hidden state vector may be determined according to a matching result.

Specifically, the distance between the first hidden state vector and a plurality of reference hidden state vectors may be calculated, and at least one reference hidden state vector satisfying a preset condition according to the distance is determined as the at least one second hidden state vector. In some application scenarios, the preset condition may be that the distance is smaller than a preset distance threshold. In some other application scenarios, the preset condition may be the first k hidden state vectors with the smallest distance between the first hidden state vector and the reference hidden state vectors. Wherein k is an integer greater than or equal to 1 and less than the number of reference hidden state vectors.

After determining the at least one second hidden state vector, the at least one target index entry may be further determined. The target index entry may include a second hidden state vector, a tag corresponding to the second hidden state vector, and a distance between the second hidden state vector and the first hidden state vector.

And then a second probability distribution of each morpheme mapped to the second hidden state vector in the preset vocabulary can be determined.

When determining the second probability distribution, the normalization weights of the plurality of target index items may be calculated according to the similarity between the first hidden state vector and a second hidden state vector of the plurality of target index items. The normalized weight distribution can be understood as the probability distribution of the target index item. And combining the probabilities of a plurality of target index items with the same morphemes to obtain the probability distribution of the morphemes contained in the target index items in the preset word list. And the probability of a word in the preset word list not appearing in the target index entry is set to 0. The probability distribution on the preset vocabulary thus obtained is a second probability distribution.

Specifically, the second probability distribution may be determined according to the following formula:

wherein

q_tThe method comprises the steps that a first hidden state vector corresponding to the tth morpheme to be translated in a source language is obtained; r is the number of second hidden state vectors which are determined from the vector index library and meet the preset condition with the first hidden state vector; k is a radical of_iIs the label corresponding to the ith second hidden state vector in the r second hidden state vectors. K (q)_t,k_i(ii) a σ) is q_t,k_i(ii) a σ is the kernel function of the parameter. u is the corresponding same tag v in at least one hidden state_iThe number of the cells.U correspond to the same label v_iThe sum of the kernel function values of the second hidden state of (1).

p₂(y_t) The probability of the second hidden state vector corresponding to the tth morpheme in the language to be translated in the preset word list.

The kernel function K (q, K; sigma) is a Gaussian kernel,

wherein, | q_t-k_i||²Is q_tAnd k is_iThe squared euclidean distance between.

The bandwidth parameter σ may be represented by an exponential activation function:

wherein

Is a first hidden state vector q_tMean of r second hidden state vectors satisfying a predetermined condition, W₁And b₁Are trainable parameters.

This results in a second probability distribution of the second hidden state vector mapping to the preset vocabulary. It should be noted that, for the morphemes (corresponding to the preset labels) in the vocabulary that are not referred to in the key value pair determined from the index library, the probability distribution corresponding to the second hidden state vector is 0.

In some optional implementation manners, the obtaining, from the vector index library of the target language, at least one target index entry that satisfies a preset condition with the first hidden state vector may send the first hidden state vector to the vector index library by using a second preset remote invocation interface, and the vector index library may determine the at least one target index entry in multiple reference hidden state vectors of the vector index library.

After determining at least one target index item, the vector index library may return the target index item through the second preset remote invocation interface.

And the index can be carried out in the vector index library at any time through the second preset remote calling interface, and the index result is obtained in real time.

And 103, fusing the first probability distribution and the second probability distribution to obtain fused probability distribution.

The fusion proportion corresponding to the first probability distribution and the second probability distribution respectively can be determined according to a preset method, and the first probability distribution and the second probability distribution are fused according to respective proportions to obtain fusion probability distribution. Specifically, the fused probability distribution may be a sum of a product of the first probability distribution and the first fused proportion and a product of the second probability and the second fused proportion.

The fusion probability distribution can be represented by, for example, the following formula:

p(y_t)＝λ×p₂(y_t)+(1-λ)×p₁(y_t) (5)；

wherein p is₁(y_t) For the first probability distribution, p₂(y_t) Is the second probability distribution.

It is understood that the fused probability distribution may include probabilities corresponding to each morpheme in the predetermined vocabulary. That is, the fusion probability distribution includes the probability that the morpheme to be translated is mapped to each morpheme in the preset vocabulary table under the influence of the index item given by the index database.

And 104, returning the fusion probability distribution to the first translation model so that the first translation model determines a translation result according to the fusion probability distribution.

The morpheme of the template language corresponding to the label with the maximum probability value in the fusion probability distribution can be used as a translation result.

In the information processing method provided by this embodiment, a first hidden state vector obtained by inputting information to be translated, which is expressed by using a source language, to a pre-trained first translation model is obtained, and the first hidden state vector is predicted as a first probability distribution of each word in a preset vocabulary table; acquiring at least one target index item meeting a preset condition with the first hidden state vector from a vector index library of a target language, wherein the target index item comprises a second hidden state vector; determining a second probability distribution that the second latent state vector is predicted as a word in the preset vocabulary; fusing the first probability distribution and the second probability distribution to obtain a fused probability distribution; and returning the fusion probability distribution to the first translation model, so that the first translation model determines a translation result according to the fusion probability distribution, and the intervention of a decoding process of a neural machine translation model is realized by utilizing a constructed data index of the field to be applied based on neighbor retrieval, so that the trained machine translation model can be applied to the field to be applied without training and adjusting model parameters again when being applied to a specific field, and a relatively accurate translation result is obtained.

In the related art, when a trained translation model is generally applied to a field to be applied, the translation model parameters need to be retrained and adjusted by using a parallel prediction of the field to be applied, so that the translation model trained by using a general corpus cannot be directly applied to a specific field for translation, and the field expression of the translation model is poor. According to the scheme provided by the embodiment, the decoding process of the neural machine translation model is intervened by using the data index of the field to be applied and based on neighbor retrieval, so that when the trained machine translation model is applied to a specific field, the model parameters do not need to be trained and adjusted again, and a relatively accurate translation result can be obtained. Thereby the domain representation of the translation model can be improved.

In addition, in the related field, the parallel linguistic data in the field to be applied can be stored in a whole sentence key value pair mode in advance, when the translation model is applied to the field to be applied, the translation model queries according to the stored key value pair in translation, and the method is high in accuracy. But this scheme does not return a corresponding translation unless the user enters the complete original text in life. When the information to be translated does not appear in the pre-stored key value pair, accurate translation cannot be realized, so that the scheme lacks generalization. In the scheme, the translation result is determined by using the fusion result of different probability distributions of the same information to be translated, and compared with a method of translating according to the stored key value pair, the generalization of the translation model is improved.

Referring to fig. 2, a flow diagram of another embodiment of an information processing method according to the present disclosure is shown. As shown in fig. 2, the method comprises the steps of:

step 201, obtaining a first hidden state vector obtained by inputting information to be translated expressed by using a source language into a pre-trained first translation model, and predicting the first hidden state vector as a first probability distribution of each word in a preset vocabulary.

Step 202, obtaining at least one target index item meeting a preset condition with the first hidden state vector from a vector index library of a target language, wherein the target index item comprises a second hidden state vector; determining a second probability distribution that the second latent state vector is predicted as a word in the predetermined vocabulary.

For specific implementation of step 201 to step 202, reference may be made to step 101 and step 102 in the embodiment shown in fig. 1, which is not described herein again.

And 203, determining fusion proportions corresponding to the first probability distribution and the second probability distribution by using a pre-trained fusion proportion determination model.

The fusion proportion determination model may include a multi-layer perceptron.

The fusion ratio determining module may first determine a second fusion ratio corresponding to the second probability distribution. The second fusion ratio may be expressed as follows:

wherein the content of the first and second substances,

q_tthe method comprises the steps that a first hidden state vector corresponding to the tth morpheme to be translated in a source language is obtained; r is the number of second hidden state vectors which are determined from the vector index library and meet the preset condition with the first hidden state vector; k is a radical of_iIs the label corresponding to the ith second hidden state vector in the r second hidden state vectors. K (q)_t,k_i(ii) a σ) is q_t,k_i(ii) a σ is a kernel function of the parameter; w₂；b₂；W₃；b₃Are trainable parameters.

K (q) above_t,k_i(ii) a σ) may be a gaussian kernel function. K (q)_t,k_i(ii) a σ) can refer to equation (2), which is not described herein.

Two neural networks that estimate the bandwidth parameter σ and the fusion weight coefficient λ require additional training. During training, firstly, the label y in the t step is_tConverting into one-hot probability distribution on preset vocabulary, and smoothing label of the one-hot probability distribution to obtain smoothed label distribution p represented by the following formula_ls(v) Where V is a preset vocabulary size for the target language.

The loss function of a single label is the fused probability distribution p (y)_t) With a smooth label distribution p_ls(v|y_t) Cross entropy between.

The loss function of a single translation sample is the sum of the loss functions of all tokens at the target end.

During training, the translation samples corresponding to a plurality of target language tags are packed into a batch of batchs, and the loss function of each batch is the sum of the loss functions of all sentences in the batch. The gradient of the loss function with respect to the parameters in the probability distribution fusion module is calculated using a back propagation algorithm and the parameters of the model are updated using an Adam optimizer. And obtaining a convergent model through iteration of preset times.

After the second fusion ratio is obtained, a first fusion ratio can be determined, the first fusion ratio being 1- λ.

And 204, fusing the first probability distribution and the second probability distribution according to the first fusion proportion and the second fusion proportion to obtain a fusion probability distribution.

The first probability distribution and the second probability distribution may be fused with reference to the method of equation (5).

Step 205, returning the fusion probability distribution to the first translation model, so that the first translation model determines a translation result according to the fusion probability distribution.

Compared with the embodiment shown in fig. 1, the embodiment highlights that the fusion proportions respectively corresponding to the first probability distribution and the second probability distribution are determined according to the fusion proportion determination model, so that the self-adaptive fusion proportion implementation is realized, and the portability of the information processing method provided by the application can be improved.

Please refer to fig. 3, which shows a schematic structural diagram of an information processing model provided by the present disclosure. As shown in fig. 3, the information processing model includes a first translation model, a second translation model, an index building module, and a fusion ratio determination model.

The first translation model is to: converting input information to be translated expressed by using a source language into a first hidden state vector and predicting the first hidden state vector to be a first probability distribution of each word in a preset vocabulary; outputting the first hidden state vector and the first probability distribution through a first preset remote calling interface; receiving fusion probability distribution output by a fusion proportion determination model, and determining a translation result corresponding to the information to be translated according to the fusion probability distribution;

Referring to FIG. 4, a schematic diagram illustrating a comparison of the information processing model of FIG. 3 is shown. As shown in fig. 4, the NMT model may be a model used by the first translation model and the second translation model. The KNN index may be indexed in the index library of fig. 3 using a neighbor search.

The first translation model can translate the input English information to be translated, namely ' I ' ma bad case ', into Chinese ' I is a wrong case '.

After the information translation model is used, because a second hidden state vector which meets a preset condition with a first hidden state vector obtained by the first translation model is searched in the index database, the second hidden state vector can influence the probability that a currently translated morpheme is mapped to each morpheme in a preset word list of Chinese, and the translation result is changed.

The index library may determine a plurality of reference hidden state vectors and labels corresponding to the plurality of reference hidden state vectors according to the input parallel predictions. The second translation model (the NMT model may determine the reference hidden state vector according to the input parallel expectation "I' ma good case", I "is a correct case", and the label of the word in the preset vocabulary corresponding to the reference hidden state vector.) may build an index according to the reference hidden state vector and the reference hidden state vector.

When information to be translated, namely ' We're all bad cases ', is input into a first translation model (NMT model), the first translation model sends a first hidden state vector generated by the first translation model to an index library through an index retrieval interface. The index library may be matched among a plurality of reference hidden state vectors therein to obtain at least one second hidden state vector. The first probability distribution of each morpheme in the preset vocabulary of the target language predicted by the first hidden state vector and the second probability distribution of each word in the preset vocabulary predicted by the second hidden state vector can be fused to obtain a fused probability distribution. And determining the translation result as 'all are correct cases' according to the fusion probability distribution.

With further reference to fig. 5, as an implementation of the methods shown in the above figures, the present disclosure provides some embodiments of an information processing apparatus, which correspond to the method embodiment shown in fig. 1, and which may be applied in various electronic devices in particular.

As shown in fig. 5, the information processing apparatus of the present embodiment includes: a first acquisition unit 501, a second acquisition unit 502, a fusion unit 503, and a translation model 504. The first obtaining unit 501 is configured to obtain a first hidden state vector obtained by inputting information to be translated, which is expressed in a source language, to a pre-trained first translation model, and predict the first hidden state vector as a first probability distribution of each word in a preset vocabulary; a second obtaining unit 502, configured to obtain, from a vector index library of a target language, at least one target index entry that meets a preset condition with the first hidden state vector, where the target index entry includes a second hidden state vector; determining a second probability distribution that the second latent state vector is predicted as a word in the preset vocabulary; a fusion unit 503, configured to fuse the first probability distribution and the second probability distribution to obtain a fused probability distribution; a translation unit 504, configured to return the fusion probability distribution to the first translation model, so that the first translation model determines a translation result according to the fusion probability distribution.

In some optional implementations, the fusion unit 503 is further configured to: determining a first fusion proportion and a second fusion proportion corresponding to the first probability distribution and the second probability distribution respectively by using a pre-trained fusion proportion determination model; and fusing the first probability distribution and the second probability distribution according to the first fusion proportion and the second fusion proportion to obtain fusion probability distribution.

In some optional implementations, the fusion unit 503 is further configured to: and taking the sum of the product of the first probability distribution and the first fusion proportion and the product of the second probability and the second fusion proportion as a fusion probability distribution.

In some optional implementations, the second fusion ratio corresponding to the second probability distribution is determined by the following formula:

wherein

k (q, K; sigma) is a kernel function with sigma as a parameter.

In some optional implementations, the vector index library is built based on the following steps: inputting preset parallel linguistic data into a pre-trained second translation model, and decoding by the second translation model to obtain reference hidden state vectors corresponding to a plurality of morphemes of a target language in the preset linguistic data, wherein the preset parallel expected data comprises a source language preset linguistic data and a target language preset linguistic data with the same semantic meaning; building the vector index library based on a plurality of the reference hidden state vectors; and the second translation model is obtained by training the first translation model in the same translation model by using the same training scheme.

In some optional implementations, the first obtaining unit 501 is further configured to: and acquiring the first hidden state vector and the first probability distribution by utilizing a first preset remote calling interface.

In some optional implementations, the second obtaining unit 502 is further configured to: and acquiring at least one target index item meeting a preset condition with the first hidden state vector from a vector index library of a target language by utilizing a second preset remote calling interface.

Referring to fig. 6, fig. 6 illustrates a display information generating method, an exemplary system architecture to which the information display method may be applied, of an embodiment of the present disclosure.

As shown in fig. 6, the system architecture may include terminal devices 601, 602, 603, a network 604, and a server 605, as shown in fig. 6. The network 604 serves to provide a medium for communication links between the terminal devices 601, 602, 603 and the server 605. Network 604 may include various types of connections, such as wire, wireless communication links, or fiber optic cables, to name a few.

The terminal devices 601, 602, 603 may interact with the server 605 via the network 604 to receive or send messages or the like. The terminal devices 601, 602, 603 may have various client applications installed thereon, such as a web browser application, a search-type application, and a news-information-type application. The client application in the terminal device 601, 602, 603 may receive the instruction of the user, and complete a corresponding function according to the instruction of the user, for example, send the information to be translated to the server 605 according to the instruction of the user.

The terminal devices 601, 602, 603 may be hardware or software. When the terminal devices 601, 602, 603 are hardware, they may be various electronic devices having a display screen and supporting web browsing, including but not limited to smart phones, tablet computers, e-book readers, MP3 players (Moving Picture Experts Group Audio Layer III, mpeg compression standard Audio Layer 3), MP4 players (Moving Picture Experts Group Audio Layer IV, mpeg compression standard Audio Layer 4), laptop portable computers, desktop computers, and the like. When the terminal device 601, 602, 603 is software, it can be installed in the electronic devices listed above. It may be implemented as multiple pieces of software or software modules (e.g., software or software modules used to provide distributed services) or as a single piece of software or software module. And is not particularly limited herein.

The server 605 may be a server that provides various services, and for example, analyzes information to be translated transmitted by the terminal apparatuses 601, 602, and 603 to obtain translation results, and transmits the translation results to the terminal apparatuses 601, 602, and 603.

It should be noted that the information processing method provided by the embodiment of the present disclosure may be executed by the server 604, and accordingly, the information processing apparatus may be provided in the server 604. In addition, the information processing method may be executed by the terminal device 601, 602, 603, and accordingly, the information processing apparatus may be provided in the terminal device 601, 602, 603

It should be understood that the number of terminal devices, networks, and servers in fig. 6 is merely illustrative. There may be any number of terminal devices, networks, and servers, as desired for implementation.

Referring now to fig. 7, shown is a schematic diagram of an electronic device (e.g., the server or terminal device of fig. 6) suitable for use in implementing embodiments of the present disclosure. The terminal device in the embodiments of the present disclosure may include, but is not limited to, a mobile terminal such as a mobile phone, a notebook computer, a digital broadcast receiver, a PDA (personal digital assistant), a PAD (tablet computer), a PMP (portable multimedia player), a vehicle terminal (e.g., a car navigation terminal), and the like, and a stationary terminal such as a digital TV, a desktop computer, and the like. The electronic device shown in fig. 7 is only an example, and should not bring any limitation to the functions and the scope of use of the embodiments of the present disclosure.

As shown in fig. 7, the electronic device may include a processing device (e.g., central processing unit, graphics processor, etc.) 701, which may perform various appropriate actions and processes according to a program stored in a Read Only Memory (ROM)702 or a program loaded from a storage device 708 into a Random Access Memory (RAM) 703. In the RAM 703, various programs and data necessary for the operation of the electronic apparatus 700 are also stored. The processing device 701, the ROM 702, and the RAM 703 are connected to each other by a bus 704. An input/output (I/O) interface 705 is also connected to bus 704.

Generally, the following devices may be connected to the I/O interface 705: input devices 706 including, for example, a touch screen, touch pad, keyboard, mouse, camera, microphone, accelerometer, gyroscope, etc.; an output device 707 including, for example, a Liquid Crystal Display (LCD), a speaker, a vibrator, and the like; storage 708 including, for example, magnetic tape, hard disk, etc.; and a communication device 709. The communication device 709 may allow the electronic device to communicate wirelessly or by wire with other devices to exchange data. While fig. 7 illustrates an electronic device having various means, it is to be understood that not all illustrated means are required to be implemented or provided. More or fewer devices may alternatively be implemented or provided.

In particular, according to an embodiment of the present disclosure, the processes described above with reference to the flowcharts may be implemented as computer software programs. For example, embodiments of the present disclosure include a computer program product comprising a computer program carried on a non-transitory computer readable medium, the computer program containing program code for performing the method illustrated by the flow chart. In such embodiments, the computer program may be downloaded and installed from a network via the communication means 709, or may be installed from the storage means 708, or may be installed from the ROM 702. The computer program, when executed by the processing device 701, performs the above-described functions defined in the methods of the embodiments of the present disclosure.

It should be noted that the computer readable medium in the present disclosure can be a computer readable signal medium or a computer readable storage medium or any combination of the two. A computer readable storage medium may be, for example, but not limited to, an electronic, magnetic, optical, electromagnetic, infrared, or semiconductor system, apparatus, or device, or any combination of the foregoing. More specific examples of the computer readable storage medium may include, but are not limited to: an electrical connection having one or more wires, a portable computer diskette, a hard disk, a Random Access Memory (RAM), a read-only memory (ROM), an erasable programmable read-only memory (EPROM or flash memory), an optical fiber, a portable compact disc read-only memory (CD-ROM), an optical storage device, a magnetic storage device, or any suitable combination of the foregoing. In the present disclosure, a computer readable storage medium may be any tangible medium that can contain, or store a program for use by or in connection with an instruction execution system, apparatus, or device. In contrast, in the present disclosure, a computer readable signal medium may comprise a propagated data signal with computer readable program code embodied therein, either in baseband or as part of a carrier wave. Such a propagated data signal may take many forms, including, but not limited to, electro-magnetic, optical, or any suitable combination thereof. A computer readable signal medium may also be any computer readable medium that is not a computer readable storage medium and that can communicate, propagate, or transport a program for use by or in connection with an instruction execution system, apparatus, or device. Program code embodied on a computer readable medium may be transmitted using any appropriate medium, including but not limited to: electrical wires, optical cables, RF (radio frequency), etc., or any suitable combination of the foregoing.

In some embodiments, the clients, servers may communicate using any currently known or future developed network Protocol, such as HTTP (HyperText Transfer Protocol), and may interconnect with any form or medium of digital data communication (e.g., a communications network). Examples of communication networks include a local area network ("LAN"), a wide area network ("WAN"), the Internet (e.g., the Internet), and peer-to-peer networks (e.g., ad hoc peer-to-peer networks), as well as any currently known or future developed network.

The computer readable medium may be embodied in the electronic device; or may exist separately without being assembled into the electronic device.

The computer readable medium carries one or more programs which, when executed by the electronic device, cause the electronic device to: acquiring a first hidden state vector obtained by inputting information to be translated expressed by using a source language into a pre-trained first translation model, and predicting the first hidden state vector into a first probability distribution of each word in a preset vocabulary; acquiring at least one target index item meeting a preset condition with the first hidden state vector from a vector index library of a target language, wherein the target index item comprises a second hidden state vector; determining a second probability distribution that the second latent state vector is predicted as a word in the preset vocabulary; fusing the first probability distribution and the second probability distribution to obtain a fused probability distribution; and returning the fusion probability distribution to the first translation model so as to determine a translation result according to the fusion probability distribution by the first translation model.

Computer program code for carrying out operations for the present disclosure may be written in any combination of one or more programming languages, including but not limited to an object oriented programming language such as Java, Smalltalk, C + +, and conventional procedural programming languages, such as the "C" programming language or similar programming languages. The program code may execute entirely on the user's computer, partly on the user's computer, as a stand-alone software package, partly on the user's computer and partly on a remote computer or entirely on the remote computer or server. In the case of a remote computer, the remote computer may be connected to the user's computer through any type of network, including a Local Area Network (LAN) or a Wide Area Network (WAN), or the connection may be made to an external computer (for example, through the Internet using an Internet service provider).

The flowchart and block diagrams in the figures illustrate the architecture, functionality, and operation of possible implementations of systems, methods and computer program products according to various embodiments of the present disclosure. In this regard, each block in the flowchart or block diagrams may represent a module, segment, or portion of code, which comprises one or more executable instructions for implementing the specified logical function(s). It should also be noted that, in some alternative implementations, the functions noted in the block may occur out of the order noted in the figures. For example, two blocks shown in succession may, in fact, be executed substantially concurrently, or the blocks may sometimes be executed in the reverse order, depending upon the functionality involved. It will also be noted that each block of the block diagrams and/or flowchart illustration, and combinations of blocks in the block diagrams and/or flowchart illustration, can be implemented by special purpose hardware-based systems which perform the specified functions or acts, or combinations of special purpose hardware and computer instructions.

The units described in the embodiments of the present disclosure may be implemented by software or hardware. Where the name of an element does not in some cases constitute a limitation on the element itself.

The functions described herein above may be performed, at least in part, by one or more hardware logic components. For example, without limitation, exemplary types of hardware logic components that may be used include: field Programmable Gate Arrays (FPGAs), Application Specific Integrated Circuits (ASICs), Application Specific Standard Products (ASSPs), systems on a chip (SOCs), Complex Programmable Logic Devices (CPLDs), and the like.

In the context of this disclosure, a machine-readable medium may be a tangible medium that can contain, or store a program for use by or in connection with an instruction execution system, apparatus, or device. The machine-readable medium may be a machine-readable signal medium or a machine-readable storage medium. A machine-readable medium may include, but is not limited to, an electronic, magnetic, optical, electromagnetic, infrared, or semiconductor system, apparatus, or device, or any suitable combination of the foregoing. More specific examples of a machine-readable storage medium would include an electrical connection based on one or more wires, a portable computer diskette, a hard disk, a Random Access Memory (RAM), a read-only memory (ROM), an erasable programmable read-only memory (EPROM or flash memory), an optical fiber, a portable compact disc read-only memory (CD-ROM), an optical storage device, a magnetic storage device, or any suitable combination of the foregoing.

The foregoing description is only exemplary of the preferred embodiments of the disclosure and is illustrative of the principles of the technology employed. It will be appreciated by those skilled in the art that the scope of the disclosure herein is not limited to the particular combination of features described above, but also encompasses other embodiments in which any combination of the features described above or their equivalents does not depart from the spirit of the disclosure. For example, the above features and (but not limited to) the features disclosed in this disclosure having similar functions are replaced with each other to form the technical solution.

Further, while operations are depicted in a particular order, this should not be understood as requiring that such operations be performed in the particular order shown or in sequential order. Under certain circumstances, multitasking and parallel processing may be advantageous. Likewise, while several specific implementation details are included in the above discussion, these should not be construed as limitations on the scope of the disclosure. Certain features that are described in the context of separate embodiments can also be implemented in combination in a single embodiment. Conversely, various features that are described in the context of a single embodiment can also be implemented in multiple embodiments separately or in any suitable subcombination.

Although the subject matter has been described in language specific to structural features and/or methodological acts, it is to be understood that the subject matter defined in the appended claims is not necessarily limited to the specific features or acts described above. Rather, the specific features and acts described above are disclosed as example forms of implementing the claims.

21页详细技术资料下载

Information processing method and device and electronic equipment

相关技术

网友询问留言