Full-parallelization text generation method based on standardized stream

文档序号:1556933 发布日期:2020-01-21 浏览:27次 中文

阅读说明:本技术 一种基于标准化流的全并行化文本生成方法 (Full-parallelization text generation method based on standardized stream ) 是由 蔡翔 于 2019-10-12 设计创作,主要内容包括:为了解决现有的采用序列到序列的框架的文本生成算法效率低的问题,本发明提出一种基于标准化流的全并行化文本生成方法,包括训练过程和应用过程,其特征在于,包括如下处理步骤:将标准答案输入编码器,编码器经过处理后输出中间隐层信息,中间隐层信息直接分别输入到标准化流模块和解码器中,标准化流模块在接收到中间隐层信息后会直接处理得到标准化流输出结果;将条件信息输入到条件信息模块,经过条件信息模块的处理后得到条件隐层信息,将条件隐层信息直接输入到解码器中,当解码器同时接收到条件隐层信息和中间隐层信息后,解码器对条件隐层信息和中间隐层信息进行注意力机制和非线性变换,然后得到解码器输出结果。(In order to solve the problem of low efficiency of the existing text generation algorithm adopting a sequence-to-sequence framework, the invention provides a full-parallelization text generation method based on a standardized stream, which comprises a training process and an application process and is characterized by comprising the following processing steps: inputting the standard answers into an encoder, outputting intermediate hidden layer information after the encoder processes the intermediate hidden layer information, directly and respectively inputting the intermediate hidden layer information into a standardized stream module and a decoder, and directly processing the intermediate hidden layer information by the standardized stream module to obtain a standardized stream output result after receiving the intermediate hidden layer information; the method comprises the steps of inputting condition information into a condition information module, obtaining condition hidden layer information after the condition information is processed by the condition information module, directly inputting the condition hidden layer information into a decoder, and after the decoder receives the condition hidden layer information and the middle hidden layer information at the same time, performing attention mechanism and nonlinear transformation on the condition hidden layer information and the middle hidden layer information by the decoder to obtain an output result of the decoder.)

1. A full parallel text generation method based on standardized flow comprises a training process and an application process, and is characterized by comprising the following processing steps:

training process:

s1, inputting the standard answers into an encoder, outputting intermediate hidden layer information after the encoder processes the intermediate hidden layer information, directly and respectively inputting the intermediate hidden layer information into a standardized stream module and a decoder, and directly processing the intermediate hidden layer information by the standardized stream module to obtain a standardized stream output result after receiving the intermediate hidden layer information;

s2, inputting the condition information into a condition information module, obtaining condition hidden layer information after the processing of the condition information module, directly inputting the condition hidden layer information into a decoder, and after the decoder receives the condition hidden layer information and the middle hidden layer information at the same time, performing attention mechanism and nonlinear transformation on the condition hidden layer information and the middle hidden layer information by the decoder to obtain a decoder output result;

s3, calculating a loss function by using the normalized flow output result and Gaussian white noise, wherein the loss function adopts KL divergence; performing loss function calculation on the output result of the decoder and the standard answer, wherein the loss function adopts cross entropy;

s4, calculating KL divergence and cross entropy of the two loss functions in the step S3 by adopting a gradient descent method, wherein the two loss functions jointly form the lowest lower bound of the variational self-encoder, the lowest lower bound of the variational self-encoder is transmitted back to the standardized flow module and the encoder by using the gradient descent method, and parameters of the neural network are updated after the lowest bound is transmitted back reversely;

s5, in the next training process, the encoder calculates by using the updated network parameters, then generates the adjusted intermediate hidden layer information, and transmits the adjusted intermediate hidden layer information to the standardized flow module and the decoder, and the standardized flow module calculates by using the updated parameters, thereby obtaining the new output result of the standardized flow module; the decoder receives the conditional hidden layer information and the adjusted intermediate hidden layer information input by the conditional information module, and obtains a decoder output result after operation; then repeating the steps of S3-S4;

s6, the step of S5 is repeated until the training is finished when the KL divergence and the cross entropy of the backward feedback are lower than a certain fixed threshold at the lowest lower bound of the whole neural network.

The application process comprises the following steps:

and S7, inputting the Gaussian white noise into the trained standardized stream module to obtain standardized stream output information, then transmitting the standardized stream output information to a decoder, and carrying out attention mechanism and nonlinear transformation on the standardized stream output information and the conditional implicit layer information by the decoder in combination with the conditional implicit layer information input by the conditional information module to obtain a decoder output result.

2. The method of fully parallel text generation based on normalized streams of claim 1, wherein: the full parallel text generation method based on the standardized stream is expanded based on a variational self-encoder framework.

3. The method of fully parallel text generation based on normalized streams of claim 1, wherein: after the standard answer answers are input into the encoder, the processing steps inside the encoder sequentially comprise word embedding model processing, multilayer long-term memory model and/or convolutional layer stacking processing and full-connection nonlinear transformation processing, and finally middle hidden layer information is output.

4. The method of fully parallel text generation based on normalized streams of claim 1, wherein: the normalized stream module includes a plurality of reversible transforms.

5. The method of fully parallel text generation based on normalized streams of claim 1, wherein: in the training process, after the intermediate hidden layer information output by the encoder is input into the standardized stream module, the intermediate hidden layer information is sequentially processed by the mask autoregressive stream module, the 1X1 reversible convolution module and the re-affine coupling layer module, and the three modules are sequentially cycled for 8 times to obtain a standardized stream output result.

6. The method of fully parallel text generation based on normalized streams of claim 1, wherein: the processing steps of the condition information module sequentially comprise word embedding model processing, multilayer long-time memory model and/or convolutional layer stacking processing and full-connection nonlinear transformation processing, and finally condition hidden layer information is obtained.

7. The method of fully parallel text generation based on normalized streams of claim 1, wherein: in the test process, the length of the text is determined by the number of Gaussian white noise samples, and the number of words is generated by the number of Gaussian white noise sample points.

8. The method of fully parallel text generation based on normalized streams of claim 1, wherein: the output result of the decoder is the ID of the generated character sequence, each character has a unique ID in the word stock, and the corresponding character can be found according to the ID number.

Technical Field

The invention relates to the technical field of natural language processing, in particular to a full-parallelization text generation method based on standardized streams, which can be applied to the directions of article title generation, automatic summarization, news generation, machine translation, question and answer method generation and the like.

Background

With the development of technology, information is transmitted more and more frequently and more diversified. People have higher and higher requirements on the speed-per-hour performance of a text generation algorithm, and thus the text generation algorithm can be better applied to the technical fields of machine translation, machine generation of news and the like, so that the automatic generation of texts by machines becomes an inevitable scientific and technological development trend in the future.

The existing text generation algorithm adopts a frame from a sequence to a sequence (seq2seq), which comprises an encoder (encoder) and a decoder (decoder), wherein the decoder (decoder) adopts an autoregressive decoder, namely, a former character or word needs to be input into a method to obtain a latter character, and the whole sentence and paragraph are obtained in a reciprocating way.

The standardized stream is a completely new density estimation mode, the principle of the standardized stream is a process of converting a distribution to a target distribution through multi-step jumping, and due to the independence of manually designed Gaussian noise, the method can break the limitation of the original autoregressive of a decoder (decoder) in a sequence-to-sequence (seq2seq) framework, and can enable the character sequence to be generated and parallelized.

In view of this, a new text generation method is urgently needed to be provided, and a method of a standardized stream can be applied thereto, so that all characters can be generated in a serialized and parallel manner, the text generation efficiency can be greatly improved, the effect of generating characters in a large scale is realized, and the method can be applied to the generation of large-length articles.

Disclosure of Invention

In view of this, in order to solve the problem that the existing text generation algorithm using a frame from a sequence to a sequence (seq2seq) is low in efficiency, the present invention provides a full-parallelization text generation method based on a normalized stream, which converts the input of an encoder into gaussian white noise through the reversible characteristic of a normalized stream module, and then replaces the input of the encoder with the input of the gaussian white noise, so that the encoder can also avoid using an autoregressive encoder, and when generating a text sequence, a decoder (decoder) is not constrained by autoregressive, and can generate texts in parallel in a time dimension, so that the efficiency of text generation is greatly improved.

A full parallel text generation method based on standardized flow comprises a training process and an application process, and is characterized by comprising the following processing steps:

training process:

s1, inputting the standard answers into an encoder, outputting intermediate hidden layer information after the encoder processes the intermediate hidden layer information, directly and respectively inputting the intermediate hidden layer information into a standardized stream module and a decoder, and directly processing the intermediate hidden layer information by the standardized stream module to obtain a standardized stream output result after receiving the intermediate hidden layer information;

s2, inputting the condition information into a condition information module, obtaining condition hidden layer information after the processing of the condition information module, directly inputting the condition hidden layer information into a decoder, and after the decoder receives the condition hidden layer information and the middle hidden layer information at the same time, performing attention mechanism and nonlinear transformation on the condition hidden layer information and the middle hidden layer information by the decoder to obtain a decoder output result;

s3, performing loss function calculation on the normalized flow output result and the Gaussian white noise, wherein the loss function adopts KL Divergence (Chinese is translated into relative entropy, and English is totally called KL Divergence), the KL Divergence can be used for measuring the matching degree of the probability distribution of the normalized flow output result and the Gaussian white noise, when the distribution difference of the normalized flow output result and the Gaussian white noise is larger, the KL Divergence is larger, and conversely, when the distribution difference of the normalized flow output result and the Gaussian white noise is smaller, the KL Divergence is smaller; performing loss function calculation on the decoder output result and the standard answer, wherein the loss function adopts cross entropy (English is called cross entropy), the cross entropy is the loss function of the standard classification problem, and is different from the previous KL divergence, and because only a specific standard answer sample is provided and the distribution of the standard answer is not provided, the cross entropy is adopted when the loss function of the decoder output result and the standard answer is performed;

s4, calculating KL divergence and cross entropy of the two loss functions in the step S3 by using a gradient descent method, wherein the two loss functions jointly form the lowest Lower Bound (Elbo) of the variational self-encoder, the lowest Lower Bound (ELBO) of the variational self-encoder is returned to the standardized flow module and the encoder by using the gradient descent method, and parameters of the neural network are updated after the minimum Lower Bound (ELBO) is returned reversely, so that the value of a target function, namely the lowest Lower Bound (ELBO), becomes smaller and smaller, and the purpose of calculating the lowest Lower Bound (ELBO) of the loss functions is to obtain the gradient returned reversely;

s5, in the next training process, the encoder calculates by using the updated network parameters, then generates the adjusted intermediate hidden layer information, and transmits the adjusted intermediate hidden layer information to the standardized flow module and the decoder, and the standardized flow module calculates by using the updated parameters, thereby obtaining the new output result of the standardized flow module; the decoder receives the conditional hidden layer information and the adjusted intermediate hidden layer information input by the conditional information module, and obtains a decoder output result after operation; then repeating the steps of S3-S4;

s6, continuously repeating the step S5 until the training is finished when the KL divergence and the cross entropy of the backward feedback are lower than a certain fixed threshold value at the lowest lower bound of the whole neural network; and when the KL divergence and the cross entropy which are reversely fed back are lower than a certain fixed threshold value at the lowest lower bound of the whole neural network, the output result of the neural network is just the same as the target, and the training process is finished.

The application process comprises the following steps:

and S7, inputting the Gaussian white noise into the trained standardized stream module to obtain standardized stream output information, then transmitting the standardized stream output information to a decoder, and carrying out attention mechanism and nonlinear transformation on the standardized stream output information and the conditional implicit layer information by the decoder in combination with the conditional implicit layer information input by the conditional information module to obtain a decoder output result.

Further, the whole framework is extended based on the variation self-encoder framework.

Further, after the standard answer answers are input into the encoder, the processing steps inside the encoder sequentially include word embedding model (embedding) processing, multilayer long-term memory model (LSTM) and/or convolutional layer stacking processing, full-connection nonlinear transformation processing, and finally middle hidden layer information is output. The multi-layer long and short term memory model (LSTM) is mainly used for learning the characteristics of a whole sequence, such as a sentence, and the convolutional layer stacking processing is mainly used for learning the characteristics between a front word and a rear word or between words in a sentence.

Further, the standardized Flow module includes a plurality of reversible transformations, and more specifically, in the training process, after the intermediate hidden layer information output by the encoder is input to the standardized Flow module, the intermediate hidden layer information is sequentially processed by a Mask Autoregressive Flow (MAF) module, a 1X1 reversible convolution module, and a re-affine coupling layer module, and the three modules are sequentially cycled for 8 times to obtain the standardized Flow output.

Further, the attention mechanism is an important link in the neural translation framework, and there are many variations of the existing attention mechanism, where the attention mechanism is the most common attention mechanism (english name is attentionmechanism), and the attention mechanism is a mathematical calculation that can fuse two pieces of non-aligned information, mainly fusing the conditional hidden layer information and the current hidden layer information. For example, a Chinese sentence is translated into German Gutgemacht, Gut indicates good, Gemacht indicates that something has been done, Chinese is 3 words, German has only 2 words, good in Chinese is at the end of the sentence, Gut in German is at the beginning of the sentence, and attention is paid to the mechanism to align the words "Gut" with the words "good" automatically and output a number matrix.

Further, the structure of the condition information module is the same as that of the encoder, which represents the information required for generating corresponding characters, for example, if Chinese is translated into English, then Chinese is the condition information; if the article is generated, the keywords of the article are the condition information; the processing steps of the condition information module sequentially comprise word embedding model (embedding) processing, multilayer long-time memory model (LSTM) and/or convolutional layer stacking processing and full-connection nonlinear transformation processing, and finally condition hidden layer information is obtained.

Furthermore, in the test process, the length of the text is determined by the number of Gaussian white noise samples, and the number of words is generated by the number of Gaussian white noise sample points. For example, if 100 words of text are to be output, the length of white gaussian noise input to the normalized flow module is 100.

Furthermore, the output result of the decoder is the ID of the generated character sequence, each character has a unique ID in a word stock, and the corresponding character can be found according to the ID number.

Therefore, by adopting the full parallel text generation method based on the standardized stream, the decoder can be replaced by a Gaussian white noise and standardized stream module, so that the correlation in the time dimension of the decoder in the traditional frame from the sequence to the sequence (seq2seq) is broken, all characters of a paragraph or sentence can be generated simultaneously, and the generation speed is greatly improved.

Secondly, the invention adds an attention mechanism into the decoder, so that the generated characters can be fused into the decoder in a non-aligned mode, and the practical application is facilitated. Because in practical applications, many times the condition information and the target text are not aligned, for example: from Chinese translation to English, most of the probabilities of the Chinese text length and the English text length are different, and the grammar orders are different.

Moreover, the invention uses the form of a variational self-encoder to construct a network. The training becomes stable and reliable, and the learning mode and the traditional seq2seq mode adopt completely different solutions.

Drawings

FIG. 1 is a block diagram of a fully parallel text generation method based on normalized streams according to the present invention.

The following detailed description will further illustrate the invention in conjunction with the above-described figures.

Detailed Description

9页详细技术资料下载
上一篇:一种医用注射器针头装配设备
下一篇:一种文献著录格式转换方法

网友询问留言

已有0条留言

还没有人留言评论。精彩留言会获得点赞!

精彩留言,会给你点赞!