Abstract generation method based on abstract neural network with known attention distribution

文档序号：1043302 发布日期：2020-10-09 浏览：6次中文

阅读说明：本技术 基于注意力分布已知的抽象式神经网络生成摘要的方法 (Abstract generation method based on abstract neural network with known attention distribution ) 是由马晔宗璐于 2020-06-29 设计创作，主要内容包括：本发明提供一种基于注意力分布已知的抽象式神经网络生成摘要的方法,所述方法包括如下步骤：步骤S1,将源文本输入摘要模型,利用beam search生成多个候选摘要及相应的注意力分布,并经过摘要模型的编码后得到句向量；步骤S2,将所述句向量输入到预先训练好的注意力分布预测模型中,得到预测出的最优注意力分布；步骤S3,根据每个候选摘要的注意力分布与最优注意力分布的距离,计算各个候选摘要的注意力得分；步骤S4,根据每个摘要序列的条件概率以及注意力得分,计算各个候选摘要的最终得分；步骤S5,选择最终得分最高的候选摘要作为最终摘要。本发明得到的最终摘要更贴近源文本。(The invention provides a method for generating an abstract based on an abstract neural network with known attention distribution, which comprises the following steps: step S1, inputting the source text into a summary model, generating a plurality of candidate summaries and corresponding attention distribution by using the beam search, and obtaining a sentence vector after the coding of the summary model; step S2, inputting the sentence vector into a pre-trained attention distribution prediction model to obtain the predicted optimal attention distribution; step S3, calculating the attention score of each candidate abstract according to the distance between the attention distribution of each candidate abstract and the optimal attention distribution; step S4, calculating the final score of each candidate abstract according to the conditional probability and the attention score of each abstract sequence; in step S5, the candidate summary with the highest final score is selected as the final summary. The final abstract obtained by the invention is closer to the source text.)

1. A method for generating a summary based on an abstract neural network with known attention distribution, the method comprising the steps of:

step S1, inputting the source text into a summary model, generating a plurality of candidate summaries and corresponding attention distribution by using the beam search, and obtaining a sentence vector after the coding of the summary model;

step S2, inputting the sentence vector into a pre-trained attention distribution prediction model to obtain the predicted optimal attention distribution;

step S3, calculating the attention score of each candidate abstract according to the distance between the attention distribution of each candidate abstract and the optimal attention distribution;

step S4, calculating the final score of each candidate abstract according to the conditional probability and the attention score of each abstract sequence;

in step S5, the candidate summary with the highest final score is selected as the final summary.

2. The method according to claim 1, wherein in the step S3, the attention score of each candidate summary is calculated according to the following formula:

wherein attAw (Y) is the attention score, α_t,pIs the attention weight of the t-th word to the p-th sentence, Y is the generated sequence, | Y | is the length of the generated sequence,

3. The method according to claim 2, wherein in the step S4, the final score of each candidate summary is calculated according to the following formula:

wherein Score is the final Score, p (Y | X) is the conditional probability of each digest sequence Y, β is the scaling factor, which is used to balance the conditional concept and the attention Score, and the final Score is the conditional probability without the influence of the digest length plus a certain proportion of the attention Score.

4. The method according to claim 1, wherein in the step S2, the attention distribution prediction model is a dependency relationship between a sentence vector and an attention distribution, and the training learning process of the attention distribution prediction model is: the method comprises the steps of inputting a set of sentence vectors into a Transformer encoder to obtain a set of sentence vectors containing context information, obtaining corresponding attention weights through a full-connection layer of a single neuron for each sentence vector, obtaining attention distribution after the attention weights are normalized by softmax, taking the attention distribution obtained by training a summary model before as optimal attention distribution, and training an attention distribution prediction model to minimize the mean square error between the output attention distribution and the optimal abortion distribution.

5. The method of claim 4, wherein the set of sentence vectors is obtained by separating source texts in the form of sentences and inputting them into a summarization model, and obtaining the attention assigned to each sentence by each predicted word.

6. The method of claim 5, wherein the attention weight is normalized by softmax to obtain the attention distribution by the following calculation process:

wherein, α_pTotal attention weight assigned for the p-th sentence, α_t,pAnd for the attention weight of the tth word to the pth sentence, adding the attention weights of all the words to the pth sentence, and then normalizing to obtain the total attention weight distributed to the pth sentence, wherein the attention weights form the attention distribution.

Technical Field

The invention relates to the technical field of artificial intelligence, in particular to a method for generating an abstract based on an abstract neural network with known attention distribution.

Background

The Beam search algorithm is the most commonly used generation algorithm of abstract neural network digests at present, and generates digests word by word on the premise of maximizing the occurrence probability of sequences, but the algorithm has some problems, for example, in order to pursue probability maximization, common phrases are often generated, and rather, specific information in source text is ignored.

Disclosure of Invention

In view of the shortcomings of the prior art, the present invention provides a method for generating a summary based on an abstract neural network with known attention distribution, wherein the generated summary is closer to the source text.

In order to achieve the above purpose, the embodiment of the invention adopts the following technical scheme:

a method of generating a summary based on an abstract neural network for which attention distribution is known, the method comprising the steps of:

step S2, inputting the sentence vector into a pre-trained attention distribution prediction model to obtain the predicted optimal attention distribution;

step S3, calculating the attention score of each candidate abstract according to the distance between the attention distribution of each candidate abstract and the optimal attention distribution;

step S4, calculating the final score of each candidate abstract according to the conditional probability and the attention score of each abstract sequence;

in step S5, the candidate summary with the highest final score is selected as the final summary.

Further, in step S3, the attention score of each candidate summary is calculated according to the following formula:

wherein attAw (Y) is the attention score, α_t,pIs the attention weight of the t-th word to the p-th sentence, Y is the generated sequence, | Y | is the length of the generated sequence,

is trueAttention distribution, ψ, of generating a summary_p(X) for the predicted optimal attention distribution, if a candidate abstract gives too low attention rate to some sentences, the attention score will decrease, and if some sentences are assigned too high attention rate, the score of itself will not change, but the attention rate of other sentences will decrease, thus resulting in the score decrease.

Further, in step S4, the final score of each candidate summary is calculated according to the following formula:

Further, in step S2, the attention distribution prediction model is a dependency relationship between a sentence vector and an attention distribution, and the training learning process of the attention distribution prediction model is as follows: the method comprises the steps of inputting a set of sentence vectors into a Transformer encoder to obtain a set of sentence vectors containing context information, obtaining corresponding attention weights through a full-connection layer of a single neuron for each sentence vector, obtaining attention distribution after the attention weights are normalized by softmax, taking the attention distribution obtained by training a summary model before as optimal attention distribution, and training an attention distribution prediction model to minimize the mean square error between the output attention distribution and the optimal abortion distribution.

Further, the set of sentence vectors is obtained by separating the source texts in the form of sentences and inputting them into a summary model, and the attention of each predicted word to each sentence is obtained.

Further, the calculation process of the attention distribution obtained after the attention weight is normalized by softmax is as follows:

The invention discloses a method for generating an abstract based on an abstract neural network with known attention distribution, which comprises the following steps: step S1, inputting the source text into a summary model, generating a plurality of candidate summaries and corresponding attention distribution by using the beam search, and obtaining a sentence vector after the coding of the summary model; step S2, inputting the sentence vector into a pre-trained attention distribution prediction model to obtain the predicted optimal attention distribution; step S3, calculating the attention score of each candidate abstract according to the distance between the attention distribution of each candidate abstract and the optimal attention distribution; step S4, calculating the final score of each candidate abstract according to the conditional probability and the attention score of each abstract sequence; in step S5, the candidate summary with the highest final score is selected as the final summary. So the resulting final summary is closer to the source text.

Drawings

In order to more clearly illustrate the technical solutions in the embodiments of the present invention, the drawings needed to be used in the embodiments will be briefly described below, and it is obvious that the drawings in the following description are only some embodiments of the present invention, and it is obvious for those skilled in the art that other drawings can be obtained according to these drawings without creative efforts.

FIG. 1 is a flowchart illustrating a method for generating a summary based on an abstract neural network with known attention distribution according to an embodiment of the present invention;

fig. 2 is a schematic diagram of a training learning process of an attention distribution prediction model in a method for generating an abstract based on an abstract neural network with known attention distribution according to an embodiment of the present invention.

Detailed Description

The technical solutions in the embodiments of the present invention will be clearly and completely described below with reference to the drawings in the embodiments of the present invention, and it is obvious that the described embodiments are only a part of the embodiments of the present invention, and not all of the embodiments. All other embodiments, which can be derived by a person skilled in the art from the embodiments given herein without making any creative effort, shall fall within the protection scope of the present invention.

Referring first to fig. 1, an embodiment of the present invention provides a method for generating a summary based on an abstract neural network with known attention distribution, including the following steps:

step S2, inputting the sentence vector into a pre-trained attention distribution prediction model to obtain the predicted optimal attention distribution;

step S3, calculating the attention score of each candidate abstract according to the distance between the attention distribution of each candidate abstract and the optimal attention distribution;

step S4, calculating the final score of each candidate abstract according to the conditional probability and the attention score of each abstract sequence;

in step S5, the candidate summary with the highest final score is selected as the final summary.

In step S3, the attention score of each candidate summary is calculated according to the following formula:

wherein attAw (Y) is the attention score, α_t,pIs the attention weight of the t-th word to the p-th sentence, Y is the generated sequence, | Y | is the length of the generated sequence,attention distribution,. psi, for generating a summary of reality_p(X) is the predicted optimal attention distribution. If a candidate abstract gives too low attention to some sentences, the attention score will decrease, and if some sentences are assigned too high attention, the score of itself will not change, but the attention score of other sentences will decrease, thus resulting in the decrease of the score. In general, if the attention distribution of a candidate summary is closer to the predicted optimal attention distribution, its attention score is higher.

In step S4, the final score of each candidate summary is calculated according to the following formula:

where Score is the final Score, p (Y | X) is the conditional probability of each digest sequence Y, and β is a scaling factor, which is used to balance the conditional concept and the attention Score. The final score is equal to the conditional probability of removing the effect of the digest length plus a proportion of the attention score.

In step S2, the attention distribution prediction model is a dependency relationship between a sentence vector and an attention distribution, please refer to fig. 2, where the training and learning process of the attention distribution prediction model includes: the method comprises the steps of inputting a set of sentence vectors into a Transformer encoder to obtain a set of sentence vectors containing context information, obtaining corresponding attention weights through a full-connection layer of a single neuron for each sentence vector, obtaining attention distribution after the attention weights are normalized by softmax, taking the attention distribution obtained by training a summary model before as optimal attention distribution, and training an attention distribution prediction model to minimize the mean square error between the output attention distribution and the optimal attention distribution.

Wherein the set of sentence vectors is obtained by separating the source texts in the form of sentences and inputting them into a summarization model, and obtaining the attention of each predicted word to each sentence.

The calculation process of the attention distribution obtained after the attention weight is normalized by softmax is as follows:

The above description is only an embodiment of the present invention, but the scope of the present invention is not limited thereto, and any changes or substitutions that can be easily conceived by those skilled in the art within the technical scope of the present invention disclosed herein are intended to be covered by the scope of the present invention. Therefore, the protection scope of the present invention shall be subject to the protection scope of the appended claims.

9页详细技术资料下载

Abstract generation method based on abstract neural network with known attention distribution

相关技术

网友询问留言