Humor text generation method with enhanced external knowledge

文档序号:1310783 发布日期:2020-07-10 浏览:9次 中文

阅读说明:本技术 一种外部知识增强的幽默文本生成方法 (Humor text generation method with enhanced external knowledge ) 是由 吕建成 张航 杨可心 彭德中 彭玺 孙亚楠 贺喆南 于 2020-03-11 设计创作,主要内容包括:本发明公开了一种外部知识增强的幽默文本生成方法,该方法包括对短笑话数据集预处理得到主体句-背景知识-妙语句对齐的数据,构建幽默文本生成模型,利用幽默文本生成模型生成幽默文本。本发明提出利用图注意力网络来聚合一个笑话主体句的背景知识图,增强节点表达,并提出将背景知识图融合到妙语句解码器中,从而实现给定主体句及相关的背景知识,能够生成富含幽默感的妙语句。(The invention discloses an external knowledge enhanced humorous text generation method, which comprises the steps of preprocessing a short joke data set to obtain data with aligned main sentence-background knowledge-musicial sentence, constructing a humorous text generation model, and generating a humorous text by using the humorous text generation model. The invention provides a background knowledge graph for aggregating a joke subject sentence by utilizing a graph attention network, enhances node expression, and fuses the background knowledge graph into a smart sentence decoder, thereby realizing the given subject sentence and related background knowledge and generating a smart sentence rich in humorous sense.)

1. A humorous text generation method with enhanced external knowledge is characterized by comprising the following steps:

s1, acquiring a short joke data set and preprocessing the short joke data set to obtain data of alignment of a main sentence, background knowledge and a wonderful sentence;

s2, constructing a humor text generation model comprising a background knowledge encoder, a main sentence encoder and a background knowledge fused musician sentence decoder;

and S3, processing the data of the alignment of the main body sentence, the background knowledge and the wonderful sentence obtained in the step S1 by using the humor text generation model constructed in the step S2 to generate the humor text.

2. The method for generating humor text according to claim 1, wherein the step S1 comprises the following sub-steps:

s1-1, acquiring a short joke data set, and performing joke filtering, joke point segmentation and joke de-duplication processing;

s1-2, taking the last sentence in the short joke data as a wonderful sentence, and taking other sentences as main sentences;

s1-3, linking the entity in the main body sentence to a Wikipedia website by using an entity linking tool TagMe to obtain a Wikipedia title of the entity;

and S1-4, linking the entity to Wikidata by using the SPARQ L and obtaining the triple related to the entity to obtain the data of alignment of the main sentence, the background knowledge and the smart sentence.

3. The method for generating humor text according to claim 2, wherein the step S2 comprises the following sub-steps:

s2-1, constructing a background knowledge graph according to the background knowledge triples;

s2-2, fusing the characteristics of the adjacent nodes of the background knowledge graph by adopting a background knowledge encoder to obtain the hidden characteristics of the background knowledge;

s2-3, adopting a main sentence encoder to encode the main sentence;

and S2-4, integrating the hidden features of the background knowledge acquired in the step S2-2 and the hidden features of the main sentence acquired in the step S2-3 into the current state of the smart sentence decoder, and decoding the smart sentence by adopting the background knowledge fusion smart sentence decoder.

4. The method for generating humor text with external knowledge enhanced according to claim 3, wherein the step S2-1 comprises the following sub-steps:

s2-1-1, folding the co-reference entities in the background knowledge triples into a single entity node, and mapping the relationship to the relationship node;

s2-1-2, adding a reverse relation node;

s2-1-3, adopting a bidirectional long-short term memory network to encode the text information in the entity and the relation node, and adopting the final hidden state as the initial characteristic of the node.

5. The method for generating humor text with enhanced external knowledge according to claim 4, wherein the step S2-2 is specifically:

setting the background knowledge graph as G ═ V, E, Hl),V={v1,v2,...,vI},Node viIs characterized by being initializedEach node fuses the information of the neighbors through a multi-head attention mechanism to update the characteristics, which is expressed as:

at level l, node viIs characterized in that M is the number of heads in multi-head attention operation, | | | represents that the operation result of M head attention is spliced,representing a node viIs the activation function, sigma,will be provided withAndmapping to the mth head subspace.

6. The method for generating humor text with enhanced external knowledge according to claim 5, wherein the step S2-3 is specifically:

the main sentence sequence subjected to vocabulary embedding and position coding is set as { x1,x2,...,xpAnd then input into the various sub-layers of the encoder block for processing.

7. The method for generating humor text with enhanced external knowledge according to claim 6, wherein the sub-layer processing steps are specifically:

a1, inputSequence of entries { x1,x2,...,xpObtaining the global information expression { x 'of each element through multi-head self-attention operation'1,x′2,...,x′p};

A2, performing residual concatenation and layer planning, and expressing:

{l1,l2,...,ln}=LayNorm({x1+x′1,x2+x′2,...,xp+x′p})#

a3, obtaining a sequence { f through a feedforward neural network1,f2,...,fn};

A4, performing residual concatenation and layer planning, and expressing:

{f′1,f′2,...,f′n}=LayNorm({l1+f1,l2+f2,...,ln+fn})#

a5, mixing { f'1,f′2,...,f′nTransferring to the next sub-layer, and repeating N times of sub-layer operation;

a6, obtaining the final main sentence coding expression { q1,q2,...,qp}。

8. The method for generating humor text with enhanced external knowledge according to claim 7, wherein the step S2-4 comprises the following sub-steps:

s2-4-1, setting the node feature obtained by the knowledge encoder to H ═ H1,h2,...,hIThe coding sequence obtained by the main sentence coder is Qp={q1,q2,...,qpAt this time, the input sequence of the decoder is Yt={y1,y2,...,yI}. Computing a decode-side lexical representation using a plurality of identical modules including a self-attention layer, a subject attention layer, a knowledge fusion layer, and a linear layer;

s2-4-2, in the nth block, the target sentence Yt={y1,y2,...,yISelf-attention calculation after multiple head masks, and expression of main sentence Qp={q1,q2,...,qpAfter performing multi-head attention calculation, the hidden state is expressed as

S2-4-3, integrating the knowledge characteristics into the current state, and expressing as follows:

An=MultiHead(Sn,H,H)

s2-4-4, a gating mechanism is introduced, and the expression is as follows:

Gate(Sn)=λnSn+(1-λn)An

s2-4-5, inputting the characteristics into a feedforward layer of a Transformer, and obtaining a final state { e after N block operations1,e2,...,et};

S2-4-6, generating the next target word yt+1Is represented as:

P(yt+1|X,K,y<=t;θ)∝exp(Woet),

whereinIs the parameter of the model and is,is the vocabulary size of the target.

Technical Field

The invention belongs to the technical field of text generation, and particularly relates to a humorous text generation method for enhancing external knowledge.

Background

Humorous, describing an interesting, laughter, profound expression of sentences. It has the characteristics of clear culture and represents things which the speaker wants to express in the form of testimony, relaxation or sarcasm language. With the rapid growth of artificial intelligence technology, people have increasingly higher expectations for computer capabilities. One of the important reasons for the big fire of the intelligent assistant, such as Microsoft ice, Xiaoai classmates, Tianmaoling, and the like, is the good interaction capability of the Microsoft ice, the Xiaoai classmates, the Tianmaoling, and the like. We want the intelligent assistant to be more emotional and temperature in the communication, i.e. have a higher quotient. Humor is considered as an important expression of the emotion quotient, "temperature" in communication, and has important significance in application fields such as intelligent assistants, conversation generation and the like. At present, most of dialog generation technologies in the intelligent assistant are retrieval and matching, and although humorous responses can be given, the dialog generation technologies cannot be analyzed and understood, and only copy responses are carried out according to chat data of people. According to the lambertian theory, humorous language, i.e. jokes, are generally composed of two parts, namely, a subject (set-up) and a (punchline, also called murraya), for example, "what is hutler eaten in the morning? Eating the jewish! "in, the main sentence" what is eaten in hutler morning? ", provide the background of the joke, including the expectation of the reader. The wonderful sentence "eat jewish! "usually at the end of a joke, producing an unintended effect, laughing.

In recent years, computational generation of humor has attracted increasing attention, and these efforts have been directed primarily to filling in fixed templates or word replacement to create humor. However, few researchers are currently working on a more open form of humorous generation. Furthermore, knowledge of the background is crucial for a person in understanding, generating a joke. For example, in the above example, if you are to feel humor from this sentence, you have to know the background about Hitler, which has to be imported by general external knowledge. To our knowledge, however, background knowledge of jokes has not been introduced in current computational humor research.

In the article 'step towards creative language generation, Chinese humor is automatically generated and explored', a deep learning mode is used for generating a musician sentence for the first time, and a Seq2Seq network and a confrontation network are adopted, so that a certain effect is achieved. However, in their algorithms, the background knowledge of the jokes is not considered, which results in insufficient perception of the model on the subject sentence and failure to generate the laugh-point sentence with strong relevance, thereby reducing the humorous effect.

Disclosure of Invention

Aiming at the problems of fixed form, lack of background knowledge and the like of the current humor text generation method, the invention provides an external knowledge enhanced humor text generation method.

In order to achieve the purpose of the invention, the invention adopts the technical scheme that:

a humorous text generation method with enhanced external knowledge comprises the following steps:

s1, acquiring a short joke data set and preprocessing the short joke data set to obtain data of alignment of a main sentence, background knowledge and a wonderful sentence;

s2, constructing a humor text generation model comprising a background knowledge encoder, a main sentence encoder and a background knowledge fused musician sentence decoder;

and S3, processing the data of the alignment of the main body sentence, the background knowledge and the wonderful sentence obtained in the step S1 by using the humor text generation model constructed in the step S2 to generate the humor text.

Further, the step S1 specifically includes the following sub-steps:

s1-1, acquiring a short joke data set, and performing joke filtering, joke point segmentation and joke de-duplication processing;

s1-2, taking the last sentence in the short joke data as a wonderful sentence, and taking other sentences as main sentences;

s1-3, linking the entity in the main body sentence to a Wikipedia website by using an entity linking tool TagMe to obtain a Wikipedia title of the entity;

and S1-4, linking the entity to Wikidata by using the SPARQ L and obtaining the triple related to the entity to obtain the data of alignment of the main sentence, the background knowledge and the smart sentence.

Further, the step S2 specifically includes the following sub-steps:

s2-1, constructing a background knowledge graph according to the background knowledge triples;

s2-2, fusing the characteristics of the adjacent nodes of the background knowledge graph by adopting a background knowledge encoder to obtain the hidden characteristics of the background knowledge;

s2-3, adopting a main sentence encoder to encode the main sentence;

and S2-4, integrating the hidden features of the background knowledge acquired in the step S2-2 and the hidden features of the main sentence acquired in the step S2-3 into the current state of the smart sentence decoder, and decoding the smart sentence by adopting the background knowledge fusion smart sentence decoder.

Further, the step S2-1 specifically includes the following sub-steps:

s2-1-1, folding the co-reference entities in the background knowledge triples into a single entity node, and mapping the relationship to the relationship node;

s2-1-2, adding a reverse relation node;

s2-1-3, adopting a bidirectional long-short term memory network to encode the text information in the entity and the relation node, and adopting the final hidden state as the initial characteristic of the node.

Further, the step S2-2 specifically includes:

setting the background knowledge graph as G ═ V, E, Hl),V={v1,v2,...,vI},Node viIs characterized by being initializedEach node fuses the information of the neighbors through a multi-head attention mechanism to update the characteristics, which is expressed as:

at level l, node viIs characterized in thatM is the number of heads in multi-head attention operation, | | | represents that the operation result of M head attention is spliced,representing a node viIs the activation function, sigma,will be provided withAndmapping to the mth head subspace.

Further, the step S2-3 specifically includes:

the main sentence sequence subjected to vocabulary embedding and position coding is set as { x1,x2,...,xpAnd then input into the various sub-layers of the encoder block for processing.

Further, the processing steps of the sub-layers are specifically:

a1, sequence of inputs { x1,x2,...,xpObtaining the global information expression { x 'of each element through multi-head self-attention operation'1,x′2,...,x′p};

A2, performing residual concatenation and layer planning, and expressing:

{l1,l2,...,ln}=LayNorm({x1+x′1,x2+x′2,...,xp+x′p})#

a3, obtaining a sequence { f through a feedforward neural network1,f2,...,fn};

A4, performing residual concatenation and layer planning, and expressing:

{f′1,f′2,...,f′n}=LayNorm({l1+f1,l2+f2,...,ln+fn})#

a5, mixing { f'1,f′2,...,f′nTransferring to the next sub-layer, and repeating N times of sub-layer operation;

a6, obtaining the final main sentence coding expression { q1,q2,...,qp}。

Further, the step S2-4 specifically includes the following sub-steps:

s2-4-1, setting the node feature obtained by the knowledge encoder to H ═ H1,h2,...,hIThe coding sequence obtained by the main sentence coder is Qp={q1,q2,...,qpAt this time, the input sequence of the decoder is Yt={y1,y2,...,yI}. Computing a decode-side lexical representation using a plurality of identical modules including a self-attention layer, a subject attention layer, a knowledge fusion layer, and a linear layer;

s2-4-2, in the nth block, the target sentence Yt={y1,y2,...,yISelf-attention calculation after multiple head masks, and expression of main sentence Qp={q1,q2,...,qpAfter performing multi-head attention calculation, the hidden state is expressed as

S2-4-3, integrating the knowledge characteristics into the current state, and expressing as follows:

An=MultiHead(Sn,H,H)

s2-4-4, a gating mechanism is introduced, and the expression is as follows:

Gate(Sn)=λnSn+(1-λn)An

s2-4-5, inputting the characteristics into a feedforward layer of a Transformer, and obtaining a final state { e after N block operations1,e2,...,et};

S2-4-6, generating the next target word yt+1Is represented as:

P(yt+1|X,K,y<=t;θ)∝exp(Woet),

whereinIs the model parameter, | vyI is the vocabulary size of the target.

The invention has the following beneficial effects: the invention provides a background knowledge graph for aggregating a joke subject sentence by utilizing a graph attention network, enhances node expression, and fuses the background knowledge graph into a smart sentence decoder, thereby realizing the given subject sentence and related background knowledge and generating a smart sentence rich in humorous sense.

Drawings

FIG. 1 is a flow diagram of a method for extrinsic knowledge enhanced humor text generation of the present invention;

FIG. 2 is a schematic diagram of a humorous text generation model architecture in an embodiment of the present invention;

fig. 3 is a schematic diagram of bundle searching in the embodiment of the present invention.

Detailed Description

In order to make the objects, technical solutions and advantages of the present invention more apparent, the present invention is described in further detail below with reference to the accompanying drawings and embodiments. It should be understood that the specific embodiments described herein are merely illustrative of the invention and are not intended to limit the invention.

As shown in fig. 1, an embodiment of the present invention provides a humor text generating method with enhanced external knowledge, including the following steps S1 to S3:

a humorous text generation method with enhanced external knowledge comprises the following steps:

s1, acquiring a short joke data set and preprocessing the short joke data set to obtain data of alignment of a main sentence, background knowledge and a wonderful sentence;

in this embodiment, step S1 specifically includes the following sub-steps:

s1-1, acquiring a short joke data set, and performing joke filtering, joke point segmentation and joke de-duplication processing;

the invention selects a short Joke data set and a Reddit-Joke data set as raw data, which are disclosed on a Kaggle website. And then perform joke filtering, smile point segmentation, and joke de-duplication. The data containing the special characters is first deleted and only the jokes of at least two words, at least 15 words per sentence, are retained.

To eliminate duplicates, we use the bow (bag of words) and cosine distances to detect sentence similarity, filtered for jokes with similarity greater than 0.93.

S1-2, taking the last sentence in the short joke data as a wonderful sentence, and taking other sentences as main sentences;

s1-3, linking the entity in the main body sentence to a Wikipedia website by using an entity linking tool TagMe to obtain a Wikipedia title of the entity;

to gain background, we use the entity linking tool TagMe, which links entities appearing in the body sentence to the wikipedia website. To guarantee the credibility of entity links, only entities with credibility greater than 0.1 are reserved.

And S1-4, linking the entity to Wikidata by using the SPARQ L and obtaining the triple related to the entity to obtain the data of alignment of the main sentence, the background knowledge and the smart sentence.

After obtaining the Wikipedia title for the entity, SPARQ L was used to link the entity to Wikidata and obtain the entity-related triples, resulting in approximately 10700 data aligned for body sentences-background knowledge-smiley sentences.

S2, constructing a humor text generation model comprising a background knowledge encoder, a main sentence encoder and a background knowledge fused musician sentence decoder;

in this embodiment, the humorous text generation model architecture is as shown in fig. 2, and step S2 specifically includes the following sub-steps:

s2-1, constructing a background knowledge graph according to the background knowledge triples, and specifically comprising the following steps:

s2-1-1, folding the co-reference entities in the background knowledge triples into a single entity node, and mapping the relationship to the relationship node; the subject, relationship and object nodes in a triple are connected in sequence.

S2-1-2, adding a reverse relation node in order to allow the information of the object to flow into the subject node; the content of the inverse relationship node is the connection of the source and the < rev > symbol. For example, in the triple (apple, one, fruit), the present invention constructs four nodes, i.e., apple, one, fruit, one < rev >.

S2-1-3, adopting a bidirectional long-short term memory network to encode the text information in the entity and the relation node, and adopting the final hidden state as the initial characteristic of the node.

Since the entities and relationships in Wikidata are usually multi-word expressions, the present invention encodes these words using a Bi-directional long-short term memory network (Bi-L STM)0) Where V is the set of nodes, E is the set of edges, H0Is the initial set of characteristics for V.

S2-2, fusing the characteristics of the adjacent nodes of the background knowledge graph by adopting a background knowledge encoder to obtain the hidden characteristics of the background knowledge, which specifically comprises the following steps:

setting the background knowledge graph as G ═ V, E, Hl),V={v1,v2,...,vI},Node viIs characterized by being initializedEach node fuses the information of the neighbors through a multi-head attention mechanism to update the characteristics, which is expressed as:

at level l, node viIs characterized in thatM is the number of heads in multi-head attention operation, | | | represents that the operation result of M head attention is spliced,representing a node viIs the activation function, sigma,will be provided withAndmapping to the mth head subspace.

The present invention calculates a connection weight of each edge by equation 2This calculation can also be expressed as Hl+1=MultiHead(Hl,Hl,Hl)。

According to the invention, the attention weight of each node and the adjacent nodes thereof is calculated at each layer in a graph attention network mode, and the characteristics of each node are updated according to the weighted sum of the attention weights.

S2-3, adopting the main sentence encoder to encode the main sentence, specifically:

the main sentence sequence subjected to vocabulary embedding and position coding is set as { x1,x2,...,xpAnd then input into the various sub-layers of the encoder block for processing.

The processing steps of the sub-layer are specifically as follows:

a1, sequence of inputs { x1,x2,...,xpObtaining the global information expression { x 'of each element through multi-head self-attention operation'1,x′2,...,x′p};

A2, performing residual concatenation and layer planning, and expressing:

{l1,l2,...,ln}=LayNorm({x1+x′1,x2+x′2,...,xp+x′p})#

a3, obtaining a sequence { f through a feedforward neural network1,f2,...,fn};

A4, performing residual concatenation and layer planning, and expressing:

{f′1,f′2,...,f′n}=LayNorm({l1+f1,l2+f2,...,ln+fn})#

a5, mixing { f'1,f′2,...,f′nTransferring to the next sub-layer, and repeating N times of sub-layer operation;

a6, obtaining the final main sentence coding expression (q)1,q2,...,qp}。

S2-4, integrating the hidden features of the background knowledge acquired in the step S2-2 and the hidden features of the main sentence acquired in the step S2-3 into the current state of the smart sentence decoder, and decoding the smart sentence by adopting the smart sentence decoder with the fused background knowledge, wherein the method specifically comprises the following steps:

s2-4-1, setting the node feature obtained by the knowledge encoder to H ═ H1,h2,...,hIThe coding sequence obtained by the main sentence coder is Qp={q1,q2,...,qpAt this time, the input sequence of the decoder is Yt={y1,y2,...,yI}. Computing a decode-side lexical representation using a plurality of identical modules including a self-attention layer, a subject attention layer, a knowledge fusion layer, and a linear layer;

s2-4-2, in the nth block, the target sentence Yt={y1,y2,...,yISelf-attention calculation after multiple head masks, and expression of main sentence Qp={q1,q2,...,qpAfter performing multi-head attention calculation, the hidden state is expressed as

S2-4-3, the knowledge fusion layer comprises a multi-head attention layer and a gating mechanism inspired by high way Network, the invention integrates knowledge characteristics into the current state, and the knowledge characteristics are expressed as:

An=MultiHead(Sn,H,H)

s2-4-4, due to inaccuracies of the entity linking tool, the node information in the background knowledge graph may contain noise. To address this problem, the present invention introduces a gating mechanism that can better balance the impact of background knowledge against the information from the set encoder, expressed as:

Gate(Sn)=λnSn+(1-λn)An

wherein λnExpressing the gating weight, and calculating by the following formula

WhereinAre network parameters.

S2-4-5, inputting the characteristics into a feedforward layer of a Transformer, and obtaining a final state { e after N block operations1,e2,...,et};

S2-4-6, generating the next target word yt+1Is represented as:

P(yt+1|X,K,y<=t;θ)∝exp(Woet),

whereinIs the model parameter, | vyI is the vocabulary size of the target.

And S3, processing the data of the alignment of the main body sentence, the background knowledge and the wonderful sentence obtained in the step S1 by using the humor text generation model constructed in the step S2 to generate the humor text.

In this embodiment, the method trains the humorous text generation model by a common training mode, and the training process is as follows:

inputting: training data set D { (X)1,Y1),(X2,Y2),...,(Xn,Yn)}

The hyper-parameters comprise network parameters, learning rate η, maximum iteration round number epoch, batch data size batch _ size, etc

The process is as follows:

1: initializing network parameters

2: while iteration number < epoch

3: while iteration number < (size of D)/batch _ size +1)

4: acquiring data with the size of batch _ size

5: forward propagation, calculating loss

6: back propagation, updating network parameters

7: saving network parameters

8: print training set, test set loss

And (3) outputting:

trained sequence-to-sequence network model

The variation curve of the model on the training set and the test set is L oss according to the iteration number

Test results on the model

In the testing phase, the model objective is to take the one sentence with the highest probability as output. The output of each time step is the input of the previous time step. The output of each time step network is a distribution of discrete probabilities from which we need to sample as input for the next time step. In this sampling process, there are two general approaches, one is greedy search and one is bundle search. The idea of the greedy search algorithm is simple, i.e. the word with the highest probability is sampled from the output of the network as input for the next time step. However, a problem arises in that the maximum probability of each step does not represent that the probability of the finally generated text sequence is the highest, and there is a high probability that the hidden high-probability sequence is omitted. Therefore, the cluster searching is more satisfactory.

The cluster searching is applied to the output sampling phase of each time step, and the algorithm only retains a few nodes with high probability. That is, in each step of output of the network, k nodes with higher probability are selected and stored, and the final generated sequence will avoid the local optimization problem of greedy search, as shown in fig. 3.

In this diagram, assuming that the dictionary has only five words, the decoder outputs a discrete probability distribution of words at each time step, and the cluster search holds two nodes with the highest probability at each step. The goal is to obtain the most probable word. If greedy search is adopted, the probability of each step of selection is the highest, the result is 'you patrolling', but the probability of the sentence is not as high as 'I patrolling', and a sentence with the potential highest probability is ignored. And the two nodes with the highest probability are selected at each step of the bundle searching, so that the sentence with the highest probability can be selected to a great extent.

It will be appreciated by those of ordinary skill in the art that the embodiments described herein are intended to assist the reader in understanding the principles of the invention and are to be construed as being without limitation to such specifically recited embodiments and examples. Those skilled in the art can make various other specific changes and combinations based on the teachings of the present invention without departing from the spirit of the invention, and these changes and combinations are within the scope of the invention.

12页详细技术资料下载
上一篇:一种医用注射器针头装配设备
下一篇:一种基于机器学习的文章断句方法

网友询问留言

已有0条留言

还没有人留言评论。精彩留言会获得点赞!

精彩留言,会给你点赞!