Event element gridding extraction method based on character embedding, storage medium and electronic device

文档序号:1889947 发布日期:2021-11-26 浏览:4次 中文

阅读说明:本技术 基于字符嵌入的事件元素网格化抽取方法、存储介质及电子装置 (Event element gridding extraction method based on character embedding, storage medium and electronic device ) 是由 陈兴蜀 蒋梦婷 袁磊 刘朋 黄铁脉 廖志红 宋可儿 冯科 王海舟 王文贤 罗永刚 于 2021-09-03 设计创作,主要内容包括:本发明公开了一种基于字符嵌入的事件元素网格化抽取方法、存储介质及电子装置,方法包括以下步骤:首先构建事件元素抽取基础模型,包括基于BERT的信息预学习表示层、字编码嵌入层、BiGRU双向门控循环神经网络层、自注意力层和CRF条件随机场输出层,并将基础模型按功能细化分为事件触发词抽取、事件论元抽取和事件属性抽取3个网格模块;然后分别对事件触发词抽取模型、事件论元抽取模型和事件属性抽取模型进行抽取优化;最后采用训练得到的事件元素抽取模型对测试数据进行事件元素抽取的预测。本发明方法在事件元素抽取任务中表现良好,获得了较高的准确率。(The invention discloses an event element gridding extraction method based on character embedding, a storage medium and an electronic device, wherein the method comprises the following steps: firstly, constructing an event element extraction basic model, including an information pre-learning representation layer based on BERT, a character coding embedding layer, a BiGRU bidirectional gating recurrent neural network layer, a self-attention layer and a CRF conditional random field output layer, and dividing the basic model into 3 grid modules of event trigger word extraction, event argument extraction and event attribute extraction according to function refinement; then, extracting and optimizing an event trigger word extraction model, an event argument extraction model and an event attribute extraction model respectively; and finally, predicting the event element extraction of the test data by adopting an event element extraction model obtained by training. The method has good performance in the event element extraction task and obtains higher accuracy.)

1. An event element gridding extraction method based on character embedding is characterized by comprising the following steps:

step 1: constructing an event element extraction basic model;

the basic model is a character embedded neural network deep learning model and comprises an information pre-learning representation layer based on BERT, a character coding embedding layer, a BiGRU bidirectional gating cyclic neural network layer, a self-attention layer and a CRF conditional random field output layer; the operation steps are as follows:

step 1.1: the method comprises the steps that an information pre-learning representation layer based on BERT pre-learns the context semantic features of sample data to generate a text representation model of an emergent meta-event domain;

step 1.2: the word coding embedding layer inputs semantic expression vectors generated in a trained BERT language model into the BiGRU bidirectional gating recurrent neural network layer;

step 1.3: extracting context-dependent long-distance deep features of an input sequence by a BiGRU bidirectional gating recurrent neural network layer; step 1.4: performing weighted transformation on deep features learned by the BiGRU bidirectional gating recurrent neural network layer from the attention layer to highlight important vocabulary information in a text sequence;

step 1.5: the CRF conditional random field output layer extracts and converts the trigger words into a sequence labeling task so as to solve the problem of context labeling information after the traditional word vector is converted into a word vector;

the basic model is divided into 3 grid modules of event trigger word extraction, event argument extraction and event attribute extraction according to function refinement, and the grid modules are used for optimizing the models respectively in the subsequent steps according to the functional characteristics of different models;

step 2: extracting and optimizing an event trigger word extraction model: extracting a main event trigger word from an event sentence as an event trigger word, and taking redundant event trigger words as external features to assist in representing the main event; all event trigger words in the labeled data are used as a knowledge base and are used as prior characteristics for extracting the event trigger words; extracting trigger words matched with event trigger words in a knowledge base from sentences, marking the trigger words, and splicing the trigger words with character embedded vectors obtained according to a sentence BERT semantic representation model; splicing the event type vectors into character embedding vectors; the target vector of the event trigger word extraction task is represented by the extraction result of the event trigger word and corresponds to the labeling result of the event trigger word;

and step 3: extracting and optimizing an event argument extraction model: on the basis of BERT semantic features of an original text, relative distances from all characters in the text to event trigger words are used as text structure features, and the relative distances of the event trigger words are 0; combining an event subject and an object, combining event time and place, and extracting by adopting two independent models; marking the extraction result of the target vector of the event argument extraction task corresponding to the event argument;

and 4, step 4: extracting and optimizing an event attribute extraction model: defining event attributes as event tenses and event polarities, converting model output into a multi-classification problem, and replacing a CRF conditional random field output layer in the basic model to construct two classifiers; taking the event trigger words and the features obtained by performing dynamic pooling on the left and right ends as global features, splicing the global features with the character embedded vectors obtained according to the sentence BERT semantic representation model, and optimizing by adopting a ten-fold cross validation method;

and 5: and predicting the event element extraction result of the test data by adopting the event element extraction model obtained by training in the steps 1 to 4.

2. The method for extracting event elements based on character embedding in gridding according to claim 1, wherein in step 1.3, the BiGRU bidirectional gated recurrent neural network layer contains a forward directionAnd one in the reverse directionForward GRU captures the above characteristic information a of 0: ttReverse GRU captures context feature information a 'of t: n-1'tObtaining context information y of the sentence by splicing the captured context feature informationtAs shown in formulas (1) to (3):

in the formula, xtRepresenting a word sequence feature vector; a istThe forward GRU captures the above characteristic information of 0: t; a'tRepresenting that the reverse GRU captures the following characteristic information of t: n-1; y istRepresenting context information of the obtained sentence;

the weighted transformation formula (4) shows:

in the formula, eijRepresenting the importance of the features of sentence j to sentence i; a denotes the attention mechanism; w represents a linear transformation weight matrix of the shared parameters; y isiAnd yjRepresenting the obtained context information of sentence i and sentence j, respectively.

3. The character embedding-based event element gridding extraction method according to claim 1, wherein in the step 2, in the event-triggered word extraction task, a word w is assumediIs the event-triggered part-of-speech type target vector of [ tri0,tri1,tri2,...,trij,...,trin]Then, it is trijIs set as shown in equation (5):

in the step 3, in the task of extracting event argument, the word w is assumediIs the event argument type target vector of [ arg ]0,arg1,...,argj,...,argn]Wherein argjIs set as shown in equation (6):

4. a storage medium, in which a computer program is stored, wherein the computer program is arranged to perform the method of any of claims 1 to 3 when executed.

5. An electronic device comprising a memory and a processor, characterized in that the memory has stored therein a computer program, the processor being arranged to execute the method of any of claims 1 to 3 by means of the computer program.

Technical Field

The invention relates to the technical field of event extraction, in particular to an event element gridding extraction method based on character embedding, a storage medium and an electronic device.

Background

The information extraction technology is to extract the information of the unstructured data concerned from massive text data and convert the information into structured data. By the information extraction technology, low-value information content can be filtered, and accurate and high-quality information can be quickly obtained. The event is an important expression form of the information, and the key research direction in the field of information extraction is event extraction. The authority society ace (automatic Content extraction) in the research makes a clear definition on event extraction, and the event extraction requires that unstructured data representing event information in text data is converted into structured and accurate knowledge capable of being stored and used.

In the current society, various hot events with different sizes are pushed in real time on a network news medium. In the face of the growing volume of internet information, it becomes critical to quickly locate specific events discussed by the public. The method not only can help public opinion supervisors to quickly locate specific events and know specific elements of the events, but also can provide event extraction results to other natural language processing tasks for further analysis and application. Due to the influence of network and social factors, the research popularity of event extraction technology is increasing year by year at home and abroad.

Disclosure of Invention

In view of the above problems, an object of the present invention is to provide a character-embedding-based event element gridding extraction method, a storage medium, and an electronic device, where event element gridding extraction is to refine a model into 3 grid modules, namely, event trigger extraction, event argument extraction, and event attribute extraction, on the basis of an event detection task, and each grid module not only jointly shares event semantic information of a basic model, but also independently optimizes extraction performance of each element. The technical scheme is as follows:

an event element gridding extraction method based on character embedding comprises the following steps:

step 1: constructing an event element extraction basic model;

the basic model is a character embedded neural network deep learning model and comprises an information pre-learning representation layer based on BERT, a character coding embedding layer, a BiGRU bidirectional gating cyclic neural network layer, a self-attention layer and a CRF conditional random field output layer; the operation steps are as follows:

step 1.1: the method comprises the steps that an information pre-learning representation layer based on BERT pre-learns the context semantic features of sample data to generate a text representation model of an emergent meta-event domain;

step 1.2: the word coding embedding layer inputs semantic expression vectors generated in a trained BERT language model into the BiGRU bidirectional gating recurrent neural network layer;

step 1.3: extracting context-dependent long-distance deep features of an input sequence by a BiGRU bidirectional gating recurrent neural network layer;

step 1.4: weighting and transforming deep features learned by the BiGRU bidirectional gating recurrent neural network layer from the attention layer to highlight important vocabulary information in a text sequence;

step 1.5: the CRF conditional random field output layer extracts and converts the trigger words into a sequence labeling task so as to solve the problem of context labeling information after the traditional word vector is converted into a word vector;

the basic model is refined into 3 grid modules of event trigger word extraction, event argument extraction and event attribute extraction;

step 2: extracting and optimizing an event trigger word extraction model: extracting a main event trigger word from an event sentence as an event trigger word, and taking redundant event trigger words as external features to assist in representing the main event; all event trigger words in the labeled data are used as a knowledge base and are used as prior characteristics for extracting the event trigger words; extracting and marking trigger words matched with event trigger words in a knowledge base in the sentence, and splicing the trigger words with the output character embedded vectors of the BERT semantic codes of the sentence; splicing the event type vectors into character embedding vectors; the target vector of the event trigger word extraction task is represented by the extraction result of the event trigger word and corresponds to the labeling result of the event trigger word;

and step 3: extracting and optimizing an event argument extraction model: on the basis of BERT semantic features of an original text, relative distances from all characters in the text to event trigger words are used as text structure features, and the relative distances of the event trigger words are 0; combining an event subject and an object, combining event time and place, and extracting by adopting two independent models; the target vector of the event argument extraction task corresponds to the extraction result label of the event argument;

and 4, step 4: extracting and optimizing an event attribute extraction model: defining event attributes as event tenses and event polarities, converting model output into a multi-classification problem, and constructing two classifiers by replacing a conditional random field output layer (CRF) in the basic model; taking the event trigger words and the features obtained by performing dynamic pooling on the left and right ends as global features, splicing with the output character embedded vectors of the sentence BERT semantic code, and optimizing by adopting a ten-fold cross validation method;

and 5: and predicting the event element extraction result of the test data by adopting the event element extraction model obtained by training in the steps 1 to 4.

Further, in step 1.3, the BiGRU bidirectional gated recurrent neural network layer also includes a forward directionAnd one in the reverse directionForward GRU captures the above characteristic information a of 0: ttReverse GRU captures context feature information a 'of t: n-1'tObtaining context information y of the sentence by splicing the captured context feature informationtAs shown in formulas (1) to (3):

yt=[at,a′t] (3)

in the formula, xtRepresenting a word sequence feature vector; a istThe forward GRU captures the above characteristic information of 0: t; a'tRepresenting that the backward GRU captures the following characteristic information of t: n-1; y istRepresenting context information of the obtained sentence;

the weighted transformation formula (4) shows:

in the formula, eijRepresenting the importance of the features of sentence j to sentence i; a denotes the attention mechanism; a linear transformation weight matrix representing the shared parameters; y isiAnd yjRepresenting the obtained context information of sentence i and sentence j, respectively.

Further, in the event-triggered word extraction task, the word w is assumediIs the event-triggered part-of-speech type target vector of [ tri0,tri1,tri2,...,trij,...,trin]Then, it is trijIs set as shown in equation (5):

in the step 3, in the task of extracting event argument, the word w is assumediIs the event argument type target vector of [ arg ]0,arg1,...,argj,...,argn]Wherein argjIs set as shown in equation (6):

a storage medium having a computer program stored therein, wherein the computer program is arranged to perform the above method when run.

An electronic device comprising a memory having a computer program stored therein and a processor arranged to execute the above method by the computer program.

The invention has the beneficial effects that: according to the invention, an event element extraction basic model is utilized to respectively extract and optimize 3 grid modules of event trigger word extraction, event argument extraction and event attribute extraction of model refinement, each grid module not only jointly shares event semantic information of the basic model, but also independently optimizes the extraction performance of each element, and the result shows that the character-embedding-based event element gridding extraction model is well represented in an event element extraction task and is well represented in the event element extraction task, so that higher accuracy is obtained; in addition, the model can be used for carrying out more researches subsequently.

Drawings

FIG. 1 is a schematic flow diagram of the process of the present invention.

FIG. 2 is a schematic diagram of an event element extraction basic model established in the present invention.

FIG. 3 is a diagram of the comparative experimental results of event argument extraction in modules according to the present invention.

FIG. 4 is a schematic diagram of the comparative analysis experiment results of the sub-module event attribute extraction optimization method of the present invention.

FIG. 5 shows the results of comparative experiments of different methods of event element extraction according to the present invention.

Detailed Description

The present invention will be described in further detail with reference to the accompanying drawings and specific embodiments. An event element gridding extraction method based on character embedding comprises the following steps:

step 1: constructing an event element extraction basic model;

as shown in FIG. 2, the event element extraction basic model mainly comprises a BERT-based information pre-learning representation layer, a word coding embedding layer, a BiGRU bidirectional gating cyclic neural network layer, a Self-attention layer and a CRF conditional random field output layer.

And the BERT model can be used for pre-learning the context semantic features of the sample data to generate a text representation model of the emergent meta-event domain. And inputting the semantic expression vector generated in the trained BERT language model into the BiGRU, and extracting the context-dependent long-distance deep features of the input sequence by using the BiGRU.

Among them, the BERT language model is a well-known language model, which is a pre-training model proposed by Google AI research institute in 2018, month 10. The invention only uses the Bert model for semantic representation.

The BiGRU bidirectional gating cyclic neural network layer simultaneously comprises a forward directionAnd one in the reverse directionForward GRU captures the above characteristic information a of 0: ttReverse GRU captures context feature information a 'of t: n-1'tObtaining context information y of the sentence by splicing the captured context feature informationtAs shown in equations 1-3.

yt=[at,a′t] (3)

The Self-attention layer of Self-attention is used for performing weighted transformation on deep features learned by the BiGRU, and highlighting important vocabulary information in the text sequence, as shown in formula 4. And finally, the CRF is used for extracting and converting the trigger words into a sequence labeling task, and context labeling information after conventional word vectors are converted into word vectors is solved.

Step 2: extracting and optimizing an event trigger word extraction model;

there may be a plurality of event trigger words in one event description sentence. In the extraction process of the event element, not only the extraction of the event element is completed, but also the event element and the event trigger word must be associated with each other. Meanwhile, the information elements in one event description sentence are limited, and primary and secondary relations exist in a plurality of events. In order to extract main attention events and richer event elements, one event sentence extracts a main event trigger word as an event trigger word, and redundant event trigger words are used as external features to assist in representing the main event. All event trigger words in the annotation data are used as a knowledge base, similar to a remote supervision mode, and are used as prior characteristics of event trigger word extraction. Extracting and marking out trigger words matched with event trigger words in the knowledge base in the sentences, and splicing the trigger words with output character embedded vectors of the BERT semantic codes of the sentences.

In addition, the composition of the event elements has a great relationship with the types of the events, such as terrorist and explosive-related events like 'attack', and the meaning of the trigger word indicates that two parties have conflict, and generally, two parties with conflict exist at the adjacent position of the trigger word; in a major disaster event such as an earthquake, the trigger word means that a disaster is expressed somewhere, and the possibility that a place element appears at a position adjacent to the trigger word is very high. Therefore, the event type in the event element extraction has important semantic clues, and the event type vector is spliced into the character embedding vector.

In the event trigger word extraction task, a target vector is represented by an extraction result of an event trigger word, and the target vector corresponds to a labeling result of the event trigger word. As shown in Table 1, the lengths of the three event Trigger word labels and the BIO labeling modes are 'B-Trigger', 'I-Trigger' and 'Other', respectively.

TABLE 1 event trigger notation in meanings

Hypothesis word wiIs the event-triggered part-of-speech type target vector of [ tri0,tri1,tri2]Wherein trijIs set as shown in formula (1):

and step 3: extracting and optimizing an event argument extraction model;

four elements of an event subject, an event object, an event time and an event place in the event argument are significantly influenced by an event trigger word in a semantic structure. In order to obtain the potential characteristics of the event argument elements on the sentence semantic structure, on the basis of the BERT semantic characteristics of the original text, the relative distance from all characters in the text to the event trigger word is taken as the text structural characteristics, and the relative distance of the event trigger word is 0. And combining the subject and the object of the event, combining the time and the place of the event, and extracting by adopting two independent models.

In the event argument extraction task, the target vector corresponds to the extraction result label of the event argument. The respective event element tag types and their meanings are shown in Table 2, and the nine types of event trigger word tag lengths and BIO annotation patterns are "B-Subject", "I-Subject", "B-Object", "I-Object", "B-Time", "I-Time", "B-Location", "I-Location", and "Other", respectively.

TABLE 2 event argument notation and implications

Hypothesis word wiIs the event argument type target vector of [ arg ]0,arg1,...,argj,...,arg8]Wherein argjIs set as shown in formula (2):

in the event argument extraction, the element distribution difference of an event subject, an event object, event time and event location is large, and one model can cause the extraction effect of the two elements of the event time and the event location to be poor. In order to improve the extraction effect of each event element in the event argument extraction, table 3 shows the comparison experiment result of whether the event argument extraction is performed by using the submodules.

TABLE 3 sub-module event argument extraction comparative experiment results

Sub & Obj and Tim & Loc represent that four event elements are split into two event argument pairs, two models are trained independently, and event argument extraction is performed. As can be seen from FIG. 1, the sub-modules extract event arguments, so that the problem of uneven distribution of arguments in data can be solved, and the argument extraction effect is effectively improved.

And 4, step 4: extracting and optimizing the event attribute extraction model;

the event attributes are defined as event tenses and event polarities, the event tenses are divided into "past", "present", "future" and "others", and the event polarities are divided into "positive", "negative" and "possible". The output of the model is converted into a multi-classification problem, and two classifiers are constructed by replacing the CRF output layer of the basic model. Classifier activation used a softmax multi-classification function, with a loss function of crossEntropyLoss.

Most words for representing the event tense and the event polarity exist near the event trigger word. Compared with the method that the global feature of the text is utilized, the pooling window near the event trigger word is set, the related close local feature is extracted, and the extraction of the event attribute is facilitated. The method adopts the characteristics obtained by dynamically pooling the event trigger words and the left and right ends as global characteristics and splices the global characteristics with the output character embedded vectors of the sentence BERT semantic code. In addition, in order to improve the generalization performance of the model, a ten-fold cross validation method is considered for optimization.

In the event attribute optimization extraction, two optimization methods of event trigger word left-right dynamic pooling and ten-fold cross validation are added. In order to verify the effectiveness of the optimization method adopted in the event attribute element extraction model, the results of the comparative analysis experiment are shown in table 4.

TABLE 4 event Attribute extraction optimization method comparative analysis experiment results

As can be seen from fig. 4, compared with the case that the basic model is not optimized, the extraction effect of the event attribute can be improved by adding the trigger word pooling feature or performing the ten-fold cross validation; meanwhile, the extraction performance of event attributes can be greatly improved by adding the trigger word pooling characteristic and performing ten-fold cross validation. Through analysis, the event trigger word left-right dynamic pooling characteristics utilize the potential relation between the trigger words and the event attributes, and the extraction performance of event attribute elements is improved; the ten-fold cross validation can reduce overfitting to a certain extent, obtain as much effective information as possible in limited data, relieve the problem of uneven distribution of elements in the data and improve the generalization capability of the model.

And 5: and predicting the event element extraction result of the test data by adopting the event element extraction model obtained by training in the steps 1 to 4. BiGRU-SATT-CRF is an event element extraction method based on character embedding provided by the invention, and the experimental result is shown in Table 5.

TABLE 5 event element extraction different methods to compare experimental results

From the experimental results of fig. 5, it can be seen that the experimental results of the neural network event element extraction method based on character embedding are all superior to other extraction methods, which illustrates that the neural network method based on character embedding and module-based optimization has certain advantages in the event element extraction task.

The inventive methods may be programmed as program code, the code stored via a computer-readable storage medium, the program code transmitted to a processor, and the processor caused to perform the inventive methods.

According to the invention, the event element extraction basic model is utilized to respectively extract and optimize the event trigger word extraction, the event argument extraction and the event attribute extraction 3 grid modules of the model refinement, different characteristic vectors and target vectors are constructed, and the result shows that the character-embedded-based event element gridding extraction model performs well in the event element extraction task. In addition, the model can be used for carrying out more researches subsequently.

14页详细技术资料下载
上一篇:一种医用注射器针头装配设备
下一篇:语义搭配词检查方法

网友询问留言

已有0条留言

还没有人留言评论。精彩留言会获得点赞!

精彩留言,会给你点赞!