Event information extraction method and device, storage medium and electronic equipment

文档序号：830107 发布日期：2021-03-30 浏览：20次中文

阅读说明：本技术 事件信息的抽取方法及装置、存储介质、电子设备 (Event information extraction method and device, storage medium and electronic equipment ) 是由王硕杨康徐成国周星杰于 2020-12-11 设计创作，主要内容包括：本发明公开了一种事件信息的抽取方法及装置、存储介质、电子设备,属于人工智能领域。其中,该方法包括：获取非结构化文本中事件元素的文本特征,以及获取多个事件元素之间的句法依赖关系,其中,所述事件元素包括事件触发词和事件论元；以所述文本特征和句法依赖关系为输入信息,采用图注意力网络GAT编码所述非结构化文本的图结构特征；将所述图结构特征输入全连接层映射到样本空间,并采用Softmax将所述图结构特征映射到所述事件元素的类别标签。通过本发明,解决了相关技术中的事件抽取方法无法编码事件元素之间的依赖关系特征的技术问题,提高了模型对文本的表征能力,从而提高事件的抽取性能。(The invention discloses an event information extraction method and device, a storage medium and electronic equipment, and belongs to the field of artificial intelligence. Wherein, the method comprises the following steps: acquiring text characteristics of event elements in an unstructured text, and acquiring a syntactic dependency relationship among a plurality of event elements, wherein the event elements comprise event trigger words and event arguments; the text characteristics and the syntax dependence relationship are taken as input information, and graph structure characteristics of the unstructured text are coded by adopting a graph attention network GAT; mapping the graph structure feature input full-connected layer to a sample space, and mapping the graph structure feature to a category label of the event element by using Softmax. The method and the device solve the technical problem that the event extraction method in the related technology cannot encode the dependency relationship characteristics among the event elements, and improve the representation capability of the model on the text, thereby improving the extraction performance of the event.)

1. An extraction method of event information, comprising:

acquiring text characteristics of event elements in an unstructured text, and acquiring a syntactic dependency relationship among a plurality of event elements, wherein the event elements comprise event trigger words and event arguments;

the text characteristics and the syntax dependence relationship are taken as input information, and graph structure characteristics of the unstructured text are coded by adopting a graph attention network GAT;

mapping the graph structure feature input full-connected layer to a sample space, and mapping the graph structure feature to a category label of the event element by using Softmax.

2. The method of claim 1, wherein obtaining text features of event elements in unstructured text comprises:

extracting the feature vectors of the event elements by adopting M continuous hidden layers in the middle of a BERT model, wherein M is an integer greater than 1;

and extracting local features of the feature vector by adopting a multi-scale Convolutional Neural Network (CNN).

3. The method of claim 2, wherein extracting local features of the feature vector using multi-scale CNN comprises:

and extracting n-gram features of the text under different scales from the feature vectors by adopting a CNN convolution kernel with the scales of 1 × 1,3 × 3 and 5 × 5, wherein the activation function of the multi-scale CNN is a linear rectification function ReLU.

4. The method of claim 1, wherein obtaining syntactic dependency characteristics between a plurality of event elements comprises:

adopting a StandFordNLP to carry out dependency syntax analysis, and analyzing a syntax dependency relationship between words in the unstructured text, wherein the syntax dependency relationship is used for representing a directed dependency relationship between two event elements;

and adopting an adjacency matrix of the directed graph to store the syntactic dependency between the two event elements.

5. The method of claim 1, wherein encoding graph structure features of the unstructured text using a graph attention network (GAT) comprises:

for each vertex word in the syntactic dependency tree, calculating an attention factor of a vertex adjacent to the syntactic dependency relationship of the syntactic dependency tree, wherein the syntactic dependency relationship in the unstructured text is represented as the syntactic dependency tree;

and carrying out normalization processing on the attention factor to obtain an attention coefficient, and carrying out weighted summation on the text features by adopting the attention coefficient to obtain an attention vector of each vertex word.

6. The method of claim 1, wherein encoding graph structure features of the unstructured text using a graph attention network (GAT) comprises:

stacking the GAT networks into n layers according to the complexity of the unstructured text, wherein each layer corresponds to one sub-GAT network, and the number of layers is positively correlated with the complexity;

and calculating the middle attention feature of the unstructured text by adopting the front n-1 layers of GAT networks, inputting the average value of the middle attention features of the n-1 layers of sub-GAT networks into the last layer of GAT network, and outputting the graph structure feature of the unstructured text.

7. The method of claim 1, wherein mapping the graph structure feature to a category label for the event element comprises:

inputting the graph structural features into a full-connection layer, mapping the graph structural features to a sample space, and mapping the graph structural features to category labels of corresponding event elements by adopting softmax, wherein a network model where the full-connection layer is located adopts a cross entropy loss function to perform loss calculation, and adopts L2 regular to prevent overfitting.

8. An extraction device of event information, comprising:

the obtaining module is used for obtaining text characteristics of event elements in the unstructured text and obtaining syntactic dependency relations among a plurality of event elements, wherein the event elements comprise event trigger words and event arguments;

the processing module is used for coding the graph structure characteristics of the unstructured text by adopting a graph attention network GAT (generic object model) by taking the text characteristics and the syntactic dependency relationship as input information;

a mapping module to map the graph structure feature input fully connected layer to a sample space and to map the graph structure feature to a category label of the event element using Softmax.

9. A storage medium, characterized in that the storage medium comprises a stored program, wherein the program is operative to perform the method steps of any of the preceding claims 1 to 7.

10. An electronic device comprises a processor, a communication interface, a memory and a communication bus, wherein the processor, the communication interface and the memory are communicated with each other through the communication bus; wherein:

a memory for storing a computer program;

a processor for performing the method steps of any of claims 1 to 7 by executing a program stored on a memory.

Technical Field

The invention relates to the field of artificial intelligence, in particular to an event information extraction method and device, a storage medium and electronic equipment.

Background

In the related art, event extraction is a very challenging task in the field of natural language processing information extraction, and aims to extract structured event information such as time, place, people, events and the like from unstructured text, so that the application field is wide.

In the related technology, there are two methods for event extraction, one is based on a pipeline mode, and the mode firstly identifies and classifies trigger words and then further identifies and classifies argument roles. Pipeline-based models tend to be prone to error propagation problems, namely: once the trigger word is identified incorrectly, the argument role identification is incorrect, and the argument identification task cannot assist the trigger word identification task and cannot assist event extraction by utilizing the dependency relationship between the trigger word and the argument; the other method is based on a combined mode, namely a combined model is established by utilizing the relation between trigger words and arguments, and the roles of the trigger words and the arguments are predicted, the method solves the problem of cascade errors in a pipeline model, more and more scholars adopt a combined event extraction method to jointly model the event trigger words and the event arguments along with the rapid development of deep learning, but most of the scholars extract English events, and the Chinese events are less in extraction method, so that the text length characteristics of the Chinese expression event trigger words and the event arguments cannot be extracted; and secondly, the syntactic dependency existing between the event trigger word and the event argument cannot be extracted, so that the characteristics of the event are lost, the complete semantics cannot be identified, or the semantic identification accuracy is low.

In view of the above problems in the related art, no effective solution has been found at present.

Disclosure of Invention

The embodiment of the invention provides an event information extraction method and device, a storage medium and electronic equipment.

According to an aspect of an embodiment of the present application, there is provided an event information extraction method, including: acquiring text characteristics of event elements in an unstructured text, and acquiring a syntactic dependency relationship among a plurality of event elements, wherein the event elements comprise event trigger words and event arguments; the text characteristics and the syntax dependence relationship are taken as input information, and graph structure characteristics of the unstructured text are coded by adopting a graph attention network GAT; mapping the graph structure feature input full-connected layer to a sample space, and mapping the graph structure feature to a category label of the event element by using Softmax.

Further, acquiring the text features of the event elements in the unstructured text comprises: extracting the feature vectors of the event elements by adopting M continuous hidden layers in the middle of a BERT model, wherein M is an integer greater than 1; and extracting local features of the feature vector by adopting a multi-scale Convolutional Neural Network (CNN).

Further, extracting local features of the feature vector by using the multi-scale CNN includes: and extracting n-gram features of the text under different scales from the feature vectors by adopting a CNN convolution kernel with the scales of 1 × 1,3 × 3 and 5 × 5, wherein the activation function of the multi-scale CNN is a linear rectification function ReLU.

Further, obtaining syntactic dependency characteristics between the plurality of event elements includes: adopting a StandFordNLP to carry out dependency syntax analysis, and analyzing a syntax dependency relationship between words in the unstructured text, wherein the syntax dependency relationship is used for representing a directed dependency relationship between two event elements; and adopting an adjacency matrix of the directed graph to store the syntactic dependency between the two event elements.

Further, the graph structure feature of the unstructured text encoded by the graph attention network GAT includes: for each vertex word in the syntactic dependency tree, calculating an attention factor of a vertex adjacent to the syntactic dependency relationship of the syntactic dependency tree, wherein the syntactic dependency relationship in the unstructured text is represented as the syntactic dependency tree; and carrying out normalization processing on the attention factor to obtain an attention coefficient, and carrying out weighted summation on the text features by adopting the attention coefficient to obtain an attention vector of each vertex word.

Further, the graph structure feature of the unstructured text encoded by the graph attention network GAT includes: stacking the GAT networks into n layers according to the complexity of the unstructured text, wherein each layer corresponds to one sub-GAT network, and the number of layers is positively correlated with the complexity; and calculating the middle attention feature of the unstructured text by adopting the front n-1 layers of GAT networks, inputting the average value of the middle attention features of the n-1 layers of sub-GAT networks into the last layer of GAT network, and outputting the graph structure feature of the unstructured text.

Further, mapping the graph structure feature to a category label of the event element comprises: inputting the graph structural features into a full-connection layer, mapping the graph structural features to a sample space, and mapping the graph structural features to category labels of corresponding event elements by adopting softmax, wherein a network model where the full-connection layer is located adopts a cross entropy loss function to perform loss calculation, and adopts L2 regular to prevent overfitting.

According to another aspect of the embodiments of the present application, there is also provided an event information extraction apparatus, including: the obtaining module is used for obtaining text characteristics of event elements in the unstructured text and obtaining syntactic dependency relations among a plurality of event elements, wherein the event elements comprise event trigger words and event arguments; the processing module is used for coding the graph structure characteristics of the unstructured text by adopting a graph attention network GAT (generic object model) by taking the text characteristics and the syntactic dependency relationship as input information; a mapping module to map the graph structure feature input fully connected layer to a sample space and to map the graph structure feature to a category label of the event element using Softmax.

Further, the obtaining module includes: a first extraction unit, configured to extract a feature vector of the event element by using M consecutive hidden layers in the middle of a BERT model, where M is an integer greater than 1; and the second extraction unit is used for extracting the local features of the feature vector by adopting a multi-scale Convolutional Neural Network (CNN).

Further, the second extraction unit includes: and the extracting subunit is used for extracting n-gram features of the text at different scales from the feature vector by using a CNN convolution kernel with the scales of 1 × 1,3 × 3 and 5 × 5, wherein the activating function of the multi-scale CNN is a linear rectification function ReLU.

Further, the obtaining module includes: the analysis unit is used for analyzing the dependency syntax by adopting the StandFordNLP and analyzing the syntax dependency relationship between the words in the unstructured text, wherein the syntax dependency relationship is used for representing the directed dependency relationship between two event elements; and the storage unit is used for storing the syntactic dependency relationship between the two event elements by adopting an adjacency matrix of the directed graph.

Further, the processing module includes: a first calculation unit, configured to calculate, for each vertex word in a syntactic dependency tree, an attention factor of a vertex adjacent to its syntactic dependency, where the syntactic dependency in the unstructured text is represented as the syntactic dependency tree; and the second calculation unit is used for carrying out normalization processing on the attention factor to obtain an attention coefficient, and carrying out weighted summation on the text features by adopting the attention coefficient to obtain an attention vector of each vertex word.

Further, the processing module package includes: the stacking unit is used for stacking the GAT network into n layers according to the complexity of the unstructured text, wherein each layer corresponds to one sub-GAT network, and the number of layers is positively correlated with the complexity; and the third calculation unit is used for calculating the intermediate attention feature of the unstructured text by adopting the front n-1 layers of GAT networks, inputting the average value of the intermediate attention feature of the n-1 layers of sub-GAT networks into the last layer of GAT network, and outputting the graph structure feature of the unstructured text.

Further, the mapping module includes: and the mapping unit is used for inputting the graph structural features into a full connection layer to map the graph structural features to a sample space, and mapping the graph structural features to the class labels of the corresponding event elements by adopting softmax, wherein a network model where the full connection layer is located adopts a cross entropy loss function to perform loss calculation, and adopts an L2 regular mode to prevent overfitting.

According to another aspect of the embodiments of the present application, there is also provided a storage medium including a stored program that executes the above steps when the program is executed.

According to another aspect of the embodiments of the present application, there is also provided an electronic device, including a processor, a communication interface, a memory, and a communication bus, where the processor, the communication interface, and the memory complete communication with each other through the communication bus; wherein: a memory for storing a computer program; a processor for executing the steps of the method by running the program stored in the memory.

Embodiments of the present application also provide a computer program product containing instructions, which when run on a computer, cause the computer to perform the steps of the above method.

By the invention, the text characteristics of the event elements in the unstructured text are obtained, the syntactic dependency relationship among a plurality of event elements is obtained, the text characteristics and the syntactic dependency relationship are taken as input information, the attention characteristics of the unstructured text are output by adopting GAT, the attention characteristics are mapped into the category labels of the event elements, the model attention can be focused on the words related to the event elements with the syntactic dependency relationship by obtaining the text characteristics and the syntactic dependency relationship of the event elements in the unstructured text and coding the graph structure characteristics of the unstructured text by adopting GAT, the word characteristics with the dependency relationship in the syntactic dependency relationship are aggregated into the text characteristics of the current words, the technical problem that the dependency relationship characteristics among the event elements cannot be coded by an event extraction method in the related technology is solved, and the text representation capability of the model is improved, thereby improving the extraction performance of events.

Drawings

The accompanying drawings, which are included to provide a further understanding of the invention and are incorporated in and constitute a part of this application, illustrate embodiment(s) of the invention and together with the description serve to explain the invention without limiting the invention. In the drawings:

FIG. 1 is a block diagram of a hardware configuration of a computer according to an embodiment of the present invention;

FIG. 2 is a flow chart of a method for extracting event information according to an embodiment of the present invention;

FIG. 3 is a schematic flow chart of a protocol for an embodiment of the present invention;

fig. 4 is a block diagram of an event information extraction apparatus according to an embodiment of the present invention;

fig. 5 is a block diagram of an electronic device implementing an embodiment of the invention.

Detailed Description

In order to make the technical solutions better understood by those skilled in the art, the technical solutions in the embodiments of the present application will be clearly and completely described below with reference to the drawings in the embodiments of the present application, and it is obvious that the described embodiments are only partial embodiments of the present application, but not all embodiments. All other embodiments, which can be derived by a person skilled in the art from the embodiments given herein without making any creative effort, shall fall within the protection scope of the present application. It should be noted that the embodiments and features of the embodiments in the present application may be combined with each other without conflict.

It should be noted that the terms "first," "second," and the like in the description and claims of this application and in the drawings described above are used for distinguishing between similar elements and not necessarily for describing a particular sequential or chronological order. It is to be understood that the data so used is interchangeable under appropriate circumstances such that the embodiments of the application described herein are capable of operation in sequences other than those illustrated or described herein. Furthermore, the terms "comprises," "comprising," and "having," and any variations thereof, are intended to cover a non-exclusive inclusion, such that a process, method, article, or apparatus that comprises a list of steps or elements is not necessarily limited to those steps or elements expressly listed, but may include other steps or elements not expressly listed or inherent to such process, method, article, or apparatus.

Example 1

The method provided by the embodiment one of the present application may be executed in a server, a computer, or a similar computing device. Taking an example of the present invention running on a computer, fig. 1 is a block diagram of a hardware structure of a computer according to an embodiment of the present invention. As shown in fig. 1, computer 10 may include one or more (only one shown in fig. 1) processors 102 (processor 102 may include, but is not limited to, a processing device such as a microprocessor MCU or a programmable logic device FPGA) and a memory 104 for storing data, and optionally may also include a transmission device 106 for communication functions and an input-output device 108. It will be appreciated by those of ordinary skill in the art that the configuration shown in FIG. 1 is illustrative only and is not intended to limit the configuration of the computer described above. For example, computer 10 may also include more or fewer components than shown in FIG. 1, or have a different configuration than shown in FIG. 1.

The memory 104 may be used to store a computer program, for example, a software program and a module of application software, such as a computer program corresponding to an event information extraction method in the embodiment of the present invention, and the processor 102 executes various functional applications and data processing by running the computer program stored in the memory 104, so as to implement the above-mentioned method. The memory 104 may include high speed random access memory, and may also include non-volatile memory, such as one or more magnetic storage devices, flash memory, or other non-volatile solid-state memory. In some examples, memory 104 may further include memory located remotely from processor 102, which may be connected to computer 10 via a network. Examples of such networks include, but are not limited to, the internet, intranets, local area networks, mobile communication networks, and combinations thereof.

The transmission device 106 is used for receiving or transmitting data via a network. Specific examples of such networks may include wireless networks provided by the communications provider of computer 10. In one example, the transmission device 106 includes a Network adapter (NIC), which can be connected to other Network devices through a base station so as to communicate with the internet. In one example, the transmission device 106 may be a Radio Frequency (RF) module, which is used for communicating with the internet in a wireless manner.

In the present embodiment, an event information extraction method is provided, and fig. 2 is a flowchart of an event information extraction method according to an embodiment of the present invention, as shown in fig. 2, the flowchart includes the following steps:

step S202, acquiring text characteristics of event elements in an unstructured text, and acquiring syntactic dependency relations among a plurality of event elements, wherein the event elements comprise event trigger words and event arguments;

the embodiment can be applied to texts of various languages, and is particularly suitable for texts with syntactic dependency, such as Chinese. In this embodiment, taking an unstructured input text (unstructured text) as an example of a chinese text, an event trigger is: the core word representing the event occurrence can be a verb or a noun, the event argument is a participant of the event and consists of an entity, a value and time, and the event trigger word is the projection of the event concept at the word and phrase level and is the basis and the recourse of the event recognition. The syntactic dependency of the embodiment includes the syntactic dependency between the event trigger and the event argument, and the syntactic dependency between the event argument and the event argument.

Step S204, using the text characteristics and the syntax dependence relationship as input information, and adopting a graph structure characteristic of a graph attention network GAT coding unstructured text;

because the syntactic dependency existing between the event trigger word and the event argument plays an important guiding role in event extraction, the embodiment adopts a Graph Attention Network (GAT) to encode Graph structure data, adopts an Attention mechanism to put the model Attention on the event element related words, and assigns larger feature weight to the event element related words, thereby weakening the interference of useless words and improving the richness and accuracy of text feature representation.

Step S206, mapping the graph structure feature input full-connection layer to a sample space, and mapping the graph structure feature to a category label of an event element by adopting Softmax;

the category labels of the present embodiment are structured event information, and may be divided according to a context or an application scenario, for example, into time, place, person, and event, or into time, place, person, subject, and object, where each category includes a plurality of preset labels. The scheme of the embodiment can be applied to application scenes such as automatic summarization, public opinion analysis, intelligent retrieval, data mining and the like.

Through the steps, the text characteristics of event elements in the unstructured text are obtained, and the syntactic dependency relationship among a plurality of event elements is obtained, wherein the event elements comprise event trigger words and event arguments; the text characteristics and the syntax dependence relationship are taken as input information, and graph structure characteristics of the unstructured text are coded by adopting a graph attention network GAT; mapping the graph structure features to a sample space through a full connection layer, and further mapping the graph structure features to category labels of the event elements by using Softmax. The method and the device solve the technical problem that the event extraction method in the related technology cannot encode the dependency relationship characteristics among the event elements, and improve the representation capability of the model on the text, thereby improving the extraction performance of the event.

In an implementation manner of this embodiment, acquiring text features of event elements in the unstructured text includes:

s11, extracting feature vectors of event elements by adopting M continuous hidden layers in the middle of the BERT model, wherein M is an integer larger than 1;

in one example, the middle M consecutive hidden layers are the 9 th-12 th hidden layers of the BERT (Bidirectional Encoder reproduction from Transformers) model.

In the embodiment, a BERT pre-training language model is adopted for text embedding, semantic information contained in each encoder layer of the BERT is different, lower-layer encoders tend to contain more syntactic structure information, higher-layer encoders tend to contain deeper semantic information, feature information required by different tasks is different, splicing and adding of features also have certain influence on event extraction performance, and splicing of output vectors of 9 th-12 th hidden layers of the BERT model is adopted as extracted feature vectors in an event extraction task, so that the performance of subsequent tasks is improved more effectively. Text scriptureText vector H output by pre-trained language model BERT_iAs shown in formula (1).

x_i＝concat(X_i,...,X_L) (1)

Wherein, X_iAnd L is the output of the i-th hidden layer of the BERT, and the number of the layers of the BERT hidden layer.

S12, extracting local features of the feature vectors by adopting a multi-scale Convolutional Neural Network (CNN).

In one example, extracting local features of a feature vector using multi-scale CNN includes: extracting n-gram features under different scales from the feature vectors by adopting CNN convolution kernels with the scales of 1 x 1,3 x 3 and 5 x 5; and selecting a Linear rectification function (ReLU) as an activation function of the multi-scale CNN, and capturing local features of different scales from the feature vector.

In the Chinese event extraction task, event elements appear in the form of words or phrases, and the CNN network under a single scale cannot well capture the local features of the event elements with different lengths. Therefore, in the embodiment, based on the inclusion network model, for the particularity of the chinese event, the convolution kernels with the scales of 1 × 1,3 × 3,5 × 5 are used to extract n-gram features of the text at different scales, the ReLU is selected as the activation function of the multi-scale CNN, and the local features at different scales are captured through the multi-scale CNN, so as to enrich text embedding, and the convolution operation formula (2) is as follows:

h_ij＝f(w_j·x_i:i+h-1+b_j) (2)

wherein j, b ∈ R, w_jDenotes the jth filter, b is the bias term, f denotes a non-linear function, and h denotes the convolution kernel size.

In one implementation of this embodiment, obtaining the syntactic dependency between the plurality of event elements includes: adopting a Standard NLP (Standard Language Processing) model to carry out dependency syntax analysis and analyzing syntax dependence relationship between words in the unstructured text, wherein the syntax dependence relationship is used for representing directed dependence relationship between two event elements; an adjacency matrix of a directed graph is employed to store syntactic dependencies between two event elements.

And the GAT network module is based on dependency syntax analysis. The module firstly adopts StandFordNLP to carry out dependency syntax analysis, and because the event extraction attention points are the dependency relationship among event elements, the directivity of a root node is ignored in the syntax analysis, only the dependency relationship among words is concerned, and the type of the dependency relationship is ignored. In addition, because the dependency relationship among the event elements is directional, the invention adopts the adjacent matrix of the directed graph to store the syntactic dependency relationship of the text, if the dependency relationship exists among the words, the corresponding adjacent matrix element value is 1, and the corresponding adjacent matrix element is 0 among the words without the dependency relationship. And inputting the text features output by the multi-scale CNN and the adjacency matrix obtained by the syntactic dependency analysis into the GAT network, wherein the structural features of the syntactic dependency relationship exist in the modeling text.

In one embodiment of this embodiment, the graph structure feature that encodes the unstructured text using the graph attention network GAT includes: for each vertex word in the syntactic dependency tree, calculating an attention factor of a vertex adjacent to the syntactic dependency relationship of the syntactic dependency tree, wherein the syntactic dependency relationship in the unstructured text is represented as the syntactic dependency tree; and carrying out normalization processing on the attention factor to obtain an attention coefficient, and carrying out weighted summation on the text features by adopting the attention coefficient to obtain an attention vector of each vertex word.

And performing dependency syntax analysis on the text to obtain a syntactic dependency tree, and calculating the attention factor e of the vertex adjacent to the syntactic dependency relationship of each vertex word in the syntactic dependency tree by the GAT network_ijBy normalizing the attention factor to obtain the attention coefficient alpha_ijUsing the attention coefficient alpha_ijCarrying out weighted summation on the characteristics so as to obtain the output characteristics h of each vertex word at the GAT layer_iI.e., attention vector, the calculation formula is as follows:

wherein N is_iSet of contiguous nodes being node i, e_ijAs an attention factor, α_ijIs e_ijAttention coefficient, h, normalized by softmax_iFor each vertex's output characteristics in the GAT network,is the transpose of the weight matrix.

In another embodiment of this embodiment, the graph structure feature that encodes the unstructured text using the graph attention network GAT includes: stacking the GAT network into n layers according to the complexity of the unstructured text, wherein each layer corresponds to one sub-GAT network; and adopting the front n-1 layers of GAT networks to calculate the hidden features of the unstructured text, inputting the average value of the hidden features of the n-1 sub-GAT networks into the last layer of GAT network, and outputting the graph structure features of the unstructured text.

The learning process of the single-layer GAT network has certain instability, and in one example, in order to make the learning training process of the GAT more stable, a multi-head attention thought can be adopted, and mutually independent attention vectors are spliced together to serve as an output characteristic vector h of the single-layer GAT network_i′：

Wherein K is the number of multiple heads in the attention of the multiple heads.

In another example, in the event extraction process, the GAT network may be stacked into n layers according to the complexity of events in the event extraction fieldInputting the average value of the output of the previous n-1 layers of progressive GAT networks on the last layer of GAT network, and outputting a feature vector h_i' through GAT network coding graph structure data, and adopting an attention mechanism to put the attention of the model on the words related to the event elements, and distributing larger weight to the words, thereby weakening the interference of useless words and improving the richness and accuracy of text feature representation.

In an implementation manner of this embodiment, the graph structure feature is mapped to a category label of the event element: inputting the graph structure features into a full connection layer, mapping the graph structure features to a sample space, inputting output results of the full connection layer to softmax, and mapping the graph structure features to category labels of corresponding event elements by adopting the softmax, wherein a network model (comprising an input layer of unstructured text to a softmax output layer) where the full connection layer is located adopts a cross entropy loss function to perform loss calculation and adopts L2 regular to prevent overfitting.

At the output layer, the output features of the GAT network are input into the fully-connected layer to be mapped to the sample space, and are mapped to the category labels of the corresponding event elements by adopting softmax. The present embodiment employs a cross entropy loss function (loss) for loss calculation and uses L2 regularization to prevent overfitting.

Wherein N is the text length, M is the category number of the event element label,event element labels, θ, representing the final predictionAnd representing parameters of the model, wherein lambda is a regularized parameter of L2, and Adam is used as an optimizer to gradually optimize loss in the training process of the event extraction model.

Fig. 3 is a schematic flow chart of a scheme of an embodiment of the present invention, which avoids propagation errors in the event-triggered word recognition and argument recognition and classification processes by using a joint event extraction method, and uses context vector representation of a pre-trained BERT model modeling text, and then captures local features of event-triggered words and arguments of different lengths by using a multi-scale CNN. In addition, a syntactic dependency relationship of a text is modeled through a GAT network, different attention weights are distributed to adjacent nodes of current candidate event elements in the syntactic dependency relationship, the syntactic dependency relationship among the event elements and the strength of the dependency relationship are modeled, the text feature modeling capacity of a model is improved, finally, the output features are mapped to a sample space through a full-connection network, and recognition and classification of event trigger words and event arguments are carried out, so that the extraction performance of Chinese events is improved.

The embodiment provides a Chinese event joint extraction scheme based on multi-feature fusion, aiming at cascading errors existing in an event extraction process adopting a pipeline mode, an event trigger word and an event argument are identified simultaneously by adopting a joint mode, an event extraction task is formalized into a sequence labeling problem, and the event trigger word and the event argument are extracted in a joint mode. According to the scale characteristics of event elements in Chinese events, n-gram characteristic information of a text is modeled by adopting multi-scale CNN according to the scale characteristics of the event elements in the Chinese events, local characteristics of the event elements with different scales are modeled by adopting the multi-scale CNN, the syntactic dependency relationship of the text is obtained through dependency syntactic analysis aiming at the defect of the existing method in modeling of the syntactic dependency relationship structure between the event elements, different attention weights are distributed to adjacent nodes of candidate event elements in the syntactic dependency relationship through a GAT network, so that the characteristics of the candidate event elements are coded into the current candidate event elements, the characterization capability of the model on the text is improved, the extraction performance of the Chinese events is improved, and the extraction performance of the Chinese events is improved.

Through the above description of the embodiments, those skilled in the art can clearly understand that the method according to the above embodiments can be implemented by software plus a necessary general hardware platform, and certainly can also be implemented by hardware, but the former is a better implementation mode in many cases. Based on such understanding, the technical solutions of the present invention may be embodied in the form of a software product, which is stored in a storage medium (e.g., ROM/RAM, magnetic disk, optical disk) and includes instructions for enabling a terminal device (e.g., a mobile phone, a computer, a server, or a network device) to execute the method according to the embodiments of the present invention.

Example 2

In this embodiment, an event information extraction device is further provided, which is used to implement the foregoing embodiments and preferred embodiments, and the description that has been already made is omitted. As used below, the term "module" may be a combination of software and/or hardware that implements a predetermined function. Although the means described in the embodiments below are preferably implemented in software, an implementation in hardware, or a combination of software and hardware is also possible and contemplated.

Fig. 4 is a block diagram of an event information extraction apparatus according to an embodiment of the present invention, as shown in fig. 4, the apparatus includes: an acquisition module 40, a processing module 42, a mapping module 44, wherein,

the obtaining module 40 is configured to obtain text features of event elements in the unstructured text, and obtain a syntactic dependency relationship between a plurality of event elements, where the event elements include event trigger words and event arguments;

the processing module 42 is used for encoding the graph structure characteristics of the unstructured text by adopting a graph attention network GAT with the text characteristics and the syntactic dependency relationship as input information;

a mapping module 44, configured to map the graph structure feature input full-link layer to a sample space, and map the graph structure feature to a category label of the event element using Softmax.

Optionally, the obtaining module includes: a first extraction unit, configured to extract a feature vector of the event element by using M consecutive hidden layers in the middle of a BERT model, where M is an integer greater than 1; and the second extraction unit is used for extracting the local features of the feature vector by adopting a multi-scale Convolutional Neural Network (CNN).

Optionally, the second extracting unit includes: and the extracting subunit is used for extracting n-gram features of the text at different scales from the feature vector by using a CNN convolution kernel with the scales of 1 × 1,3 × 3 and 5 × 5, wherein the activating function of the multi-scale CNN is a linear rectification function ReLU.

Optionally, the obtaining module includes: the analysis unit is used for analyzing the dependency syntax by adopting the StandFordNLP and analyzing the syntax dependency relationship between the words in the unstructured text, wherein the syntax dependency relationship is used for representing the directed dependency relationship between two event elements; and the storage unit is used for storing the syntactic dependency relationship between the two event elements by adopting an adjacency matrix of the directed graph.

Optionally, the processing module includes: a first calculation unit, configured to calculate, for each vertex word in a syntactic dependency tree, an attention factor of a vertex adjacent to its syntactic dependency, where the syntactic dependency in the unstructured text is represented as the syntactic dependency tree; and the second calculation unit is used for carrying out normalization processing on the attention factor to obtain an attention coefficient, and carrying out weighted summation on the text features by adopting the attention coefficient to obtain an attention vector of each vertex word.

Optionally, the processing module package includes: the stacking unit is used for stacking the GAT network into n layers according to the complexity of the unstructured text, wherein each layer corresponds to one sub-GAT network, and the number of layers is positively correlated with the complexity; and the third calculation unit is used for calculating the intermediate attention feature of the unstructured text by adopting the front n-1 layers of GAT networks, inputting the average value of the intermediate attention feature of the n-1 layers of sub-GAT networks into the last layer of GAT network, and outputting the graph structure feature of the unstructured text.

Optionally, the mapping module includes: and the mapping unit is used for inputting the graph structural features into a full connection layer to map the graph structural features to a sample space, and mapping the graph structural features to the class labels of the corresponding event elements by adopting softmax, wherein a network model where the full connection layer is located adopts a cross entropy loss function to perform loss calculation, and adopts an L2 regular mode to prevent overfitting.

It should be noted that, the above modules may be implemented by software or hardware, and for the latter, the following may be implemented, but not limited to: the modules are all positioned in the same processor; alternatively, the modules are respectively located in different processors in any combination.

Example 3

Embodiments of the present invention also provide a storage medium having a computer program stored therein, wherein the computer program is arranged to perform the steps of any of the above method embodiments when executed.

Alternatively, in the present embodiment, the storage medium may be configured to store a computer program for executing the steps of:

s1, acquiring text characteristics of event elements in the unstructured text, and acquiring syntactic dependencies among a plurality of event elements, wherein the event elements comprise event trigger words and event arguments;

s2, using the text features and the syntactic dependency relationship as input information, and adopting a graph attention network GAT to encode graph structure features of the unstructured text;

s3, mapping the graph structure feature input full-link layer to a sample space, and mapping the graph structure feature to the class label of the event element by adopting Softmax.

Optionally, in this embodiment, the storage medium may include, but is not limited to: various media capable of storing computer programs, such as a usb disk, a Read-Only Memory (ROM), a Random Access Memory (RAM), a removable hard disk, a magnetic disk, or an optical disk.

Embodiments of the present invention also provide an electronic device comprising a memory having a computer program stored therein and a processor arranged to run the computer program to perform the steps of any of the above method embodiments.

Optionally, the electronic device may further include a transmission device and an input/output device, wherein the transmission device is connected to the processor, and the input/output device is connected to the processor.

Optionally, in this embodiment, the processor may be configured to execute the following steps by a computer program:

s2, using the text features and the syntactic dependency relationship as input information, and adopting a graph attention network GAT to encode graph structure features of the unstructured text;

s3, mapping the graph structure feature input full-link layer to a sample space, and mapping the graph structure feature to the class label of the event element by adopting Softmax.

Optionally, the specific examples in this embodiment may refer to the examples described in the above embodiments and optional implementation manners, and this embodiment is not described herein again.

Fig. 5 is a block diagram of an electronic device according to an embodiment of the present invention, as shown in fig. 5, including a processor 51, a communication interface 52, a memory 53 and a communication bus 54, where the processor 51, the communication interface 52, and the memory 53 complete communication with each other through the communication bus 54, and the memory 53 is used for storing computer programs; and a processor 51 for executing the program stored in the memory 53.

The above-mentioned serial numbers of the embodiments of the present application are merely for description and do not represent the merits of the embodiments.

In the above embodiments of the present application, the descriptions of the respective embodiments have respective emphasis, and for parts that are not described in detail in a certain embodiment, reference may be made to related descriptions of other embodiments.

In the embodiments provided in the present application, it should be understood that the disclosed technology can be implemented in other ways. The above-described embodiments of the apparatus are merely illustrative, and for example, the division of the units is only one type of division of logical functions, and there may be other divisions when actually implemented, for example, a plurality of units or components may be combined or may be integrated into another system, or some features may be omitted, or not executed. In addition, the shown or discussed mutual coupling or direct coupling or communication connection may be an indirect coupling or communication connection through some interfaces, units or modules, and may be in an electrical or other form.

The units described as separate parts may or may not be physically separate, and parts displayed as units may or may not be physical units, may be located in one place, or may be distributed on a plurality of network units. Some or all of the units can be selected according to actual needs to achieve the purpose of the solution of the embodiment.

In addition, functional units in the embodiments of the present application may be integrated into one processing unit, or each unit may exist alone physically, or two or more units are integrated into one unit. The integrated unit can be realized in a form of hardware, and can also be realized in a form of a software functional unit.

The integrated unit, if implemented in the form of a software functional unit and sold or used as a stand-alone product, may be stored in a computer readable storage medium. Based on such understanding, the technical solution of the present application may be substantially implemented or contributed to by the prior art, or all or part of the technical solution may be embodied in a software product, which is stored in a storage medium and includes instructions for causing a computer device (which may be a personal computer, a server, or a network device) to execute all or part of the steps of the method according to the embodiments of the present application. And the aforementioned storage medium includes: a U-disk, a Read-Only Memory (ROM), a Random Access Memory (RAM), a removable hard disk, a magnetic or optical disk, and other various media capable of storing program codes.

The foregoing is only a preferred embodiment of the present application and it should be noted that those skilled in the art can make several improvements and modifications without departing from the principle of the present application, and these improvements and modifications should also be considered as the protection scope of the present application.

15页详细技术资料下载

Event information extraction method and device, storage medium and electronic equipment

相关技术

网友询问留言