Method for extracting process knowledge events in communication field

文档序号：1953451 发布日期：2021-12-10 浏览：17次中文

阅读说明：本技术 一种通信领域过程类知识事件抽取方法 (Method for extracting process knowledge events in communication field ) 是由李飞周源万飞王德玄夏献军于 2021-09-07 设计创作，主要内容包括：本发明公开了一种通信领域过程类知识事件抽取方法,属于信息技术领域,包括以下步骤：S1：对通信领域事件抽取问题进行定义,并选择抽取方法；S2：对通信领域过程类知识的数据预处理；S3：构建分层序列标注任务；S4：使用预训练模型和图卷积神经网络获取增强语义表征；S5：使用门控神经单元获取语义表征的长距离语义依赖信息；S6：使用条件随机场对步骤S5中存在的标签偏差问题进行解决；S7：利用基于模型迁移学习和图卷积神经网络的通信领域过程类知识事件抽取模型对事件进行抽取。本发明使用基于模型迁移学习和图卷积神经网络的融合模型实现语义表征提取,且使用门控神经单元获取语义表征的长距离依赖信息,同时使用条件随机场克服标签偏差问题。(The invention discloses a method for extracting process knowledge events in the communication field, which belongs to the technical field of information and comprises the following steps: s1: defining the event extraction problem of the communication field, and selecting an extraction method; s2: preprocessing data of process knowledge in the communication field; s3: constructing a hierarchical sequence marking task; s4: obtaining an enhanced semantic representation by using a pre-training model and a graph convolution neural network; s5: obtaining long-distance semantic dependency information of semantic representation by using a gated neural unit; s6: solving the tag bias problem existing in step S5 using conditional random fields; s7: and extracting the events by using a communication field process knowledge event extraction model based on model transfer learning and a graph convolution neural network. According to the method, a fusion model based on model transfer learning and a graph convolution neural network is used for extracting semantic representation, a gated neural unit is used for acquiring long-distance dependence information of the semantic representation, and a conditional random field is used for overcoming the problem of label deviation.)

1. A method for extracting process knowledge events in the communication field is characterized by comprising the following steps:

s1: defining the event extraction problem of the communication field, and selecting an extraction method;

s2: preprocessing data of process knowledge in the communication field;

s3: constructing a hierarchical sequence marking task;

s4: obtaining an enhanced semantic representation by using a pre-training model and a graph convolution neural network;

s5: obtaining long-distance semantic dependency information of semantic representation by using a gated neural unit;

s6: solving the tag bias problem existing in step S5 using conditional random fields;

s7: and extracting the events by using a communication field process knowledge event extraction model based on model transfer learning and a graph convolution neural network to obtain a prediction result of communication field process knowledge event extraction.

2. The method for extracting process-class knowledge events in the communication field according to claim 1, wherein: in step S1, the specific process of defining the communication domain event extraction problem is as follows:

s11: identifying whether a related communication field event exists from the text corpus;

s12: identifying a related element of the related event;

s13: the role each element plays is determined.

3. The method for extracting process-class knowledge events in the communication field according to claim 1, wherein: in step S1, the event extraction method selects a pipeline extraction method, and the pipeline extraction method is adopted to model the event trigger words and the event elements separately, and sequentially extract the trigger words and the event elements included in the event.

4. The method for extracting process-class knowledge events in the communication field according to claim 1, wherein: in step S2, the data preprocessing process is specifically as follows:

s21: data cleansing

Extracting part of obviously labeled error data existing in the corpus text for the labeled communication field knowledge, and directly abandoning the part of data;

s22: data deduplication

Executing deduplication operation on duplicate data generated by recording the same equipment state within a certain time;

s23: text normalization

And processing the text and the symbol which are not uniform in all half angles in the sample data into a uniform format.

5. The method for extracting process-class knowledge events in the communication field according to claim 1, wherein: in step S3, the hierarchical sequence annotation refers to a task of dividing data into structured 8 types of 30-level addresses based on the event type and the event element in the data Schema according to the event type, and performing sequence annotation by using a BIO annotation policy.

6. The method for extracting process-class knowledge events in the communication field according to claim 5, wherein: b in the BIO labeling strategy represents the beginning of an event element, I represents a middle or ending word of the event element, and O represents an irrelevant word.

7. The method for extracting process-class knowledge events in the communication field according to claim 1, wherein: the specific process of step S4 is as follows:

s41: obtaining semantic representation of a corpus text by using a pre-training model;

s42: taking event trigger words and event elements in the corpus text as nodes, wherein the nodes in each corpus have adjacent edge relation, and constructing a dynamic network topological graph; the feature information of each node is transmitted to the neighbor nodes after being transformed through a message transmission mechanism, the extraction transformation of the feature information of the nodes is realized, and then the transmission information of the neighbor nodes around each target node is aggregated through a message receiving mechanism:

wherein A represents an adjacency matrix of the target node, D represents a degree matrix of the target node, and H^(l)For node semantic representation at level l, H^(l+1)For node semantic representation at level l +1, W^(l)And sigma is a Sigmoid activation function which is a characteristic weight matrix of the l-th layer target node.

8. The method for extracting process-class knowledge events in the communication field according to claim 1, wherein: the specific process of step S5 is as follows:

s51: at time step t, the information of the last time step is aggregated with the information of the current time step using an update gate:

z_t＝σ(W^(z)x_t+U^(z)h_t-1)

wherein h is_t-1Latent semantic output, x, representing the t-1 time step_tOriginal semantic input, W, representing the t-th time step^(z)And U^(z)All the weight matrixes are weight matrixes, and sigma is a Sigmoid activation function;

s52: at time step t, the information of the last time step is aggregated with the information of the current time step using a reset gate:

z_t＝σ(W^(r)x_t+U^(r)h_t-1)；

s53: at time step t, the magnitude of the reset gating value is measured, and the previous information is determined to be reserved or forgotten:

h′_t＝tanh(Wx_t+z_t⊙Uh_t-1)

wherein the content of the first and second substances,tan h is the activation function, z_tA result of reset gate, which indicates a Hadamard product;

s54: at the end of time step t, the size of the update gating value is measured, and the information transmitted to the next unit is determined to be the hidden layer information or the information of the update gate of the previous time step:

h_t＝z_t⊙h_t-1+(1-z_t)⊙h′_t。

9. the method for extracting process-class knowledge events in the communication field according to claim 1, wherein: in step S6, the conditional random field is generated according to K feature functions, corresponding K weights, and an observation sequence x ═ x₁，x₂，x₃，...，x_n}, predicting the optimal marker sequence

10. The method for extracting process-class knowledge events in the communication field according to claim 1, wherein: in step S7, the communication field process knowledge event extraction model based on model transfer learning and graph convolution neural network includes a text input layer, a BERT pre-training model layer, a GRU layer, a CRF layer, and an output layer, and the specific working process of the model is as follows:

s71: in a text input layer, performing data preprocessing on an original corpus, and performing token-based processing on the text corpus according to Chinese characters by using a BERT pre-training model to obtain a token-based word segmentation result:

x＝{x₁，x₂，x₃，...，x_n}

wherein x is_iThe ith character of the input sentence is represented, n is the number of characters contained in the sentence, and if the length of the sentence is less than n, 0 is automatically supplemented to the same length after the sequence;

s72: obtaining semantic representation of an original text corpus through a BERT pre-training model layer;

s73: will be pre-trainedSemantic representation of exercise modelsInputting a GRU layer to extract key information of event corpus of the communication field:

wherein the content of the first and second substances,is the input of the current GRU, Ht is the hidden layer state vector of the GRU;

s74: the conditional random field is used to model the dependency between tags, overcoming the tag bias problem, for an input sequence: x ═ x₁，x₂，x₃，...，x_nCalculating an input label sequence y ═ y₁，y₂，y₃，...，y_nTo the target tag sequence y ═LOSS value score of (a):

wherein A is a transition probability matrix, A_i，jIs the conversion score, P, of label i to label j_i，jRepresenting the score of the jth label in the ith mark, wherein m is the maximum length of a single text corpus;

during the training process, optimize { x_i，y_iThe maximum likelihood function of (c):

where λ and Θ are canonical parameters, P (y)_i|x_i) Obtaining a final sequence label according to a maximum likelihood function for the probability from the original sequence to the predicted sequence;

s75: in the output layer, the output label y adjusted according to the CRF layer is { y ═ y₁，y₂，y₃，...，y_nAnd converting the labels of the BIO, and outputting event trigger words and event elements.

Technical Field

The invention relates to the technical field of information, in particular to a method for extracting process knowledge events in the communication field.

Background

In recent years, with the rapid development of natural language processing technology and the wide application of 5G technology in the communication field, how to extract process-class knowledge in the communication field by using natural language processing technology becomes a more and more interesting problem. The communication field event extraction aims at extracting specified event attributes from unstructured process class knowledge texts, is one of important steps of text structuring, and is also the basis of extensive application of knowledge maps.

The current task of extracting events in the communication field generally faces the problems of high marking cost and rare marking samples. Therefore, the method realizes high-quality event extraction under the condition of less labeled samples, and has important value for wide application of event extraction technology in the communication field. The event extraction method based on the rule is adopted, and the difficulty in formulating a unified and complete rule is high due to the uncertainty of a language structure; however, most of the traditional machine learning is based on supervised learning, and the problems of diversified event element expressions and defect of event elements (missing extraction and defect of text description) are difficult to process. Therefore, a method for extracting process knowledge events in the communication field is provided.

Disclosure of Invention

The technical problem to be solved by the invention is as follows: the method can more intuitively present the logic of fault occurrence in the communication operation and maintenance process by extracting and combing the 'event' and 'event relation' in the communication operation and maintenance process, and is an important prerequisite for follow-up fault troubleshooting and one-line processing of the current network fault.

The invention solves the technical problems through the following technical scheme, and the invention comprises the following steps:

s1: problem definition and method selection for extracting process knowledge events in the communication field;

s2: preprocessing data of process knowledge in the communication field;

s3: constructing a hierarchical sequence marking task;

s4: obtaining an enhanced semantic representation using a pre-trained model and a graph convolution neural network (GCN);

s5: obtaining long-distance semantic dependency information of semantic representations by using a gated neural unit (GRU);

s6: overcoming the label bias problem present in step S5 using Conditional Random Fields (CRF);

s7: and carrying out a data extraction process on the communication field process knowledge event extraction model based on model transfer learning and the graph convolution neural network.

Further, the question definition in step S1 refers to which event elements are extracted from the event text corpus of the communication domain; after the requirement analysis is performed, the problem definition of event extraction is given: firstly, whether relevant communication field events exist or not is identified from text corpora, secondly, related elements of the relevant events are identified, and finally, the role played by each element is determined. The method of event extraction selects a pipeline extraction method.

Furthermore, the data preprocessing in step S2 refers to operations such as data cleaning, data deduplication, and text normalization, so as to solve the problems of data non-normalization, feature omission, and labeling error existing in the original manual labeling data.

Further, the hierarchical sequence labeling in step S3 refers to a task of dividing data into structured 8 types of 30-level addresses by using a programming means based on the event type and the event element in the data Schema according to the event type, and performing sequence labeling by using a BIO labeling policy.

Furthermore, the pre-training model in step S4 is obtained by running an auto-supervised learning method on the basis of the mass corpus; the pre-training model provides a model for other tasks to perform model transfer learning, and the model can be used as a feature extractor after being finely adjusted or fixed according to the tasks; the graph convolution neural network is characterized in that a message transmission mechanism and a message receiving mechanism are used on a graph, deep relationships among nodes in the graph are mined through convolution operation on the graph, and accordingly enhanced node semantic representations can be obtained.

Furthermore, the gated neural unit (GRU) in step S5 is an LSTM simplified model with a reset gate and an update gate, and the GRU has fewer parameters and higher efficiency; the long range semantic dependency is determined by the reset gate and update gate characteristics in the GRU, the reset gate determining how to combine the new input information with the previous memory, the update gate defining the amount of previous memory saved to the current time step.

Further, the Conditional Random Field (CRF) in step S6 is determined according to K feature functions, corresponding K weights, and an observation sequence x ═ { x ═ x₁,x₂,x₃,...,x_n}, predicting the optimal marker sequence

Further, the data flow in step S7 is that the text corpus passes through a text input layer, a pre-training model layer, a GCN layer, a GRU layer, a CRF layer, and an output layer to obtain a prediction result of the process knowledge event extraction in the communication field.

Compared with the prior art, the invention has the following advantages: according to the method for extracting the process knowledge event in the communication field, the semantic representation extraction is realized by using a fusion model based on model transfer learning and a graph convolution neural network, long-distance dependence information of the semantic representation is acquired by using a gated neural unit (GRU), and the problem of label deviation is solved by using a Conditional Random Field (CRF).

Drawings

FIG. 1 is a schematic diagram illustrating a masking prediction performed on a communication field process knowledge corpus by a pre-training model according to an embodiment of the present invention;

FIG. 2 is a diagram illustrating a convolutional neural network (GCN) implementation of multi-level semantic updating according to an embodiment of the present invention;

FIG. 3 is a diagram illustrating a gated neural unit (GRU) data update process at time t according to an embodiment of the present invention;

fig. 4 is an execution flow diagram of a communication domain process knowledge event extraction model based on model transfer learning and graph convolution neural network in the embodiment of the present invention.

Detailed Description

The following examples are given for the detailed implementation and specific operation of the present invention, but the scope of the present invention is not limited to the following examples.

As shown in fig. 1 to 4, the present embodiment provides a technical solution: a communication field process knowledge event extraction method based on model transfer learning and graph convolution neural network comprises the following steps:

s1: problem definition for event extraction

The communications domain event extraction problem can be described as: firstly, whether relevant communication field events exist or not is identified from text corpora, secondly, related elements of the relevant events are identified, and finally, the role played by each element is determined. As shown in the example sentence, inputting the example sentence into the event extraction model requires extracting E1, a1, a2, A3, and a 4. Where E1 is called a trigger, A1, A2, A3, and A4 are called event elements.

Example sentence: XX cell 8 o' clock later apple terminal (A1) access (E1)5G network (A2) failure (A3)

The trigger in the example sentence is "access", which indicates an event containing a software and hardware exception (software hardware fault), and the roles of the extracted elements a1, a2 and A3 in the software and hardware exception event respectively represent the position of the fault, the fault-related object and the fault state.

At present, there are two methods for extracting events based on machine learning, which are a pipeline method (The pipeline Approach) and a Joint learning method (The Joint Approach). The pipeline method is that trigger word recognition and event type determination are carried out in the first stage, and event element recognition is carried out in the second stage, namely E1 in an example sentence is extracted first, the type of the event is judged, and then A1, A2, A3 and A4 extraction is carried out according to an E1 event framework. The joint learning method is to extract trigger words and event elements simultaneously, namely extract E1, A1, A2, A3 and A4 in example sentences simultaneously. The pipeline method has the phenomenon of error propagation, and if the judgment of the event type in the first stage is wrong, the extraction of the event elements in the second stage is wrong, so the effect of using the joint learning method under the normal condition is better than that of the pipeline method. However, in the communication field, event types and event elements are both extremely complex, and a situation that an event trigger word and an event element are overlapped often occurs, for example, the trigger word is "access", and the event element is "access terminal", so that the model easily marks the two occurrences of "access" as the event trigger word, thereby causing failure of the extraction task. Therefore, in the invention, a pipeline method is adopted to respectively and independently model the event trigger words and the event elements, and the trigger words and the event elements contained in the events are sequentially extracted. Experiments prove that in the event extraction task of the communication field with complex context, the effect of the pipeline method is obviously better than that of the joint learning method.

S2: data preprocessing for communication domain process class knowledge

There are many process-class knowledge in the field of communications, and common process-class knowledge event types include: index deterioration, software and hardware abnormity, data acquisition, verification, configuration fault, external event, machine adjustment, machine operation and the like. The training data set identifies the type to which the event belongs, and trigger words and event elements contained in each type in detail. "pair _ ID" is an event pair ID. The statistics and examples of the training dataset and the validation dataset are shown in tables 1 and 2, and table 3, respectively.

Table 1 data set statistics

Table 2 training data set example

Table 3 verification data set example

id	Text
		15001	But not connected to an antenna feeder
15002	Even more unfortunately the handset is mobile!
		15003	But because the site is not configured with the adjacent region of the GERAN system
15004	A phenomenon of RRC establishment failure occurs
		15005	Many sites cannot be adjusted

Because the process knowledge in the communication field is generated in real time in the operation process of the equipment, the problems of large amount of data non-specification, missing features, even wrong labeling and the like still exist after manual cleaning and labeling. Therefore, data preprocessing work is required before data is input into the model.

S21: and (6) data cleaning. The marked communication field knowledge extraction corpus text has part of obvious marking errors, and the part of data needs to be directly discarded in the data cleaning process.

S22: and (5) data deduplication. Sometimes, the device records the state of the same device within a certain time, so that a lot of repeated data is generated. The large amount of repeated data affects the sample distribution, so the repeated data is subjected to the deduplication operation in the preprocessing loop.

S23: and (5) text normalization. The problem of non-uniformity of all half angles of texts and symbols existing in the sample is uniformly processed.

S3: constructing hierarchical sequence annotation tasks

The sequence tagging problem is the most common problem in NLP, and most NLP problems can be converted into the sequence tagging problem. By "sequence labeling", it is meant for a one-dimensional linear input sequence: x ═ x₁,x₂,x₃,...,x_nEach element in the linear sequence is labeled with a certain label y in the set of labels { y ═ y }₁,y₂,y₃,...,y_n}. Therefore, the sequence labeling task is essentially a problem of classifying each element in a linear sequence according to the context.

The invention takes event extraction as a sequence labeling task, the labeling strategy adopts a BIO strategy, B represents the beginning of an event element, I represents the middle or ending word of the event element, and O represents an irrelevant word.

Based on the event types and event elements in the data Schema, the data is divided into 8 types of structured 30-level addresses by using a programming means, trigger words under 8 types of categories are marked by using A-H, event elements under each category are marked by using An-Hn, the starting position is marked by using B, and the middle and ending positions are marked by using I. The labeling specifications are shown in table 4.

Table 4: sequence annotation tag definition rules

Label (R)	Definition of
		B-A1	Starting position of SoftHardwarreFault
I-A1	Middle position or end position of SoftHardwarreFault
		B-A2	Subject start position
I-A2	Subject middle or end position
		B-A3	Object/Object start position
I-A3	Object/Object intermediate position or end position
		B-A4	State Start position
I-A4	State intermediate or end position
		B-A5	Owner start position
I-A5	Middle of OwnerPosition or end position
		B-B1	Starting position of CollectData
I-B1	CollectData intermediate or end bit
		B-B2	Object/Object start position
I-B2	Object/Object middle position or end bit
		B-B3	Source starting position
I-B3	Source middle or end bit
		...	...

The rules for labeling are as follows:

trigger_dic＝{'SoftHardwareFault':'A1','CollectData':'B1','Check':'C1','SettingFault':'D1','ExternalFault':'E1','SetMachine':'F1','Operate':'G1','IndexFault':'H1'}

a_dic＝{'Subject':'A2','Object':'A3','object':'A3','State':'A4','Owner':'A5'}

b_dic＝{'Object':'B2','object':'B2','Source':'B3'}

c_dic＝{'Object':'C2','object':'C2','Standard':'C3'}

d_dic＝{'Setting':'D2','Owner':'D3','Reference':'D4','State':'D5'}

e_dic＝{'State':'E2'}

f_dic＝{'Object':'F2','object':'F2','Network':'F3','InitialState':'F4','FinalState':'F5'}

g_dic＝{'Object':'G2','object':'G2','Owner':'G3'}

h_dic＝{'Index':'H2','Owner':'H3','State':'H4'}

s4: obtaining enhanced semantic representations using a pre-trained model and a graph-convolution neural network (GCN)

S41: the invention uses a pre-training model to perform token processing on the corpus. Firstly, a word segmentation method is used for segmenting a text into word units such as a word or a phrase. Because the original corpus needs to be input into the pre-trained model, word segmentation is required. For a given sentence x ═ x₁,x₂,x₃,...,x_nIn which x_iThe ith character representing the input sentence, n is the number of characters contained in the sentence, when the ith character is input into the layer, a word segmentation device provided with a pre-training model is used, and when the word segmentation device processes Chinese, the word segmentation is carried out by using the characters. After word segmentation, 0 is supplemented to the uniform length after the sequence, and a word segmentation result omega is obtained_i∈R_m(i＝1,2,...,m)，ω_iIs the ith mark in the sentence, and m is the length of the sequence after the sentence is participated.

The invention obtains semantic representation of corpus text by using a pre-training model. The pre-training model provides a model for other tasks to perform model transfer learning, and the model can be used as a feature extractor after being finely adjusted or fixed according to the tasks. The method uses the position coding of characters as the input of a transformer, randomly masks a part of words in a corpus, and then predicts the masked words by using the context information, so that the meaning of the masked words can be better understood according to the corpus context. In the present invention, a method for performing masking prediction on a communication field process class knowledge corpus by using a pre-training model is shown in fig. 1.

The method uses the graph convolution neural network to enhance the node semantic representation of the pre-training model.

S42: taking event trigger words and event elements in the corpus text as nodes, wherein the nodes in each corpus have adjacent edge relation, and constructing a dynamic network topological graph; the feature information of each node is transmitted to the neighbor nodes after being transformed through a message transmission mechanism, the extraction transformation of the feature information of the nodes is realized, and then the transmission information of the neighbor nodes around each target node is aggregated through a message receiving mechanism:

Deep relationships between nodes in the graph can be mined through a multi-layer convolution operation on the graph, wherein the l +1 layer convolution operation on the graph is shown in FIG. 2.

S5: obtaining long-range semantically-dependent information of semantic representations using gated neural units (GRUs)

The gated neural unit (GRU) is a simplified model of LSTM, and the addition of the GRU layer in the model is long-distance semantic dependency information for obtaining input vectors. Compared with the recurrent neural network, the GRU solves the problems of gradient disappearance and gradient explosion, and compared with the LSTM model, the LSTM has three gates (an input gate, a forgetting gate and an output gate), and the GRU only has two gates (a reset gate and an update gate). The GRU does not control and retain internal memory and does not have an output gate in the LSTM, so that the neural network GRU with the same structure has fewer parameters, higher efficiency and better effect on many tasks. Because the event main bodies appearing in the event in the communication field are more and more complex, the characteristic information extracted by the pre-training model can be further refined through the GRU, and the relation between the event elements in the remote communication field can be obtained.

As shown in fig. 3, the GRU uses an update gate and a reset gate. The reset gate determines how the new input information is combined with the previous memory, the update gate defining the amount of the previous memory to be saved to the current time step.

S51: at time step t, the information of the last time step is aggregated with the information of the current time step using an update gate:

z_t＝σ(W^(z)x_t+U^(z)h_t-1)

wherein h is_t-1Latent semantic output, x, representing the t-1 time step_tOriginal semantic input, W, representing the t-th time step^(z)And U^(z)All are weight matrixes, and sigma is a Sigmoid activation function.

S52: at time step t, the information of the last time step is aggregated with the information of the current time step using a reset gate:

z_t＝σ(W^(r)x_t+U^(r)h_t-1)

in updating the gate, different weight matrices are used in semantic aggregation.

S53: at time step t, the magnitude of the reset gating value is measured, and the previous information is determined to be reserved or forgotten:

h^′ _t＝tanh(Wx_t+z_t⊙Uh_t-1)

wherein tanh is the activation function, z_tA result of resetting the gate, an indication of a Hadamard product.

S54: at the end of time step t, the size of the update gating value is measured, and the information transmitted to the next unit is determined to be the hidden layer information or the information of the update gate of the previous time step:

h_t＝z_t⊙h_t-1+(1-z_t)⊙h′_t

s6: overcoming the problem of label bias present in step S5 using Conditional Random Fields (CRF)

Conditional Random Fields (CRFs) can model the dependency between tags, overcoming the tag bias problem. After the feature vector H with the obtained context semantic dependency is transmitted into a linear layer, a matrix P of m x n is obtained, wherein P_i,jThe score of the jth label in the ith label is m is the number of labels, and n is the maximum length of the sentence set by the model.

Inputting: k feature functions of the model, corresponding K weights, and an observation sequence x ═ x₁,x₂,x₃,...,x_n}

And (3) outputting: optimal marker sequence

S61: and (3) performing modeling initialization on the CRF, and solving the probability of each mark combination at the initial position:

wherein i represents the position of the mark, δ₁(l) Representing the probability, w, of each combination of marks at the initial position_kCRF model parameters for the k-th pair of marker combinations, f_kA feature function representing the k-th pair of mark combinations,indicating that delta is at the initial position₁(l) The marker value reaching the maximum value;

s62: recursion is performed on i 1, 2.. n, and the maximum non-normalized probability of each marker l 1, 2.. n to position i is obtained:

wherein, delta_i+1(l) Represents the maximum value of the unnormalized probability, δ, corresponding to each possible value of the label/at position i_i(j) Representing the probability of marking a combination of j at location i;

s63: recording the path of the maximum value of the unnormalized probability:

wherein the content of the first and second substances,is expressed as delta_i+1(l) The marker value of the position i which reaches the maximum value;

s64: and when i finishes traversing all n corpus samples, stopping the recursion process, wherein the maximum value of the non-normalized probability is as follows:

simultaneously, the end point of the optimal path can be obtained:

s65: and backtracking the end point of the optimal path to obtain the whole optimal path:

wherein the content of the first and second substances,an optimal mark representing the ith position;

connecting the nodes on the optimal path to obtain the marking sequence of the optimal path:

s7: communication field process knowledge event extraction model data flow based on model transfer learning and graph convolution neural network

As shown in fig. 4, the communication field process knowledge event extraction model based on model transfer learning and graph convolution neural network mainly includes a text input layer, a BERT pre-training model layer, a GRU layer, a CRF layer, and an output layer.

S71: in a text input layer, performing data preprocessing on an original corpus, and performing token-based processing on the text corpus according to Chinese characters by using a BERT pre-training model to obtain a token-based word segmentation result:

x＝{x₁,x₂,x₃,...,x_n}

wherein x_iThe ith character of the input sentence is represented, n is the number of characters contained in the sentence, and if the length of the sentence is less than n, 0 is automatically supplemented to the same length after the sequence.

S72: obtaining semantic representation of an original text corpus through a BERT pre-training model layer;

s73: semantic characterization of a pre-trained modelInputting a GRU layer to extract key information of event corpus of the communication field:

wherein the content of the first and second substances,is the input of the current GRU, H_tIs the hidden layer state vector of the GRU.

S74: the dependence relationship between labels is modeled by using a Conditional Random Field (CRF), so that the label deviation problem is solved. For one input sequence: x ═ x₁,x₂,x₃,...,x_nCalculating an input label sequence y ═ y₁,y₂,y₃,...,y_nTo the target tag sequenceLOSS value score of (a):

wherein A is a transition probability matrix, A_i,jIs the conversion score, P, of label i to label j_i,jAnd m is the maximum length of a single text corpus.

During the training process, optimize { x_i,y_iThe maximum likelihood function of (c):

where λ and Θ are canonical parameters, P (y)_i|x_i) Is the probability from the original sequence to the predicted sequence. And obtaining the final sequence label according to the maximum likelihood function.

S75: in the output layer, the output label y adjusted according to the CRF layer is { y ═ y₁,y₂,y₃,...,y_nAnd the sequence labeling label definition rule defined in the table 4 converts the output label y into a BIO label, thereby obtaining an event trigger word and an event element of the corpus text in the inference process.

To sum up, in the method for extracting process-class knowledge events in the communication field according to the embodiment, the semantic representation extraction is realized by using a fusion model based on model migration learning and a graph convolution neural network, long-distance dependency information of the semantic representation is acquired by using a gated neural unit (GRU), and the problem of label deviation is overcome by using a Conditional Random Field (CRF).

Although embodiments of the present invention have been shown and described above, it is understood that the above embodiments are exemplary and should not be construed as limiting the present invention, and that variations, modifications, substitutions and alterations can be made to the above embodiments by those of ordinary skill in the art within the scope of the present invention.

16页详细技术资料下载

Method for extracting process knowledge events in communication field

相关技术

网友询问留言