Military field named entity identification method with cooperation of multiple neural networks

文档序号:1170290 发布日期:2020-09-18 浏览:8次 中文

阅读说明:本技术 一种多神经网络协作的军事领域命名实体识别方法 (Military field named entity identification method with cooperation of multiple neural networks ) 是由 尹学振 赵慧 陈沁蕙 李欣妍 于 2020-04-21 设计创作,主要内容包括:本发明提出了一种多神经网络协作的军事领域命名实体识别方法,包括以下步骤:步骤A:获取公开的微博数据,形成原始数据集;步骤B:结合领域知识,提出考虑实体模糊边界的军事领域实体标注策略,制定军事领域命名实体分类标准;步骤C:针对原始数据集进行文本预处理,结合步骤B实体标注策略及实体分类标准构建军事语料集MilitaryCorpus;步骤D:利用深度学习和统计学习的框架,训练了基于BERT-BiLSTM-CRF网络结构的多神经网络协作军事领域命名实体识别模型,以进行针对微博为代表的中文社交文本的军事领域命名实体识别任务。(The invention provides a military field named entity identification method with multi-neural network cooperation, which comprises the following steps: step A: acquiring public microblog data to form an original data set; and B: combining with domain knowledge, proposing a military domain entity labeling strategy considering entity fuzzy boundaries, and formulating a military domain named entity classification standard; and C: b, performing text preprocessing on the original data set, and constructing a military corpus set MilitaryCorpus by combining the entity labeling strategy and the entity classification standard in the step B; step D: by utilizing a deep learning and statistical learning framework, a multi-neural-network collaborative military field named entity recognition model based on a BERT-BilSTM-CRF network structure is trained to perform a military field named entity recognition task aiming at Chinese social texts represented by microblogs.)

1. A military field named entity identification method with cooperation of multiple neural networks is characterized by comprising the following steps:

step A: acquiring public text data to form an original data set;

and B: combining with domain knowledge, proposing a military domain entity labeling strategy considering entity fuzzy boundaries, and formulating a military domain named entity classification standard;

and C: b, performing text preprocessing on the original data set, and constructing a military corpus MilitaryCorpus by combining the entity labeling strategy and the entity classification standard in the step B;

step D: and training a multi-neural-network collaborative military field named entity recognition model based on a BERT-BilSTM-CRF network structure by utilizing a deep learning and statistical learning framework so as to carry out a military field named entity recognition task aiming at text data.

2. The method of multi-neural-network-cooperative military-domain named entity recognition of claim 1, wherein the textual data is derived from a microblog.

3. The method of multi-neural-network-cooperative military-domain named entity recognition of claim 1, wherein step B comprises:

step B1: in addition to general personnel names, times, place names, definitions are given for 5 categories of entities in the military domain, specific military rank and occupations, military agencies, military facilities, military events, weaponry: labeling buildings, sites and facilities representing for military purposes as military facility entities; labeling an institution, a military administration, a government agency, a combat unit, or an organization representing military association as a military agency entity; marking military exercises, armed conflicts, armed assaults, political events representing a relationship with the military field as military event entities; firearms, artillery, aircraft, ammunition, tanks, naval vessels, missiles, biochemical weapons, nuclear weapons are labeled as weaponry entities;

step B2: and (3) combining professional knowledge and literature data, proposing an entity labeling rule considering the fuzzy boundary:

rule 1: english letters, short transverse lines and numbers are connected with the weapon equipment, and the English letters, the short transverse lines, the numbers and the weapon equipment are marked as weapon equipment entities integrally;

rule 2: the military mechanism is connected with the weapon equipment, and if the weapon equipment is unique to the military mechanism, the weapon equipment is respectively marked as a military mechanism entity and a weapon equipment entity; if the weaponry is not unique to the military institution, the military institution and the weaponry are integrally labeled as weaponry entities;

rule 3: the military mechanism is connected with the military mechanism, and the whole body connected with the plurality of military mechanisms is marked as a military mechanism entity by taking the lowest mechanism level as the standard;

rule 4: connecting a military institution or a military place name with a military rank, and marking the connected whole as a military rank and a military entity;

rule 5: the military place name/military institution is connected with the military facility, and if the military facility is the military facility with a specific name, the military facility is respectively marked as a military place name/military institution entity and a military facility entity; if the military facility does not have a specific name, the connected whole is labeled as a military facility entity.

4. The method of multi-neural-network-cooperative military-domain named entity recognition of claim 1, wherein step C comprises:

step C1: cleaning the original data set, deleting data without military information, and deleting special symbols in the data; the special symbols include: expressions and characters;

step C2: and C, combining the military field entity labeling strategy considering the entity fuzzy boundary in the step B and the military field named entity classification standard, and performing word-level labeling on the text processed in the step C1 to form a military field named entity corpus set MilitaryCorpus.

5. The method of multi-neural-network-cooperative military-domain named entity recognition of claim 1, wherein step D comprises:

step D1: the military corpus is divided into sentence levels according to each character x in the text sequenceijkGenerating a feature vector ckThe feature vector c is processed by a transform-based bidirectional encoderkConversion into a word vector E with word and position featuresk

Step D2: inputting the word vector sequence into a bidirectional long-short time memory neural network to extract context characteristics and generate a characteristic matrix Pk

Step D3: CRF layer based on feature vector ckCapturing the dependency relationship between adjacent labels, and determining the label sequence optimized by the whole sentence according to the dependency relationship between the adjacent labels.

6. The method for military domain named entity recognition with multi-neural network cooperation of claim 5, wherein the BERT-based word vector expression layer in step D1 implements:

for MilitaryCorpus corpora, corpus s is concentratedij=(xij1,xij2,...,xijn) Calculates 3 features per word: word feature, sentence feature, position feature definition sij=(xij1,xij2,...,xijn) Is characterized byIs a sentence feature, a location feature is

Figure FDA0002459490560000022

In word feature generation, for xijkDetermining the corresponding word vector by using BERT vocabularyEach time the recognition unit is a sentence, the sentence characteristic is set to 0, namely for the recognition unit

Figure FDA0002459490560000031

Said C ═ C1,C2,C3,...,Cn) And (E) outputting a final characteristic vector E through multi-layer transform calculation1,E2,E3,...,En) (ii) a The output matrix of each transform node is used as the input of all transform nodes in the previous layer, howeverAnd then using a computing mechanism of BERT to compute to obtain a character-level feature vector sequence E ═ (E)1,E2,E3,...,En) As input to the BiLSTM neural network layer.

7. The method for identifying military domain named entities with multi-neural network cooperation according to claim 5, wherein the step D2 is implemented based on a bidirectional long-and-short memory neural network layer:

BERT-based word vector representation layer E ═ (E)1,E2,E3,...,En) As the input of the layer, performing characteristic calculation in the hidden node corresponding to the time point; the output sequence of the forward LSTM hidden layer is F ═ F (F)1,F2,F3,...,Fn),F1Is input as E1From F2Start of

Figure FDA0002459490560000036

8. The method for identifying military domain named entities with multi-neural network cooperation according to claim 5, wherein in the step D3, the CRF layer realizes that:

for the output matrix P of the BilSTM neural network layer, the inputs (x) to the model are definedij1,xij2,...,xijn) The sequence of the mark is (y ═ y1,y2,y3,...,yn) While defining the transition matrix as

Figure FDA00024594905600000312

wherein, y0Denotes SijStart tag of yn+1Denotes SijThe end tag of (a), which is used only as a marker and is not included in the final predicted tag sequence;

Figure FDA0002459490560000046

wherein, YXIs shown for SijAll of the possible predicted tag sequences are,representing the actual marker sequence;

in the training process, in order to obtain the optimal predicted label sequence, p (ys) is required to be enabledij) Maximization, for ease of calculation, p (y | s) is calculated based on equation (3)ij) And (3) solving the log likelihood:

by log (p (y | s)ij) Maximization, and obtaining a globally optimal label sequence based on a CRF coding part; in the decoding stage, a group of sequences with the highest overall probability are obtained as the optimal label sequences based on the formula (4), and the optimal label sequences are used as the output of the CRF-based coding global optimal label:

Technical Field

The invention belongs to the technical field of military affairs, and relates to a named entity identification method based on multi-neural-network cooperation, in particular to a named entity identification method aiming at an entity in the field of military affairs.

Background

Named entity recognition is the fundamental work for conducting natural language processing research such as intelligent question answering and knowledge mapping, and is always concerned by researchers. Early named entity recognition research mostly adopts rule-based and dictionary-based methods, relies on a large number of manually set recognition rules, is difficult to fully cover the material, and the formulation of the rules often depends on a data set, and the rules need to be updated when the data set changes. The entity identification method based on statistical learning avoids formulation of a large number of rules, and converts the named entity identification problem into a serialization labeling problem, however, the named entity identification method based on statistical learning relies on predefined features, and feature engineering is not only high in cost but also related to a specific field, so that domain knowledge sacrifices the generalization capability and the migration capability of a model while improving the identification effectiveness of the model.

The improvement of computing power and the support of a distributed expression technology of words enable the named entity recognition task based on the deep neural network to be independent of feature engineering, and remarkable research progress is achieved. At present, research proves that the recognition accuracy of the word vector representation applied in the Chinese named entity recognition is superior to the mode of applying the word vector representation; the prior researchers use convolutional neural networks, BilSTM, CRF and the like to design model network structures, and obtain better identification effect in entity identification in the special fields of biological medicine and the like; aiming at the military field, some researchers carry out entity recognition work aiming at standard texts such as battle documents, imagination documents and the like, and obtain positive and objective research results. However, the entities in the standardized texts such as electronic medical records, military texts, battle papers and the like are relatively densely distributed and have certain rules, and the entity boundaries are relatively clear, while the entities in the social media data such as microblogs, Tweets and the like are sparsely distributed, the entity expression is not standard, and the entity boundaries are often not clear, so that how to perform named entity identification in the military field on the social media data such as microblogs and the like containing fuzzy boundary entities becomes a new research problem.

Disclosure of Invention

The invention aims to provide a named entity identification method in the military field, which comprises the following steps: an entity labeling mechanism aiming at entity fuzzy boundaries is provided to solve the problems that entity boundaries are difficult to define and entity simplified expressions are difficult to express in an entity identification task; a military field named entity recognition model (BERT-BilSTM-CRF) based on a transform based on multi-neural network cooperation and a two-way long-and-short-time memory neural network (BilSTM) and a Conditional Random Field (CRF) is utilized to solve the problems that a single CRF model needs to depend on a large amount of manual feature selection work and an LSTM model needs to depend on a huge corpus to construct word vectors, and the entity recognition effect is improved.

The invention provides a military field named entity identification method with multi-neural network cooperation, which comprises the following steps:

step A: acquiring public microblog text data to form an original data set;

and B: combining with domain knowledge, proposing a military domain entity labeling strategy considering entity fuzzy boundaries, and formulating a military domain named entity classification standard;

and C: b, performing text preprocessing on the original data set, and constructing a military corpus MilitaryCorpus by combining the entity labeling strategy and the entity classification standard in the step B;

step D: and training a multi-neural-network collaborative military field named entity recognition model based on a BERT-BilSTM-CRF network structure by utilizing a deep learning and statistical learning framework so as to carry out a military field named entity recognition task aiming at text data.

In the present invention, step B comprises:

step B1: in addition to general personnel names, times, place names, definitions are given for 5 categories of entities in the military domain, specific military rank and occupations, military agencies, military facilities, military events, weaponry: labeling buildings, sites and facilities representing for military purposes as military facility entities; labeling an institution, a military administration, a government agency, a combat unit, or an organization representing military association as a military agency entity; marking military exercises, armed conflicts, armed assaults, political events representing a relationship with the military field as military event entities; firearms, artillery, aircraft, ammunition, tanks, naval vessels, missiles, biochemical weapons, nuclear weapons are labeled as weaponry entities;

step B2: and (3) combining professional knowledge and literature data, proposing an entity labeling rule considering the fuzzy boundary:

rule 1: english letters, short transverse lines and numbers are connected with the weapon equipment, and the English letters, the short transverse lines, the numbers and the weapon equipment are marked as weapon equipment entities integrally;

rule 2: the military mechanism is connected with the weapon equipment, and if the weapon equipment is unique to the military mechanism, the weapon equipment is respectively marked as a military mechanism entity and a weapon equipment entity; if the weaponry is not unique to the military institution, the military institution and the weaponry are integrally labeled as weaponry entities;

rule 3: the military mechanism is connected with the military mechanism, and the whole body connected with the plurality of military mechanisms is marked as a military mechanism entity by taking the lowest mechanism level as the standard;

rule 4: connecting a military institution or a military place name with a military rank, and marking the connected whole as a military rank and a military entity;

rule 5: the military place name/military institution is connected with the military facility, and if the military facility is the military facility with a specific name, the military facility is respectively marked as a military place name/military institution entity and a military facility entity; if the military facility does not have a specific name, the connected whole is labeled as a military facility entity.

In the present invention, step C includes:

step C1: cleaning the original data set, deleting data without military information, and deleting special symbols in the data; the special symbols include: expressions and characters;

step C2: and C, combining the military field entity labeling strategy considering the entity fuzzy boundary in the step B and the military field named entity classification standard, and performing word-level labeling on the text processed in the step C1 to form a military field named entity corpus set MilitaryCorpus.

In the present invention, step D comprises:

step D1: the military corpus is divided into sentence levels according to each character x in the text sequenceijkGenerating a feature vector ckThe feature vector c is processed by a transform-based bidirectional encoderkConversion into a word vector E with word and position featuresk

Step D2: inputting the word vector sequence into a bidirectional long-short time memory neural network to extract context characteristics and generate a characteristic matrix Pk

Step D3: CRF layer based on feature vector ckCapturing the dependency relationship between adjacent labels, and determining the label sequence optimized by the whole sentence according to the dependency relationship between the adjacent labels.

Wherein, the word vector expression layer based on BERT in step D1 implements:

for MilitaryCorpus corpora, corpus s is concentratedij=(xij1,xij2,...,xijn) Calculates 3 features per word: word feature, sentence feature, position feature definition sij=(xij1,xij2,...,xijn) is characterized byIs a sentence feature, a location feature is

When the character features are generated, aiming at xijk, a BERT vocabulary is adopted to determine the corresponding character vectorEach time the recognition unit is a sentence, the sentence characteristic is set to 0, namely for the recognition unitIs provided with

Figure BDA0002459490570000035

Wherein the content of the first and second substances,

Figure BDA0002459490570000037

the position characteristic of the kth word is represented, namely k; the BERT-based word vector representation layer inputs numerical values and position characteristics of word characteristics, sentence characteristics and position characteristicsWherein, Ck∈C,C=(C1,C2,C3,...,Cn);

Said C ═ C1,C2,C3,...,Cn) And (E) outputting a final characteristic vector E through multi-layer transform calculation1,E2,E3,...,En) (ii) a The output matrix of each Transformer node is used as the input of all the Transformer nodes in the previous layer, and then the calculation is carried out by using the calculation mechanism of BERT to obtain the word-level eigenvector sequence E ═ E1,E2,E3,...,En) As input to the BiLSTM neural network layer.

And D2, realizing based on a bidirectional long-time memory neural network layer:

BERT-based word vector representation layer E ═ (E)1,E2,E3,...,En) As the input of the layer, performing characteristic calculation in the hidden node corresponding to the time point; the output sequence of the forward LSTM hidden layer is F ═ F (F)1,F2,F3,...,Fn),F1Is input as E1From F2Start of

Figure BDA0002459490570000041

Input is asThe output sequence of the backward LSTM hidden layer is B ═ B (B)1,B2,B3,...,Bn),B1Is input as E1From B2Start of

Figure BDA0002459490570000043

Input is as

Figure BDA0002459490570000044

For EkCalculating an output vector PkI.e. by

Figure BDA0002459490570000045

Finally, the output of the BilSTM neural network is given E ═ E (E)1,E2,E3,...,En) Generating a feature matrix P ∈ RpyzRepresenting an input sentence sijChinese character xyIs the z-tag probability; the feature matrix P is used as input to the CRF layer to generate sij=(xij1,xij2,...,xijn) The tag sequence of (1).

Wherein, the CRF layer in step D3 implements:

for the output matrix P of the BilSTM neural network layer, the inputs (x) to the model are definedij1,xij2,...,xijn) The sequence of the mark is (y ═ y1,y2,y3,...,yn) While defining the transition matrix asWherein m is the number of entity types, aijRepresenting the probability of label i transitioning to label j, the tag sequence y ═ y (y) is generated1,y2,y3,...,yn) The score function is:

Figure BDA0002459490570000048

wherein y is0Denotes SijStart tag of yn+1Denotes SijThe end tag of (a), which is used only as a marker and is not included in the final predicted tag sequence;indicates that the label is represented by ykTransfer to yk+1The probability of (a) of (b) being,

Figure BDA00024594905700000410

obtained by an output matrix of a context feature extraction part based on the BilSTM, and represents SijIn xijkThe label is ykThe probability of (d); calculating y ═ y (y) by the transition matrix and the output matrix of the context feature extraction part based on BilSTM1,y2,y3,...,yn) The fraction of the sequence is input to the softmax function; calculating S using equation (2)ijEach possible predicted sequence probability of:

wherein Y isXIs shown for SijAll of the possible predicted tag sequences are,representing the actual marker sequence;

in the training process, in order to obtain the optimal predicted label sequence, p (ys) is required to be enabledij) Maximization, for ease of calculation, p (y | s) is calculated based on equation (3)ij) And (3) solving the log likelihood:

by log (p (y | s)ij) Maximization, and obtaining a globally optimal label sequence based on a CRF coding part; in the decoding stage, a group of sequences with the highest overall probability are obtained as the optimal label sequences based on the formula (4), and the optimal label sequences are used as the output of the CRF-based coding global optimal label:

the technical scheme adopted by the invention has the following technical characteristics:

1) the invention provides an entity labeling mechanism aiming at entity fuzzy boundaries by combining the opinions of field experts, and constructs a military field corpus MilitaryCorpus based on open data.

2) A military field named entity recognition model (BERT-BilSTM-CRF) based on a transform-based Bidirectional Encoder (BERT) and a bidirectional long-and-short time memory neural network (BilSTM) and a Conditional Random Field (CRF) in cooperation with a multi-neural network is provided as a core military field entity recognition method.

3) Compared with the dominant entity recognition models such as a named entity recognition model (CRF) based on CRF, a named entity recognition model (BilSTM-CRF) based on bidirectional LSTM (BilSTM) and CRF, a named entity recognition model (CNN-BilSTM-CRF) based on Convolutional Neural Network (CNN), BilSTM and CRF and the like, the military field entity recognition method based on multi-neural network cooperation provided by the invention combines word characteristics, sentence characteristics and position characteristics to generate a word vector, uses a Transformer training word vector, fully considers the influence of context information on an entity, and overcomes the defect that the entity recognition is carried out by only considering the characteristics of a word and neglecting the context combined context in the feature vector combined with the word level. The method has the advantages of higher effectiveness and better recognition effect.

Drawings

FIG. 1 is a block diagram of a multi-neural network collaborative military domain entity recognition model according to the present invention

FIG. 2 is a schematic diagram of the input of a word vector expression layer based on BERT of the named entity recognition model proposed by the present invention;

fig. 3 is an overall schematic diagram of a BERT-representing layer in the present invention.

Detailed Description

The technical solutions in the embodiments of the present invention are clearly and completely described below with reference to the specific embodiments. The embodiments described below are only a part of the embodiments of the present invention, and not all of them. All other embodiments, which can be obtained by a person skilled in the art without any inventive step based on the embodiments of the present invention, are within the scope of the present invention.

The invention provides a technical method of a military field named entity recognition model with multi-neural network cooperation

The scheme is as follows:

1) an entity labeling strategy considering entity fuzzy boundaries is proposed, and a military corpus set MilitaryCorpus based on microblog data is constructed by combining domain expert knowledge, and the method specifically comprises the following steps:

a) selecting 21711 microblogs of 'New wave military', 'gathering number', 'micro military situation', 3 microblog accounts from 12 months in 2013 to 12 months in 2018 as an original data set;

b) an entity labeling rule considering the fuzzy boundary is proposed by combining the professional knowledge of a domain expert and the existing literature data, and the division category of the named entity facing the military domain is determined;

c) cleaning the original data, and deleting the microblog which only contains the facial expression, the advertisement information, the recruitment information and the like and does not contain military information;

d) and marking the linguistic data in the unmarked original corpus according to the word level to form a military field named entity linguistic data set MilitaryCorpus.

2) A military field named entity recognition model with multi-neural network cooperation is constructed, and as shown in figure 1, the specific steps are as follows:

a) dividing the microblog into sentence levels according to each character x in the text sequenceijkGenerating a feature vector ckA Transformer-based Bidirectional Encoder (BERT) will ckConversion into a word vector E with word and position featuresk

b) Inputting the word vector sequence into a bidirectional long-short time memory network (BilSTM), extracting context characteristics and generating a characteristic matrix Pk

c) And finally, the CRF layer captures the dependency relationship between the adjacent labels according to the characteristic vector and determines the label sequence optimized by the whole sentence according to the dependency relationship between the adjacent labels.

Further, the specific content of the entity labeling rule considering the fuzzy boundary in the step (1b) is as follows:

in addition to general personnel names, times, place names, definitions are given for 5 categories of entities in the military domain, specific military rank and occupations, military agencies, military facilities, military events, weaponry: labeling buildings, sites and facilities representing for military purposes as military facility entities; labeling an institution, a military administration, a government agency, a combat unit, or an organization representing military association as a military agency entity; marking military exercises, armed conflicts, armed assaults, political events representing a relationship with the military field as military event entities; firearms, artillery, aircraft, ammunition, tanks, naval vessels, missiles, biochemical weapons, nuclear weapons are labeled weaponry entities. Meanwhile, referring to the military language and the suggestion of domain experts, an entity labeling rule considering the fuzzy boundary is provided.

Rule 1: english letters, (dash), numerals are associated with the weaponry and english letters, (dash), numerals are labeled as weaponry entities in their entirety. Such as: { ZTZ-99 type tank }.

Rule 2: the military mechanism is connected with the weapon equipment, and if the weapon equipment is unique to the military mechanism, the weapon equipment is respectively marked as a military mechanism entity and a weapon equipment entity; if the weaponry is not unique to the military entity, the military entity and the ensemble of weaponry are labeled weaponry entities. Such as: { the Russian, graph-160 bomber }, and { the Russian bomber }.

Rule 3: the military mechanism is connected with the military mechanism, and the whole body connected with the plurality of military mechanisms is marked as a military mechanism entity by taking the lowest mechanism level as the standard. Such as: { certain engineer infantry travel }.

Rule 4: military agencies or names of military sites are connected to the military rank, and the connected whole is labeled as the military rank and military-occupational entity. Such as: { japanese defending against growth }.

Rule 5: the military place name (or military organization) is connected with the military facility, and if the military facility is the military facility with a specific name, the military place name (or military organization) entity and the military facility entity are respectively marked; a connected whole is labeled as a military facility entity if the military facility does not have a specific name. Such as: { australia, willingston air force base }.

By the entity labeling mechanism, the problems of fuzzy entity boundary and difficult entity boundary determination in the process of labeling the named entities in the military field are solved.

Further, the specific content of the classification categories facing the military domain named entities in the step (1d) is as follows:

aiming at the characteristics of more professional terms and less ambiguity of the named entities in the military field, the invention adopts a simple and efficient BIO labeling mechanism and is commonly labeled by field experts. BIO notation is a word-level position notation for each entity in the data set, B is used for representing the beginning of the named entity, I is used for representing the inside of the named entity, and O is used for representing a word not belonging to the named entity. The specific labeling is shown in table 1.

TABLE 1 military Domain named entity tagging Categories

Entity classes Entity initiation Inside the entity
Name of person (P) B-P I-P
Military place name (L) B-L I-L
Time (T) B-T I-T
Military rank or military position (R) B-R I-R
Military mechanism (G) B-G I-G
Military equipment (F) B-F I-F
Military affairs (E) B-E I-E
Military weapon (W) B-W I-W

Further, the implementation details of the BERT-based word vector expression layer in step (2a) are as follows:

according to the invention, a word vector expression layer based on the BERT is constructed according to the fine tuning mechanism provided by the BERT and the particularity of the naming recognition problem in the field of Chinese military, and the generation work of the word vector is completed.

To sij=(xij1,xij2,...,xijn) Calculates 3 features per word: word feature, sentence feature, position feature definition sij=(xij1,xij2,...,xijn) Is characterized by Is a sentence feature, a location feature is. In word feature generation, for xijkDetermining the corresponding word vector by adopting the BERT vocabulary provided by GoogleThe model is input based on sentences, each time the recognition unit is a sentence, the characteristics of the sentence are invalid for the entity recognition, so the invention sets the sentence characteristics to 0, namely for the sentenceIs provided with

Figure BDA0002459490570000087

The position characteristic of the kth word is represented as k. The BERT-based word vector representation layer inputs numerical values and position characteristics of word characteristics, sentence characteristics and position characteristics

Figure BDA0002459490570000088

Wherein C isk∈C,C=(C1,C2,C3,...,Cn) As shown in fig. 2.

Obtained C ═ C1,C2,C3,...,Cn) And (E) outputting a final characteristic vector E through multi-layer transform calculation1,E2,E3,...,En). The output matrix of each Transformer node is used as the input of all the Transformer nodes in the previous layer, and then the calculation is carried out by using the calculation mechanism of BERT to obtain the word-level eigenvector sequence E ═ E1,E2,E3,...,En) As a BilsTM neural netThe input of the envelope layer is not shown in FIG. 3.

Further, the specific content of the BilSTM neural network layer in the step (2b) is as follows:

the long-distance dependence problem exists in named entities in the military field, and the method is solved by using a bidirectional long-time neural network (BilSTM). BERT-based word vector representation layer E ═ (E)1,E2,E3,...,En) And as the input of the layer, performing characteristic calculation in the hidden node corresponding to the time point. The output sequence of the forward LSTM hidden layer is F ═ F (F)1,F2,F3,...,Fn),F1Is input as E1From F2Start ofInput is as

Figure BDA0002459490570000092

The output sequence of the backward LSTM hidden layer is B ═ B (B)1,B2,B3,...,Bn),B1Is input as E1From B2Start ofInput is asFor EkCalculating an output vector PkI.e. by

Figure BDA0002459490570000095

Finally, the output of the BilSTM neural network is given E ═ E (E)1,E2,E3,...,En) Generating a feature matrix P ∈ R

Figure BDA0002459490570000096

pyzRepresenting an input sentence sijChinese character xyIs the z-tag probability. The feature matrix P is used as input to the CRF layer to generate sij=(xij1,xij2,...,xijn) The tag sequence of (1).

Further, the specific content of the CRF layer in step (2c) is:

the CRF layer obtains a global optimal marking sequence through the relation of adjacent labels, and adds constraint to the last predicted label:

1) the first word in the sentence starts with the tag "B-" or "O-" and tags starting with "O-" cannot be connected in sequence with the tag "I-";

2) in the label B-X1I-X2I-X3, X1, X2 and X3 belong to the same category; based on these constraints, the probability of illegal sequences occurring in the tag sequence prediction is reduced;

for the output matrix P of the BilSTM neural network layer, the inputs (x) to the model are definedij1,xij2,...,xijn) The sequence of the mark is (y ═ y1,y2,y3,...,yn) While defining the transition matrix asWherein m is the number of entity types, aijRepresenting the probability of label i transitioning to label j, the tag sequence y ═ y (y) is generated1,y2,y3,...,yn) The score function is:

Figure BDA0002459490570000098

wherein y is0Denotes SijStart tag of yn+1Denotes SijThe end tag of (2) is used only as a marker and is not included in the final predicted tag sequence.

Figure BDA0002459490570000105

Indicates that the label is represented by ykTransfer to yk+1The probability of (a) of (b) being,obtained by an output matrix of a context feature extraction part based on the BilSTM, and represents SijIn xijkThe label is ykThe probability of (c). Calculating y ═ y (y) by the transition matrix and the output matrix of the context feature extraction part based on BilSTM1,y2,y3,...,yn) The score of the sequence and input to the softmax function. Calculating S using equation (2)ijIs determined for each possible predicted sequence probability.

Wherein Y isXIs shown for SijAll of the possible predicted tag sequences are,

Figure BDA0002459490570000102

representing the actual marker sequence.

In the training process, in order to obtain the optimal predicted label sequence, p (ys) is required to be enabledij) Maximization, for ease of calculation, p (y | s) is calculated based on equation (3)ij) And (6) solving the log likelihood.

Figure BDA0002459490570000103

By log (p (y | s)ij) ) to maximize, a globally optimal tag sequence is obtained based on the encoded portion of the CRF. In the decoding stage, a group of sequences with the highest overall probability are obtained as the optimal label sequence based on the formula (4) and are used as the output of the CRF-based encoding global optimal label.

Figure BDA0002459490570000104

15页详细技术资料下载
上一篇:一种医用注射器针头装配设备
下一篇:命名实体识别模型、电话总机转接分机方法及系统

网友询问留言

已有0条留言

还没有人留言评论。精彩留言会获得点赞!

精彩留言,会给你点赞!