Information extraction method and device, computer equipment and storage medium

文档序号：1953433 发布日期：2021-12-10 浏览：18次中文

阅读说明：本技术 信息抽取方法、装置、计算机设备和存储介质 (Information extraction method and device, computer equipment and storage medium ) 是由张文泽文博刘云峰吴悦于 2021-08-11 设计创作，主要内容包括：本申请涉及一种信息抽取方法、装置、计算机设备和存储介质。所述方法包括：获取目标问题和目标内容；将所述目标问题和所述目标内容输入到预先训练的信息抽取模型中进行信息抽取,得到所述信息抽取模型输出的回答所述目标问题的目标信息；其中,所述信息抽取模型用于根据所述目标问题和所述目标内容确定从预设词表抽取信息的目标概率,并根据所述目标概率从所述预设词表或所述目标内容中抽取出所述目标信息。采用本方法能够降低信息抽取难度、提高信息抽取准确度。(The application relates to an information extraction method, an information extraction device, computer equipment and a storage medium. The method comprises the following steps: acquiring a target question and target content; inputting the target question and the target content into a pre-trained information extraction model for information extraction to obtain target information which is output by the information extraction model and used for answering the target question; the information extraction model is used for determining a target probability of extracting information from a preset word list according to the target question and the target content, and extracting the target information from the preset word list or the target content according to the target probability. By adopting the method, the difficulty of information extraction can be reduced, and the accuracy of information extraction can be improved.)

1. An information extraction method, the method comprising:

acquiring a target question and target content;

inputting the target question and the target content into a pre-trained information extraction model for information extraction to obtain target information which is output by the information extraction model and used for answering the target question;

the information extraction model is used for determining a target probability of extracting information from a preset word list according to the target question and the target content, and extracting the target information from the preset word list or the target content according to the target probability.

2. The method of claim 1, wherein the information extraction model comprises a coding submodel and a decoding submodel, and the determining a target probability for extracting information from a preset vocabulary according to the target question and the target content and extracting the target information from the preset vocabulary or the target content according to the target probability comprises:

coding the target problem and the target content by using the coding sub-model to obtain a feature vector;

performing multiple rounds of decoding processing on the feature vectors by using the decoding submodels to obtain the target probability corresponding to each round of decoding, the first probability distribution corresponding to the preset word list and the second probability distribution corresponding to the feature vectors; the first probability distribution is used for representing first extracted probabilities corresponding to the vocabularies in the preset word list respectively, and the second probability distribution is used for representing second extracted probabilities corresponding to the vocabularies in the target content respectively;

determining a third probability distribution corresponding to each round of decoding according to the target probability, the first probability distribution and the second probability distribution corresponding to each round of decoding; the third probability distribution is used for representing a third extracted probability corresponding to each vocabulary in the preset vocabulary and the target content respectively;

and extracting information according to the third probability distribution corresponding to each round of decoding to obtain a plurality of candidate information, and screening the target information from the candidate information.

3. The method of claim 2, wherein the feature vectors include a classification feature vector, a problem feature vector and a content feature vector, the decoding submodel includes a bidirectional long-short term memory network, a first fully-connected layer and a second fully-connected layer, and performing multiple decoding passes on the feature vectors by using the decoding submodel to obtain the target probability corresponding to each decoding pass, a first probability distribution corresponding to the preset vocabulary, and a second probability distribution corresponding to the feature vectors, and the method includes:

for each decoding round, splicing the classified feature vectors and the network feature vectors corresponding to the current round of the bidirectional long-short term memory network to obtain spliced vectors, and inputting the spliced vectors into the first full-connection layer to obtain the target probability;

inputting the classified feature vector into the second fully-connected layer resulting in the first probability distribution;

and performing regularization processing on the problem feature vector and the content feature vector, and obtaining the second probability distribution according to a regularization result.

4. The method of claim 2, wherein determining a third probability distribution for each decoding round according to the target probability, the first probability distribution, and the second probability distribution for each decoding round comprises:

for each round of decoding, determining a first weight corresponding to the first probability distribution and a second weight corresponding to the second probability distribution according to the target probability;

and performing weighted summation processing on the first probability distribution and the second probability distribution according to the first weight and the second weight to obtain the third probability distribution.

5. The method of claim 2, wherein the extracting information according to the third probability distribution corresponding to each decoding round to obtain a plurality of candidate information comprises:

and for each round of decoding, determining the preset vocabulary and the vocabulary with the maximum third extracted probability in the target content according to the third probability distribution, and determining the vocabulary with the maximum third extracted probability as the candidate information.

6. The method of claim 2, wherein the encoding the target question and the target content using the encoding submodel to obtain a feature vector comprises:

splicing the target problem and the target content to obtain splicing information;

performing word segmentation processing on the spliced information to obtain a plurality of information segments; wherein the information segment comprises a classification identification segment;

inputting the plurality of information segments into the coding sub-model for coding processing to obtain a feature vector corresponding to each information segment output by the coding sub-model; and the classification identification fragment corresponds to a classification feature vector.

7. The method of claim 1, wherein before inputting the target question and the target content into a pre-trained information extraction model for information extraction, the method further comprises:

acquiring general corpora from the Internet by using a crawler tool; the general corpus consists of questions, contents and answers;

performing data processing on the general corpus to obtain a training problem, training content and a training label;

and performing model training according to the training question, the training content and the training label to obtain the information extraction model.

8. An information extraction apparatus, characterized in that the apparatus comprises:

the question content acquisition module is used for acquiring a target question and target content;

the information extraction module is used for inputting the target question and the target content into a pre-trained information extraction model for information extraction to obtain target information which is output by the information extraction model and used for answering the target question;

9. A computer device comprising a memory and a processor, the memory storing a computer program, characterized in that the processor, when executing the computer program, implements the steps of the method of any of claims 1 to 7.

10. A computer-readable storage medium, on which a computer program is stored, which, when being executed by a processor, carries out the steps of the method of any one of claims 1 to 7.

Technical Field

The present application relates to the field of information extraction technologies, and in particular, to an information extraction method and apparatus, a computer device, and a storage medium.

Background

The acquisition of user information is an essential step for the development of industries such as insurance, banking and the like, the important information is extracted quickly and accurately, and the method is very important for the development of enterprises and the improvement of service quality and service indexes.

In the related art, model training is usually performed according to training samples and manual labels corresponding to the training samples to obtain an information extraction model. Then, the document of the information to be extracted is input into the information extraction model, and the information is extracted from the document by the information extraction model.

However, the above information extraction method has the following problems: the information extraction model is difficult to extract information from the document, or the information extracted from the document by the information extraction model is inaccurate.

Disclosure of Invention

In view of the above, it is necessary to provide an information extraction method, an information extraction apparatus, a computer device, and a storage medium, which can reduce difficulty in information extraction and improve accuracy of information extraction.

An information extraction method, the method comprising:

acquiring a target question and target content;

the information extraction model is used for determining the target probability of extracting information from the preset word list according to the target problem and the target content and extracting the target information from the preset word list or the target content according to the target probability.

In one embodiment, the information extraction model includes an encoding sub-model and a decoding sub-model, and the determining the target probability of extracting the information from the preset vocabulary according to the target question and the target content and extracting the target information from the preset vocabulary or the target content according to the target probability includes:

coding the target problem and the target content by using the coding sub-model to obtain a characteristic vector;

performing multiple rounds of decoding processing on the feature vectors by using the decoding submodels to obtain a target probability corresponding to each round of decoding, a first probability distribution corresponding to a preset word list and a second probability distribution corresponding to the feature vectors; the first probability distribution is used for representing first extracted probabilities corresponding to all the vocabularies in the preset vocabulary table respectively, and the second probability distribution is used for representing second extracted probabilities corresponding to all the vocabularies in the target content respectively;

In one embodiment, the feature vector includes a classification feature vector, a problem feature vector, and a content feature vector, the decoding submodel includes a bidirectional long and short term memory network, a first full link layer, and a second full link layer, and the performing multiple decoding processes on the feature vector by using the decoding submodel to obtain a target probability corresponding to each decoding, a first probability distribution corresponding to a preset word list, and a second probability distribution corresponding to the feature vector includes:

for each decoding round, splicing the classified feature vectors and the network feature vectors corresponding to the current round of the bidirectional long-short term memory network to obtain spliced vectors, and inputting the spliced vectors into a first full-connection layer to obtain target probability;

inputting the classified feature vectors into a second full-connection layer to obtain a first probability distribution;

and performing regularization processing on the problem feature vector and the content feature vector, and obtaining a second probability distribution according to a regularization result.

In one embodiment, the determining a third probability distribution corresponding to each decoding round according to the target probability, the first probability distribution and the second probability distribution corresponding to each decoding round includes:

and carrying out weighted summation processing on the first probability distribution and the second probability distribution according to the first weight and the second weight to obtain a third probability distribution.

In one embodiment, the extracting information according to the third probability distribution corresponding to each decoding round to obtain a plurality of candidate information includes:

and for each round of decoding, determining the preset word list and the words with the maximum third extracted probability in the target content according to the third probability distribution, and determining the words with the maximum third extracted probability as candidate information.

In one embodiment, the encoding the target problem and the target content by using the encoding submodel to obtain the feature vector includes:

splicing the target problem and the target content to obtain splicing information;

performing word segmentation processing on the spliced information to obtain a plurality of information segments; the information segment comprises a classification identification segment;

inputting a plurality of information segments into a coding sub-model for coding processing to obtain a feature vector corresponding to each information segment output by the coding sub-model; wherein, the classification identification fragment corresponds to the classification feature vector.

In one embodiment, before inputting the target question and the target content into the pre-trained information extraction model for information extraction, the method further includes:

acquiring general corpora from the Internet by using a crawler tool; the universal language material consists of questions, contents and answers;

carrying out data processing on the general language materials to obtain training problems, training contents and training labels;

and carrying out model training according to the training problems, the training contents and the training labels to obtain an information extraction model.

An information extraction apparatus, the apparatus comprising:

the question content acquisition module is used for acquiring a target question and target content;

In one embodiment, the information extraction model includes an encoding sub-model and a decoding sub-model, and the information extraction module includes:

the coding submodule is used for coding the target problem and the target content by utilizing the coding submodel to obtain a characteristic vector;

the decoding submodule is used for carrying out multi-round decoding processing on the feature vector by utilizing the decoding submodel to obtain a target probability corresponding to each round of decoding, a first probability distribution corresponding to a preset word list and a second probability distribution corresponding to the feature vector; the first probability distribution is used for representing first extracted probabilities corresponding to all the vocabularies in the preset vocabulary table respectively, and the second probability distribution is used for representing second extracted probabilities corresponding to all the vocabularies in the target content respectively;

the probability determination submodule is used for determining a third probability distribution corresponding to each round of decoding according to the target probability, the first probability distribution and the second probability distribution corresponding to each round of decoding; the third probability distribution is used for representing a third extracted probability corresponding to each vocabulary in the preset vocabulary and the target content respectively;

and the information extraction submodule is used for extracting information according to the third probability distribution corresponding to each round of decoding to obtain a plurality of candidate information and screening target information from the candidate information.

In one embodiment, the feature vectors include classified feature vectors, problem feature vectors and content feature vectors, the decoding submodel includes a bidirectional long and short term memory network, a first full connection layer and a second full connection layer, and the decoding submodule is specifically configured to, for each decoding round, perform splicing processing on the classified feature vectors and network feature vectors corresponding to a current round of the bidirectional long and short term memory network to obtain spliced vectors, and input the spliced vectors into the first full connection layer to obtain a target probability; inputting the classified feature vectors into a second full-connection layer to obtain a first probability distribution; and performing regularization processing on the problem feature vector and the content feature vector, and obtaining a second probability distribution according to a regularization result.

In one embodiment, the probability determining submodule is specifically configured to determine, for each decoding round, a first weight corresponding to the first probability distribution and a second weight corresponding to the second probability distribution according to the target probability; and carrying out weighted summation processing on the first probability distribution and the second probability distribution according to the first weight and the second weight to obtain a third probability distribution.

In one embodiment, the information extraction sub-module is specifically configured to, for each decoding round, determine, according to the third probability distribution, a predetermined vocabulary and a vocabulary with the highest third extracted probability in the target content, and determine the vocabulary with the highest third extracted probability as the candidate information.

In one embodiment, the encoding submodule is specifically configured to perform splicing processing on a target problem and target content to obtain splicing information; performing word segmentation processing on the spliced information to obtain a plurality of information segments; the information segment comprises a classification identification segment; inputting a plurality of information segments into a coding sub-model for coding processing to obtain a feature vector corresponding to each information segment output by the coding sub-model; wherein, the classification identification fragment corresponds to the classification feature vector.

In one embodiment, the apparatus further comprises:

the corpus acquiring module is used for acquiring general corpus from the Internet by using a crawler tool; the universal language material consists of questions, contents and answers;

the corpus processing module is used for carrying out data processing on the general corpus to obtain training problems, training contents and training labels;

and the training module is used for carrying out model training according to the training problems, the training contents and the training labels to obtain an information extraction model.

A computer device comprising a memory and a processor, the memory storing a computer program, the processor implementing the following steps when executing the computer program:

acquiring a target question and target content;

A computer-readable storage medium, on which a computer program is stored which, when executed by a processor, carries out the steps of:

acquiring a target question and target content;

The information extraction method, the information extraction device, the computer equipment and the storage medium acquire the target problem and the target content; and inputting the target question and the target content into a pre-trained information extraction model for information extraction to obtain target information which is output by the information extraction model and used for answering the target question. The information extraction model firstly determines the target probability of extracting information from the preset word list according to the target problem and the target content, and then extracts the target information from the preset word list or the target content according to the target probability, so that the problem that the information is difficult to extract from the target content can be avoided, and the information extraction difficulty is reduced; moreover, the preset word list provides information except the target content, so that the information extraction accuracy can be improved.

Drawings

FIG. 1 is a flow diagram illustrating a method for information extraction in one embodiment;

FIG. 2 is a flow chart illustrating the target information extraction step in one embodiment;

FIG. 3 is a flow diagram illustrating the steps of an encoding process in one embodiment;

FIG. 4 is a flow diagram illustrating the steps of the decoding process in one embodiment;

FIG. 5 is a flowchart illustrating the step of determining a third probability distribution in one embodiment;

FIG. 6 is a schematic flow chart diagram illustrating the model training steps in one embodiment;

FIG. 7 is a block diagram showing the structure of an information extracting apparatus according to an embodiment;

FIG. 8 is a diagram illustrating an internal structure of a computer device according to an embodiment.

Detailed Description

In order to make the objects, technical solutions and advantages of the present application more apparent, the present application is described in further detail below with reference to the accompanying drawings and embodiments. It should be understood that the specific embodiments described herein are merely illustrative of the present application and are not intended to limit the present application.

The present application provides an information extraction method, as shown in fig. 1, which is applied to a terminal for illustration, and it can be understood that the method can also be applied to a server, and can also be applied to a system including a terminal and a server, and is implemented through interaction between the terminal and the server. The terminal can be, but is not limited to, various personal computers, notebook computers, smart phones and tablet computers, and the server can be implemented by an independent server or a server cluster formed by a plurality of servers. In the disclosed embodiment, the following steps may be included:

step 101, obtaining a target question and target content.

The target question is used for indicating a target of information extraction, and the target content is used for indicating a basis of the information extraction. For example, the target question is "what color clothes are whitish," and the target content includes "black is a wild color … … bright red that makes a person look fair … …".

The terminal can acquire target problems and target contents input by a user and also can acquire target problems and target contents stored in the terminal in advance; target questions input by the user and target content stored in advance can also be acquired. The embodiment of the present disclosure does not limit the manner of acquiring the target problem and the target content.

And 102, inputting the target question and the target content into a pre-trained information extraction model for information extraction to obtain target information which is output by the information extraction model and used for answering the target question.

The information extraction model is used for determining the target probability of extracting information from the preset word list according to the target problem and the target content and extracting the target information from the preset word list or the target content according to the target probability. The preset vocabulary includes a plurality of vocabularies, and the preset vocabulary may be constructed according to specific technical fields (such as financial field and insurance field) or according to a general scenario, which is not limited in the embodiments of the present disclosure.

The terminal is provided with a pre-trained information extraction model and a pre-trained word list. After the target problem and the target content are obtained, the target problem and the target content are input into an information extraction model, and the information extraction model determines the target probability of extracting information from a preset word list according to the target problem and the target content. If the target probability is greater than a preset probability threshold value, extracting target information from a preset word list; and if the target probability is less than or equal to the preset probability threshold, extracting target information from the target content.

For example, the preset probability threshold is 70%, if the information extraction model determines that the target probability is 80% according to the target question and the target content, the target probability is determined to be greater than the preset probability threshold, and the target information is extracted from the preset word list; and if the information extraction model determines that the target probability is 60% according to the target problem and the target content, determining that the target probability is smaller than a preset probability threshold value, and extracting the target information from the target content.

In the information extraction method, a target question and target content are obtained; and inputting the target question and the target content into a pre-trained information extraction model for information extraction to obtain target information which is output by the information extraction model and used for answering the target question. The information extraction model firstly determines the target probability of extracting information from the preset word list according to the target problem and the target content, and then extracts the target information from the preset word list or the target content according to the target probability, so that the problem that the information is difficult to extract from the target content can be avoided, and the information extraction difficulty is reduced; moreover, the preset word list provides information except the target content, so that the information extraction accuracy can be improved.

In one embodiment, the information extraction model includes an encoding sub-model and a decoding sub-model, and as shown in fig. 2, the step of determining a target probability of extracting information from the preset vocabulary according to the target question and the target content and extracting the target information from the preset vocabulary or the target content according to the target probability may include:

step 201, using the coding sub-model to code the target problem and the target content to obtain the feature vector.

Wherein the feature vector comprises a problem feature vector and a content feature vector.

And inputting the target problem and the target content into a coding sub-model in the information extraction model, and coding the target problem and the target content by the coding sub-model to obtain a problem feature vector corresponding to the target problem and a content feature vector corresponding to the target content.

Step 202, performing multiple rounds of decoding processing on the feature vectors by using the decoding submodel to obtain a target probability corresponding to each round of decoding, a first probability distribution corresponding to the preset word list and a second probability distribution corresponding to the feature vectors.

The first probability distribution is used for representing first extracted probabilities corresponding to all the vocabularies in the preset vocabulary table respectively, and the second probability distribution is used for representing second extracted probabilities corresponding to all the vocabularies in the target content respectively; the first extracted probability and the second extracted probability are both used to characterize a probability value that the vocabulary was extracted.

The decoding submodel performs multiple rounds of decoding processing on the feature vectors in various ways, wherein one way includes: the decoding submodel may perform N rounds of decoding processing on the feature vector, where N is a maximum decoding round preset by a user, and N is a positive integer. In each round of decoding process, the decoding submodel can obtain a target probability, a first probability distribution corresponding to a preset word list and a second probability distribution corresponding to a feature vector.

For example, in the first round of decoding, the decoding submodel obtains a target probability of 80%, where the first extracted probability of word 1 in the preset vocabulary is a1, the first extracted probability of word 2 is a2 … …, the second extracted probability of word 1 in the target content is b1, and the second extracted probability of word 2 is b2 … ….

The other mode comprises the following steps: the decoding submodel performs decoding processing on the feature vector, and if an end identifier is generated, the decoding processing is ended. The embodiment of the present disclosure does not limit the number of decoding processing rounds.

And step 203, determining a third probability distribution corresponding to each round of decoding according to the target probability, the first probability distribution and the second probability distribution corresponding to each round of decoding.

And the third probability distribution is used for representing the third extracted probability corresponding to each vocabulary in the preset vocabulary and the target content respectively. The third extracted probability is used to characterize the probability value that the vocabulary was extracted.

And in each decoding process, respectively carrying out normalization operation on the first probability distribution and the second probability distribution according to the target probability to obtain a third extracted probability corresponding to each vocabulary in the preset vocabulary and a third extracted probability corresponding to each vocabulary in the target content.

And 204, extracting information according to the third probability distribution corresponding to each round of decoding to obtain a plurality of candidate information, and screening target information from the candidate information.

After each round of decoding, information extraction can be performed according to the third extracted probability corresponding to each word in the preset word list and the third extracted probability corresponding to each word in the target content, so as to obtain at least one candidate information.

In one embodiment, for each decoding round, the predetermined vocabulary and the third vocabulary with the highest extracted probability in the target content are determined according to the third probability distribution, and the third vocabulary with the highest extracted probability is determined as the candidate information.

For example, the words in the preset vocabulary and the words in the target content are sorted according to the third extracted probability from large to small, and then the word with the maximum third extracted probability is determined as the candidate information.

In practical applications, the vocabulary with the third extracted probability greater than the preset extracted threshold may also be determined as the candidate information. The embodiment of the present disclosure does not limit the extraction manner of the candidate information.

After each round of decoding, at least one candidate information can be obtained, and after multiple rounds of decoding, multiple candidate information can be obtained. And then, the terminal screens out target information from the plurality of candidate information. Specifically, the candidate information with the largest frequency of occurrence may be determined as the target information, or the candidate information with the largest third extraction probability may be determined as the target information. The screening method is not limited in the embodiments of the present disclosure.

In the above embodiment, the terminal uses the coding sub-model to code the target problem and the target content to obtain a feature vector; performing multiple rounds of decoding processing on the feature vectors by using the decoding submodels to obtain a target probability corresponding to each round of decoding, a first probability distribution corresponding to a preset word list and a second probability distribution corresponding to the feature vectors; determining a third probability distribution corresponding to each round of decoding according to the target probability, the first probability distribution and the second probability distribution corresponding to each round of decoding; and extracting information according to the third probability distribution corresponding to each round of decoding to obtain a plurality of candidate information, and screening target information from the candidate information. According to the embodiment of the disclosure, the encoding sub-model and the decoding sub-model of the information extraction model are used for encoding and decoding, so that whether the information is extracted from the preset word list or the target content is determined. Because the preset word list provides information except the target content, the difficulty of information extraction can be reduced, and the accuracy of information extraction is improved.

In an embodiment, as shown in fig. 3, the step of obtaining the feature vector by encoding the target question and the target content by using the encoding sub-model may include:

and step 2011, splicing the target problem and the target content to obtain splicing information.

And after the target problem and the target content are input into the information extraction model, the information extraction model carries out splicing treatment on the target problem and the target content. Specifically, a classification mark [ cls ] is added before the target question, and a sentence mark [ sep ] is added between the target question and the target content.

For example, the splicing information includes: [ cls ] what color clothes worn appear white [ sep ] black is a wild color … … bright red which can make people look fair … ….

In practical application, other splicing manners may also be adopted, which is not limited in the embodiments of the present disclosure.

Step 2012, performing word segmentation processing on the spliced information to obtain a plurality of information segments.

And performing word segmentation processing on the spliced information by the information extraction model according to a preset rule to obtain a plurality of information segments. For example, a classification identifier segment corresponding to the classification identifier, a question information segment corresponding to the target question, and a content information segment corresponding to the target content are obtained.

If the text of the target question is long, a plurality of question information fragments can be obtained; if the text of the target content is long, a plurality of content information pieces can be obtained.

The preset rule may include a length of a vocabulary, an euclidean distance between vocabularies, and the like, and the preset rule is not limited in the embodiment of the disclosure. It can be understood that the number of the information segments obtained by the word segmentation process varies with the actual word segmentation situation, and the embodiment of the present disclosure does not limit this.

And 2013, inputting the plurality of information segments into the coding submodel for coding, and obtaining the feature vector corresponding to each information segment output by the coding submodel.

The coding sub-model may be a BERT (Bidirectional Encoder coding) model.

After the information extraction model is divided into words to obtain a plurality of information segments, the information segments are input into a coding sub-model, the coding sub-model carries out coding processing on the information segments, and characteristic vectors corresponding to the information segments are output.

For example, the encoding sub-model outputs a classification identification segment corresponding to the classification feature vector H _ cls, a question information segment corresponding to the question feature vector H _ q, and a content feature vector H _ p corresponding to the content information segment.

In the above embodiment, the terminal performs splicing processing on the target question and the target content to obtain splicing information; performing word segmentation processing on the spliced information to obtain a plurality of information segments; and inputting the plurality of information segments into the coding sub-model for coding to obtain the characteristic vector corresponding to each information segment output by the coding sub-model. According to the information extraction model, the target problem and the target content are subjected to splicing processing, word segmentation processing and coding processing through the information extraction model, so that the characteristic vectors corresponding to the target problem and the target content respectively are obtained, the target probability of extracting information from the preset word list can be determined in the following process, and then the target information is determined.

In one embodiment, the feature vectors include a classification feature vector, a problem feature vector, and a content feature vector; the decoding submodel includes a bidirectional Long Short-Term Memory network (Bi Long Short-Term Memory, bilst), a first full link layer, and a second full link layer, as shown in fig. 4, the step of performing multiple decoding processes on the feature vector by using the decoding submodel to obtain a target probability corresponding to each decoding process, a first probability distribution corresponding to the preset vocabulary, and a second probability distribution corresponding to the feature vector may include:

step 2021, for each decoding round, performing stitching processing on the classified feature vectors and the network feature vectors corresponding to the current round of the bidirectional long-short term memory network to obtain stitched vectors, and inputting the stitched vectors into the first full connection layer to obtain the target probability.

In each decoding process, the decoding sub-model splices the classified feature vectors and the network feature vectors corresponding to the BiLSTM current round to obtain spliced vectors, and the spliced vectors are input into a first full-connection layer; and then, activating the output result of the first full-connection layer by using a first preset activation function to obtain the target probability.

For example, in the first round of decoding, the decoding sub-model splices the classified feature vector H _ cls1 with the network feature vector Ht1 corresponding to the first round of BiLSTM to obtain a spliced vector H1 ', inputs the spliced vector H1' into the first full-link layer, and activates the first round of output result of the first full-link layer by using a first preset activation function to obtain a target probability Pt 1. Then, the classification feature vector H _ cls1 is input into the BilSTM to obtain a classification feature vector H _ cls2 output by the BilSTM, and the BilSTM is updated to obtain a network feature vector Ht2 corresponding to the second round of the BilSTM. Then, in the second round of decoding process, the decoding sub-model splices the classified feature vector H _ cls2 with the network feature vector Ht2 corresponding to the second round of BiLSTM to obtain a spliced vector H2 ', inputs the spliced vector H2' into the first full-link layer, and activates the second round of output results of the first full-link layer by using a first preset activation function to obtain a target probability Pt 2. By analogy, when the maximum decoding turn set by the user is reached or the terminator is generated, the decoding process is ended.

The first preset activation function may be a sigmoid function, or may be another function, which is not limited in this disclosure.

Step 2022, inputting the classification feature vector into the second fully-connected layer to obtain a first probability distribution.

And in each round of decoding process, inputting the classification characteristic vectors into a second full-connection layer, and activating the output result of the second full-connection layer by using a second preset activation function to obtain a first probability distribution corresponding to a preset word list.

For example, in the first round of decoding, the classification feature vector H _ cls1 is input into the second fully-connected layer, and the first round of output result of the second fully-connected layer is activated by using the second preset activation function, so as to obtain the first probability distribution Pwt1 corresponding to the preset vocabulary. And then, inputting the classification feature vector H _ cls2 into a second full-connection layer, and performing activation processing on a second round of output results of the second full-connection layer by using a second preset activation function to obtain a first probability distribution Pwt2 corresponding to a preset vocabulary. And by analogy, obtaining the first probability distribution of each round.

The second preset activation function may be a softMax function, or may be another function, which is not limited in this disclosure.

It is to be understood that the first preset activation function may be the same as or different from the second preset activation function.

Step 2023, regularizing the problem feature vector and the content feature vector, and obtaining a second probability distribution according to the regularization result.

And performing regularization processing on the problem characteristic vector and the content characteristic vector by the decoding submodel to obtain a regularization result, and then calculating a second probability distribution according to the regularization result.

For example, the problem feature vector H _ q and the content feature vector H _ p are subjected to L2 regularization processing to obtain regularization results Lq and Lp, and then the second probability distribution Put is calculated from the formula Put ═ softMax (Lp ^ T ^ Lq). Wherein Lp ^ T is a transposed matrix of Lp. In practical applications, other regularization processing manners may also be adopted, which is not limited in this disclosure.

It is to be understood that, in the case where the text of the target question and the target content is long, the question feature vector may be a vector matrix composed of feature vectors corresponding to a plurality of question information pieces, respectively, and the content feature vector may be a vector matrix composed of feature vectors corresponding to a plurality of content information pieces, respectively.

In the above embodiment, for each decoding round, the classified feature vectors and the network feature vectors corresponding to the current round of the bidirectional long-short term memory network are spliced to obtain spliced vectors, and the spliced vectors are input into the first full-connection layer to obtain the target probability; inputting the classified feature vectors into a second full-connection layer to obtain a first probability distribution; and performing regularization processing on the problem feature vector and the content feature vector, and obtaining a second probability distribution according to a regularization result. Through the embodiment of the disclosure, the target probability, the first probability distribution and the second probability distribution of each round of decoding can be determined, so that candidate information can be screened out according to the target probability, the first probability distribution and the second probability distribution in the following process, and then the target information can be determined.

In one embodiment, as shown in fig. 5, the step of determining a third probability distribution corresponding to each decoding round according to the target probability, the first probability distribution and the second probability distribution corresponding to each decoding round may include:

step 2031, for each round of decoding, determining a first weight corresponding to the first probability distribution and a second weight corresponding to the second probability distribution according to the target probability.

In each round of decoding, the target probability is determined as a first weight corresponding to the first probability distribution, and the difference between 1 and the target probability is determined as a second weight corresponding to the second probability respectively.

For example, Pt is determined as a first weight corresponding to the first probability distribution Pwt, and 1-Pt is determined as a second weight corresponding to the second probability Put distribution.

Step 2032, performing weighted summation processing on the first probability distribution and the second probability distribution according to the first weight and the second weight to obtain a third probability distribution.

A third probability distribution P is calculated according to the formula P Pt wt + (1-Pt) Put.

In the above embodiment, for each decoding round, a first weight corresponding to the first probability distribution and a second weight corresponding to the second probability distribution are determined according to the target probability; and carrying out weighted summation processing on the first probability distribution and the second probability distribution according to the first weight and the second weight to obtain a third probability distribution. According to the embodiment of the disclosure, the first probability distribution and the second probability distribution can be weighted according to the target probability, and then the extracted probability of each vocabulary in the prediction vocabulary and the target content is obtained, so that information extraction can be performed according to the extracted probability of each vocabulary in the following process.

In an embodiment, as shown in fig. 6, on the basis of the above embodiment, a training process of the information extraction model may also be included, such as the following steps:

and 301, acquiring the universal linguistic data from the Internet by using a crawler tool.

Wherein, the universal language material is composed of questions, contents and answers.

A large amount of questioning and answering data exist in the internet, and the data are crawled from the internet by using a crawler tool, so that the universal linguistic data can be obtained.

For example, a question "what color clothes are white" is crawled from a certain encyclopedia website in the internet, data that a answer is "black is a wild color … … scarlet can make a person look fair … …" is obtained, and a question "what color clothes are white" is obtained, the content is "black is a wild color … … scarlet can make a person look fair … …", and the answer is a general corpus of "black and scarlet".

And step 302, performing data processing on the general language materials to obtain training problems, training contents and training labels.

And after the terminal acquires the universal language material, performing data processing on the universal language material to obtain a training problem, training content and a training label of the training information extraction model.

For example, the generic corpus includes: the question is 'what color clothes are white when worn', the content is 'black is wild color … … scarlet and can make people look fair … …', the answer is 'black and scarlet' general language material, the training question can be 'what color clothes are white when worn', the training content is 'black is wild color … … scarlet and can make people look fair … …', and the training label is 'black and scarlet'. By analogy, a large number of training questions, training contents and training labels can be obtained, so that a training set is formed.

And 303, performing model training according to the training problems, the training contents and the training labels to obtain an information extraction model.

Inputting the training question and the training content into the initial model to obtain a training result output by the initial model, and judging whether the model meets a preset convergence condition or not according to the training label and the training result. If the preset convergence condition is not met, adjusting the adjustable parameters in the model to continue training; and if the preset convergence condition is met, finishing the training, and determining the model after finishing the training as the information extraction model.

In the above embodiment, the crawler tool is used to obtain the universal corpus from the internet; carrying out data processing on the general language materials to obtain training problems, training contents and training labels; and carrying out model training according to the training problems, the training contents and the training labels to obtain an information extraction model. According to the method and the device for extracting the information, the terminal obtains the training problems, the training contents and the training labels used by model training according to the universal linguistic data, and the universal linguistic data is not limited in a certain field, so that the information extraction model trained according to the training problems, the training contents and the training labels has strong universality and mobility. Further, because the training labels can be obtained without manual labeling, the labor cost can be reduced, the label obtaining time can be saved, the label obtaining efficiency can be improved, the model training efficiency can be improved, the risk of user information leakage can be avoided, and the information safety can be improved.

It should be understood that, although the steps in the flowcharts of fig. 1 to 6 are shown in sequence as indicated by the arrows, the steps are not necessarily performed in sequence as indicated by the arrows. The steps are not performed in the exact order shown and described, and may be performed in other orders, unless explicitly stated otherwise. Moreover, at least some of the steps in fig. 1 to 6 may include multiple steps or multiple stages, which are not necessarily performed at the same time, but may be performed at different times, and the order of performing the steps or stages is not necessarily sequential, but may be performed alternately or alternately with other steps or at least some of the other steps or stages.

In one embodiment, as shown in fig. 7, there is provided an information extraction apparatus including:

a question content acquiring module 401, configured to acquire a target question and target content;

an information extraction module 402, configured to input the target question and the target content into a pre-trained information extraction model for information extraction, so as to obtain target information output by the information extraction model and used for answering the target question;

In one embodiment, the information extraction model includes an encoding sub-model and a decoding sub-model, and the information extraction module 402 includes:

the coding submodule is used for coding the target problem and the target content by utilizing the coding submodel to obtain a characteristic vector;

In one embodiment, the apparatus further comprises:

the corpus acquiring module is used for acquiring general corpus from the Internet by using a crawler tool; the universal language material consists of questions, contents and answers;

the corpus processing module is used for carrying out data processing on the general corpus to obtain training problems, training contents and training labels;

and the training module is used for carrying out model training according to the training problems, the training contents and the training labels to obtain an information extraction model.

For specific limitations of the information extraction device, reference may be made to the above limitations of the information extraction method, which are not described herein again. The modules in the information extraction device can be wholly or partially realized by software, hardware and a combination thereof. The modules can be embedded in a hardware form or independent from a processor in the computer device, and can also be stored in a memory in the computer device in a software form, so that the processor can call and execute operations corresponding to the modules.

In one embodiment, a computer device is provided, which may be a terminal, and its internal structure diagram may be as shown in fig. 8. The computer device includes a processor, a memory, a communication interface, a display screen, and an input device connected by a system bus. Wherein the processor of the computer device is configured to provide computing and control capabilities. The memory of the computer device comprises a nonvolatile storage medium and an internal memory. The non-volatile storage medium stores an operating system and a computer program. The internal memory provides an environment for the operation of an operating system and computer programs in the non-volatile storage medium. The communication interface of the computer device is used for carrying out wired or wireless communication with an external terminal, and the wireless communication can be realized through WIFI, an operator network, NFC (near field communication) or other technologies. The computer program is executed by a processor to implement an information extraction method. The display screen of the computer equipment can be a liquid crystal display screen or an electronic ink display screen, and the input device of the computer equipment can be a touch sub-model covered on the display screen, a key, a track ball or a touch pad arranged on the shell of the computer equipment, an external keyboard, a touch pad or a mouse and the like.

Those skilled in the art will appreciate that the architecture shown in fig. 8 is merely a block diagram of some of the structures associated with the disclosed aspects and is not intended to limit the computing devices to which the disclosed aspects apply, as particular computing devices may include more or less components than those shown, or may combine certain components, or have a different arrangement of components.

In one embodiment, a computer device is provided, comprising a memory and a processor, the memory having a computer program stored therein, the processor implementing the following steps when executing the computer program:

acquiring a target question and target content;

In one embodiment, the information extraction model comprises an encoding sub-model and a decoding sub-model, and the processor when executing the computer program further performs the steps of:

coding the target problem and the target content by using the coding sub-model to obtain a characteristic vector;

In one embodiment, the feature vectors include a classification feature vector, a problem feature vector, and a content feature vector, the decoding submodel includes a bidirectional long-short term memory network, a first fully-connected layer, and a second fully-connected layer, and the processor when executing the computer program further performs the steps of:

for each decoding round, splicing the classified feature vectors and the network feature vectors corresponding to the current round of the bidirectional long-short term memory network to obtain spliced vectors, and inputting the spliced vectors into a first full-connection layer to obtain target probability;

inputting the classified feature vectors into a second full-connection layer to obtain a first probability distribution;

and performing regularization processing on the problem feature vector and the content feature vector, and obtaining a second probability distribution according to a regularization result.

In one embodiment, the processor, when executing the computer program, further performs the steps of:

splicing the target problem and the target content to obtain splicing information;

performing word segmentation processing on the spliced information to obtain a plurality of information segments; the information segment comprises a classification identification segment;

In one embodiment, the processor, when executing the computer program, further performs the steps of:

acquiring general corpora from the Internet by using a crawler tool; the universal language material consists of questions, contents and answers;

carrying out data processing on the general language materials to obtain training problems, training contents and training labels;

and carrying out model training according to the training problems, the training contents and the training labels to obtain an information extraction model.

In one embodiment, a computer-readable storage medium is provided, having a computer program stored thereon, which when executed by a processor, performs the steps of:

acquiring a target question and target content;

In an embodiment, the information extraction model comprises an encoding sub-model and a decoding sub-model, the computer program, when executed by the processor, further performing the steps of:

coding the target problem and the target content by using the coding sub-model to obtain a characteristic vector;

In one embodiment, the feature vectors comprise classification feature vectors, problem feature vectors and content feature vectors, the decoding submodel comprises a bidirectional long-short term memory network, a first fully-connected layer and a second fully-connected layer, and the computer program when executed by the processor further implements the steps of:

for each decoding round, splicing the classified feature vectors and the network feature vectors corresponding to the current round of the bidirectional long-short term memory network to obtain spliced vectors, and inputting the spliced vectors into a first full-connection layer to obtain target probability;

inputting the classified feature vectors into a second full-connection layer to obtain a first probability distribution;

and performing regularization processing on the problem feature vector and the content feature vector, and obtaining a second probability distribution according to a regularization result.

In one embodiment, the computer program when executed by the processor further performs the steps of:

splicing the target problem and the target content to obtain splicing information;

performing word segmentation processing on the spliced information to obtain a plurality of information segments; the information segment comprises a classification identification segment;

In one embodiment, the computer program when executed by the processor further performs the steps of:

acquiring general corpora from the Internet by using a crawler tool; the universal language material consists of questions, contents and answers;

carrying out data processing on the general language materials to obtain training problems, training contents and training labels;

and carrying out model training according to the training problems, the training contents and the training labels to obtain an information extraction model.

It will be understood by those skilled in the art that all or part of the processes of the methods of the embodiments described above can be implemented by hardware instructions of a computer program, which can be stored in a non-volatile computer-readable storage medium, and when executed, can include the processes of the embodiments of the methods described above. Any reference to memory, storage, database or other medium used in the embodiments provided herein can include at least one of non-volatile and volatile memory. Non-volatile Memory may include Read-Only Memory (ROM), magnetic tape, floppy disk, flash Memory, optical storage, or the like. Volatile Memory can include Random Access Memory (RAM) or external cache Memory. By way of illustration and not limitation, RAM can take many forms, such as Static Random Access Memory (SRAM) or Dynamic Random Access Memory (DRAM), among others.

The technical features of the above embodiments can be arbitrarily combined, and for the sake of brevity, all possible combinations of the technical features in the above embodiments are not described, but should be considered as the scope of the present specification as long as there is no contradiction between the combinations of the technical features.

The above-mentioned embodiments only express several embodiments of the present application, and the description thereof is more specific and detailed, but not construed as limiting the scope of the invention. It should be noted that, for a person skilled in the art, several variations and modifications can be made without departing from the concept of the present application, which falls within the scope of protection of the present application. Therefore, the protection scope of the present patent shall be subject to the appended claims.

21页详细技术资料下载

Information extraction method and device, computer equipment and storage medium

相关技术

网友询问留言