Entity relationship type determination method, device and equipment and storage medium

文档序号：361616 发布日期：2021-12-07 浏览：4次中文

阅读说明：本技术 实体关系类型确定方法、装置和设备及存储介质 (Entity relationship type determination method, device and equipment and storage medium ) 是由杨韬于 2021-05-20 设计创作，主要内容包括：本申请公开了一种实体关系类型确定方法、装置和设备及存储介质,涉及人工智能技术领域,用于提升实体关系类型确定的准确性。该方法包括：获取目标实体对关联的目标句袋；将目标句袋输入至已训练的实体关系确定模型,针对目标句袋中各个句子,分别执行如下操作：针对一个句子,基于一个句子中各个字符各自对应的字符表示向量,获得一个句子的句子表示向量；基于获得的各个句子表示向量,分别确定相应句子的句子权重值,每个句子权重值表征一个句子对于目标实体对的关系确定的重要程度；基于各个句子各自对应的句子表示向量和句子权重值,确定目标句袋的句袋表示向量,并基于句袋表示向量,确定目标实体对包括的两个实体之间的目标关系类型。(The application discloses a method, a device and equipment for determining entity relationship types and a storage medium, relates to the technical field of artificial intelligence, and is used for improving the accuracy of determining the entity relationship types. The method comprises the following steps: acquiring a target sentence bag associated with the target entity pair; inputting the target sentence bag into the trained entity relationship determination model, and respectively executing the following operations for each sentence in the target sentence bag: aiming at a sentence, obtaining a sentence expression vector of the sentence based on the character expression vector corresponding to each character in the sentence; respectively determining sentence weight values of corresponding sentences based on the obtained sentence expression vectors, wherein each sentence weight value represents the determined importance degree of a sentence to the relation of a target entity pair; and determining a sentence bag representation vector of the target sentence bag based on the sentence representation vector and the sentence weight value respectively corresponding to each sentence, and determining a target relation type between two entities included in the target entity pair based on the sentence bag representation vector.)

1. A method for determining entity relationship types, the method comprising:

acquiring a target sentence bag associated with the target entity pair; the target sentence bag comprises a plurality of sentences, and each sentence comprises the target entity pair;

inputting the target sentence bag into a trained entity relationship determination model, and respectively executing the following operations for each sentence in the target sentence bag: for a sentence, obtaining a sentence representation vector of the sentence based on the character representation vector corresponding to each character in the sentence;

respectively determining sentence weight values of corresponding sentences based on the obtained sentence expression vectors by adopting the trained entity relationship determination model, wherein each sentence weight value represents the importance degree of a sentence for determining the relationship of the target entity pair;

and determining a sentence bag representation vector of the target sentence bag based on a sentence representation vector and a sentence weight value respectively corresponding to each sentence by adopting the trained entity relationship determination model, and determining a target relationship type between two entities included in the target entity pair based on the sentence bag representation vector.

2. The method of claim 1, wherein the training process of the entity-relationship determination model comprises:

determining multiple relation types output by a preset model based on the entity relation, and acquiring multiple triples; each triple comprises an entity pair, and one of the multiple relation types is labeled in association with the corresponding entity pair;

for the triples, the following operations are respectively performed: for a triple, carrying out sentence matching by adopting an entity pair contained in the triple to obtain a sentence sample containing the entity pair contained in the triple;

constructing corresponding training samples respectively based on the obtained sentence samples, wherein each training sample comprises a plurality of sentence samples containing the same entity pair;

respectively labeling the corresponding relation types of entity pairs associated with the corresponding training samples aiming at the obtained training samples;

and carrying out iterative training on the entity relationship determination model to be trained based on the marked training samples until a convergence condition is met, and obtaining the trained entity relationship determination model.

3. The method of claim 2, wherein prior to constructing respective training samples based on the obtained plurality of sentence samples, the method further comprises:

performing word segmentation operation on the obtained multiple sentence samples to obtain multiple word segments;

for the multiple relation types, the following operations are respectively executed:

for a relation type, determining mutual information coefficients of each word in the multiple words and the relation type, wherein one mutual information coefficient is used for representing the importance degree of one word to the relation type;

selecting at least one word segmentation corresponding to the mutual information coefficient larger than a set threshold value based on the obtained multiple mutual information coefficients;

screening sentence samples which do not contain any participle in the at least one participle from a plurality of sentence samples corresponding to the relation type;

the constructing of the corresponding training samples based on the obtained sentence samples respectively comprises:

and respectively constructing corresponding training samples based on the plurality of residual sentence samples.

4. The method of claim 3, wherein determining, for a relationship type, mutual information coefficients for each of the plurality of participles and the relationship type comprises:

for the plurality of participles, the following operations are respectively executed:

determining a first probability of occurrence of a participle for the participle;

determining a second probability of occurrence of said one relationship type and determining a third probability of occurrence of said one word-segmentation when said one relationship type exists;

and determining a mutual information coefficient corresponding to the word segmentation based on the first probability, the second probability and the third probability.

5. The method of claim 1, wherein before obtaining the sentence-representation vector of the one sentence based on the character-representation vectors corresponding to the respective characters in the one sentence, the method further comprises:

performing character splitting on the sentence to obtain a plurality of characters included in the sentence;

for the plurality of characters, the following operations are respectively executed:

performing feature coding on one character to obtain a content representation vector, a position representation vector and a source representation vector of the character; the content representation vector is used for representing content corresponding to the character, the position representation vector represents the position of the character in the sentence, and the source representation vector represents the sentence from which the character comes;

and obtaining a character representation vector of the character based on the content representation vector, the position representation vector and the source representation vector.

6. The method of claim 1, wherein before obtaining the sentence-representation vector of the one sentence based on the character-representation vectors corresponding to the respective characters in the one sentence, the method further comprises:

performing character splitting on the sentence to obtain a plurality of characters included in the sentence;

according to the sequence of the characters in the sentence, sequentially carrying out feature coding on each character in the characters to respectively obtain character representation vectors corresponding to the characters; when the character is subjected to feature coding, feature extraction is carried out on the character, a basic expression vector of the character is obtained, and a character expression vector of the character is obtained based on the basic expression vector and a character expression vector of a character before the character.

7. The method according to any one of claims 1-5, wherein obtaining the sentence representation vector of the one sentence based on the character representation vectors corresponding to the respective characters in the one sentence comprises:

performing mean pooling on each obtained character representation vector to obtain the sentence representation vector; alternatively, the first and second electrodes may be,

determining a character weight value of each character, and obtaining a sentence expression vector based on a character expression vector and a character weight value corresponding to each character; wherein each character weight value characterizes a degree of importance of one character to the one sentence.

8. The method according to any one of claims 1-5, wherein determining a sentence weight value for a respective sentence based on the obtained respective sentence representation vector comprises:

for each sentence representation vector, the following operations are respectively executed:

for one sentence expression vector, obtaining an intermediate expression vector of the sentence expression vector based on the sentence expression vector and a pre-training weight matrix included by the entity relation determination model;

obtaining a vector dot product between the intermediate representation vector and a pre-training parameter vector included in the entity relationship determination model;

and carrying out normalization processing on the obtained vector dot products to obtain sentence weight values corresponding to the sentence expression vectors.

9. An entity relationship type determination apparatus, the apparatus comprising:

the system comprises an acquisition unit, a relation determination module and a relation determination module, wherein the acquisition unit is used for acquiring a target sentence bag associated with a target entity pair and inputting the target sentence bag into a trained entity relation determination model; the target sentence bag comprises a plurality of sentences, and each sentence comprises the target entity pair;

a sentence coding unit, configured to use the trained entity relationship determination model to perform the following operations for each sentence in the target sentence bag, respectively: for a sentence, obtaining a sentence representation vector of the sentence based on the character representation vector corresponding to each character in the sentence;

a sentence bag encoding unit, configured to determine, based on the obtained sentence expression vectors, sentence weight values of corresponding sentences respectively by using the trained entity relationship determination model, each sentence weight value representing an importance degree of a sentence determined for the relationship of the target entity pair, and determine, based on the sentence expression vectors and the sentence weight values corresponding to the respective sentences by using the trained entity relationship determination model, a sentence bag expression vector of the target sentence bag;

and the predicting unit is used for determining a target relationship type between the two entities included in the target entity pair based on the sentence bag representation vector by adopting the trained entity relationship determination model.

10. A computer device comprising a memory, a processor, and a computer program stored on the memory and executable on the processor,

the processor, when executing the computer program, realizes the steps of the method of any one of claims 1 to 8.

11. A computer storage medium having computer program instructions stored thereon, wherein,

the computer program instructions, when executed by a processor, implement the steps of the method of any one of claims 1 to 8.

Technical Field

The application relates to the technical field of computers, in particular to the technical field of Artificial Intelligence (AI), and provides a method, a device and equipment for determining entity relationship types and a storage medium.

Background

With the development of network technology, a great deal of knowledge is contained in common texts, and it is a very necessary work to mine relevant knowledge from the texts. Entity (entity) relationship extraction is the work of mining the relationship between entities from the common text to construct a triple data to enrich the knowledge map, and belongs to a basic technology in Natural Language Processing (NLP). For example, for one sentence: zhang san in 1961, 9 and 27 sunrise in A City, the sentence contains Zhang san, 1961, 9 and 27 sundays in 1961 and A City, and the three entities have a certain association relationship, and from the sentence, two entities of Zhang san and 1961, 9 and 27 sundays in 1961 are in a relationship of birth time, and the Zhang san and the A City are in a relationship of birth place. Therefore, after the entity relationship extraction based on this statement, two triples (zhang, time of birth, 1961, 9, 27) and (zhang, place of birth, city a) can be obtained, and then these triples can be added to the knowledge graph.

Knowledge maps have a wide range of applications in many fields. For example, in the search field, a user may ask knowledge-based questions such as "what was born when zhangsan", how high the zumuluman peak is, "and the two questions may be resolved into two queries, namely" (zhangsan, time of birth), "and" (zumuluman, altitude; in the recommendation field, the knowledge in the knowledge map is combined with the recommendation model to provide better recommendation results for the user; or in the conversation field, the user may ask relevant questions, and the user may not have knowledge map for accurate answering.

Therefore, the accuracy of the relation extraction task directly determines the accuracy of the knowledge graph, so that the experience of subsequent downstream application is influenced, and how to improve the accuracy of the relation extraction task to construct a high-quality knowledge graph is a problem to be considered.

Disclosure of Invention

The embodiment of the application provides a method, a device and equipment for determining an entity relationship type and a storage medium, which are used for improving the accuracy of determining the entity relationship type.

In one aspect, a method for determining an entity relationship type is provided, where the method includes:

acquiring a target sentence bag associated with the target entity pair; the target sentence bag comprises a plurality of sentences, and each sentence comprises the target entity pair;

In one aspect, an entity relationship type determining apparatus is provided, the apparatus includes:

Optionally, the apparatus further includes a model training unit, configured to:

respectively labeling the corresponding relation types of entity pairs associated with the corresponding training samples aiming at the obtained training samples;

Optionally, the model training unit is further configured to: