Entity disambiguation method, device, storage medium and computer equipment

文档序号:1170277 发布日期:2020-09-18 浏览:17次 中文

阅读说明:本技术 实体消歧方法、装置、存储介质及计算机设备 (Entity disambiguation method, device, storage medium and computer equipment ) 是由 刘万增 翟曦 尹川 于 2020-05-18 设计创作,主要内容包括:本发明公开了一种实体消歧方法、装置、存储介质及计算机设备,涉及信息技术领域,主要目的在于通过在创建专家关系图谱时,引入机构字段,并通过社交网络关系分析技术对构建的待消歧子网进行处理,从而最大化实现知识图谱中实体的消歧处理,减少后期消歧的工作量,提高知识图谱构建的效率,节约人力资源。所述方法包括:根据合作关系模型以及预先创建的专家-机构关系表,构建专家关系图谱;利用整体分析法对所述专家关系图谱进行消歧处理。本发明适用于实体的消歧。(The invention discloses an entity disambiguation method, an entity disambiguation device, a storage medium and computer equipment, relates to the technical field of information, and mainly aims to introduce mechanism fields when an expert relation graph is created and process a constructed sub-network to be disambiguated through a social network relation analysis technology, so that the entity disambiguation processing in a knowledge graph is realized to the maximum extent, the workload of later disambiguation is reduced, the construction efficiency of the knowledge graph is improved, and human resources are saved. The method comprises the following steps: constructing an expert relation map according to the cooperation relation model and a pre-established expert-mechanism relation table; and carrying out disambiguation processing on the expert relation map by using an integral analysis method. The invention is suitable for disambiguation of entities.)

1. An entity disambiguation method, comprising:

constructing an expert relation map according to the cooperation relation model and a pre-established expert-mechanism relation table;

and carrying out disambiguation processing on the expert relation map by using an integral analysis method.

2. The method according to claim 1, wherein the constructing an expert relationship graph according to the cooperative relationship model and a pre-created expert-institution relationship table comprises:

extracting the information of each entity field and the relationship information in the expert-institution relationship table;

adding the entity field information to an entity set of the collaborative relationship model;

adding the relationship information to a relationship set of the collaborative relationship model to construct the expert relationship graph.

3. The method of claim 1, wherein disambiguating the expert relationship graph using holistic analysis comprises:

constructing a sub-network to be disambiguated according to the entity to be disambiguated acquired from the expert relation map;

and carrying out disambiguation processing on the sub-network to be disambiguated by utilizing a social network analysis technology.

4. The method according to claim 1, wherein before the constructing the expert relationship graph according to the collaborative relationship model and the pre-created expert-institution relationship table, the method further comprises:

judging whether the expert and the mechanism are in a many-to-many corresponding relationship or not according to the acquired expert information and the acquired mechanism information;

if so, only extracting the corresponding relation between the first expert and the first mechanism, and storing the corresponding relation to the expert-mechanism relation table;

if not, directly extracting the corresponding relation between the expert and the mechanism, and storing the corresponding relation to the expert-mechanism relation table.

5. The method of claim 1, wherein prior to disambiguating the expert relationship graph using holistic analysis, the method further comprises:

and carrying out disambiguation processing on the obtained entity to be disambiguated based on a similarity clustering algorithm.

6. The method of claim 1, wherein the constructing an expert relationship graph comprises:

and generating an expert relation map by utilizing a gephi tool, the cooperation relation model and the pre-established expert-mechanism relation table.

7. An entity disambiguation apparatus, comprising:

the construction unit is used for constructing an expert relation map according to the cooperation relation model and a pre-established expert-mechanism relation table;

and the processing unit is used for carrying out disambiguation processing on the expert relation map by utilizing an integral analysis method.

8. The apparatus of claim 7, wherein the processing unit comprises:

the construction module is used for constructing a sub-network to be disambiguated according to the entity to be disambiguated acquired from the expert relation map;

and the processing module is used for carrying out disambiguation processing on the sub-network to be disambiguated by utilizing a social network analysis technology.

9. A storage medium having stored thereon a computer program having stored therein at least one executable instruction that causes a processor to perform operations corresponding to the entity disambiguation method according to any one of claims 1-6.

10. A computer device comprising a processor, a memory, a communication interface, and a communication bus through which the processor, the memory, and the communication interface communicate with each other, the memory storing at least one executable instruction that causes the processor to perform operations corresponding to the entity disambiguation of any of claims 1-6.

Technical Field

The present invention relates to the field of information technology, and in particular, to an entity disambiguation method, apparatus, storage medium, and computer device.

Background

The knowledge map is a series of different graphs displaying the relationship between the knowledge development process and the structure, and is used for describing knowledge resources and carriers thereof by using a visualization technology, mining, analyzing, constructing, drawing and displaying knowledge and the mutual relation between the knowledge resources and the carriers. The expert relation graph is obtained through various ways of mining, extracting and integrating a large number of documents, relevant information of experts is standardized, and a relevant algorithm is used for establishing a relation network of experts, institutions, relevant research and the like.

At present, an expert relationship map is generally constructed for the basis through expert cooperative relationships. However, the expert relationship graph constructed by the method has a large number of nodes of the same-name experts, for example, all the same-name experts with the expert name P issue the paper M, the ambiguity pairs of the experts needing to be processed reach C2M, and the workload of manual disambiguation in the later period is huge, so that the efficiency of constructing the knowledge graph is reduced, and a large amount of human resources are consumed.

Disclosure of Invention

In view of the above, the present invention provides an entity disambiguation method, apparatus, storage medium, and computer device, and mainly aims to introduce mechanism fields when creating an expert relationship graph, and process a constructed subnet to be disambiguated by using a social network relationship analysis technique, so as to maximally implement disambiguation processing of an entity in a knowledge graph, reduce workload of post-disambiguation, improve efficiency of construction of the knowledge graph, and save human resources.

In accordance with one aspect of the present invention, there is provided a method of entity disambiguation comprising:

constructing an expert relation map according to the cooperation relation model and a pre-established expert-mechanism relation table;

and carrying out disambiguation processing on the expert relation map by using an integral analysis method.

Further, the constructing an expert relationship map according to the cooperation relationship model and a pre-created expert-institution relationship table includes:

extracting the information of each entity field and the relationship information in the expert-institution relationship table;

adding the entity field information to an entity set of the collaborative relationship model;

adding the relationship information to a relationship set of the collaborative relationship model to construct the expert relationship graph.

Further, the disambiguating the expert relationship graph using the global analysis method includes:

constructing a subnet to be disambiguated according to the selected expert node to be disambiguated;

and carrying out disambiguation processing on the sub-network to be disambiguated by utilizing a social network analysis technology.

Further, the pre-creating an expert-institution relationship table includes:

judging whether the expert and the mechanism are in a many-to-many corresponding relationship or not according to the acquired expert information and the acquired mechanism information;

if not, directly extracting the corresponding relation between the expert and the mechanism, and storing the corresponding relation to the expert-mechanism relation table;

if yes, only extracting the corresponding relation between the first expert and the first mechanism, and storing the corresponding relation to the expert-mechanism relation table.

Further, before the disambiguating the expert relationship graph using the ensemble analysis method, the method further includes:

and carrying out disambiguation processing on the obtained entity to be disambiguated based on a similarity clustering algorithm.

Further, the constructing of the expert relationship graph comprises:

and generating an expert relation map by utilizing a gephi tool, the cooperation relation model and the pre-established expert-mechanism relation table.

According to two aspects of the present invention, there is provided an entity disambiguation apparatus comprising:

the construction unit is used for constructing an expert relation map according to the cooperation relation model and a pre-established expert-mechanism relation table;

and the processing unit is used for carrying out disambiguation processing on the expert relation map by utilizing an integral analysis method.

Further, the building unit includes:

the extraction module is used for extracting the information of each entity field and the relationship information in the expert-institution relationship table;

an adding module, configured to add the entity field information to an entity set of the collaborative relationship model;

the adding module is further specifically configured to add the relationship information to a relationship set of the cooperative relationship model to construct the expert relationship graph.

Further, the processing unit includes:

the construction module is used for constructing the subnet to be disambiguated according to the selected expert node to be disambiguated;

and the processing module is used for carrying out disambiguation processing on the sub-network to be disambiguated by utilizing a social network analysis technology.

Further, the construction unit includes:

the judging module is used for judging whether the expert and the mechanism are in a many-to-many corresponding relationship or not according to the acquired expert information and the acquired mechanism information;

the first extraction module is used for extracting the corresponding relation between the first expert and the first mechanism only if the relation is positive, and storing the corresponding relation to the expert-mechanism relation table;

and the second extraction module is used for directly extracting the corresponding relation between the expert and the mechanism if the expert and the mechanism do not correspond to each other, and storing the corresponding relation to the expert-mechanism relation table.

Further, the apparatus further comprises:

and the clustering unit is used for carrying out disambiguation processing on the obtained entity to be disambiguated based on the similarity clustering algorithm.

Further, the construction unit is specifically further configured to generate an expert relationship graph by using a gephi tool, the cooperation relationship model, and the pre-created expert-institution relationship table.

According to a third aspect of the present invention, there is provided a storage medium having at least one executable instruction stored therein, the executable instruction causing a processor to perform the steps of: constructing an expert relation map according to the cooperation relation model and a pre-established expert-mechanism relation table; and carrying out disambiguation processing on the expert relation map by using an integral analysis method.

According to a fourth aspect of the present invention, there is provided a computer device comprising a processor, a memory, a communication interface and a communication bus, wherein the processor, the memory and the communication interface communicate with each other via the communication bus, and the memory is used for storing at least one executable instruction, and the executable instruction causes the processor to perform the following steps: constructing an expert relation map according to the cooperation relation model and a pre-established expert-mechanism relation table; and carrying out disambiguation processing on the expert relation map by using an integral analysis method.

Compared with the prior art that an expert relation map is established on the basis of an expert cooperation relationship, the entity disambiguation method and device, a storage medium and computer equipment establish the expert relation map according to a cooperation relationship model and a pre-established expert-mechanism relationship table; and carrying out disambiguation processing on the expert relation map by using an integral analysis method. Therefore, mechanism fields are introduced when the expert relation graph is created, and the constructed sub-network to be disambiguated is processed through the social network relation analysis technology, so that the disambiguation processing of the entity in the knowledge graph is realized to the maximum extent, the workload of later-stage disambiguation is reduced, the efficiency of knowledge graph construction is improved, and human resources are saved.

The foregoing description is only an overview of the technical solutions of the present invention, and the embodiments of the present invention are described below in order to make the technical means of the present invention more clearly understood and to make the above and other objects, features, and advantages of the present invention more clearly understandable.

Drawings

Various other advantages and benefits will become apparent to those of ordinary skill in the art upon reading the following detailed description of the preferred embodiments. The drawings are only for purposes of illustrating the preferred embodiments and are not to be construed as limiting the invention. Also, like reference numerals are used to refer to like parts throughout the drawings. In the drawings:

FIG. 1 is a flow chart of a method for entity disambiguation according to an embodiment of the present invention;

FIG. 2 is a schematic diagram illustrating a process of creating an expert-institution relationship table according to an embodiment of the present invention;

FIG. 3 is a diagram illustrating an expert relationship atlas provided by an embodiment of the present invention;

FIG. 4 is a schematic diagram illustrating the construction of a subnet to be disambiguated according to an embodiment of the present invention;

FIG. 5 is a schematic structural diagram of an entity disambiguation apparatus according to an embodiment of the present invention;

fig. 6 shows a physical structure diagram of a computer device according to an embodiment of the present invention.

Detailed Description

Exemplary embodiments of the present disclosure will be described in more detail below with reference to the accompanying drawings. While exemplary embodiments of the present disclosure are shown in the drawings, it should be understood that the present disclosure may be embodied in various forms and should not be limited to the embodiments set forth herein. Rather, these embodiments are provided so that this disclosure will be thorough and complete, and will fully convey the scope of the disclosure to those skilled in the art.

As described in the background, expert relationship maps are currently typically constructed for the basis by expert collaboration relationships. However, if there are a large number of nodes of the same-name experts in the expert relation graph constructed by the method, for example, all the same-name experts with the expert name P published the paper M, then the ambiguous pairs of experts that need to be processed will reach

Figure BDA0002495899420000051

The workload of manual disambiguation in the later period is huge, so that the efficiency of establishing the knowledge graph is reduced, and a large amount of human resources are consumed.

In order to solve the above problem, an embodiment of the present invention provides an entity disambiguation method, as shown in fig. 1, where the method includes:

101. and constructing an expert relation map according to the cooperation relation model and a pre-established expert-mechanism relation table.

The cooperation relationship model may be a graph model for generating an expert relationship graph through cooperation relationships of experts in the prior art, and may specifically be as follows:

GM={V,E}

V={v1,v1,v1,v1,v1,v1,v1}

E={(x,y)|x∈V,y∈V,x≠y}

vi={Identifier,Name,Publicationsi}

Publicationsi={pi1,pi1,pi1,pi1,pi1,pi1,}

wherein GM may represent the expert relationship graph, V may represent a set of expert nodes, E may represent a set of partnership, each entity ViThe system can comprise three fields of an Identifier, a Name and Publications, wherein the Identifier can represent an Identifier, the Name can represent an expert Name, the Publications can represent document names, and the document Name corresponding to each entity can be one or more.

The expert-mechanism relation table can be a pre-established relation set of an expert and a corresponding mechanism, and the expert-mechanism relation table can be used for accurately distinguishing the experts with the same name, so that the ambiguity of the same name existing in the established expert relation graph is less, and the workload of the disambiguation at the later stage is reduced. It should be noted that, when constructing the expert relationship graph in the prior art, a graph structure is generally constructed only through the cooperation relationship of experts, the paper of an independent expert is abandoned, the vertex of the graph represents the expert, the edge represents the union relationship, and the expert attribute includes several fields such as a paper title and a keyword. The algorithm is relatively simple to implement, but because the generated expert relation graph loses the important attribute of the mechanism, two experts with the same name of different mechanisms can be combined into a vertex, and the workload is increased for next splitting of the experts with the same name. For the embodiment of the invention, the mechanism information is introduced into the construction process of the expert relationship graph, so that the attribute information of the expert relationship network is enriched, and more importantly, the workload of the homonymy splitting link can be reduced.

102. And carrying out disambiguation processing on the expert relation map by using an integral analysis method.

Wherein the global analysis is relative to the existing path analysis. The existing path analysis method constructs an entity relationship graph according to the cooperative relationship between experts and the membership relationship between the experts and an organization, and searches effective paths between every two experts with the same name in the graph by adopting a breadth-first search strategy; and calculating the connection strength between two experts with the same name according to the length and the number of the effective paths and the types of the paths, and comparing the connection strength with a threshold value to realize the disambiguation with the same name. The overall analysis method in the embodiment of the invention can realize homonymy disambiguation by extracting the characteristic subnetworks among homonymy experts and carrying out social network analysis on the characteristic subnetworks. Specifically, after the expert relation graph is preliminarily constructed, the subnet to be disambiguated can be extracted by using an overall analysis method, and the subnet to be disambiguated is processed by using a social network analysis technology, so that the synonym disambiguation is realized.

Further, in order to better illustrate the process of the entity disambiguation method, as a refinement and an extension of the above embodiments, the embodiments of the present invention provide several alternative embodiments, but are not limited thereto, and specifically as follows:

in an optional embodiment of the present invention, the step 101 may specifically include: extracting the information of each entity field and the relationship information in the expert-institution relationship table; adding the entity field information to an entity set of the collaborative relationship model; adding the relationship information to a relationship set of the collaborative relationship model to construct the expert relationship graph.

The collaboration relation model and the expert-institution relation table are already described above, and are not described herein again. Specifically, the expert-institution relationship table is extracted, and the expert entity field information, such as name field information, works field information, institution field information, and the like, can be obtained. In addition, the expert-institution relationships in the expert-institution relationship table may also be extracted. And adding the expert entity field information and the mechanism entity field information into an entity set in the cooperative relationship model, and adding the expert-mechanism relationship information into a relationship set in the cooperative relationship model, thereby constructing a visual expert relationship map. According to the embodiment of the invention, the expert relation map is constructed by utilizing the graph model and the pre-constructed expert-mechanism relation table, so that the problem that the expert entities and the mechanism entities in the existing document can not be in one-to-one correspondence can be solved, the expert relation map can be rapidly generated, and the generation efficiency is improved.

In an optional embodiment of the present invention, the step 102 may specifically include: constructing a sub-network to be disambiguated according to the entity to be disambiguated acquired from the expert relation map; and utilizing a social network analysis technology to perform disambiguation processing on the characteristic subnet.

Wherein the subnet to be disambiguated may be constructed based on a shortest path between two entities to be disambiguated. In the expert relation map, whether the path exists or not is reflected by the relevance of two entities, and the shortest path can directly reflect the strength of the relevance, so the embodiment of the invention constructs the sub-network to be disambiguated by the principle of the shortest path. Specifically, assuming that the shortest path of two expert nodes to be disambiguated is n, the maximum path of the subnet to be disambiguated is w, w > n, the subnet to be disambiguated may be a network composed of all paths from n to w, where the value of w may be determined by the degree and the amount of text sent by the two entities to be disambiguated. The setting of w determines the scale of the subnet to be disambiguated, and whether the scale selection is proper is the premise that the disambiguation accuracy is important, if the setting of the subnet scale is too small, the similar relation may not be extracted, and if the setting is too large, a large amount of computing resources are consumed. According to the embodiment of the invention, the relationship between the subnet scale and the path is analyzed according to a large number of samples, as shown in fig. 4, a schematic drawing of the subnet to be disambiguated of two typical samples is provided, the initial shortest paths of the two sample subnets in the drawing are both 4, m1 and m2 are two entities to be disambiguated, and the graphs a to d are respectively the change conditions of the subnet scale to be disambiguated when the path is set to be 4, 5, 6 and 7; graphs e to h represent the variation of the subnet size to be disambiguated when the paths between the two entities n1 and n2 are set to 4, 5, 6 and 7. The sizes of the sub-networks to be disambiguated constructed by m1 and m2 become larger rapidly along with the increase of the paths, the requirement of feature extraction can be met when the path is 5, the sizes of the sub-networks to be disambiguated constructed by n1 and n2 do not change obviously along with the increase of the paths, and the sizes of the paths are not changed obviously when the paths are set to be 5, 6 and 7. For the embodiment of the present invention, as each path is added with one expert node, the corresponding subnet to be disambiguated increases a large amount of time in calculation speed, for example, as the subnet c to be disambiguated in fig. 4 has a calculation time of 30 seconds, and the path of the subnet d to be disambiguated has one expert node added, and the calculation time is about 5 minutes, therefore, three cases of w-n being 1, 2, and 3 may be set, that is, the shortest path is n, the longest paths are respectively the subnets to be disambiguated of n +1, n +2, and n +3, and the specific value of the longest path may be set according to w.

For the embodiment of the invention, the change of the scale of the sub-network to be disambiguated is determined by the degree and the text sending quantity of two entities to be disambiguated. The degree of the entity to be disambiguated may be the number of edges connecting the entity, and the text sending amount may be the number of documents corresponding to each edge of the entity. The specific formula of the change of the scale of the subnet to be disambiguated can comprise:

M=Dx*0.7+Fx*0.3

wherein, M may represent the scale change of the subnet to be disambiguated, Dx may represent the mean normalization processing of the degrees of the two entities to be disambiguated, and Fx may represent the mean normalization processing of the text amount of the two entities to be disambiguated, where the weight coefficient of Dx may be 0.7, and the weight coefficient of Fx may be 0.3.

For the embodiment of the present invention, the longest path of the subnet to be disambiguated can be determined by the change of the size of the subnet to be disambiguated. Wherein, the subnet scale variation to be disambiguated may be the above-mentioned M value. Specifically, the specific formula of the longest path of the subnet to be disambiguated may include:

wherein w may be the longest path of the subnet to be disambiguated, M may be the change amount of the subnet scale to be disambiguated, 0.3 and 0.5 may be preset thresholds of the change of the subnet scale to be disambiguated, and the weight coefficient may be set according to experience, which is not explicitly specified in the embodiments of the present invention.

In yet another alternative embodiment of the present invention, the step 101 may specifically include: judging whether the expert and the mechanism are in a many-to-many corresponding relationship or not according to the acquired expert information and the acquired mechanism information; if yes, only extracting the corresponding relation between the first expert and the first mechanism, and storing the corresponding relation to an expert-mechanism relation table.

The first expert may specifically refer to an expert who is first arranged in order of position in the document signature, and similarly, the first mechanism may refer to a mechanism that is first arranged in order of position in the document signature. The expert information and the institution information can be obtained from documents, the expert information can comprise expert names, the institution information can comprise institution names, and the expert information and the institution information can be specifically crawled in a document database through crawler software. The many-to-many correspondence relationship may specifically include: the expert names of the signature in the document are 2 or more, and the corresponding organization names are also 2 or more. For example, for a document, the names of the experts signed include zhang three, lie four, wang five, and zhao six, and the corresponding names of the institutions include institution 1, institution 2, and institution 3, and since the numbers of the expert names and the institution names are not equal, the expert names and the institution names cannot be directly matched in a one-to-one correspondence manner, but according to the general document requirements, the first expert and the first institution have a correct correspondence relationship, that is, zhang three and institution 1 necessarily have a correspondence relationship, but lie four may belong to institution 2 and institution 3, and therefore, the embodiment of the present invention only extracts the first expert name and the first institution name for the case where the expert names and the institution names are many-to-many. It should be noted that, in the embodiments of the present invention, since the entity disambiguation process is implemented, the experts involved in the embodiments of the present invention all need to have at least one document with a first expert signature, that is, even if one author has a plurality of document signatures, the scope of the experts cannot be counted without the document signature of one first expert. This ensures that all expert-to-institution correspondences within the scope of embodiments of the present invention can be collected. Specifically, if the expert name and the organization name belong to a many-to-many correspondence relationship, only the correspondence relationship between the first expert name and the first organization name is extracted and stored in the expert-organization relationship table.

For the embodiment of the present invention, the step 101 may further include: if not, directly extracting the corresponding relation between the expert and the mechanism, and storing the corresponding relation to the expert-mechanism relation table.

Wherein, if the expert name and the organization name are not in a many-to-many relationship, only three relationships may exist, namely 1: 1. 1: n, n: three relationships, as shown in fig. 2, for which the correspondence of expert names to institution names can be uniquely determined, for example: for a document, 1) if only one signature expert Zhang III and only one institution 1 exist, the two have unique corresponding relation; 2) if there is only one third signature expert and there are two institutions, institution 1 and institution 2, then it can also be determined that three signature experts belong to institution 1 and institution 2 at the same time, which may occur during the expert's learning stage or the job department's transition; 3) if there are two signature experts, Zhang three, Liqu, and only one organization 1, then it can be determined that Zhang three and Liqu belong to organization 1. Therefore, when the expert name and the organization name are judged not to belong to the many-to-many relationship, the corresponding relationship between the expert and the organization can be directly extracted and stored in the expert-organization relationship table.

Through the steps, the expert information and the institution information in each document are crawled from a pre-established document database, and the corresponding relation is stored in the expert-institution relation table. After all the documents are traversed, the corresponding relations between all the experts and the institutions in the database can be obtained. And after the expert-mechanism relation table is constructed, the matching problem of the experts and the mechanisms under the condition of n: n can be solved according to the table, firstly, the first expert of the expert field is obtained, the mechanism list corresponding to the expert is searched in the expert-mechanism relation table, then, intersection operation is carried out on the expert and all the mechanisms of the thesis, the obtained result is the mechanism corresponding to the expert, and by analogy, the mechanism information corresponding to the other experts is obtained.

It should be noted that, the existing method for constructing an expert relationship graph through a cooperative relationship discards the paper of a single-literate expert and only uses the paper of a co-literate expert, that is, the prior art discards the above-mentioned 1: 1 and 1: the expert literature under the two conditions causes that the constructed expert relation graph lacks a plurality of entities, thus causing the incompleteness of the graph; the embodiment of the invention adds the field of the organization name on the basis of the prior art, and can realize the extraction of the independent entity by creating the expert-organization relation table.

In yet another alternative embodiment of the present invention, the method further comprises: and carrying out disambiguation processing on the obtained entity to be disambiguated based on a similarity clustering algorithm.

Specifically, the specific process of performing disambiguation processing on the obtained entity to be disambiguated based on the similarity clustering algorithm may include: s1: training 2 attribute features of the expert name and the cooperative relationship of the entity into Word vectors by using a Word2Vec tool, normalizing each Word vector into a decimal between (0 and 1), and forming the feature vector by using 2 normalized decimals to represent the entity; s2: taking all entities with the same name, calculating the similarity between any two entities with the same name, comparing the similarity with a similarity threshold, taking the maximum similarity value larger than the similarity threshold, clustering the two entities with the same name corresponding to the maximum similarity value into a cluster to obtain an entity set, wherein a formula for calculating the similarity specifically can be as follows:

wherein S isijCan represent two entities a with the same nameiWith entity ajSimilarity between, simattr() A similarity calculation function may be represented; s3: taking any other entity with the same name as the entity set, and if the similarity between the entity and any entity in the entity set is greater than thatIf the similarity threshold value is reached, adding the entity into the entity set; s4: processing the remaining entities with the same name according to the steps S2 and S3 until all the entities with the same name are matched with the corresponding entity set; s5: all entities in the same entity set are merged into the same entity.

In yet another optional embodiment of the present invention, the step 101 specifically further includes: and generating an expert relation map by utilizing a gephi tool, the cooperation relation model and the pre-established expert-mechanism relation table.

The gephi tool is a free open-source cross-platform JVM-based complex network analysis software, and is mainly used for interactive visualization and detection open-source tools of various networks and complex systems, dynamic graphs and hierarchical graphs. For the embodiment of the present invention, the cooperation relationship model and the pre-created expert-mechanism relationship table are utilized to generate the corresponding expert relationship map, and the generated expert relationship map is an expert relationship map with ambiguous expert names, and the disambiguation processing needs to be performed subsequently by the disambiguation subnet extraction method provided by the embodiment of the present invention, so as to obtain the disambiguated expert relationship map.

Further, as a specific implementation of fig. 1, an embodiment of the present invention provides an entity disambiguation apparatus, as shown in fig. 5, where the apparatus includes: a construction unit 21 and a processing unit 22.

The construction unit 21 may be configured to perform disambiguation processing on the expert relationship graph by using an integral analysis method;

the processing unit 22 may be configured to perform disambiguation processing on the expert relationship graph using a global analysis method.

Further, the building unit 21 includes:

an extracting module 211, configured to extract field information and relationship information of each entity in the expert-institution relationship table;

an adding module 212, which may be configured to add the entity field information to the entity set of the partnership model;

the adding module 212 may be further configured to add the relationship information to the relationship set of the collaborative relationship model to construct the expert relationship graph.

Further, the processing unit 22 includes:

the building module 221 may be configured to build a sub-network to be disambiguated according to the entity to be disambiguated obtained in the expert relationship graph;

the processing module 222 may be configured to perform disambiguation on the subnet to be disambiguated using social network analysis techniques.

Further, the apparatus further comprises:

the judging module 23 may be configured to judge whether the expert and the organization are in a many-to-many correspondence relationship according to the acquired expert information and organization information;

the extracting unit 24 may be configured to, if yes, extract only a corresponding relationship between the first expert and the first organization, and store the corresponding relationship in the expert-organization relationship table;

the extracting unit 24 may be further configured to, if the expert is not present, directly extract the corresponding relationship between the expert and the organization, and store the corresponding relationship in the expert-organization relationship table.

Further, the apparatus further comprises:

the clustering unit 25 may be configured to perform disambiguation processing on the obtained entity to be disambiguated based on a similarity clustering algorithm.

Further, the constructing unit 21 may be further configured to generate an expert relationship graph by using a gephi tool, the cooperative relationship model, and the pre-created expert-institution relationship table.

It should be noted that other corresponding descriptions of the functional modules related to the entity disambiguation apparatus provided in the embodiment of the present invention may refer to the corresponding descriptions of the method shown in fig. 1, and are not described herein again.

Based on the method shown in fig. 1, correspondingly, an embodiment of the present invention further provides a storage medium, where at least one executable instruction is stored in the storage medium, and the executable instruction causes a processor to perform the following steps: constructing an expert relation map according to the cooperation relation model and a pre-established expert-mechanism relation table; and carrying out disambiguation processing on the expert relation map by using an integral analysis method.

Based on the above embodiments of the method shown in fig. 1 and the apparatus shown in fig. 5, the embodiment of the present invention further provides a computer device, as shown in fig. 6, including a processor (processor)31, a communication Interface (communication Interface)32, a memory (memory)33, and a communication bus 34. Wherein: the processor 31, the communication interface 32, and the memory 33 communicate with each other via a communication bus 34. A communication interface 34 for communicating with network elements of other devices, such as clients or other servers. The processor 31 is configured to execute a program, and may specifically execute relevant steps in the foregoing entity disambiguation method embodiment. In particular, the program may include program code comprising computer operating instructions. The processor 31 may be a central processing unit CPU or a Specific Integrated circuit asic (application Specific Integrated circuit) or one or more Integrated circuits configured to implement an embodiment of the present invention.

The terminal comprises one or more processors, which can be the same type of processor, such as one or more CPUs; or may be different types of processors such as one or more CPUs and one or more ASICs. And a memory 33 for storing a program. The memory 33 may comprise a high-speed RAM memory, and may further include a non-volatile memory (non-volatile memory), such as at least one disk memory. The program may specifically be adapted to cause the processor 31 to perform the following operations: constructing an expert relation map according to the cooperation relation model and a pre-established expert-mechanism relation table; and carrying out disambiguation processing on the expert relation map by using an integral analysis method.

By the technical scheme, the expert relation map can be constructed according to the cooperation relation model and the pre-established expert-mechanism relation table; and carrying out disambiguation processing on the expert relation map by using an integral analysis method. Therefore, mechanism fields are introduced when the expert relation graph is created, and the constructed sub-network to be disambiguated is processed through the social network relation analysis technology, so that the disambiguation processing of the entity in the knowledge graph is realized to the maximum extent, the workload of later-stage disambiguation is reduced, the efficiency of knowledge graph construction is improved, and human resources are saved.

It will be apparent to those skilled in the art that the modules or steps of the present invention described above may be implemented by a general purpose computing device, they may be centralized on a single computing device or distributed across a network of multiple computing devices, and alternatively, they may be implemented by program code executable by a computing device, such that they may be stored in a storage device and executed by a computing device, and in some cases, the steps shown or described may be performed in an order different than that described herein, or they may be separately fabricated into individual integrated circuit modules, or multiple ones of them may be fabricated into a single integrated circuit module. Thus, the present invention is not limited to any specific combination of hardware and software.

The above description is only a preferred embodiment of the present invention and is not intended to limit the present invention, and various modifications and changes may be made by those skilled in the art. Any modification, equivalent replacement, or improvement made within the spirit and principle of the present invention should be included in the protection scope of the present invention.

13页详细技术资料下载
上一篇:一种医用注射器针头装配设备
下一篇:基于自然语言处理的物品应用分析方法及系统

网友询问留言

已有0条留言

还没有人留言评论。精彩留言会获得点赞!

精彩留言,会给你点赞!