Cross-network entity identification method, device, electronic equipment and medium

文档序号：1378990 发布日期：2020-08-14 浏览：6次中文

阅读说明：本技术 跨网络实体的识别方法、装置、电子设备及介质 (Cross-network entity identification method, device, electronic equipment and medium ) 是由常文睿冯天恒于 2020-04-16 设计创作，主要内容包括：本说明书实施例公开了一种跨网络实体的识别方法,根据多个实体及实体间的关联关系,构建异构网络,其中,所述多个实体位于多个网络中；从所述异构网络中选取一个实体作为目标实体,基于链路预测算法对目标实体进行召回识别,从所述异构网络中识别出召回实体,其中,所述目标实体和所述召回实体位于所述多个网络中的不同网络中；基于已创建的实体识别模型对所述召回实体进行识别,识别出所述召回实体与所述目标实体是否为同一个实体。(The embodiment of the specification discloses a cross-network entity identification method, which is used for constructing a heterogeneous network according to a plurality of entities and incidence relations among the entities, wherein the entities are positioned in a plurality of networks; selecting an entity from the heterogeneous networks as a target entity, performing recall identification on the target entity based on a link prediction algorithm, and identifying a recall entity from the heterogeneous networks, wherein the target entity and the recall entity are located in different networks of the plurality of networks; and identifying the recalling entity based on the established entity identification model, and identifying whether the recalling entity and the target entity are the same entity.)

1. A method of identification across network entities, comprising:

constructing a heterogeneous network according to a plurality of entities and incidence relations among the entities, wherein the entities are positioned in a plurality of networks;

selecting an entity from the heterogeneous networks as a target entity, performing recall identification on the target entity based on a link prediction algorithm, and identifying a recall entity from the heterogeneous networks, wherein the target entity and the recall entity are located in different networks of the plurality of networks;

and identifying the recalling entity based on the established entity identification model, and identifying whether the recalling entity and the target entity are the same entity.

2. The method of claim 1, wherein the selecting an entity from the heterogeneous network as a target entity, the recalling identifying the target entity based on a link prediction algorithm, and the identifying the recalling entity from the heterogeneous network comprises:

selecting the target entity from the heterogeneous network;

predicting a link between the target entity and each alternative entity through a link prediction algorithm, wherein each alternative entity is disconnected with the target entity in the heterogeneous network;

determining the recall entity from all alternative entities based on a similarity between the target entity and each alternative entity, wherein the recall entity and the target entity are located in different networks of the plurality of networks.

3. The method of claim 2, the determining the recall entity from all alternative entities based on a similarity between the target entity and each alternative entity, comprising:

predicting a link between the target entity and each alternative entity through a link prediction algorithm to obtain the similarity between the target entity and each alternative entity;

and selecting the alternative entity corresponding to the similarity greater than the set similarity as the recalling entity.

4. The method of any of claims 1-3, the identifying the recall entity based on the created entity identification model, the identifying whether the recall entity and the target entity are the same entity, comprising:

if the number of the recalling entities is 1, identifying the recalling entities by using the entity identification model, and identifying whether the recalling entities and the target entity are the same entity;

if the number of the recalling entities is more than or equal to 2, screening the recalling entities through a pre-established entity screening model to screen out the recalling entity with the highest credibility; and identifying the screened recalling entity by using the entity identification model, and identifying whether the screened recalling entity and the target entity are the same entity.

5. The method of claim 4, after identifying a recall entity from the heterogeneous network, the method further comprising:

acquiring a recall network between the target entity and the recall entity;

and screening the edges in the recall network based on preset edge strength to obtain the screened recall network.

6. The method of claim 5, after obtaining the filtered recall network, further comprising:

grading each link between a target entity and a recall entity in the screened recall network to obtain the grade of each link between the target entity and the recall entity;

and acquiring a target link between the target entity and the recall entity according to the level of each link between the target entity and the recall entity, wherein the level of the target link is greater than a preset level.

7. The method of claim 6, the identifying the recall entity based on the created entity identification model, identifying whether the recall entity and the target entity are the same entity, comprising:

if the number of the recalling entities is 1, identifying a target link between the target entity and the recalling entity by using the entity identification model, and identifying whether the recalling entity and the target entity are the same entity;

and if the number of the recalling entities is more than or equal to 2, identifying a target link between the target entity and the screened recalling entity by using the entity identification model, and identifying whether the recalling entity and the target entity are the same entity.

8. An apparatus for identification across network entities, comprising:

the heterogeneous network construction unit is used for constructing a heterogeneous network according to a plurality of entities and incidence relations among the entities, wherein the entities are positioned in a plurality of networks;

a recall entity acquiring unit, configured to select an entity from the heterogeneous networks as a target entity, perform recall identification on the target entity based on a link prediction algorithm, and identify a recall entity from the heterogeneous networks, where the target entity and the recall entity are located in different networks of the multiple networks;

and the identification unit is used for identifying the recall entity based on the established entity identification model and identifying whether the recall entity and the target entity are the same entity.

9. The apparatus of claim 8, the recalling entity obtaining unit is configured to select the target entity from the heterogeneous network; predicting a link between the target entity and each alternative entity through a link prediction algorithm, wherein each alternative entity is disconnected with the target entity in the heterogeneous network; determining the recall entity from all alternative entities based on a similarity between the target entity and each alternative entity, wherein the recall entity and the target entity are located in different networks of the plurality of networks.

10. The apparatus of claim 9, the recalling entity obtaining unit is configured to predict a link between the target entity and each candidate entity through a link prediction algorithm, so as to obtain a similarity between the target entity and each candidate entity; and selecting the alternative entity corresponding to the similarity greater than the set similarity as the recalling entity.

11. The apparatus according to any one of claims 8-10, wherein the identifying unit is configured to identify the recalling entity using the entity identification model if the number of the recalling entities is 1, and identify whether the recalling entity and the target entity are the same entity; if the number of the recalling entities is more than or equal to 2, screening the recalling entities through a pre-established entity screening model to screen out the recalling entity with the highest credibility; and identifying the screened recalling entity by using the entity identification model, and identifying whether the screened recalling entity and the target entity are the same entity.

12. The apparatus of claim 11, further comprising:

a recall network screening unit, configured to acquire a recall network between the target entity and the recall entity after identifying the recall entity from the heterogeneous network; and screening the edges in the recall network based on preset edge strength to obtain the screened recall network.

13. The apparatus of claim 12, further comprising:

the link grading unit is used for grading each link between a target entity and a recall entity in the screened recall network after the screened recall network is obtained, so as to obtain the grade of each link between the target entity and the recall entity; and acquiring a target link between the target entity and the recall entity according to the level of each link between the target entity and the recall entity, wherein the level of the target link is greater than a preset level.

14. The apparatus of claim 13, the identifying unit configured to identify a target link between the target entity and the recall entity using the entity identification model if the number of the recall entities is 1, and identify whether the recall entity and the target entity are the same entity; and if the number of the recalling entities is more than or equal to 2, identifying a target link between the target entity and the screened recalling entity by using the entity identification model, and identifying whether the recalling entity and the target entity are the same entity.

15. An electronic device comprising a memory, a processor and a computer program stored on the memory and executable on the processor, the processor implementing the steps of the method of any one of claims 1-7 when executing the program.

16. A computer-readable storage medium, on which a computer program is stored which, when being executed by a processor, carries out the steps of the method of any one of claims 1 to 7.

Technical Field

The embodiments of the present disclosure relate to the field of block chain processing technologies, and in particular, to a method and an apparatus for identifying a cross-network entity, an electronic device, and a medium.

Background

With the rapid development of mobile electronic devices, more and more applications are applied to the mobile electronic devices, and an application internal network usually stores a plurality of entities, such as enterprises, stores, users, and the like, and a plurality of entities also exist in an external network of the application, and some of the same entities exist in the internal network and the external network.

Disclosure of Invention

Embodiments of the present specification provide a method, an apparatus, an electronic device, and a medium for identifying an inter-network entity, which can identify whether two entities of different networks are the same entity, and improve identification accuracy.

A first aspect of an embodiment of the present specification provides a method for identifying a cross-network entity, including:

constructing a heterogeneous network according to a plurality of entities and incidence relations among the entities, wherein the entities are positioned in a plurality of networks;

and identifying the recalling entity based on the established entity identification model, and identifying whether the recalling entity and the target entity are the same entity.

A second aspect of embodiments of the present specification provides an apparatus for identifying a cross-network entity, including:

The third aspect of the embodiments of the present specification further provides an electronic device, which includes a memory, a processor, and a computer program stored in the memory and executable on the processor, where the processor implements the steps of the above method for identifying a network entity when executing the program.

The fourth aspect of the embodiments of the present specification further provides a computer-readable storage medium, on which a computer program is stored, which when executed by a processor, performs the steps of the above method for identifying an inter-network entity.

The beneficial effects of the embodiment of the specification are as follows:

based on the technical scheme, a heterogeneous network is constructed according to a plurality of entities and incidence relations among the entities, wherein the entities are positioned in the networks; selecting an entity from a heterogeneous network as a target entity, performing recall identification on the target entity based on a link prediction algorithm, and identifying the recall entity from the heterogeneous network; identifying a recall entity based on an entity identification model, and identifying whether the recall entity and the target entity are the same entity; therefore, the target entity is recalled and identified through the link prediction algorithm, the link prediction algorithm can accurately predict the similarity index between the target entity and other entities, recall and identification are carried out on the basis that the similarity index is more accurate, the matching degree of the identified recall entity and the target entity is higher, entity identification is carried out on the basis that the matching degree is higher, and the identification accuracy is improved accordingly; and the recalling entity and the target entity are positioned in different networks of the heterogeneous network, so that the identification accuracy can be improved on the basis of cross-network identification of the entity.

Drawings

Fig. 1 is a flow chart of a method of identifying across network entities in an embodiment of the present description;

fig. 2 is a schematic structural diagram of a heterogeneous network in an embodiment of the present disclosure;

FIG. 3 is a schematic diagram of a recall network in an embodiment of the present disclosure;

FIG. 4 is a schematic structural diagram of a recalled network after being filtered in an embodiment of the present specification;

fig. 5 is a schematic structural diagram of an identification device across network entities in an embodiment of the present specification;

fig. 6 is a schematic structural diagram of an electronic device in an embodiment of this specification.

Detailed Description

In order to better understand the technical solutions, the technical solutions of the embodiments of the present specification are described in detail below with reference to the drawings and specific embodiments, and it should be understood that the specific features of the embodiments and embodiments of the present specification are detailed descriptions of the technical solutions of the embodiments of the present specification, and are not limitations of the technical solutions of the present specification, and the technical features of the embodiments and embodiments of the present specification may be combined with each other without conflict.

In a first aspect, as shown in fig. 1, an embodiment of the present specification provides a method for identifying a cross-network entity, including:

s102, constructing a heterogeneous network according to a plurality of entities and incidence relations among the entities, wherein the entities are located in the networks;

s104, selecting an entity from the heterogeneous network as a target entity, performing recall identification on the target entity based on a link prediction algorithm, and identifying the recall entity from the heterogeneous network, wherein the target entity and the recall entity are located in different networks of the plurality of networks;

s106, identifying the recalling entity based on the established entity identification model, and identifying whether the recalling entity and the target entity are the same entity.

In the embodiment of the specification, the entity can be a natural person, an enterprise, an account, a shop, a mobile phone number, equipment and the like; the association relationship between entities may include membership, equity, legal person and link, etc.

The identification method in embodiments of the present description may associate and identify an external entity with an internal entity, such as a natural person in a business with a payment account for a financial application without credential information. The financial application may be, for example, a treasure, a letter, a party application, etc. Moreover, the same name of the natural person is very much in the financial application, and the natural person information and data obtained from a large number of external data sources are usually in name units, so that the obtained natural person information needs to be mounted and accurately identified with the user account or identity information. Similar situations may occur for other entities such as businesses, institutions, products, stores, etc. due to ambiguous names, duplicate names, etc.

When step S102 is executed, first a plurality of entities included in each of the two networks are obtained, then all the entities included in the two networks are entity-aligned, so as to obtain aligned entities, and obtain unaligned entities; and connecting the unaligned entity and the aligned entity in series to construct the heterogeneous network. Specifically, in the entity alignment process, if an entity in a certain network is aligned with an entity in another network, it is characterized that the entity in the network and the entity in the other network are the same entity.

In the embodiment of the present specification, the plurality of networks include an external network and an internal network, and of course, the plurality of networks may also include a plurality of external networks and one internal network; further, in the embodiments of the present specification, a plurality means 2 or a number of 2 or more.

For example, by analyzing external data sources, i.e., external networks, entities including natural people a1, a2, a3, a4, a5, a6, a8, and a10, and stores b1, b3, b4, and b5 are extracted; and by analyzing the internal database of the financial application, i.e. the internal network, extracting entities including natural people a3, a5, a6, a7, a8, a9 and a10, and stores b1, b2, b3, b4 and b 6; then, each entity in the external network is entity-aligned with each entity in the internal network to obtain aligned entities and unaligned entities, and then a1, a2, a3, a4, a5, a6, a7, b8, a9 and a10 are connected in series with b1, b2, b3, b4, b5 and b6 to obtain a heterogeneous network, as shown in fig. 2.

In the embodiment of the present specification, each node in the heterogeneous network corresponds to one entity, so that the nodes and the entities correspond to one another.

After the heterogeneous network is obtained, step S104 is performed.

And, in step S104, each entity in the heterogeneous network may be used as a target entity, or one entity may be randomly selected from the heterogeneous network as a target entity; of course, the corresponding entity may also be selected from the heterogeneous network as the target entity according to the user selection operation, and the description is not limited in particular.

After the target entity is obtained, the target entity is recalled and identified in the heterogeneous network based on a link prediction algorithm. Specifically, after the target entity is obtained, the link between the target entity and each candidate entity is predicted through a link prediction algorithm, so that a similarity index between the target entity and each candidate entity is predicted, and according to the similarity index between the target entity and each candidate entity, the similarity between the target entity and each candidate entity can be obtained; and determining a recall entity from all the alternative entities based on the similarity between the target entity and each alternative entity, wherein each alternative entity is not connected with the target entity in the heterogeneous network, and the recall entity and the target entity are positioned in different networks of the plurality of networks.

Specifically, after the target entity is obtained, each alternative entity unconnected with the target entity is obtained from the heterogeneous network according to path search; at this time, a link between the target entity and each alternative entity may be acquired; then, predicting the link between the target entity and each alternative entity through a link prediction algorithm to obtain the similarity between the target entity and each alternative entity; and selecting an alternative entity corresponding to the similarity greater than the set similarity as a recall entity, wherein the target entity and the recall entity are positioned in different networks of the plurality of networks. For example, the plurality of networks include an external network and an internal network, in this case, the target entity may be located in any one of the external network and the internal network, and the recall entity is located in another one of the plurality of networks, for example, if the target entity is located in the external network, the recall entity is located in the internal network; and if the target entity is located in the internal network, the recall entity is located in the external network.

In the embodiment of the present specification, the set similarity may be set by a user or a device, or may be set according to actual requirements, and the set similarity may be, for example, 90%, 95%, 98%, 99%, or the like.

For example, as shown in fig. 2, if the target entity is a4, each alternative entity corresponding to a4 obtained from the heterogeneous network includes a1, a2, a3, a8, a9, and a 10; obtaining links between a4 and each alternative entity as a4-a1, a4-a2, a4-a3, a4-a8, a4-a9 and a4-a10 in turn, and predicting the 5 links of a4-a1, a4-a2, a4-a3, a4-a8, a4-a9 and a4-a10 by using a link prediction algorithm to obtain the similarity between a4 and a1, a2, a3, a8, a9 and a10 as a41, a42, a43, a48, a49 and a410 in turn; if the similarity is set to 95%, a41, a42, a43, a48, a49 and a410 detect only a43> 95%, then the recall entity is determined to be a 3.

And after the recall entity is determined, acquiring a recall network between the target entity and the recall entity, and screening edges in the recall network based on the preset edge strength to obtain the screened recall network.

An edge in the embodiment of the present specification refers to an edge between two connected entities in a heterogeneous network and a recall network; further, the model evaluation index may include accuracy, precision, recall, and the like.

In another embodiment of the present description, when determining the preset edge strength, a link strength model may be created for a heterogeneous network, and the strength of each link in the heterogeneous network is used as training data to perform model training, so as to obtain a link strength model; evaluating the link strength model through the model evaluation index, adjusting the link strength model according to an evaluation result to obtain an adjusted link strength model, wherein the adjusted link strength model meets constraint conditions, and predicting each edge in the heterogeneous network through the adjusted link strength model to obtain the strength of each edge; and determining the preset edge strength according to the strength of each edge.

Specifically, the edges may be sorted according to the intensity of each edge, and the preset edge intensity is determined according to the sorting result; at this time, if the sorting result is that the intensities are sequentially sorted from high to low, the intensities of the sides corresponding to the set proportions of 80%, 70%, 65% and the like can be taken as the preset edge intensities; the average intensity may also be taken according to the intensity of each edge, and the average intensity may be taken as the preset edge intensity. Of course, a value may be set manually or by an apparatus as the preset edge strength, and the present specification is not limited in particular.

Specifically, when the link strength model is created, the model can be established through an edit regression or iterative Decision Tree (GBDT) algorithm to predict whether each link is correct, and the score of the model in the process can be used as the link strength.

Then, predicting each edge in the recall network by using the adjusted link intensity model to obtain the intensity of each edge in the recall network; and then according to the preset edge strength, screening edges in the recall network, deleting the edges with the strength smaller than the preset edge strength in the recall network, and reserving the edges with the strength not smaller than the preset edge strength in the recall network, wherein the network formed by the reserved edges is the recalled network after screening.

Therefore, according to the preset edge intensity, the edge with the intensity smaller than the preset edge intensity in the recall network is deleted, so that an entity with lower accuracy can be removed from the recall network, the recall network can be effectively prevented from being excessively dispersed, and the accuracy of the recall network is ensured.

For example, entity B in the heterogeneous network is taken as a target entity, the entity B is recalled and identified based on a link prediction algorithm, and recalled entities B1 and B2 are identified from the heterogeneous network, so as to obtain a recalled network as shown in fig. 3, where the recalled network includes entities B, B1, B2, C1, C2, C3, C4, and C5, entities B, B1, and B2 are all natural persons, and entities C1, C2, C3, C4, and C5 may be enterprises or businesses, etc. Carrying out model training aiming at the heterogeneous network to obtain an adjusted link strength model; predicting each edge in the recall network by using the adjusted intensity link model to obtain the intensity of each edge in the recall network; comparing the strength of each edge in the recall network with a preset edge strength, and reserving edges having a strength not less than the preset edge strength in the recall network, where the strengths of the edges 40, 41, 42, and 43 in the recall network are all less than the preset edge strength, and deleting the edges 40, 41, 42, and 43 from the recall network to obtain a filtered recall network, which is specifically shown in fig. 4, where the filtered recall network includes entities B, B1, B2, C1, and C2.

Therefore, the strength of the edges in the selected recall network is not less than the preset edge strength, the association strength between the entities in the selected recall network is ensured, the subsequent calculation is carried out under the condition that the association strength between the entities in the selected recall network is higher, and the accuracy of the subsequent calculation is improved accordingly.

In another embodiment of the present disclosure, after the recall network or the filtered recall network is obtained, each link between the target entity and the recall entity in the recall network may be classified to obtain a level of each link between the target entity and the recall entity; acquiring a target link between a target entity and a recall entity according to the level of each link between the target entity and the recall entity, wherein the level of the target link is greater than a preset level; at this time, the target link is a link with a higher level between the target entity and the recall entity, for example, a first level link and a second level link, and the target link is used as a final link between the target entity and the recall entity, wherein the level of the first level link is greater than that of the second level link.

Specifically, each link between the target entity and the recall entity in the recall network may be ranked according to a preset link strength. Specifically, the strength of each link between the target entity and the recall entity may be obtained, and the strength of each link is compared with a preset link strength; and grading each link between the target entity and the recall entity according to the comparison result.

In the embodiment of the present specification, the preset link strength may be set manually or by a device, or may be set according to actual requirements.

In the embodiment of the present specification, the preset level is set according to a first level of a link between the target entity and the recall entity, and the preset level is generally a second level that is smaller than the first level and is not smaller than the link between the target entity and the recall entity. Of course, the prediction level may also be set smaller than the second level, and the present specification is not particularly limited.

For example, taking fig. 4 as an example, the links of the entities B to B1 include links d1, d2 and d3, and taking preset link strength as f as an example, if d3> f > d2> d1, it is determined that d3 is a first-level link, d2 is a second-level link, and d1 is a third-level link; according to the hierarchy of d3, d2, and d1, since the level of d3 is the highest, d3 is determined to be the target link.

If the similarity greater than the set similarity corresponds to a plurality of alternative entities, all the alternative entities are taken as recall entities, and at the moment, the number of the recall entities is more than or equal to 2; and if the similarity greater than the set similarity corresponds to one alternative entity, taking the alternative entity corresponding to the similarity greater than the set similarity as the recall entity.

For example, referring to fig. 2, if the target entity is a3, if the similarity between a3 and a7 is greater than the set similarity, and the similarity between a3 and a6 is also greater than the set similarity, then both a6 and a7 are taken as recall entities.

After the recall entity is identified, step S106 is performed.

And, in step S106, the number of the recalling entities may be determined first, and if the number of the recalling entities is 1, the entity recognition model is directly used to recognize the recalling entities, and it is recognized whether the recalling entities and the target entity are the same entity. If the number of the recalling entities is more than or equal to 2, screening a plurality of recalling entities through a pre-established entity screening model to screen out the recalling entity with the highest credibility; and then, identifying the screened recalling entity by using an entity identification model, and identifying whether the recalling entity and the target entity are the same entity.

When the number of the recalling entities is more than or equal to 2, one recalling entity with the highest reliability is screened from the plurality of recalling entities, and the screened recalling entity is identified by using an entity identification model; each recall entity in the plurality of recall entities does not need to be identified, so that the identification number of the recall entities is reduced, and the identification efficiency can be effectively improved; and the recalling entity with the highest credibility is screened out for identification, so that the higher matching degree between the screened recalling entity and the target entity can be effectively ensured.

In the embodiment of the present specification, edge features and entity features in a historical heterogeneous network may be used as training data to perform model training to obtain an entity screening model, where the edge features include connection relationships between entities, edge strength, the number of links, and the like; the entity characteristics comprise the business type, the business state, the age of natural people, the academic calendar, the type and the level of an organization and the like of the enterprise.

Specifically, the entity screening model may be a two-class model or a multi-class model, and the present specification is not particularly limited.

Thus, after the entity screening model is obtained, the edge characteristics and the entity characteristics of each recall entity in the plurality of recall entities are input into the entity screening model, and the preset credibility of each recall entity in the plurality of recall entities is obtained; and screening out the recalling entity with the highest degree of selectivity according to the preset credibility of each recalling entity. For example, referring to fig. 4, reliability prediction is performed through entity screening models B1 and B2 respectively to obtain the reliability of B1 and the reliability of B2, and if the reliability of B1 is greater than the reliability of B2, a recalling entity with the highest reliability is determined to be B1.

In this embodiment of the present specification, model training may be performed using historical entity data included in a plurality of networks to obtain an entity recognition model, where the entity recognition model may be a deep learning model. For example, entity pairs and non-entity pairs marked in historical entity data included in a plurality of networks can be deeply learned to obtain an entity recognition model.

And when the number of the recalled entities is 1, in the process of identifying the recalled entities by using the entity identification model, all edge features and entity features between the recalled entities and the target entities can be input into the entity identification model, and whether the recalled entities and the target entities are the same entity is identified; of course, the target link between the target entity and the recall entity may also be obtained in a target link obtaining manner, and then the entity recognition model is used to recognize the target link, that is, all edge features and entity features in the target link are input into the entity recognition model, so as to recognize whether the recall entity and the target entity are the same entity.

If the number of the recall entities is more than or equal to 2, after one recall entity with the highest reliability is screened from the plurality of recall entities, identifying the screened recall entity by using an entity identification model, at the moment, obtaining a target link between the target entity and the screened recall entity in a target link obtaining mode, identifying the target link by using the entity identification model, namely inputting all edge characteristics and entity characteristics between the screened target entity and the selected recall entity into the entity identification model, and identifying whether the screened recall entity and the target entity are the same entity; of course, all edge features and entity features in the target link can also be input into the entity identification model, and whether the screened recall entity and the target entity are the same entity is identified.

When the entity recognition model is used for recognition, the data input into the entity recognition model are all edge features and entity features of the target link, and the level of the target link is greater than the preset level, if the level of the link is higher, the reliability of the link is higher, so that the link data of the link with higher reliability can be input into the entity recognition model, and when the reliability of the link data input into the entity recognition model is higher, the reliability of a result recognized by the entity recognition model is higher, so that the accuracy of recognizing whether the recall entity and the target entity are the same entity can be effectively ensured.

In the embodiment of the present specification, after identifying whether the recall entity and the target entity are the same entity by using the entity identification model, if an identified high-reliability entity exists, the identified high-reliability entity may be used as an intermediate node to perform iteration, and a link and a network are continuously extended, so that more entities are identified and recalled, and thus, the identification number of the entity pairs is increased.

The technical scheme adopted by the embodiment of the specification is as follows: constructing a heterogeneous network according to a plurality of entities and incidence relations among the entities, wherein the entities are positioned in the networks; selecting an entity from a heterogeneous network as a target entity, performing recall identification on the target entity based on a link prediction algorithm, and identifying the recall entity from the heterogeneous network; identifying a recall entity based on an entity identification model, and identifying whether the recall entity and the target entity are the same entity; therefore, the target entity is recalled and identified through the link prediction algorithm, so that the matching degree of the identified recall entity and the target entity is higher, the entity identification is carried out on the basis of higher matching degree, and the identification accuracy is improved; and because the recall entity and the target entity are positioned in different networks of a plurality of networks, the identification accuracy can be improved on the basis of cross-network identification of the entities.

In a second aspect, based on the same technical concept, embodiments of the present specification provide an apparatus for identifying a cross-network entity, referring to fig. 5, including:

a heterogeneous network constructing unit 501, configured to construct a heterogeneous network according to a plurality of entities and association relationships between the entities, where the entities are located in a plurality of networks;

a recall entity obtaining unit 502, configured to select an entity from the heterogeneous networks as a target entity, perform recall identification on the target entity based on a link prediction algorithm, and identify a recall entity from the heterogeneous networks, where the target entity and the recall entity are located in different networks of the multiple networks;

an identifying unit 503, configured to identify the recall entity based on the created entity identification model, and identify whether the recall entity and the target entity are the same entity.

In an optional implementation manner, the recalling entity obtaining unit 502 is configured to select the target entity from the heterogeneous network; predicting a link between the target entity and each alternative entity through a link prediction algorithm, wherein each alternative entity is disconnected with the target entity in the heterogeneous network; determining the recall entity from all alternative entities based on a similarity between the target entity and each alternative entity, wherein the recall entity and the target entity are located in different networks of the plurality of networks.

In an optional implementation manner, the recalling entity obtaining unit 502 is configured to predict, through a link prediction algorithm, a link between the target entity and each candidate entity, so as to obtain a similarity between the target entity and each candidate entity; and selecting the alternative entity corresponding to the similarity greater than the set similarity as the recalling entity.

In an optional embodiment, the identifying unit 503 is configured to identify the recalling entity by using the entity identification model if the number of the recalling entities is 1, and identify whether the recalling entity and the target entity are the same entity; if the number of the recalling entities is more than or equal to 2, screening the recalling entities through a pre-established entity screening model to screen out the recalling entity with the highest credibility; and identifying the screened recalling entity by using the entity identification model, and identifying whether the screened recalling entity and the target entity are the same entity.

In an alternative embodiment, the identification device further comprises:

In an optional implementation manner, the identifying unit 503 is configured to identify, if the number of the recall entities is 1, a target link between the target entity and the recall entity using the entity identification model, and identify whether the recall entity and the target entity are the same entity; and if the number of the recalling entities is more than or equal to 2, identifying a target link between the target entity and the screened recalling entity by using the entity identification model, and identifying whether the recalling entity and the target entity are the same entity.

In a third aspect, based on the same inventive concept as the method for identifying a cross-network entity in the foregoing embodiments, an embodiment of the present specification further provides an electronic device, as shown in fig. 6, including a memory 604, a processor 602, and a computer program stored on the memory 604 and executable on the processor 602, where the processor 602, when executing the program, implements the steps of any one of the foregoing methods for identifying a cross-network entity.

Where in fig. 6 a bus architecture (represented by bus 600) is shown, bus 600 may include any number of interconnected buses and bridges, and bus 600 links together various circuits including one or more processors, represented by processor 602, and memory, represented by memory 604. The bus 600 may also link together various other circuits such as peripherals, voltage regulators, power management circuits, and the like, which are well known in the art, and therefore, will not be described any further herein. A bus interface 605 provides an interface between the bus 600 and the receiver 601 and transmitter 603. The receiver 601 and the transmitter 603 may be the same element, i.e., a transceiver, providing a means for communicating with various other apparatus over a transmission medium. The processor 602 is responsible for managing the bus 600 and general processing, and the memory 604 may be used for storing data used by the processor 602 in performing operations.

In a fourth aspect, based on the inventive concept of the cross-network entity identification method in the foregoing embodiments, the present specification further provides a computer-readable storage medium, on which a computer program is stored, which when executed by a processor, implements the steps of any one of the foregoing cross-network entity identification methods.

The description has been presented with reference to flowchart illustrations and/or block diagrams of methods, apparatus (systems), and computer program products according to embodiments of the description. It will be understood that each flow and/or block of the flow diagrams and/or block diagrams, and combinations of flows and/or blocks in the flow diagrams and/or block diagrams, can be implemented by computer program instructions. These computer program instructions may be provided to a processor of a general purpose computer, special purpose computer, embedded processor, or other programmable data processing apparatus to produce a machine, such that the instructions, which execute via the processor of the computer or other programmable data processing apparatus, create means for implementing the functions specified in the flowchart flow or flows and/or block diagram block or blocks.

These computer program instructions may also be stored in a computer-readable memory that can direct a computer or other programmable data processing apparatus to function in a particular manner, such that the instructions stored in the computer-readable memory produce an article of manufacture including instruction means which implement the function specified in the flowchart flow or flows and/or block diagram block or blocks.

These computer program instructions may also be loaded onto a computer or other programmable data processing apparatus to cause a series of operational steps to be performed on the computer or other programmable apparatus to produce a computer implemented process such that the instructions which execute on the computer or other programmable apparatus provide steps for implementing the functions specified in the flowchart flow or flows and/or block diagram block or blocks.

While preferred embodiments of the present specification have been described, additional variations and modifications in those embodiments may occur to those skilled in the art once they learn of the basic inventive concepts. Therefore, it is intended that the appended claims be interpreted as including preferred embodiments and all changes and modifications that fall within the scope of the specification.

It will be apparent to those skilled in the art that various changes and modifications may be made in the present specification without departing from the spirit and scope of the specification. Thus, if such modifications and variations of the present specification fall within the scope of the claims of the present specification and their equivalents, the specification is intended to include such modifications and variations.

15页详细技术资料下载

Cross-network entity identification method, device, electronic equipment and medium

相关技术

网友询问留言