Method for mining key motif of complex network based on multi-attribute decision

文档序号：86222 发布日期：2021-10-08 浏览：34次中文

阅读说明：本技术 一种基于多属性决策的复杂网络关键模体挖掘的方法 (Method for mining key motif of complex network based on multi-attribute decision ) 是由杨云云张辽冯彪郝晓亮谢珺赵文晶任密蜂于 2021-08-03 设计创作，主要内容包括：本发明公开了一种基于多属性决策的复杂网络关键模体挖掘的方法。本发明通过结合模体理论和节点重要性挖掘来综合评估模体的重要性,提出了一种整合多种拓扑属性对网络模体重要性进行识别的方法。通过识别的模体可以揭示重要模体对网络特定结构和功能的影响,本发明从介观层面剖析网络的结构特征,结果可以作为研究和分析复杂系统动力学的辅助工具,特别是对于控制复杂系统具有重要的意义。(The invention discloses a method for mining a key motif of a complex network based on multi-attribute decision. The invention provides a method for identifying the importance of network motifs by integrating various topological attributes by comprehensively evaluating the importance of motifs by combining motif theory and node importance mining. The influence of important motifs on the specific structure and function of the network can be revealed through the identified motifs, the structural characteristics of the network are analyzed from a mesoscopic level, and the result can be used as an auxiliary tool for researching and analyzing the dynamics of a complex system, and particularly has important significance for controlling the complex system.)

1. A method for mining key motifs of a complex network based on multi-attribute decision is characterized by comprising the following steps: the method comprises the following steps:

firstly, establishing an input network model

Assuming the research objects as nodes and the specific relationship between the objects as edges, forming a network model G which is (N, M) and indicates that the network has N nodes and M edges; obtaining an adjacency matrix A (a) according to the information of the points and the edges in the network_ij)_N×NIf there is a link from j to i, let a_ij1, otherwise a_ij＝0；

Secondly, searching the motif structure existing in the network

Firstly, determining the number of nodes contained in a motif to be searched, selecting a motif with three nodes or four nodes for analysis, constructing a plurality of random networks with same degree distribution for a target network, enumerating the number of various subgraphs in the network by adopting an ESU (Enterprise service Unit) method, and comparing the number with the number of the constructed random networks; the sub-graph is evaluated using a Z-score, i.e., a level of significance, formulated asWherein N is_realFor the number of sub-graph occurrences in the target network,for the average of the number of sub-graph occurrences in a random network, denominator sigma_randStandard deviation of sub-graph occurrence times in a plurality of random networks having the same degree distribution as the target network; when Z is>When 0, the subgraph structure is considered as a die body, and the larger the Z value is, the more important the die body is;

third, select the die body structure

Obtaining the Z scores of various motifs of three nodes or four nodes by adopting the method of the second step, selecting the motif type with the highest Z score as a motif target to be analyzed, and listing a node group set with the motif structure;

selecting different node importance indexes to respectively select node degree centrality D (i), betweenness centrality BC (i) and improved K core K (i) according to the topological structure of the complex network;

fifthly, obtaining a die body index matrix

According to the node group set obtained in the step three, respectively obtaining a degree centrality value, an betweenness centrality value and an improved K core average value of each die body by calculating a node importance index average value of each die body, and finally obtaining a die body index matrix W, wherein n rows comprise 3 columns, the n rows represent n die bodies, and the 3 columns respectively represent the degree centrality average value, the betweenness centrality average value and the improved K core average value; by w_ijA jth attribute value representing an ith motif;

sixthly, determining the weight of each index by utilizing entropy

1) And D, according to the die body index matrix obtained in the step five, carrying out normalization processing on the data of each column by adopting a range normalization method:

wherein, w_ijFor the jth attribute of the ith motif,is the minimum value of the jth attribute,is the maximum value of the jth attribute; finally obtaining a normalized matrix B_n×3；

2) Let the weight of the index of the three attributes be μ ═ μ₁,μ₂,μ₃According to the Shannon entropy theory, the entropy of each attribute is calculated according to the following formula:

wherein, b_ijNormalized value of j attribute of i motif, e_jEntropy representing the jth attribute;

3) calculating the weight of each attribute according to the entropy value of each attribute obtained in the step 2):

seventhly, sorting the importance of the model

The importance of each motif was calculated as follows:

the larger the s value obtained by calculating the die body is, the more important the die body is represented; and finally, sequencing the models from large to small, and outputting a set Q.

2. The method for mining the key motifs of the complex network based on the multi-attribute decision as claimed in claim 1, wherein: calculating the degree centrality D and the betweenness centrality BC of each node in the network, and improving the K-shell centrality IKS;

wherein k is_iThe degree of a node i, N is the number of nodes of the network, and the denominator N-1 is the maximum possible value of the node;

wherein g is_jk(i) Number of pieces, g, representing the shortest path between nodes j and k through node i_jkThe total number of all shortest paths from node j to node k;

where KS (i) represents the common K-shell centrality value for node i, iter (i) represents the number of iterations in stripping node i, K_maxA value representing the node of the network that is the highest in the network.

Technical Field

The invention relates to a method for mining a key motif of a complex network, in particular to a method for mining the key motif of the complex network based on multi-attribute decision.

Background

Motifs are important structures that exist in the network and have fundamental modes of interaction. The concept of motifs was first proposed in 2002 by Milo et al. Compared with a subgraph structure which does not belong to the motifs in the network, the motifs locally depict a specific mode of interconnection of a given network, and the motif has an important role in depicting an organization mode of the network and researching the overall composition of the network. In a given network, if a connected subgraph structure occurs more times than in a random network with the same degree distribution and occurs more times than a threshold, the connected subgraph is called a motif.

Motifs are a manifestation of higher-order network structures. Models based on pairwise interactions may not be able to capture complex dependencies between network nodes. And motifs, as a high-level network model, exceed these limitations. The motif can describe the specific mode of the network internal connection more accurately, and simultaneously simplify the network structure. The research of the network motif is also expanded from biology, mathematics to a plurality of subjects such as physics, social science and the like, and a brand new visual angle is provided for understanding a complex system.

Important information in a complex network is mainly researched from the microscopic view of nodes, edges and the like, and a motif can analyze network characteristics from the mesoscopic view. Research shows that a large number of types of motifs exist in an actual network, such as a protein network, the dynamic characteristics or the characteristics of nodes of the network are often dependent on the motif in which the network is located, and different types of motifs in different positions have greatly different roles in network structures or network functions. Therefore, how to accurately and efficiently identify some motifs with key functions is a scientific problem to be solved.

Disclosure of Invention

Based on the fact that the important information is mined mainly aiming at micro information such as nodes, the method considers the characteristics of the network motif and the heterogeneity of the network structure and function, researches the importance of network structure elements by taking the motif as an object from the point of coarse granularity, and provides a network key motif identification method based on multi-attribute decision.

The invention is realized by adopting the following technical scheme:

a complex network key motif identification method based on multi-attribute decision comprises the following steps:

firstly, establishing an input network model: assuming the research objects as nodes and the specific relationship between the objects as edges, forming a network model G which is (N, M) and indicates that the network has N nodes and M edges; obtaining an adjacency matrix A of the network according to the information of the points and the edges of the network (a)_ij)_N×NIf there is a link from j to i, let a_ij1, otherwise a_ij＝0。

Secondly, searching motif structures existing in the network: firstly, determining the number of nodes contained in a motif to be searched, selecting a motif with three nodes or four nodes for analysis, constructing a plurality of random networks with same degree distribution for a target network, enumerating the number of various subgraphs in the network by adopting an ESU method, and comparing the number with the number of the constructed random networks. The sub-graph is evaluated using a Z-score, i.e., a level of significance, formulated asWherein N is_realFor the number of sub-graph occurrences in the target network,for the average of the number of sub-graph occurrences in a random network, denominator sigma_randAnd standard deviation of the number of sub-graph occurrences in a plurality of random networks having the same degree distribution as the target network. When Z is>And when 0, the subgraph structure is considered as a motif, and the greater the Z value, the more important the motif is.

Thirdly, selecting a die body structure: and step two, obtaining the Z scores of all types of motifs of three nodes or four nodes by adopting a method of step two, selecting the motif type with the highest Z score as a motif target to be analyzed, and listing a node group set with the motif structure.

Selecting different node importance indexes according to the topological structure of the complex network:

selecting node degree centrality D (i), betweenness centrality BC (i) and improved K core K (i) respectively.

Centrality (D): considering the influence of the number of first-order neighbors of the node on the node;

mesomeric centrality (BC): one of the metrics for centrality of the network graph based on shortest paths is that if a node is the must-route for communication between other pairs of nodes in the network, it has an important position in the network. The higher the value of the betweenness centrality of the node is, the more important the node is;

improving a K nucleus: the common K-shell provides coarse-grained division of node importance, and finally, nodes in the interior have larger centrality by peeling the nodes layer by layer. Compared with a general K-shell algorithm, the improved K-shell algorithm has more detailed division, so that nodes in the same layer have different importance.

Fifthly, obtaining a die body index matrix

1) Calculating the degree centrality (D) and Betweenness Centrality (BC) of each node in the network, and improving the K-shell centrality (IKS);

wherein k is_iIs the degree of node i, N is the number of nodes in the network, and the denominator (N-1) is the maximum possible value of the nodes.

Wherein g is_jk(i) A number representing the shortest path between nodes j and k through node i; g_jkIs the total number of all shortest paths from node j to node k.

Where KS (i) represents the common K-shell centrality value for node i, iter (i) represents the number of iterations in stripping node i, K_maxA value representing the node of the network that is the highest in the network.

2) Calculating the degree centrality (D) and the Betweenness Centrality (BC) of each die body, and improving the K-shell centrality (IKS);

and according to the node group set obtained in the step three, combining the three importance indexes of each node obtained in the step 1), respectively averaging the three centralities of all the nodes in each node group, and taking the centrality values as the degree centrality value, the betweenness centrality value and the improved K-kernel average value of each motif.

3) Obtaining a matrix of the mold body indexes

According to the three index values of the motifs, each motif is arranged in a row, and the three indexes of each motif are respectively shown in a column. And finally obtaining a motif index matrix W, wherein n rows and 3 columns are provided, the n rows represent n motifs, and the 3 columns respectively represent the degree centrality average value, the betweenness centrality average value and the improved K kernel average value. By w_ijRepresenting the jth attribute value of the ith motif.

Sixthly, determining the weight of each index by utilizing entropy

1) And D, normalizing the data of each column according to the motif index matrix obtained in the step five, wherein a range normalization method is adopted:

wherein, w_ijFor the jth attribute of the ith motif,is the minimum value of the jth attribute,is the maximum value of the jth attribute; finally obtaining a normalized matrix B_n×3。

2) Is provided with threeThe weight of the index of the attribute is expressed as μ ═ μ₁,μ₂,μ₃According to the shannon entropy theory, the entropy of each attribute can be calculated according to the following formula:

wherein, b_ijNormalized value of j attribute of i motif, e_jIndicating the entropy of the jth attribute.

3) Calculating the weight of each attribute according to the entropy value of each attribute obtained in the step 2):

seventhly, sorting the importance of the model

The importance of each motif was calculated as follows:

the larger the s value obtained by calculating the die body is, the more important the die body is represented; and finally, sequencing the models from large to small, and outputting a set Q.

On the basis of the existing theoretical research, the method further combines the related knowledge of the network motifs, and analogizes and popularizes the research method of the importance of the network nodes to the mining of the key motifs in the network. The invention starts from three attributes of node degree centrality, betweenness centrality and position factor to realize mathematical description of network model importance, analyzes the structural characteristics of the network from the mesoscopic perspective and provides a new visual angle for further research and analysis of complex network dynamics. Meanwhile, the invention comprehensively considers a plurality of attributes influencing the importance of the die body, avoids the defect caused by analyzing the problem only from a single angle, and improves the reliability of the research result.

The method is reasonable in design, the importance of the motif is comprehensively evaluated by combining the motif theory and node importance mining, the method for identifying the importance of the network motif by integrating various topological attributes is provided, and the method has good practical application value.

Drawings

FIG. 1 shows a flow diagram of key node cluster mining.

Fig. 2 shows an exemplary diagram of a small undirected network.

FIG. 3 is a diagram illustrating an example of a three-node undirected motif.

FIG. 4 shows an example graph of an ESU method enumeration subgraph.

Detailed Description

The following detailed description of specific embodiments of the invention refers to the accompanying drawings.

A method for mining key motifs of a complex network based on multi-attribute decision-making, as shown in fig. 1, includes the following steps:

taking the small network in fig. 2 as an example, a network model G ═ N, M is formed, which indicates that there are N nodes and M edges in the network; obtaining an adjacency matrix A of the network according to the information of the points and the edges of the network (a)_ij)_N×NIf there is a link from j to i, let a_ij1, otherwise a_ij＝0。

Secondly, searching the motif structure existing in the network

The number of nodes included in a motif to be searched is determined, and because the motif searching calculation amount is large, a three-node or four-node motif is generally selected for analysis. A plurality of random networks with the same degree distribution are built for a target network, the number of various subgraphs in the network is enumerated by adopting an ESU method, and the ESU method mainly comprises the four steps of network building, subgraph enumeration, subgraph isomorphism comparison and subgraph result statistics. Specifically, after nodes are numbered, pairs are given based on serial number valuesAnd constructing a graph, wherein only the neighbor nodes larger than the node are recorded in the process so as to prevent the subgraph from being repeated in the searching process, and the neighbor nodes meeting the conditions of the newly added node are also added into the generation searching set. FIG. 4 is a specific example of an ESU method enumeration subgraph. And obtaining a subgraph statistical result and then comparing the subgraph statistical result with the constructed random network. The sub-graph is evaluated using a Z-score, i.e., a level of significance, formulated asWherein N is_realFor the number of sub-graph occurrences in the target network,for the average of the number of sub-graph occurrences in a random network, denominator sigma_randAnd standard deviation of the number of sub-graph occurrences in a plurality of random networks having the same degree distribution as the target network. When Z is>When 0, the subgraph structure is considered to be a motif. The larger the Z value is, the more important the die body is; the Z-scores for the two motifs calculated are shown in table 1.

TABLE 1Z scores for two classes of motifs

Third, select the die body structure

And D, obtaining the Z scores of the three-node die bodies according to the method in the step two, and selecting the die body structure with the highest Z score as a target, namely the fully-closed triangular die body structure. And lists the three-node group set with the phantom structure.

Fourthly, selecting different node importance indexes according to the topological structure of the complex network

Selecting node degree centrality D (i), betweenness centrality BC (i) and improved K core K (i) respectively.

Centrality (D): the influence of the number of first-order neighbors of the node on the node is considered.

Mesomeric centrality (BC): one of the metrics for centrality of the network graph based on shortest paths is that if a node is the must-route for communication between other pairs of nodes in the network, it has an important position in the network. The higher the value of the centrality of the node betweenness, the more important the node is.

Improving a K nucleus: the common K-shell provides coarse-grained division of node importance, and finally, nodes in the interior have larger centrality by peeling the nodes layer by layer. Compared with a general K-shell algorithm, the improved K-shell algorithm has more detailed division, so that nodes in the same layer have different importance.

Fifthly, obtaining a die body index matrix

1) Calculating the degree centrality (D) and Betweenness Centrality (BC) of each node in the network, and improving the K-shell centrality (IKS);

wherein k is_iIs the degree of node i, N is the number of nodes in the network, and the denominator (N-1) is the maximum possible value of the nodes.

Wherein g is_jk(i) A number representing the shortest path between nodes j and k through node i; g_jkIs the total number of all shortest paths from node j to node k.

2) Calculating the degree centrality (D) and the Betweenness Centrality (BC) of each die body, and improving the K-shell centrality (IKS);

and according to the node group set obtained in the step three, combining the three importance indexes of each node obtained in the step 1), respectively averaging the three centralities of the three nodes in each node group, and taking the centrality values as the degree centrality value, the betweenness centrality value and the improved K-kernel average value of each motif.

3) Obtaining a matrix of the mold body indexes

Sixthly, determining the weight of each index by utilizing entropy

1) And D, according to the die body index matrix obtained in the step five, carrying out normalization processing on the data of each column by adopting a range normalization method:

wherein, w_ijFor the jth attribute of the ith motif,is the minimum value of the jth attribute,is the maximum value of the jth attribute; finally obtaining a normalized matrix B_n×3。

2) Let the weight of the index of the three attributes be μ ═ μ₁,μ₂,μ₃According to the shannon entropy theory, the entropy of each attribute can be calculated according to the following formula:

wherein, b_ijIs the j attribute of the i motifNormalized value of e_jIndicating the entropy of the jth attribute.

3) Calculating the weight of each attribute according to the entropy value of each attribute obtained in the step 2):

seventhly, sorting the importance of the model

The importance of each motif was calculated as follows:

the greater the value of s calculated for a motif, the more important it is to represent that motif. Table 2 shows the significance values for each motif. And finally, sequencing the models from large to small, and outputting a set Q.

TABLE 2 significance values for individual motifs

Number of die	Composite importance value
		(7，8，10)	20.0073
(10，12，13)	11.6936
		(10，12，15)	14.9379
(10，11，12)	11.6936
		(11，12，13)	2.6178
(10，11，13)	11.2309

Eighthly, in order to verify the effectiveness of the method, the result of the method and the discrete degree D of the network after the third-order motif is deleted_FA comparison is made. The deleting method comprises the following steps: and deleting nodes forming a motif in the original network, and deleting connecting edges containing the nodes. Dispersion D_FCan be calculated from the following formula:

wherein d is_ijRepresenting the shortest path between nodes i and j, D_F∈[0，1]And D is_FThe larger the value, the greater the degree of dispersion of the network. And 3, comparing the calculated network dispersion after deleting the motif with the importance ranking obtained by the method. It can be seen that both have consistency.

TABLE 3 comparison of network dispersion after deletion of motifs with the results of the method

The invention can reveal the influence of important motifs on the specific structure and function of the network through the identified motifs, analyzes the structural characteristics of the network from the mesoscopic level, and can be used as an auxiliary tool for researching and analyzing the dynamics of a complex system as a result, thereby having important significance for controlling the complex system.

Finally, it should be noted that the above embodiments are only for illustrating the technical solutions of the present invention and not for limiting, and although the detailed description is made with reference to the embodiments of the present invention, it should be understood by those skilled in the art that modifications or equivalent substitutions may be made on the technical solutions of the present invention without departing from the spirit and scope of the technical solutions of the present invention, which shall be covered by the claims of the present invention.

13页详细技术资料下载

Method for mining key motif of complex network based on multi-attribute decision

相关技术

网友询问留言