Biological network clustering method and system based on high-order structure

文档序号:70742 发布日期:2021-10-01 浏览:47次 中文

阅读说明:本技术 一种基于高阶结构的生物网络聚类方法和系统 (Biological network clustering method and system based on high-order structure ) 是由 胡伦 张俊 周喜 蒋同海 赵博伟 于 2021-07-03 设计创作,主要内容包括:本发明涉及一种基于高阶结构的生物网络聚类方法和系统,包括网络构建模块、模型构建模块、网络聚类模块、冗余删除模块以及结果展示模块。利用生物网络中丰富的高阶结构信息以识别其中的功能模块,结合高阶马尔可夫随机过程的优势,能够针对各种类型的网络模体进行聚类分析。本发明有着优秀的表现,基于高阶结构信息的聚类结果为生物网络分析提供了新的思路,如重叠蛋白复合物的识别和新信号通路的推断,同时也揭示了生物网络中所呈现的丰富的组织结构。本发明直接作用在蛋白质相互作用网络、基因共表达网络等生物网络上,效果准确度高,是一个非常可靠的生物网络聚类方法和系统。(The invention relates to a biological network clustering method and system based on a high-order structure. The rich high-order structure information in the biological network is utilized to identify the functional modules in the biological network, and the cluster analysis can be carried out on various types of network motifs by combining the advantages of a high-order Markov random process. The invention has excellent performance, provides a new idea for biological network analysis based on the clustering result of high-order structure information, such as identification of overlapping protein complexes and inference of new signal paths, and also discloses rich tissue structures presented in the biological network. The method directly acts on biological networks such as a protein interaction network, a gene coexpression network and the like, has high effect accuracy, and is a very reliable biological network clustering method and system.)

1. A biological network clustering method based on a high-order structure is characterized by comprising the following steps:

a. in the context of biological information, a biological network is represented by a binary group including nodes and links, wherein the nodes are used for representing single biomolecules, and the links are used for describing connection relations between the single biomolecules;

b. constructing a high-order network motif represented by a tensor, applying a random walk theory to the tensor of high-order structure information to form a transition probability tensor, and establishing a high-order Markov chain model;

c. clustering each motif in a group of network motifs, approximately representing a high-order Markov chain by using a first-order Markov chain, clustering by using a Markov clustering algorithm, and adding a clustering result into a set;

d. and c, deleting redundant parts of the clustering results obtained in the step c, and verifying whether clusters in the clustering results are redundant by utilizing the field affinity to obtain final results.

2. A biological network clustering system based on a higher-order structure, the system comprising: the system comprises a network construction module, a model construction module, a network clustering module, a redundancy deletion module and a result display module, wherein:

a network construction module: constructing a biological network into a graph, and representing by using a binary group;

a model construction module: constructing a high-order network motif represented by using a tensor according to a graph in a network construction module, and popularizing a random walk theory to the tensor representing high-order structure information to form a transition probability tensor and construct a high-order Markov chain model;

a network clustering module: clustering each motif in a group of network motifs, deducing an equivalent first-order Markov chain based on spatial random walking static distribution according to a high-order Markov chain in a model construction module, clustering by using a Markov clustering algorithm, and putting a result into a set;

a redundancy deletion module: deleting the redundant part according to the set obtained in the network clustering module to obtain a final result;

and a result display module: and outputting and displaying according to the result obtained by the redundancy deleting module.

Technical Field

The invention relates to the technical field of computer data processing, in particular to a biological network clustering method and system based on a high-order structure.

Background

Clustering in biological networks involves identifying meaningful functional modules from a biological perspective, providing valuable insight for understanding complex biological systems. Most clustering algorithms use low-order connectivity patterns only at individual biological entities and their connectivity levels for clustering analysis. Although the link is a basic unit of the network, considering the low-order connection pattern may not be enough to fully utilize the structural information available in the bio-network, thereby limiting further improvement of the clustering accuracy. Existing clustering techniques utilize low-order ligation modes at the level of individual biomolecules and their ligations, but few techniques allow for high-order ligation modes at the level of small networks or motif structures.

Disclosure of Invention

The invention aims to provide a biological network clustering method and system based on a high-order structure aiming at the current defects and shortcomings, and the biological network clustering method and system based on the high-order structure comprise a network construction module, a model construction module, a network clustering module, a redundancy deletion module and a result display module, wherein the function modules in the biological network are identified by utilizing rich high-order structure information in the biological network, and the clustering analysis can be carried out on various types of network motifs by combining the advantages of a high-order Markov random process. The invention has reliable performance, provides a new idea for biological network analysis based on the clustering result of high-order structure information, such as identification of a protein overlapping complex and inference of a new signal path, and also discloses rich tissue structures presented in the biological network. The method directly acts on biological networks such as a protein interaction network, a gene coexpression network and the like, has high effect accuracy, and is a very excellent biological network clustering method and system.

The invention relates to a biological network clustering method based on a high-order structure, which comprises the following steps:

a. in the context of biological information, a biological network is represented by a binary group including nodes and links, wherein the nodes are used for representing single biomolecules, and the links are used for describing connection relations between the single biomolecules;

b. constructing a high-order network motif represented by a tensor, applying a random walk theory to the tensor of high-order structure information to form a transition probability tensor, and establishing a high-order Markov chain model;

c. clustering each motif in a group of network motifs, approximately representing a corresponding high-order Markov chain by using a first-order Markov chain, clustering by using a Markov clustering algorithm, and adding a clustering result into a set;

d. and c, deleting redundant parts of the clustering results obtained in the step c, and verifying whether clusters in the clustering results are redundant by utilizing the field affinity to obtain final results.

A higher order structure based bio-network clustering system, the system comprising: the system comprises a network construction module, a model construction module, a network clustering module, a redundancy deletion module and a result display module, wherein:

a network construction module: constructing a biological network into a graph, and representing by using a binary group;

a model construction module: according to a diagram in a network construction module, constructing a high-order network motif represented by using a tensor, popularizing a random walk theory to the tensor representing high-order structure information to form a transition probability tensor, and constructing a high-order Markov chain model;

a network clustering module: clustering each motif in a group of network motifs, deducing an equivalent first-order Markov chain from spatial random walking static distribution according to a high-order Markov chain in a model construction module, clustering by using a Markov clustering algorithm, and putting a result into a set;

a redundancy deletion module: deleting the redundant part according to the set obtained in the network clustering module to obtain a final result;

and the result display module is used for outputting and displaying the result obtained by the redundancy deletion module.

The invention relates to a biological network clustering method and system based on a high-order structure. The network construction module constructs a biological network into an image, the model construction module uses tensor to represent a high-order network motif based on the image, a random walk theory is applied to the tensor representing high-order structure information to form a transition probability tensor, a corresponding high-order Markov chain model is constructed, the network clustering module clusters each biomolecule in the network, an equivalent first-order Markov chain is deduced from spatial random walk static distribution of the high-order Markov chain, a Markov clustering algorithm is used for clustering, the result is stored into a set of sets, the redundancy deletion module deletes the redundancy part of the result obtained in the network clustering module, and the content display module outputs and displays the final clustering result. The method directly acts on biological networks such as a protein interaction network, a gene coexpression network and the like, has high effect accuracy, and is a very excellent biological network clustering method and system.

Compared with the prior art, the invention has the following beneficial technical effects:

the invention relates to a biological network clustering method and system based on a high-order structure, which aims to effectively perform clustering processing on a biological network. The method has the advantages that the high-order structure information available in the biological network is fully utilized, so that the clustering precision is further improved; corresponding clustering results are identified for each network motif, and then redundancy is deleted through a post-processing step, so that the probability of finding overlapping clusters is increased. The method solves the defects existing in the prior art during biological network clustering analysis.

Drawings

FIG. 1 is a logical block diagram of the present invention;

FIG. 2 is a diagram of three representative motifs in a biological network of the present invention, wherein a is a triangular motif; b is a feedback die body; c is a quadrilateral die body.

Detailed Description

In order to make the objects, technical solutions and advantages of the present invention more apparent, the present invention will be described in further detail below with reference to the accompanying drawings and examples.

Examples

a. In the context of biological information, a biological network is represented by a binary group including nodes and links, wherein the nodes are used for representing single biomolecules, and the links are used for describing connection relations between the single biomolecules;

b. constructing a high-order network motif represented by a tensor, applying a random walk theory to the tensor of high-order structure information to form a transition probability tensor, and establishing a high-order Markov chain model;

c. clustering each motif in a group of network motifs, approximately representing a corresponding high-order Markov chain by using a first-order Markov chain, clustering by using a Markov clustering algorithm, and adding a clustering result into a set;

d. c, deleting redundant parts of the clustering results obtained in the step c, and verifying whether clusters in the clustering results are redundant by utilizing the domain affinity to obtain final results;

a higher order structure based bio-network clustering system, the system comprising: the system comprises a network construction module, a model construction module, a network clustering module, a redundancy deletion module and a result display module, wherein:

a network construction module: constructing a biological network into a graph, and representing by using a binary group;

a model construction module: constructing a high-order network motif represented by using a tensor according to a graph in a network construction module, and popularizing a random walk theory to the tensor representing high-order structure information to form a transition probability tensor and construct a high-order Markov chain model;

a network clustering module: clustering each motif in a group of network motifs, deducing an equivalent first-order Markov chain from spatial random walking static distribution according to a high-order Markov chain in a model construction module, clustering by using a Markov clustering algorithm, and putting a result into a set;

a redundancy deletion module: deleting the redundant part according to the set obtained in the network clustering module to obtain a final result;

the result display module is used for outputting and displaying the result obtained by the redundancy deletion module;

as shown in fig. 1:

a network construction module:

abstracting single molecules in a biological network into nodes in a graph, and abstracting the connection between the nodes into links in the graph, wherein the nodes are represented by a two-tuple G ═ { V, E }, in which V ═ V ═ E }i}(1≤i≤nV) Is all nVSet of nodes, E ═ E { (E) }i}(1≤i≤nE) Is all nEA set of bar links;

a model construction module:

to be able to mathematically describe higher-order network motifs, the concept of tensor is introduced, using underlined capital lettersTRepresenting tensor, using lower case letters with underlinestRepresenting elements in a tensor, a triangular motif can be used as a trimodal tensorIs represented by the formula (I) in which n1、n2And n3Corresponding to the number of elements in different dimensions, according to the dyad G, the three-mode tensorTIs defined as:

T=(t(i,j,k)) (1)

wherein 1 is more than or equal to i, j, k is more than or equal to nVAnd an

When t (i, j, k) is 1, the node v is describedi、vjAnd vkA triangle can be formed between the two, and the high-order network motif with any structure can be easily expanded only by correspondingly adjusting the definition of the tensor; in fig. 2, three representative motifs common in biological networks are shown, wherein a is a triangular motif; b is a feedback die body; c is a quadrilateral die body;

secondly, applying a random walk theory to a tensor representing high-order structure information in G to obtain a transition probability tensor, and using the transition probability tensorIs shown in whichPElement (1) ofp(i, j, k) is a move to node viDepending on the current node vjAnd the previous node vkDefined as:

p(i,j,k)=Prob(Zt+1=vi|Zt=vj,Zt-1=vk) (3)

Ztthe node representing the access at time t, according to the doublet G,p(i, j, k) can be calculated by the following equation;

in the case of the equation (4),Pis column-random and can therefore be viewed as having a state of Zt+1、ZtAnd Zt-1Given a current state Zt=vjAnd the last state Zt-1=vkFrom and vjAnd vkSelecting a next state to be accessed from the nodes of the formed triangle;

a network clustering module:

the network clustering module is mainly divided into two parts, firstly, a high-order Markov chain is converted into an equivalent first-order Markov chain, and according to the theory of space random walk, when a process accesses Z at the time of ttWithout taking into account its penultimate state, i.e. Zt-1(ii) a Instead, a new state is selected from the sequence of past states, denoted as YtI.e. Ht={Z1,K,ZtThe probability is:

where Ind {. is an indicator event, if Zs=vkInd { Zs=vk1, otherwise Ind { Z }s=vkIs 0, so the process transitions to Zt+1As having the last two states XtAnd YtFormally, the transition probability of this stochastic process is defined as follows:

wherein α is a constant, uiIs the probability of a hidden state, when (v)i,vj) Epsilon lambda (j, k) and Prob (Z)t+1=vi|Zt=vj,Yt=vk)=Prob(Yt=vi|Ht) And in other cases Prob (Z)t+1=vi|Zt=vj,Yt=vk)=p(i, j, k); note that State (v)i,vj) E Λ (j, k) represents an undefined transition;

to approximate the high order markov chain defined by equation (3), it is necessary to derive an equivalent first order markov chain from the stationary distribution of the spatial random walk; in particular, assuming that M and x are the transition matrices and the corresponding stationary distributions, respectively, of a first order Markov chain, the equations for M and x are:

M=P[x]+x(eT-eT P[x]) (7)

x=αPx2+α(1-||Px2||1)x+(1-α)u (8)

wherein

Therefore, stable values of M and x in the formula (7) and the formula (8) are obtained by adopting an iterative fixed point algorithm, so that a random process of a first-order Markov chain is determined;

then, a set of network motifsT mInLine clustering, namely, for each motif, using a Markov clustering algorithm to put a result generated each time into a set C;

firstly, initializing a set C for storing a result obtained by clustering each network motif;

set by equation (4)PRandomly initializing M and x;

setting iteration times l, and performing iteration for the times l: x update M is fixed by equation (7), and x is fixed by equation (8);

obtaining clusters C using Markov clustering for MMAnd C isMPut into set C.

A redundancy deletion module:

because there are redundant parts in the set C, the redundant parts need to be deleted;

sorting all the clusters in the set C from large to small according to the number of the nodes;

the size of the set C is nCFrom 1 to nC-1 starts the traversal, wherein ciIs the ith cluster in set C;

if c isiIf it is not deleted from the set C, let j equal i +1, from j to nCStart of traversal, cjIs the jth cluster in set C;

calculating NA (c)i,cj) If the value is greater than or equal to the set neighborhood affinity threshold ρ, c is setjIs deleted from the set C, wherein

|ci∩cjIs | is ciAnd cjNumber of commonly owned nodes, | ciI and | cjRespectively represents ciAnd cjThe number of middle nodes;

after the traversal is finished, the cluster contained in the set C is the final result;

and a result display module:

and displaying in a text form according to results obtained by the network clustering module and the redundancy deletion module, wherein each line represents a cluster, and elements in each line are single molecules in the biological network.

The foregoing shows and describes the general principles and features of this invention, as well as features of this invention. The present invention is not limited by the above experimental particulars which are presented in the foregoing description and are merely illustrative of the principles of the present invention, and various changes and modifications may be made therein without departing from the principles of the present invention and such changes and modifications are intended to be within the scope of the invention as claimed.

8页详细技术资料下载
上一篇:一种医用注射器针头装配设备
下一篇:基于模糊聚类和基因本体语义相似性的可重叠蛋白质复合物识别方法和系统

网友询问留言

已有0条留言

还没有人留言评论。精彩留言会获得点赞!

精彩留言,会给你点赞!