Gene regulation and control network reconstruction method based on cross-platform causal network structure

文档序号：1143089 发布日期：2020-09-11 浏览：10次中文

阅读说明：本技术 一种基于跨平台因果网络结构的基因调控网络重建方法 (Gene regulation and control network reconstruction method based on cross-platform causal network structure ) 是由李弘� 张金喜曾晓南于 2020-05-12 设计创作，主要内容包括：本发明公开了一种基于跨平台因果网络结构的基因调控网络重建方法,包括：基于连续型因果网络结构建立离散型的平台节点,得到跨平台网络结构骨架；基于学习算法对所述跨平台网络结构骨架进行学习,对每个变量直接相连的变量集中的节点进行连接,得到无向图；在所述无向图中确定所述跨平台网络结构骨架中存在的v-结构,得到部分有向图；根据约束规则最大化标记所述部分有向图中剩余的无向边,得到最大化标志方向的有向图；本发明将基因调控网络视作因果图,基因测序平台视作因果图上的一个特殊节点,在重建跨平台基因调控网络过程中,将平台变量加入每一个基因表达的调控变量集中,以此消除不同基因测序平台引起的差异影响。(The invention discloses a gene regulation and control network reconstruction method based on a cross-platform causal network structure, which comprises the following steps: establishing discrete platform nodes based on a continuous causal network structure to obtain a cross-platform network structure skeleton; learning the cross-platform network structure framework based on a learning algorithm, and connecting nodes in a variable set, which are directly connected with each variable, to obtain an undirected graph; determining a v-structure existing in the cross-platform network structure skeleton in the undirected graph to obtain a partial directed graph; marking the remaining undirected edges in the partial directed graph in a maximized manner according to a constraint rule to obtain the directed graph with the maximized sign direction; the gene control network is regarded as a causal graph, the gene sequencing platform is regarded as a special node on the causal graph, and in the process of reconstructing the cross-platform gene control network, the platform variables are added into the control variable set of each gene expression, so that the difference influence caused by different gene sequencing platforms is eliminated.)

1. A gene regulation network reconstruction method based on a cross-platform causal network structure is characterized by comprising the following steps:

establishing discrete platform nodes based on a continuous causal network structure to obtain a cross-platform network structure skeleton;

learning the cross-platform network structure framework based on a learning algorithm, and connecting nodes in a variable set, which are directly connected with each variable, to obtain an undirected graph;

determining a v-structure existing in the cross-platform network structure skeleton in the undirected graph to obtain a partial directed graph;

and marking the residual non-directional edges in the partial directed graph in a maximized manner according to a constraint rule to obtain the directed graph with the maximized sign direction.

2. The method for reconstructing a gene regulatory network based on a cross-platform causal network structure as claimed in claim 1, wherein said learning algorithm is used for learning said cross-platform network structure skeleton, and nodes in a variable set directly connected to each variable are connected to obtain an undirected graph, and specifically comprises:

according to the d-partition principle, when the variable node f exists in the parent-child node set PC (x) of the target node x_iAnd the target node x is condition independent given the set of variables S, then the variable node f is determined_iAnd the variable node f without the edge directly connected with the target node x_iExcluded from PC (x).

3. The method for gene regulatory network reconstruction based on cross-platform causal network architecture of claim 2, wherein said step of determining variable nodes is: by means of an algorithm, in three stages with a variable set V ═ V₁,v₂,…,v_nAnd taking the variables in the variables as target nodes one by one until a parent-child node set PC (x) corresponding to each variable is obtained.

4. The method for gene regulatory network reconstruction based on cross-platform causal network structure of claim 3, wherein said three stages comprise a growth stage, a pruning stage and a refining stage.

5. The method for gene regulatory network reconstruction based on cross-platform causal network structure of claim 3, wherein the algorithm for determining variable nodes is a Parents _ and _ Children algorithm.

6. The method of claim 1, further comprising: providing a mixed type condition independence test, and checking the condition independence among cross-platform data; the method specifically comprises the following steps:

examining a given set of continuous variablesAs a set of conditions, a continuous variable v_iWith another continuous variable v_jCondition independence between;

examining a given set of continuous variables

examining a given set of continuous variablesAnd p, continuous variable v_iWith another continuous variable v_jCondition independence between.

7. The cross-platform causal network structure-based gene regulation network reconstruction method of claim 6Method, characterized in that said test gives a set of continuous variables

using Z as given condition variable set, respectively calculating v by least square method_iAnd the linear regression equation of Z, and v_jAnd a linear regression equation of Z, calculating residual errors respectively; calculating a partial correlation coefficient by using a simple correlation coefficient method, and performing Fisher-snow Z-conversion; make H₀:ρ_ij·zAssuming 0, a significance level α, if the following inequality holds, then H is rejected₀:

Where Φ (·) is the normal distribution, N is the sample size, and | Z | is the number of given condition variables.

8. The method for cross-platform causal network architecture based gene regulatory network reconstruction as claimed in claim 7, wherein said performing a fisher Z-transform is according to the formula:

9. the method of claim 6, wherein the testing is for a given set of continuous variablesAnd p, continuous variable v_iWith another continuous variable v_jThe condition independence includes:

for two continuous variables v_iAnd v_jGiven a set of conditions { v_KP, calculating partial correlation coefficients under each platform according to platform variables corresponding to the variables to obtain L partial correlation coefficients corresponding to the L platforms; converting the L partial correlation coefficients using a Fisher-Tropsch z-transform; propose hypothesis H₀P is zero overall if H is accepted₀If so, consider v_iAnd v_jAt a given set of conditions v_KP is independent of the condition, and under the condition of significance level α, H is rejected if the following inequality holds₀：

Wherein the content of the first and second substances,

10. The method of claim 9, wherein the partial correlation coefficient is

Technical Field

The invention relates to the field of gene regulation networks, in particular to a gene regulation network reconstruction method based on a cross-platform causal network structure.

Background

In the late genome era of 2001, the direction of biological research has turned to the study of functional genome. In terms of genome function, the expression of one gene may be under regulatory control of one or more other genes or molecules. The traditional method for searching the regulation relationship through biological experiments is very expensive, and at present, the regulation relationship among genes is found by using a large amount of gene expression data, reverse engineering and other methods through a computer technology, so that the method is a hotspot for gene regulation network research. However, different sequencing platforms have no direct comparability of gene expression data under different sequencing platforms due to the difference of technical means and operating equipment. There is a "high-dimensional, small sample" imbalance in gene expression data for a single sequencing platform, and to overcome this imbalance, there have been many recent studies attempting gene regulation network reconstruction using gene expression data from multiple platforms.

One common method is to integrate data of multiple platforms and then perform network reconstruction; the method generally combines cross-platform data into a whole gene expression data matrix which can be directly compared by using a certain stretching or compressing rule and integrating the gene expression data which have batch difference and can not be directly compared through some data conversion methods. Another method is to reconstruct the gene control network of each platform separately and then integrate the results under each platform by statistical methods. However, most of the above network reconstruction methods are applied to gene expression data on a single platform, and due to the difference influence caused by different gene sequencing platforms, the condition independence test applied in the causal network algorithm cannot measure discrete variables and continuous variables simultaneously.

Disclosure of Invention

The invention provides a cross-platform causal network structure-based gene regulation network reconstruction method, which is characterized in that a gene regulation network is regarded as a causal graph, a gene sequencing platform is regarded as a special node on the causal graph, and in the process of reconstructing the cross-platform gene regulation network, a platform variable is added into a regulation variable set expressed by each gene so as to eliminate the difference influence caused by different gene sequencing platforms.

In order to solve the above technical problems, an embodiment of the present invention provides a method for reconstructing a gene regulatory network based on a cross-platform causal network structure, including:

establishing discrete platform nodes based on a continuous causal network structure to obtain a cross-platform network structure skeleton;

determining a v-structure existing in the cross-platform network structure skeleton in the undirected graph to obtain a partial directed graph;

and marking the residual non-directional edges in the partial directed graph in a maximized manner according to a constraint rule to obtain the directed graph with the maximized sign direction.

As a preferred scheme, the learning algorithm-based learning of the cross-platform network structure skeleton is performed, nodes in a variable set directly connected to each variable are connected, and an undirected graph is obtained, which specifically includes:

As a preferred scheme, the step of determining the variable node is as follows: by means of an algorithm, in three stages with a variable set V ═ V₁,v₂,…,v_nAnd taking the variables in the variables as target nodes one by one until a parent-child node set PC (x) corresponding to each variable is obtained.

Preferably, the three stages include a growth stage, a pruning stage, and a refining stage.

Preferably, the algorithm for determining the variable nodes is a scores _ and _ Children algorithm.

Preferably, the method for reconstructing a gene regulatory network based on a cross-platform causal network structure further comprises: providing a mixed type condition independence test, and checking the condition independence among cross-platform data; the method specifically comprises the following steps:

examining a given set of continuous variablesAs a set of conditions, a continuous variable v_iWith another continuous variable v_jCondition independence between;

examining a given set of continuous variables

As a set of conditions, a continuous variable v_iConditional independence from platform variable p;

examining a given set of continuous variablesAnd p, continuous variable v_iWith another continuous variable v_jCondition independence between.

Preferably, the test gives a set of continuous variables

As a set of conditions, a continuous variable v_iWith another continuous variable v_jThe condition independence includes:

using Z as given condition variable set, respectively calculating v by least square method_iAnd the linear regression equation of Z, and v_jAnd a linear regression equation of Z, calculating residual errors respectively; calculating a partial correlation coefficient by using a simple correlation coefficient method, and performing Fisher-snow Z-conversion; make H₀:ρ_ij·ZAssuming 0, a significance level α, if the following inequality holds, then H is rejected₀:

Where Φ (·) is the normal distribution, N is the sample size, and | Z | is the number of given condition variables.

Preferably, the formula for performing the fischer-tropsch conversion is:

preferably, the test gives a set of continuous variables

And p, continuous variable v_iWith another continuous variable v_jThe condition independence includes:

Wherein the content of the first and second substances,representing a mean of 0 and a mean square error as the inverse of the cumulative function of the L-normal distribution.

Preferably, the partial correlation coefficient isFor the L partial phasesAfter the correlation coefficient is converted, z (i, j | k) is obtained as { z }₁(i,j|k),z₂(i,j|k),…,z_L(i,j|k)}。

Compared with the prior art, the embodiment of the invention has the following beneficial effects:

1. the gene control network is regarded as a causal graph, the gene sequencing platform is regarded as a special node on the causal graph, and in the process of reconstructing the cross-platform gene control network, the platform variables are added into the control variable set of each gene expression, so that the difference influence caused by different gene sequencing platforms is eliminated.

2. The cross-platform causal structure learning method and the mixed type condition independence test can be realized.

Drawings

FIG. 1: three basic connection diagrams exist for the variable of the continuous causal network;

FIG. 2: is v in the example of the invention₁And v₂Schematic diagrams separated by Zd;

FIG. 3: the cross-platform causal network is a schematic diagram of a cross-platform causal network according to an embodiment of the invention;

FIG. 4: the invention discloses a cross-platform causal network framework schematic diagram;

FIG. 5: is a partial directed graph of an embodiment of the invention;

FIG. 6: the pattern is identified for maximum in the embodiments of the present invention.

Detailed Description

The technical solutions in the embodiments of the present invention will be clearly and completely described below with reference to the drawings in the embodiments of the present invention, and it is obvious that the described embodiments are only a part of the embodiments of the present invention, and not all of the embodiments. All other embodiments, which can be derived by a person skilled in the art from the embodiments given herein without making any creative effort, shall fall within the protection scope of the present invention.

Referring to fig. 1 to 6, a preferred embodiment of the present invention provides a method for reconstructing a gene regulatory network based on a cross-platform causal network structure, including:

and S1, establishing discrete platform nodes based on the continuous causal network structure to obtain a cross-platform network structure skeleton.

In particular, a continuous causal network structure means that data samples corresponding to all variable nodes are all continuously distributed. Wherein a variable v is given₁,v₂Without direct causal relationship, by a third variable v₃As intermediate variables; there may be three basic connection cases of a forward connection structure, a branch connection structure, and a sink connection structure. If v is₁And v₂Is blocked by a node set Z, when the value of variable in Z is determined, v is changed₁(or v)₂) Value of (v) cannot be obtained₂(or v)₁) Producing an influence, called v₁And v₂Separated by Zd. I.e. v₁And v₂The conditions are independent given Z.

Specifically, one discrete platform is introduced that has an impact on all variables. An edge exists between the platform node and other variables, the edge points to the variable node from the platform node, the platform variable causes the variable, and the variable node is a result variable. The directed edges between variables have the same meaning as those of a continuous causal network, i.e. variable v_iAnd variable v_jThere is a causal relationship v_i→v_j，v_iIs a causal variable, v_jIs the result variable.

And S2, learning the cross-platform network structure skeleton based on a learning algorithm, and connecting the nodes in the variable set directly connected with each variable to obtain an undirected graph.

Specifically, the causal network skeleton learning method is to find a variable set in which each variable is directly connected, namely a parent-child node set, by using d-separation and conditional independence tests, and then connect the nodes to obtain an undirected graph, and specifically includes: according to the d-partition principle, if there is a variable node f in the set of parent and child nodes PC (x) of x_iAnd the target node x is condition independent given the set of variables S, then the variable node f_iWith no directly-connected edge, variable node f, between target node x_iShould be excluded from pc (x).

Specifically, the method for finding the parent node and the child node of the variable refers to that the variable set V is { V ═ V through a Parnts _ and _ Children algorithm₁，v₂，...，v_nThe variables in the method are used as target nodes one by one until a parent-child node set PC (x) corresponding to each variable is obtained, and the method specifically comprises three stages:

a growing phase, one by one, of the variables v in the set of candidate nodes_iPerforming a conditional independence test with the target node x, if there is no conditional independence given any subset S in the current PC (x), then v will be_iAdding x into the set of parent and child nodes PC (x) and deleting from the set of candidate nodes C (x).

During the pruning stage, given the v just added_iTo condition set so that the variable node v 'that had been previously added to pc (x) is condition independent of the target node x, v' is removed from pc (x). The residual variable nodes in the candidate node set C (x) are respectively connected with v_iPerforming a conditional independence test, if v' exists in C (x) and the target node if x exists, at the given newly added variable node v_iAnd the conditions are independent, v' is removed from C (x).

And continuously repeating the growth stage and the pruning stage until all the variables in the candidate node set C (x) are deleted or the number of the variables in the PC (x) reaches a certain upper limit.

Refining stage, for variable node v in PC (x)_jIf there is a collectionSo that v is_jConditional independently of the target node x given S, then v will be_jDeleted from PC (x).

S3, determining a v-structure existing in the cross-platform network structure skeleton in the undirected graph to obtain a partial directed graph. Specifically, the v-structure is a junction structure, and the direction of the edge can be determined by a conditional independence test.

And S4, maximally marking the residual non-directional edges in the partial directed graph according to a constraint rule to obtain the directed graph with the maximized sign direction. Specifically, according to constraint rules of no generation of redundant v-structures, no loop and the like, the directions of the remaining non-directional edges are marked continuously until no more non-directional edges can be marked, and a causal network structure diagram of the maximized marking direction is obtained; and regarding edges of which the direction cannot be judged partially through the constraint conditions, and keeping the edges as undirected edges in the network graph.

The technical solution of the present invention will be described in detail with reference to the following specific examples.

Fig. 1 shows three basic connection cases of a sequential connection structure, a branch connection structure and a sink connection structure of variables of a continuous causal network.

In a specific embodiment, the cis-link structure is as shown in fig. 1 (a): if the variable v is unknown₃Is then the slave variable v₁The obtained information will influence the pair v₃Reliability of prediction, in turn, for variable v₂The prediction may also be affected; the information may then be at v₁And v₂And they are related to each other. If the variable v is known₃Is then given from v₁The obtained information will not be aligned with v any more₃Has an effect on v₂With an effect. v. of₁And v₂Cannot pass through v₃The communication between the two is carried out, i.e. the information channel is blocked. Thus v₁And v₂At a given v₃Are independent of each other.

As shown in FIG. 1(b), when the variable v is a division structure₃When the information of (2) is unknown, the variable v of the information is not influenced₁And variable v₂Is transmitted between v₁And v₂Are related to each other; when v is known₃When information, v₁And v₂Is blocked, and thus v₁And v₂At a given v₃Are independent of each other.

When the variable v is a confluent structure as shown in FIG. 1(c)₃When unknown, variable v₁And variable v₂Independent of each other; but in the variable v₃Is known to be determined, v₁And v₂Are related to each other.

FIG. 2 is v₁And v₂Is divided by Zd to be v₁And v₂The conditions are independent given Z.

In a specific embodiment, let Z be a set of nodes, node v₁And node v₂Not provided for in Z α is v₁And v₂A path therebetween, v is said to be when any one of the following conditions is satisfied₁And v₂The passages α between are separated by Zd:

(1) α has a direct connection node or a branch connection node in Z, as shown in FIG. 2(a) and FIG. 2 (b);

(2) α has a junction node v₃Z does not include the sink node v₃And descendant nodes, as shown in FIG. 2 (c).

FIG. 3 is a cross-platform causal network including 4 variable nodes, with the introduction of a discrete platform with effects on all variables; each variable node is affected by a platform variable p, variable v₃Is subject to variable v₁And v₂Is the result variable of the joint influence of (c), and is also the variable v₄The causal variable of (a).

Fig. 4 is an undirected graph corresponding to a causal network found by learning a constructed cross-platform causal network framework.

In specific embodiments, { v₁，v₃，v₅，v₆P is a set of parent and child nodes for variable node x, using PC (x) ═ v₁，v₃，v₅，v₆P }. Two variables v_iAnd v_jDirectly connected means that there is no subset S to v_iAnd v_jd-is separated, then there is v_i∈PC(v_j)，v_j∈PC(v_i)。

FIG. 5 is a determination of v-structures present in a network skeleton resulting in a partial directed graph.

In a particular embodiment, a variable node v is given₁，v₂And v₃If there is a variable node setThe following conditions are satisfied: v. of₁And v₃Conditions are independent given S and v₁And v₃Given { S, v₂When the conditions are not independent, v is determined₁，v₂And v₃Forming a v-structure and forming an undirected edge v between the three variables₁-v₂-v₃Marked v₁→v₂←v₃。

FIG. 6 is a partial directed graph of maximized marker directions obtained by maximizing the remaining undirected edges in the labeled network graph according to the constraint rules.

In a particular embodiment, v₁→x-v₅X-v can be modified according to constraints that do not create redundant v-structures₅Is identified as v₁→x→v₅；v₃-v₂-v₄It remains in the causal network graph in an edgeless manner.

The cross-platform gene regulation and control network is constructed through the cross-platform causal discovery algorithm, so that the negative influence that part of gene expression data biological information is deleted by mistake due to the fact that data are excessively smooth in the data preprocessing process can be avoided, and the more generally applicable gene regulation and control network is constructed. The method adds a special platform node on a general causal network model, uses the edge between the platform node and the variable to represent the influence of the platform on each variable, and takes the platform variable as one of condition sets in the process of learning the causal relationship between the variables so as to eliminate the difference influence of the platform on the variables. The cross-platform causal network structure learning method is also provided, and the cross-platform causal network structure learning algorithm in the cross-platform causal relationship algorithm mainly comprises three steps: learning a network framework to find an undirected graph corresponding to a causal network; determining a v-structure existing in the network skeleton, wherein the obtained result is a partial directed graph; and thirdly, maximizing the residual undirected edges in the marked network graph according to a constraint rule to obtain a partial directed graph in the maximized marker direction.

In another embodiment, the method for gene regulation network reconstruction based on cross-platform causal network structure further comprises: s5, providing a mixed type conditional independence test, and checking conditional independence among cross-platform data; the method specifically comprises the following steps:

in the first case: examining a given set of continuous variables

As a set of conditions, a continuous variable v_iWith another continuous variable v_jCondition independence between;

in the second case: examining a given set of continuous variables

As a set of conditions, a continuous variable v_iConditional independence from platform variable p;

in the third case: examining a given set of continuous variables

And p, continuous variable v_iWith another continuous variable v_jCondition independence between.

Specifically, in the first case, v is obtained by the least square method using Z as a given conditional variable set_iAnd the linear regression equation of Z, and v_jAnd a linear regression equation of Z, calculating residual errors respectively; and then calculating a partial correlation coefficient by using a simple correlation coefficient method, and performing Fisher-snow Z-conversion:

make H₀：ρ_ij·ZAssuming 0, a significance level α, if the following inequality holds, then H is rejected₀：

Where Φ (·) is the normal distribution, N is the sample size, and | Z | is the number of given condition variables.

In particular, the second case defaulted variable v_iAnd platform variable p are interrelated and therefore not condition independent.

In particular, the third case is for two continuous variables v_iAnd v_jGiven a set of conditions { v_KP, calculating partial correlation coefficients under each platform according to platform variables corresponding to the variables to obtain L partial correlation coefficients corresponding to the L platformsThe L partial correlation coefficients are transformed using a fischer-tropsch z-transform to obtain z (i, j | k) ═ z₁(i,j|k),z₂(i,j|k),…,z_L(i,j|k)}。

Propose hypothesis H₀P is zero overall if H is accepted₀If so, consider v_iAnd v_jAt a given set of conditions v_KP is conditional independent under the significance level α, H is rejected if the following inequality holds₀：

Wherein the content of the first and second substances,representing a mean of 0 and a mean square error as the inverse of the cumulative function of the L-normal distribution.

The invention provides a mixed type condition independence test, which is designed on the basis that a partial correlation coefficient is used for the condition independence test in order to judge the condition independence among cross-platform data variables, and a discrete type platform variable is used as one of condition sets for judging the condition independence among variables.

The above-mentioned embodiments are provided to further explain the objects, technical solutions and advantages of the present invention in detail, and it should be understood that the above-mentioned embodiments are only examples of the present invention and are not intended to limit the scope of the present invention. It should be understood that any modifications, equivalents, improvements and the like, which come within the spirit and principle of the invention, may occur to those skilled in the art and are intended to be included within the scope of the invention.

12页详细技术资料下载

上一篇：一种医用注射器针头装配设备

下一篇：含二硫键多肽的结构预测方法及装置

Gene regulation and control network reconstruction method based on cross-platform causal network structure

相关技术

网友询问留言