Protein interaction prediction method and system based on mixed membership degree random block model

文档序号：70744 发布日期：2021-10-01 浏览：34次中文

阅读说明：本技术 一种基于混合隶属度随机块模型的蛋白质相互作用预测方法和系统 (Protein interaction prediction method and system based on mixed membership degree random block model ) 是由胡伦王小娟周喜蒋同海苏小芮于 2021-07-03 设计创作，主要内容包括：本发明公开了一种基于混合隶属度随机块模型的蛋白质相互作用预测方法和系统,该系统由数据处理模块、复合物隶属度计算模块和结果生成模块组成,基于随机块的蛋白质相互作用预测方法,从数据库中获取蛋白质相互作用数据,对获得的数据进行处理；应用随机变分推理对蛋白质相互作用网络进行分析,计算复合物隶属度指标；基于复合物隶属度指标的计算结果,对蛋白质相互作用进行预测,展示预测结果。该系统通过混合隶属度随机块模型以及复合物识别方法,分析蛋白质相互作用网络,预测蛋白质相互作用,并展示相关预测结果,提高预测准确度。(The invention discloses a protein interaction prediction method and a system based on a mixed membership degree random block model, wherein the system consists of a data processing module, a compound membership degree calculation module and a result generation module; analyzing a protein interaction network by random variational reasoning, and calculating a membership index of the compound; and predicting the protein interaction based on the calculation result of the compound membership index, and displaying the prediction result. The system analyzes a protein interaction network through a mixed membership random block model and a compound identification method, predicts protein interaction, displays a related prediction result and improves the prediction accuracy.)

1. A protein interaction prediction method based on a mixed membership degree random block model is characterized by comprising the following steps:

a. acquiring protein interaction data from a database, preprocessing the data to obtain a two-dimensional matrix consisting of 0 and 1 to represent a known interaction network between proteins, wherein in the network, nodes represent proteins, edges between the nodes represent interaction relations between the proteins, and if the two proteins have interaction, the edges are 1; otherwise, it is 0;

b. analyzing a protein interaction network by random variational reasoning, and calculating a membership index of a protein complex; wherein:

assigning to each protein a complex membership vector pi_iObeying the Dirichlet distribution,. pi_iDirichelet (α), the specific formula is:

assigning a weight β to each complex in the network_kObeying beta distribution, beta_kBeta (eta), the specific formula is:

updating the obtained parameters by using random variational reasoning;

two important complex correlation indexes are finally obtained: complex membership vector pi and complex weight vector beta_k；

c. Predicting the protein interaction based on the calculation result of the membership index of the protein complex, and displaying the prediction result, wherein: and calculating the probability of the interaction of the two proteins according to the membership degree of the complex of the two proteins and the weight of the complex, and displaying the prediction result.

2. The method of claim 1, wherein the calculating of the probability of interaction between two proteins in step c is based on the complex membership vector pi for each pair of proteins_iAnd pi_jAnd a complex weight vector β_kCalculating the possibility of the interaction, wherein the specific formula is as follows:

the possibility of the interaction between the two proteins is calculated by formula (4), and then the interaction probability is calculated by normalization.

3. The method of claim 1, wherein the protein interaction prediction result displayed in step c is:

sorting unknown interacting protein pairs from high to low according to the probability of interaction;

when the probability of the two proteins interacting is greater than 0.5, it is considered that there is an interaction between them, whereas there is no interaction.

4. A protein interaction prediction system based on a mixed membership degree random block model is characterized by comprising a data processing module (101), a compound membership degree calculation module (102) and a result generation module (103), wherein:

data processing module (101): the system is used for acquiring protein interaction data from a database and processing the acquired data;

a complex membership calculation module (102): analyzing a protein interaction network by random variational reasoning, and calculating a membership index of the compound;

a result generation module (103): and predicting the protein interaction according to the calculation result of the compound membership index, and displaying the prediction result.

Technical Field

The invention relates to the technical field of computer data processing, in particular to a protein interaction prediction method and system based on a mixed membership degree random block model.

Background

The research on the interaction between proteins is of great significance to the understanding of the mechanism and principle of various biochemical reactions and life activities in organisms. With the rapid development of computer technology, protein interaction networks are continuously strong, cover a large amount of interaction information, form complex network structures, and attract more and more network-based protein interaction prediction research. At present, a large number of methods for predicting protein interaction based on a network mainly utilize the topological structure similarity between proteins in a protein interaction network, wherein the main method is to judge whether interaction exists between two proteins according to the number of common neighbors between the two proteins, the algorithm is called as a common neighbor algorithm, and the core idea is that if two proteins have enough interaction neighbors, the two proteins are more likely to interact.

However, in practical applications, the existing network-based protein interaction prediction methods are poor in effect, and the main reason is that such methods only consider local information of proteins in the network and cannot sufficiently mine a link mode in the whole network. Meanwhile, recent studies have pointed out that if two proteins have a sufficient number of interacting neighbors, it can only be stated that the similarity of their interaction sites is high, and it cannot be stated that there is an interaction between them. In contrast, given two proteins, the two given proteins will only interact if one of them is similar to the other's interaction partner. Obviously, the existing method focuses on the local information of the protein interaction network and ignores the global information of the network, so that the requirement of practical application cannot be met on the accuracy of the protein interaction prediction.

Disclosure of Invention

The invention aims to provide a protein interaction prediction method and system based on a mixed membership degree random block model aiming at the defects in the prior art. The method comprises the steps of acquiring protein interaction data from a database, and processing the acquired data; analyzing a protein interaction network by random variational reasoning, and calculating a membership index of the compound; and predicting the protein interaction based on the calculation result of the compound membership index, and displaying the prediction result. The system consists of a data processing module, a compound membership calculation module and a result generation module, and analyzes a protein interaction network through a mixed membership random block model and a compound identification method, predicts protein interaction, displays related prediction results and improves prediction accuracy.

The invention relates to a protein interaction prediction method based on a mixed membership degree random block model, which comprises the following steps:

b. analyzing a protein interaction network by random variational reasoning, and calculating a membership index of a protein complex; wherein:

assigning to each protein a complex membership vector pi_iObeying the Dirichlet distribution,. pi_iDirichelet (α), the specific formula is:

assigning a weight β to each complex in the network_kObeying beta distribution, beta_kBeta (eta), the specific formula is:

updating the obtained parameters by using random variational reasoning;

two important complex correlation indexes are finally obtained: complex membership vector pi and complex weight vector beta_k；

c. Predicting the protein interaction based on the calculation result of the membership index of the protein complex, and displaying the prediction result, wherein: calculating the probability of the interaction of the two proteins according to the membership degree of the complex of the two proteins and the weight of the complex; and displaying the prediction result.

In step c, the probability of the interaction between the two proteins is calculated based on the membership vector pi of the complex of each pair of proteins_iAnd pi_jAnd a complex weight vector β_kCalculating the possibility of the interaction, wherein the specific formula is as follows:

the possibility of the interaction between the two proteins is calculated by formula (4), and then the interaction probability is calculated by normalization.

The predicted result of protein interaction displayed in step c is:

sorting unknown interacting protein pairs from high to low according to the probability of interaction;

when the probability of the two proteins interacting is greater than 0.5, it is considered that there is an interaction between them, whereas there is no interaction.

A protein interaction prediction system based on a mixed membership degree random block model is composed of a data processing module (101), a compound membership degree calculation module (102) and a result generation module (103), wherein:

data processing module (101): the system is used for acquiring protein interaction data from a database and processing the acquired data;

a complex membership calculation module (102): analyzing a protein interaction network by random variational reasoning, and calculating a membership index of the compound;

a result generation module (103): and predicting the protein interaction according to the calculation result of the compound membership index, and displaying the prediction result.

Compared with the prior art, the invention has the following beneficial technical effects:

the invention relates to a protein interaction prediction method and a system based on a mixed membership degree random block model, which comprises the steps of firstly obtaining protein interaction data from a database, preprocessing the obtained data and forming a protein interaction network; then, analyzing the network by random variational reasoning, and calculating a membership index of the compound; and finally, predicting the protein interaction based on the calculation result of the compound membership index, displaying the prediction result and improving the prediction accuracy. The method can avoid the prediction analysis only by using local network information, and can analyze the interaction of the whole protein by using a compound identification method, and accurately predict whether the interaction exists between the two proteins according to the analysis result. Compared with the existing network-based prediction algorithm, the prediction accuracy is obviously improved.

The invention also discloses a method and a system for realizing the protein interaction prediction, wherein the system mainly comprises the following three parts: the system comprises a data processing module, a compound membership calculation module and a result generation module. First, protein interaction data is acquired from a database, and the acquired data is processed. Then, a random variational inference is applied to calculate a composite relevance indicator. And finally, the result generation module predicts according to the calculation result of the compound membership index and displays the prediction result.

Drawings

FIG. 1 is a logical block diagram of the present invention;

FIG. 2 is a diagram illustrating data processing according to the present invention, wherein A is a diagram illustrating raw data of protein interaction; and B is a schematic diagram of the protein interaction data after being processed.

Detailed Description

The present invention will now be described in further detail with reference to specific examples, which are intended to be illustrative, but not limiting, of the invention.

Examples

The invention relates to a protein interaction prediction method based on a mixed membership degree random block model, which comprises the following steps:

a. protein interaction data are obtained from a database and preprocessed to produce a two-dimensional matrix of 0's and 1's to represent the known interaction network between proteins. In the network, nodes represent proteins, edges between the nodes represent interaction relationships between the proteins, and if there is an interaction between two proteins, the edge is 1; otherwise, it is 0;

b. analyzing a protein interaction network by random variational reasoning, and calculating a membership index of a protein complex; wherein:

assigning to each protein a complex membership vector pi_iObeying the Dirichlet distribution,. pi_iDirichelet (α), the specific formula is:

assigning a weight β to each complex in the network_kObeying beta distribution, beta_kBeta (eta), the specific formula is:

updating the obtained parameters by using random variational reasoning;

two important complex correlation indexes are finally obtained: complex membership vector pi and complex weight vector beta_k；

calculating the possibility of the interaction of the two proteins by a formula (4), and calculating the interaction probability by normalization;

the predicted result of protein interaction displayed in step c is:

sorting unknown interacting protein pairs from high to low according to the probability of interaction;

when the probability of the interaction between two proteins is more than 0.5, the interaction between the two proteins is considered to exist, and conversely, the interaction does not exist;

a protein interaction prediction system based on a mixed membership degree random block model is composed of a data processing module, a compound membership degree calculation module and a result generation module, wherein:

the data processing module 101: the system is used for acquiring protein interaction data from a database and processing the acquired data;

the compound membership calculation module 102: analyzing a protein interaction network by random variational reasoning, and calculating a membership index of the compound;

the result generation module 103: predicting the protein interaction according to the calculation result of the membership index of the compound, and displaying the prediction result;

as shown in fig. 1: the system mainly comprises the following three parts: 101 is a data processing module, 102 is a compound membership degree calculation module and 103 is a result generation module;

the protein interaction prediction method based on the mixed membership degree random block model comprises the following steps:

acquiring protein interaction data from a database, and preprocessing the acquired data to acquire a protein interaction network;

analyzing a protein interaction network by using random variational reasoning, and calculating a compound membership index;

predicting the protein interaction based on the calculation result of the compound membership index, and displaying the prediction result;

the following specifically exemplifies the operation of each module:

the data processing module 101:

collecting data from each big database to obtain protein interaction data, such as A, B, C, D in the schematic diagram of the A protein interaction raw data in FIG. 2;

processing the collected raw data, representing the relationship between proteins by 0 and 1; if two proteins have an interaction, the edge between them is set to 1, otherwise, set to 0, as shown in the schematic diagram of B in FIG. 2, which is the interaction data of the processed proteins, and then constructing a adjacency matrix;

the compound membership calculation module 102:

analyzing a protein interaction network by random variational reasoning, calculating a membership index of a compound and preparing for predicting interaction;

the specific calculation method of the compound membership index comprises the following steps:

assigning to each protein a complex membership vector pi_iObeying the Dirichlet distribution,. pi_iDirichelet (α), the specific formula is as follows:

assigning a weight β to each complex in the protein interaction network_kObeying beta distribution, beta_kBeta (η), the specific formula is as follows:

updating the parameters by random variational reasoning;

two important complex correlation indexes are finally obtained: complex membership vector pi and corresponding weight vector beta_k；

The result generation module 103:

the result generation module carries out a series of calculations based on the membership index of the compound, calculates the probability of the interaction of a pair of proteins, carries out interaction prediction and displays the prediction result; the specific operation is as follows:

calculating the probability of the interaction of the two proteins according to the membership degree of the complex of the two proteins and the weight of the complex;

the specific probability calculation method is as follows:

complex membership vector pi based on each pair of proteins_iAnd pi_jAnd a complex weight vector β_kCalculating the possibility of the interaction, wherein the specific formula is as follows:

calculating the interaction probability through normalization according to the obtained interaction possibility, wherein the specific formula is as follows:

and displaying a prediction result:

protein pairs with unknown interaction are ranked from high to low according to the size of the probability of interaction, and when the probability of interaction between two proteins is more than 0.5, the two proteins are considered to have interaction, and conversely, no interaction exists.

The above description is only for the preferred embodiment of the present invention, and is not intended to limit the scope of the present invention. Any modification, equivalent replacement, or improvement made within the spirit and principle of the present invention should be included in the protection scope of the present invention.

10页详细技术资料下载

Protein interaction prediction method and system based on mixed membership degree random block model

相关技术

网友询问留言