Two biological network global comparison method based on discretization bat algorithm

文档序号:1289184 发布日期:2020-08-28 浏览:14次 中文

阅读说明:本技术 基于离散化蝙蝠算法的两个生物网络全局比对方法 (Two biological network global comparison method based on discretization bat algorithm ) 是由 陈璟 夏金芳 于 2020-05-25 设计创作,主要内容包括:本发明公开了一种基于离散化蝙蝠算法的两个生物网络全局比对方法,包括:使用基于目标函数的搜索方法的框架,将蝙蝠算法离散化应用在网络比对中;融合先验知识初始化种群;保留保守节点的比对结果;蝙蝠算法的离散化体现在速度和解两方面,速度采用0或1表示当前解是否需要进行扰动,对蝙蝠个体的编码进行排列来表示个体的解。本发明的有益效果:在生物指标上及拓扑指标上表现均有不错表现的生物网络比对方法。(The invention discloses a two biological network global comparison method based on a discretization bat algorithm, which comprises the following steps: discretizing and applying a bat algorithm to network comparison by using a frame of a search method based on an objective function; fusing prior knowledge to initialize a population; keeping the comparison result of the conservative nodes; the discretization of the bat algorithm is embodied in two aspects of speed and solution, wherein the speed adopts 0 or 1 to represent whether the current solution needs to be disturbed, and codes of bat individuals are arranged to represent the solutions of the individuals. The invention has the beneficial effects that: the biological network comparison method has good performance on both biological indexes and topological indexes.)

1. A two biological network global comparison method based on a discretization bat algorithm is characterized by comprising the following steps: discretizing and applying a bat algorithm to network comparison by using a frame of a search method based on an objective function; fusing prior knowledge to initialize a population; the alignment results of the conserved nodes are retained.

2. The two biological network global comparison method based on the discretized bat algorithm of claim 1, wherein the network comparison method is divided into the following four steps:

(1) inputting: inputting two networks to be compared and sequence similarity between the two networks;

(2) individual encoding and initialization:

(3) individual and population iteration:

(4) and (3) outputting a comparison result: and after each iteration updating, the optimal solution in the population is found after the solutions in all the populations in each cycle are updated, and if the optimal solution is not changed for N times continuously, the optimal solution is output as a final comparison result.

3. The discretized bat algorithm-based two-bio-network global alignment method of claim 2, wherein "individual encoding and initialization: the method specifically comprises the following steps: (a) each bat individual represents a comparison result, the two networks are numbered from 1 respectively, the comparison result is an arrangement consistent with the dimension of a small network, the number in the arrangement represents the number of a node in the second network, and the position of the number represents the number of the node in the first network;

(b) and initializing the bat population, and comparing according to the similarity of the nodes by adopting a greedy algorithm.

4. The discretized bat algorithm-based two-bio-network global alignment method of claim 2, wherein "individual and population iteration: the specific steps are as follows: firstly, in the individual iteration part, a new solution is generated by disturbance, then, if the generated random number between 0 and 1 is greater than the pulse frequency r of the current individual, local disturbance is carried out, and then whether updating is carried out is judged; otherwise, not carrying out local disturbance, and directly judging whether to update; whether the updating is carried out or not is judged according to whether the generated random number between 0 and 1 is larger than the loudness A of the current individual or not and whether the updated disturbed target function becomes optimal or not, if the two conditions are met, the updating is carried out, and if not, the next individual disturbance is carried out; and during population iteration, after population iteration is completed every time, searching for an optimal solution in the population, and updating the optimal solution.

5. The discretized bat algorithm-based two-bio-network global alignment method of claim 4, wherein several processes involved in an individual iteration are as follows:

(a) perturbation generates a new solution: the new solution is generated according to the speed, and if the speed is 0, the matching of the current node is kept; if the speed is 1, randomly matching nodes which are not compared;

(b) local exchange disturbance: and local disturbance carries out exchange operation according to the speed, all nodes with the speed of 1 in one individual form an exchange set, and random exchange matching is carried out on the node matching in the exchange set.

(c) Updating: the updated content includes speed, solution, loudness and frequency, wherein a new speed to be updated is generated when the objective function is calculated, if the node is a conservative node, the speed is 0, otherwise, the speed is 1.

6. The discretized bat algorithm-based two-bio-network global alignment method of claim 1, wherein N is 10.

7. The two biological network global comparison method based on the discretization bat algorithm of claim 1, wherein the discretization of the bat algorithm is embodied in two aspects of speed and solution, the speed adopts 0 or 1 to represent whether the current solution needs to be disturbed, and the codes of the bat individuals are arranged to represent the solution of the individuals.

8. A computer device comprising a memory, a processor and a computer program stored on the memory and executable on the processor, characterized in that the steps of the method of any of claims 1 to 7 are implemented when the program is executed by the processor.

9. A computer-readable storage medium, on which a computer program is stored which, when being executed by a processor, carries out the steps of the method according to any one of claims 1 to 7.

10. A processor, characterized in that the processor is configured to run a program, wherein the program when running performs the method of any of claims 1 to 7.

Technical Field

The invention relates to the field of biological network global comparison, in particular to a two-biological network global comparison method based on a discretization bat algorithm.

Background

Network comparison is a research direction of complex networks, and can be applied to common traffic networks, social networks, biological networks and the like in life. The biological network comparison is a common method for researching the interaction between biological molecules, is also an important means for analyzing the functional difference between species, and can mine the functional difference between species, the knowledge transfer between species, the phylogenetic relationship and the like through the comparison of the biological network.

At present, two network comparison methods are two-step methods and two types of search methods based on an objective function.

The two-step method is mainly carried out in two steps as the name suggests, and firstly, the similarity between node pairs among networks is calculated; and secondly, completing the comparison process according to the similarity guidance of the first step.

The search method based on the objective function is to search around the objective function to generate a better new solution, and the result which finally makes the objective function optimal is taken as comparison output. In this type of alignment method, the objective function can directly use the metric, and then the alignment result is continuously adjusted by the search method in hope of finding a better result.

The two types of methods are compared, and the two-step method is earlier in research and relatively mature, but the results are evaluated and measured after the comparison is completed. However, the method also has the problem that prior knowledge is difficult to be integrated into the searching method, and particularly when the network scale is large, if the prior knowledge is not added into the searching method, the searching efficiency is low.

Disclosure of Invention

The invention aims to solve the technical problem of providing a two biological network global comparison method based on a discretization bat algorithm, and establishing a mapping relation of nodes between networks by comparing networks. The invention applies the discretization of the traditional bat algorithm to the network comparison problem, carries out greedy comparison by combining the similarity of nodes among networks to generate an initial population, and solves the problem of low search efficiency because no prior knowledge is added in the search method based on the target function.

In order to solve the technical problem, the invention provides a two-biological-network global comparison method based on a discretization bat algorithm, which comprises the following steps: discretizing and applying a bat algorithm to network comparison by using a frame of a search method based on an objective function; fusing prior knowledge to initialize a population; the alignment results of the conserved nodes are retained.

In one embodiment, the network comparison method is specifically divided into the following four steps:

(1) inputting: inputting two networks to be compared and sequence similarity between the two networks;

(2) individual encoding and initialization:

(3) individual and population iteration:

(4) and (3) outputting a comparison result: and after each iteration updating, the optimal solution in the population is found after the solutions in all the populations in each cycle are updated, and if the optimal solution is not changed for N times continuously, the optimal solution is output as a final comparison result.

In one embodiment, "individual encoding and initialization: the method specifically comprises the following steps: (a) each bat individual represents a comparison result, the two networks are numbered from 1 respectively, the comparison result is an arrangement consistent with the dimension of a small network, the number in the arrangement represents the number of a node in the second network, and the position of the number represents the number of the node in the first network;

(b) the initialization of the bat population adopts a greedy algorithm to carry out comparison according to the similarity of the nodes;

in one embodiment, "individual and population iteration: the specific steps are as follows: firstly, in the individual iteration part, a new solution is generated by disturbance, then, if the generated random number between 0 and 1 is greater than the pulse frequency r of the current individual, local disturbance is carried out, and then whether updating is carried out is judged; otherwise, not carrying out local disturbance, and directly judging whether to update; whether the updating is carried out or not is judged according to whether the generated random number between 0 and 1 is larger than the loudness A of the current individual or not and whether the updated disturbed target function becomes optimal or not, if the two conditions are met, the updating is carried out, and if not, the next individual disturbance is carried out; and during population iteration, after population iteration is completed every time, searching for an optimal solution in the population, and updating the optimal solution.

In one embodiment, the several processes involved in an individual iteration are as follows:

(a) perturbation generates a new solution: the new solution is generated according to the speed, and if the speed is 0, the matching of the current node is kept; if the speed is 1, randomly matching nodes which are not compared;

(b) local exchange disturbance: and local disturbance carries out exchange operation according to the speed, all nodes with the speed of 1 in one individual form an exchange set, and random exchange matching is carried out on the node matching in the exchange set.

(c) Updating: the updated content includes speed, solution, loudness and frequency, wherein a new speed to be updated is generated when the objective function is calculated, if the node is a conservative node, the speed is 0, otherwise, the speed is 1.

In one embodiment, N is 10.

In one embodiment, the discretization of the bat algorithm is embodied in two aspects of speed and solution, the speed adopts 0 or 1 to represent whether the current solution needs to be disturbed, and the codes of the bat individuals are arranged to represent the solutions of the individuals.

Based on the same inventive concept, the present application also provides a computer device comprising a memory, a processor and a computer program stored on the memory and executable on the processor, the processor implementing the steps of any of the methods when executing the program.

Based on the same inventive concept, the present application also provides a computer-readable storage medium, on which a computer program is stored, which, when being executed by a processor, carries out the steps of any of the methods.

Based on the same inventive concept, the present application further provides a processor for executing a program, wherein the program executes to perform any one of the methods.

The invention has the beneficial effects that:

the biological network comparison method has good performance on both biological indexes and topological indexes.

Drawings

FIG. 1 is a flow chart of two biological network global comparison methods based on the discretization bat algorithm.

Fig. 2(a), (b), (c) are the case diagram and the code diagram of the two biological network global comparison methods based on the discretization bat algorithm of the invention, respectively.

FIG. 3 is a schematic diagram of disturbance generation new solutions in two biological network global comparison methods based on the discretization bat algorithm.

FIG. 4 is a flow chart of individual perturbation in the two biological network global comparison method based on the discretization bat algorithm of the present invention.

FIG. 5 is the return speed according to the objective function in the two biological network global comparison methods based on the discretization bat algorithm of the present invention.

FIG. 6 is a local perturbation in two biological network global alignment methods based on the discretized bat algorithm of the present invention.

Detailed Description

The present invention is further described below in conjunction with the following figures and specific examples so that those skilled in the art may better understand the present invention and practice it, but the examples are not intended to limit the present invention.

The invention is a two biological network global comparison method based on discretization bat algorithm, use the frame based on search method of the objective function, discretize bat algorithm and apply to the network to compare; fusing prior knowledge to initialize a population; the alignment results of the conserved nodes are retained.

The traditional bat algorithm aims at the continuity problem, and the invention discretizes the traditional bat algorithm and applies the discretization to the network comparison problem. The discretization of the bat algorithm is mainly embodied in two aspects of speed and solution, wherein the speed adopts 0 or 1 to represent whether the current solution needs to be disturbed, and codes of bat individuals are arranged to represent the solutions of the individuals.

The network comparison is shown in fig. 1 and is divided into four parts:

(1) inputting: the two networks to be aligned and the sequence similarity between them are entered.

(2) Individual encoding and initialization:

(a) each bat individual represents a comparison result, the two networks are numbered from 1 respectively, the comparison result is an arrangement consistent with the dimension of the small network, the number in the arrangement represents the number of the node in the second network, and the position of the number represents the number of the node in the first network. Taking the two networks in the case and code of fig. 2 as an example, fig. 2(a) is the source network, fig. 2(b) is the target network, and the nodes in the two networks are numbered from 1, respectively, as shown in the similarity matrix in fig. 2(c), and the numbers outside the brackets are the numbers of the nodes inside the brackets.

(b) And initializing the bat population, and comparing according to the similarity of the nodes by adopting a greedy algorithm. FIG. 3 is an alignment x generated using a greedy algorithm based on the similarity matrix of FIG. 2(c)iThen, the comparison is based on the velocity viNew x is generatedi

(3) Individual and population iteration:

as shown in fig. 4, the individual iteration part firstly generates a new solution by disturbance, then, if the generated random number between 0 and 1 is greater than the pulse frequency r of the current individual, local disturbance is performed, and then whether updating is judged; otherwise, the local disturbance is not carried out, and whether the updating is carried out or not is directly judged. Whether the updating is carried out or not is judged according to whether the generated random number between 0 and 1 is larger than the loudness A of the current individual or not and whether the updated and disturbed target function becomes optimal or not, if the two conditions are met, the updating is carried out, and if not, the next individual disturbance is carried out. And during population iteration, after population iteration is completed every time, searching for an optimal solution in the population, and updating the optimal solution. Several processes involved in individual iterations are as follows:

(a) perturbation generates a new solution: the new solution is generated according to the speed, and if the speed is 0, the matching of the current node is kept; if the speed is 1, then there is a random match in the unmatched nodes, as shown in FIG. 3.

(b) Updating: the updated content of the invention comprises speed, solution, loudness and frequency, wherein, a new speed to be updated is generated when the objective function is calculated, if the node is a conservative node, the speed is 0, otherwise, the speed is 1. The process of calculating the return speed of the objective function is shown in fig. 5.

(c) Local exchange disturbance: local disturbance carries out switching operation according to the speed, all nodes with the speed of 1 in an individual form a switching set, and random switching matching is carried out on node matching in the switching set, as shown in fig. 6.

(4) And (3) outputting a comparison result: after each iteration updating, the optimal solution in the population is found after the solutions in all the populations in each cycle are updated, and if the optimal solution is not changed for 10 times continuously, the optimal solution is output as a final comparison result.

Table 1 shows the experimental data of the present invention applied to the real biological network, and the experiment compares the method of the present invention with the MAGNA + +, L-GRAAL, and AligNet methods, respectively, which represent a typical search method based on an objective function, the latest comparison method in the GRAAL family of pure topology, and the latest two network comparison methods that can be obtained.

Table 2 shows the topology (EC, ICS, s) of the present invention and the comparison method on the real network3) And expression on biological indicators (FC). In the experiment of the real network, the SC (Saccharomyces cerevisiae) species network is compared with the HS (HomoSapiens) species network, so that the topological indexes mainly refer to EC and s3. On the two topological indexes, the pure topological L-GRAAL is best in performance, and the method is second time. While in the biological index FC, alignnet performs best, it is highly likely because it uses more sequence similarity between proteins in the same network than other algorithms in alignment, which makes the alignnet method constrained in its application by the necessity for inter-species similarity. The method of the invention is inferior to the optimal FC value on the biological index by 0.01, but the method of the invention has the performance value on the topological index which is about twice that of the AligNet method. Therefore, the invention is a biological network comparison method with good expression on biological indexes and topological indexes.

A more specific example is given below:

(1) and preprocessing the data, and removing a repeating edge and a self-circulation edge in the network.

(2) Inputting the preprocessed two networks and the sequence similarity between the networks.

(3) The parameters of the method are set, the initial population size is 40, the iteration times are 1000, the weight of sequence and topological similarity is 0.9, and other parameters use parameters recommended in a bat algorithm.

(4) Analyzing the topological and biological performance of the output comparison result.

The above-mentioned embodiments are merely preferred embodiments for fully illustrating the present invention, and the scope of the present invention is not limited thereto. The equivalent substitution or change made by the technical personnel in the technical field on the basis of the invention is all within the protection scope of the invention. The protection scope of the invention is subject to the claims.

10页详细技术资料下载
上一篇:一种医用注射器针头装配设备
下一篇:融合多种拓扑信息的生物网络比对方法

网友询问留言

已有0条留言

还没有人留言评论。精彩留言会获得点赞!

精彩留言,会给你点赞!