Method for predicting gene expression profile after drug action based on generation of confrontation network

文档序号:1075103 发布日期:2020-10-16 浏览:4次 中文

阅读说明:本技术 基于生成对抗网络的药物作用后基因表达谱预测方法 (Method for predicting gene expression profile after drug action based on generation of confrontation network ) 是由 鱼亮 董达 于 2020-06-28 设计创作,主要内容包括:本发明公开了一种基于生成对抗网络的药物作用基因表达谱预测方法,主要解决现有生物技术成本高,时间周期长的问题。其方案是:从基因表达谱数据库中下载某种药物作用后的基因表达谱数据;构建由编码器和解码器组成的自编码器并压缩药物作用后的基因表达谱数据得到压缩基因表达谱数据;构建生成对抗网络,使用压缩基因表达谱数据训练该生成对抗网络;将构建的自编码器中的编码器和解码器,以及生成对抗网络中的生成器,按照编码器,生成器,解码器的顺序依次拼接构成预测模型;将用药前的基因表达谱输入到预测模型中,得到用药后的基因表达谱,本发明成本低,能够快速获得大量的用药后的基因表达谱,可用筛选具有治疗作用效果的药物集合。(The invention discloses a method for predicting a drug action gene expression profile based on generation of an antagonistic network, which mainly solves the problems of high cost and long time period of the prior art. The scheme is as follows: downloading gene expression profile data after the action of a certain drug from a gene expression profile database; constructing a self-encoder consisting of an encoder and a decoder and compressing gene expression profile data after the action of the medicament to obtain compressed gene expression profile data; constructing a generation countermeasure network, and training the generation countermeasure network by using compressed gene expression profile data; sequentially splicing an encoder and a decoder in the built self-encoder and a generator in the generation countermeasure network according to the sequence of the encoder, the generator and the decoder to form a prediction model; the gene expression profile before medication is input into the prediction model to obtain the gene expression profile after medication.)

1. A prediction method based on the generation of a drug-acted gene expression profile of an antagonistic network is characterized by comprising the following steps:

(1) randomly downloading tens of thousands of gene expression profile data G from a gene expression profile database;

(2) building a self-coding neural network AC consisting of an encoder EN and a decoder DE, and training the network through the gene expression profile data obtained in the step (1) to obtain a trained self-coding neural network;

(3) downloading all data of the gene expression profiles before and after the action of the drug to be researched from a gene expression profile database;

(4) compressing the data in the step (3) by using an encoder in the trained neural network to obtain compressed vector data GE before and after the action of the medicine;

(5) building a generation countermeasure neural network consisting of a generator GN and a discriminator DI;

(6) training the antagonistic neural network by using compressed vector data before and after the action of the medicament to obtain a trained generated antagonistic neural network;

(7) splicing an encoder and a decoder in the trained self-coding neural network and a generator in the trained generation countermeasure neural network in sequence according to the sequence of the encoder- > the generator- > the decoder to obtain a final prediction model;

(8) inputting the gene expression profile of the cell line to be researched before the medicine is applied into a prediction model, and outputting the gene expression profile of the cell line after the medicine is applied.

2. The method of claim 1, wherein (2) the self-encoding neural network is trained by:

(2a) randomly initializing all parameters in the self-coding neural network;

(2b) inputting the gene expression profile data G into a self-coding neural network to obtain an output O, and calculating the error between the two: l | | | O-G | | non-conducting phosphor2

(2c) And updating the parameters of the self-coding neural network by using a random gradient descent algorithm of the neural network until the error L is not reduced any more, and stopping updating the parameters to obtain the trained self-coding neural network.

3. The method of claim 1, wherein (6) generating trains the anti-neural network by:

(6a) randomly initializing all parameters in the generation of the countermeasure neural network;

(6b) randomly selecting 64 pairs of data before and after medication in the compressed vector data GE, and inputting the compressed vectors before medication in the data into a generator GE to obtain predicted compressed vectors of the gene expression profiles after medication;

(6c) training generates discriminators DI in the antagonistic neural network:

(6c1) setting the selected compressed vector label of the gene expression profile after medication as 1, setting the predicted compressed vector label of the gene expression profile after medication as 0, and inputting the two parts of the compressed vectors of the gene expression profile into a discriminator DI to obtain the output result of the discriminator;

(6c2) fixing the parameters of the generator GN, and transmitting the label information in (6c1) and the output result of the discriminator to the parameters of the random gradient descent algorithm for updating the DI once;

(6d) training generates a discriminator GN in the countering neural network:

(6d1) setting the label of a compression vector of a predicted gene expression profile after medication as 1, and inputting the compression vector of the expression profile into a discriminator DI to obtain an output result of the discriminator;

(6d2) fixing parameters of DI, transmitting label information in (6d1) and output result of the discriminator to a random gradient descent algorithm to update GN parameters once;

(6e) and (6b) repeating all the steps from (6b) to (6d) until the output probability of the discriminator is close to 1/2, and obtaining a trained generative confrontation neural network model.

Background

The gene expression profile of the drug action refers to the gene expression profile of the cell line after the drug is applied, and the difference between the gene expression profile of the cell line after the drug is applied and the original gene expression profile before the drug is applied can be used as a neutral measurement index of the drug property. Generally, in the process of drug development, specific functions, such as gene regulation, screening related drugs through expression profiles after drug action, and then performing a series of drug-related biological experimental procedures are required. Obtaining gene expression profiles is an extremely important step in the whole drug screening process.

The prediction of the gene expression profile can more intuitively give the effect of the drug on each gene, the obtained gene expression profile has no difference from the real gene expression profile in terms of data, and the gene expression profile can be further used for analyzing the characteristics of the drug or cell lines, such as function enrichment analysis and the like. Meanwhile, the predicted expression profile has the characteristics of high efficiency and low cost, and the gene expression value of the cell line under the action of the medicament can be obtained in a large scale.

Currently, the work on predicting the gene expression profile after drug action can only be done by biological experiments. Adding chemical molecular solvent of medicine into a culture dish of the cell line, and after long-time culture, sequencing the cell line by a gene sequencing technology to obtain a gene expression profile. The conventional method has several problems as follows: 1. the method comprises the following steps of (1) requiring a large amount of capital investment and high cost for the culture of a cell line, the purchase of chemical reagents, the sequencing of a gene expression profile and the like, 2. the culture process of the cell line is easily interfered by the environment, and multiple groups of experiments are required to be carried out, so that the cost is further increased, and 3. the process of culturing the cell line is a long process and the success rate cannot reach one hundred percent.

Disclosure of Invention

The invention aims to provide a method for predicting a gene expression profile after drug action based on generation of an antagonistic network aiming at the defects of the prior art so as to reduce the cost of chemical reagent purchase, cell line culture and sequencing, greatly shorten the period of obtaining the expression profile and improve the success rate of obtaining the expression profile.

In order to achieve the purpose, the technical scheme of the invention comprises the following steps:

1. a prediction method based on the generation of a drug-acted gene expression profile of an antagonistic network is characterized by comprising the following steps:

(1) randomly downloading tens of thousands of gene expression profile data G from a gene expression profile database;

(2) building a self-coding neural network AC consisting of an encoder EN and a decoder DE, and training the network through the gene expression profile data obtained in the step (1) to obtain a trained self-coding neural network;

(3) downloading all data of the gene expression profiles before and after the action of the drug to be researched from a gene expression profile database;

(4) compressing the data in the step (3) by using an encoder in the trained neural network to obtain compressed vector data GE before and after the action of the medicine;

(5) building a generation countermeasure neural network consisting of a generator GN and a discriminator DI;

(6) training the antagonistic neural network by using compressed vector data before and after the action of the medicament to obtain a trained generated antagonistic neural network;

(7) splicing an encoder and a decoder in the trained self-coding neural network and a generator in the trained generation countermeasure neural network in sequence according to the sequence of the encoder- > the generator- > the decoder to obtain a final prediction model;

(8) inputting the gene expression profile of the cell line to be researched before the medicine is applied into a prediction model, and outputting the gene expression profile of the cell line after the medicine is applied.

Compared with the prior art, the invention has the following advantages:

1) compared with the existing biological experiment method, the method not only shortens the time period for acquiring the gene expression profile, but also saves the economic cost for purchasing biological experiment articles such as chemical agents and the like.

2) The invention uses the self-coding neural network to carry out dimension reduction processing on the input data, thereby effectively reducing the parameters for generating the antagonistic neural network and being beneficial to the training of the model.

Drawings

FIG. 1 is a flow chart of an implementation of the present invention;

FIG. 2 is a graph of the error distribution between predicted data and actual data simulated using the present invention.

Detailed Description

The invention is described in further detail below with reference to the figures and specific examples.

Referring to fig. 1, the implementation steps of this example are as follows:

step 1, downloading gene expression profile data.

1a) Downloading all gene expression profile data from a LINCS database;

1b) and (4) disordering the sequence of all gene expression profiles to obtain disordered gene expression profile data G.

And 2, constructing a self-coding network and training the self-coding network.

2a) Building a self-coding network encoder EN with an input dimension of 978, an output dimension of 100 and two middle hidden layers,

wherein the first hidden layer has 500 neurons, the second hidden layer has 200 neurons, and are connected in a full-connected manner;

2b) building a self-coding network decoder DE with an input dimension of 100, an output dimension of 978 and two middle hidden layers, wherein the first hidden layer has 200 neurons, the second hidden layer has 500 neurons, and the neurons are connected in a full-connected mode;

2c) splicing an encoder EN and a decoder DE of the self-coding neural network end to obtain a self-coding network, and randomly initializing all parameters;

2d) inputting the disordered gene expression profile data G into a self-coding network to obtain an output O, and calculating the error between the two: l | | | O-G | | non-conducting phosphor2

2e) The self-coding network parameters are optimized through an optimization algorithm, the existing optimization algorithm comprises a selective gradient descent algorithm, a random gradient descent algorithm, an Adam algorithm and the like, the parameters of the self-coding neural network are updated through the random gradient descent algorithm in the embodiment, but not limited to the random gradient descent algorithm, the parameters are stopped being updated until the error L is not reduced any more, and the trained self-coding neural network is obtained.

And 3, downloading gene expression profile data before and after the drug action.

3a) Downloading gene expression profile data related to the drug bortezomib from a LINCS database;

3b) combining the gene expression profiles of the control group and the experimental group in pairs under the same environment to form a gene expression profile data set before and after the drug action.

And 4, constructing a training data set GE.

4a) Inputting the data set obtained in the step 3b) into an encoder part in the trained self-encoding neural network for compression to obtain a compressed vector of a gene expression profile;

4b) gaussian noises with 10 dimensions are added into the compressed vectors of the gene expression profiles to form a training data set GE.

Step 5, constructing and generating a confrontation network, and training by using a training data set GE

5a) Constructing a generator GN with an input dimension of 110, an output dimension of 100 and two hidden layers in the middle, wherein the two hidden layers are provided with 1000 neurons and are connected in a fully-connected manner;

5b) building a discriminator DI with an input dimension of 100, an output dimension of 1 and two hidden layers in the middle, wherein the two hidden layers are provided with 1000 neurons and are connected in a fully-communicated manner;

5c) sequentially splicing the generator GN and the discriminator DI to form a generation countermeasure network;

5d) randomly initializing all parameters in the generation of the countermeasure neural network;

5e) randomly selecting 64 pairs of data before and after medication in a training data set GE, and inputting a compressed vector before medication into a generator GN to obtain a predicted gene expression profile compressed vector after medication;

5f) training a discriminator DI in the anti-neural network is generated:

5f1) setting the compression vector label of the real post-medication gene expression profile as 1, setting the compression vector label of the predicted post-medication gene expression profile as 0, and inputting the two parts of the gene expression profile compression vectors into a discriminator DI to obtain the output result of the discriminator;

5f2) parameters of a generator GN in the countermeasure network are fixedly generated, self-coding network parameters are optimized through an optimization algorithm, the existing optimization algorithm can select a gradient descent algorithm, a random gradient descent algorithm and an Adam algorithm, the random gradient descent algorithm is adopted in the embodiment, label information and the output result of the discriminator in the step (5f1) are transmitted to the random gradient descent algorithm, and the parameters of DI are updated once;

5g) training generates a discriminator GN in the countering neural network:

5g1) setting the label of a compression vector of a predicted gene expression profile after medication as 1, and inputting the compression vector of the expression profile into a discriminator DI to obtain an output result of the discriminator;

5g2) fixing DI parameters, optimizing self-encoding network parameters through an optimization algorithm, wherein the existing optimization algorithm can select a gradient descent algorithm, a random gradient descent algorithm and an Adam algorithm, and the random gradient descent algorithm is adopted in the embodiment, so that the label information in (5g1) and the output result of the discriminator are transferred to a random gradient descent function to update once GN parameters;

5h) repeating all the steps from (5f) to (5g) until the output probability of the discriminator is close to 1/2, and obtaining a trained antagonistic neural network model

And 6, constructing a prediction model.

6a) Extracting the encoder and the decoder from the self-coding network trained in the step 2 e);

6b) extracting a generator from the generated countermeasure network trained in the step 5 h);

6c) and sequentially splicing the encoder, the generator and the decoder to form a prediction model.

And 7, inputting the gene expression profile before the medicine of the cell line to be predicted into a prediction model to obtain the gene expression profile of the cell line after the medicine bortezomib acts.

The effects of the present invention can be further illustrated by the following simulations:

1. simulation conditions

Simulation experiments were performed on Intel (R) core (TM) i7-8700k CPU, master frequency 3.70GHz, memory 48G, Python 3.6.5 on Ubuntu platform in combination with tensorflow 1.0.

2. Simulation content:

simulation 1, predicting all cell lines related to the drug bortezomib in the LINCS database by using the method of the invention to obtain gene expression profiles of all cell lines after the drug is applied, calculating errors of the predicted gene expression profiles and the real expression profiles, and drawing an error distribution diagram, as shown in FIG. 2, wherein the abscissa represents the average absolute error of the predicted values and the real values of all genes of each gene expression profile, and the ordinate represents the probability density value of the error distribution.

As can be seen from FIG. 2, the average error of all gene expression profiles predicted by the present invention was 1.5, and the predicted gene expression profiles had a low error.

Simulation 2, using the results of simulation 1, calculates the differential regulatory information of part of the related genes, and compares it with the up-down relationship of the real data regulatory information, as shown in table 1:

TABLE 1 predicted regulatory information and actual regulatory information

Figure BDA0002557041470000051

As can be seen from Table 1, the regulation and control information of the gene regulation and control information predicted by the method is consistent with the actual situation in the LINCS database in eight genes related to the bortezomib medicament, and the accuracy of the prediction result of the method is proved.

8页详细技术资料下载
上一篇:一种医用注射器针头装配设备
下一篇:一种isomiR分子标志物的筛选方法及装置

网友询问留言

已有0条留言

还没有人留言评论。精彩留言会获得点赞!

精彩留言,会给你点赞!