Method and related device for predicting drug and cell line response

文档序号：1891646 发布日期：2021-11-26 浏览：32次中文

阅读说明：本技术 药物与细胞系反应预测方法及相关装置 (Method and related device for predicting drug and cell line response ) 是由李泽超张捷于 2021-08-27 设计创作，主要内容包括：本发明公开了一种药物与细胞系反应预测方法及相关装置,包括：获取细胞系基因数据及药物分子数据；对所述细胞系基因数据及所述药物分子数据进行第一特征提取处理以得到第一特征序列；对所述细胞系基因数据及所述药物分子数据进行第二特征提取处理以得到第二特征序列；将所述第一特征序列与所述第二特征序列结合,以得到药物与细胞系反应预测结果。以此避免人为参与实验,降低人力成本,提高效率,并且该预测方法通过第二特征提取以及第一特征提取的结合提高了预测结果的精确性。(The invention discloses a method for predicting the reaction of a medicine and a cell line and a related device, comprising the following steps: acquiring cell line gene data and drug molecule data; performing first characteristic extraction processing on the cell line gene data and the drug molecule data to obtain a first characteristic sequence; performing second characteristic extraction processing on the cell line gene data and the drug molecule data to obtain a second characteristic sequence; combining the first signature sequence with the second signature sequence to obtain a prediction of drug-cell line response. Therefore, manual participation in experiments is avoided, labor cost is reduced, efficiency is improved, and the accuracy of the prediction result is improved by the combination of the second feature extraction and the first feature extraction.)

1. A method for predicting drug response to a cell line, the method comprising:

acquiring cell line gene data and drug molecule data;

performing first characteristic extraction processing on the cell line gene data and the drug molecule data to obtain a first characteristic sequence;

performing second characteristic extraction processing on the cell line gene data and the drug molecule data to obtain a second characteristic sequence;

combining the first signature sequence with the second signature sequence to obtain a prediction of drug-cell line response.

2. The method of claim 1, wherein combining the first signature sequence with the second signature sequence to obtain a prediction of drug-cell line response comprises:

and adding the first characteristic sequences and the second characteristic sequences in a one-to-one correspondence manner to obtain a drug and cell line reaction prediction result.

3. The method of claim 1, wherein the performing a first feature extraction process on the cell line gene data and the drug molecule data to obtain a first feature sequence comprises:

carrying out all characteristic extraction processing on the cell line gene data and the drug molecule data to obtain a first characteristic sequence;

the second characteristic extraction processing of the cell line gene data and the drug molecule data to obtain a second characteristic sequence comprises:

and carrying out local feature extraction processing on the cell line gene data and the drug molecule data to obtain a second feature sequence.

4. The method of claim 3, wherein the step of performing the total feature extraction process on the cell line gene data and the drug molecule data to obtain the first feature sequence comprises:

carrying out feature extraction processing on the cell line gene data to obtain a first sub-feature of a first dimension; carrying out feature extraction processing on the drug molecule data to obtain a second sub-feature of the first dimension; and

and splicing the first sub-feature of the first dimension and the second sub-feature of the first dimension to obtain the first feature sequence.

5. The method of predicting drug-to-cell line response of claim 4, wherein said performing a feature extraction process on said cell line gene data to obtain a first sub-feature of a first dimension; and the step of performing feature extraction processing on the drug molecule data to obtain a second sub-feature of the first dimension comprises:

performing the feature extraction processing on the cell line gene data by using a first multilayer perceptron to obtain a first sub-feature of the first dimension;

and performing the feature extraction processing on the drug molecule data by using a second multilayer perceptron to obtain a second sub-feature of the first dimension.

6. The method of claim 4, wherein the stitching the first sub-feature of the first dimension and the second sub-feature of the first dimension to obtain the first feature sequence comprises:

splicing the first sub-feature of the first dimension and the second sub-feature of the first dimension;

and performing feature extraction processing on the spliced first sub-feature and the spliced second sub-feature by using a third multilayer perceptron to obtain the first feature sequence.

7. The method of claim 3, wherein the local feature extraction of the cell line gene data and the drug molecule data to obtain a second feature sequence comprises:

mapping the cell line gene data to obtain a first sub-feature of a second dimension, and mapping the drug molecule data to obtain a second sub-feature of the second dimension;

associating the first sub-feature of the second dimension with the second sub-feature of the second dimension to obtain an associated feature;

extracting the correlation characteristics of which the interaction between the cell line and the medicine is greater than a threshold value from the correlation characteristics;

adding the extracted correlated features of the cell lines having drug interaction greater than a threshold to obtain the second signature sequence.

8. The method of claim 7, wherein the mapping the cell line gene data to obtain a first sub-feature of a second dimension and the mapping the drug molecule data to obtain a second sub-feature of the second dimension comprises:

mapping the cell line gene data with a first embedding layer to obtain a first sub-feature of the second dimension, and mapping the drug molecule data with a second embedding layer to obtain a second sub-feature of the second dimension.

9. The method of claim 7, wherein the correlating the first sub-feature of the second dimension with the second sub-feature of the second dimension to obtain a correlated feature comprises:

and calculating the dot product of the first sub-feature of the second dimension and the second sub-feature of the second dimension so as to obtain the associated feature.

10. The method of claim 1, wherein the predicted drug-cell line response is a predicted IC50 value.

11. A model for predicting drug and cell line response, the model comprising:

the first characteristic extraction network is used for carrying out first characteristic extraction processing on cell line gene data and drug molecule data to obtain a first characteristic sequence;

the second characteristic extraction network is used for carrying out second characteristic extraction processing on the cell line gene data and the drug molecule data to obtain a second characteristic sequence;

and the feature combination network is connected with the first feature extraction network and the second feature extraction network and is used for combining the first feature sequence with the second feature sequence to obtain a response prediction result of the medicine and the cell line.

12. The predictive model of claim 11, wherein the first feature extraction network comprises: the system comprises a first multilayer perceptron, a second multilayer perceptron and a splicing network;

the first multilayer perceptron is used for carrying out feature extraction processing on the cell line gene data to obtain a first sub-feature of a first dimension; the second multilayer perceptron is used for carrying out feature extraction processing on the drug molecule data to obtain a second sub-feature of the first dimension;

the splicing network is connected with the first multilayer perceptron and the second multilayer perceptron and is used for splicing the first sub-feature of the first dimension and the second sub-feature of the first dimension to obtain the first feature sequence.

13. The predictive model of claim 11, wherein the first feature extraction network further comprises: a third multi-layer perceptron; and the third multilayer perceptron is connected with the splicing network and is used for carrying out feature extraction processing on the spliced first sub-features and the spliced second sub-features to obtain the first feature sequence.

14. The predictive model of claim 11, wherein the second feature extraction network comprises: the system comprises a first embedding layer, a second embedding layer, a feature association network, a computing network and a feature extraction network;

wherein the first embedding layer is used for mapping the cell line gene data to obtain a first sub-feature of the second dimension; the second embedding layer is used for mapping the drug molecule data to obtain a second sub-feature of the second dimension;

the feature association network is connected with the first embedding layer and the second embedding layer and is used for associating the first sub-feature of the second dimension with the second sub-feature of the second dimension to obtain an associated feature;

the feature extraction network is connected with the feature association network and is used for extracting association features of which the interaction between the cell line and the medicine is greater than a threshold value from the association features;

the computing network is connected with the feature extraction network and used for adding the extracted cell lines and the associated features of which the drug interaction is larger than a threshold value to obtain the second feature sequence.

15. A drug-and-cell line response prediction device, comprising: a memory storing program instructions and a processor retrieving the program instructions from the memory to perform the method of drug and cell line response prediction according to any one of claims 1-10.

16. A computer-readable storage medium storing a program file executable to implement the method for predicting a drug-cell line response according to any one of claims 1 to 10.

Technical Field

The invention relates to the field of cloud computing application, in particular to a method for predicting reaction of a drug and a cell line and a related device.

Background

The human cancer cell line has stable genetic background and unlimited reproductive capacity, and the clinical tumor model is always one of the main experimental objects of biomedicine. Currently, predicting the response of cancer patients to cancer drugs is an important issue for accurate medical treatment, and the following two main research procedures are mainly adopted in the field:

one is that researchers do a lot of experimental verification and quantitative analysis between cancer cell lines and drugs according to the existing anti-cancer drug database, and the whole research experiment is completed manually by the researchers, so the cost of manpower and material resources is high, and the efficiency is low.

The other method is to infer the relationship between cancer cells and drugs according to the similarity between cell lines by using traditional statistical or machine learning methods such as matrix decomposition and the like based on the genome similarity of the cancer cells. The study neglected the relationship between genes, resulting in a poor prediction of the response between cell lines and drugs.

There is therefore a need for a method that can reduce costs and increase efficiency, and that can also predict well the reaction between cell lines and drugs.

Disclosure of Invention

The invention provides a method and a related device for predicting the reaction of a drug and a cell line, which do not need to participate in an experiment manually, reduce the labor cost and improve the efficiency.

In order to solve the above technical problems, a first technical solution provided by the present invention is: provided is a method for predicting a drug-cell line reaction, comprising: acquiring cell line gene data and drug molecule data; performing first characteristic extraction processing on cell line gene data and drug molecule data to obtain a first characteristic sequence; performing second characteristic extraction processing on the cell line gene data and the drug molecule data to obtain a second characteristic sequence; combining the first signature sequence with the second signature sequence to obtain a prediction of drug-cell line response. By the method, manual participation in experiments can be eliminated, labor cost is reduced, efficiency is improved, and accuracy of prediction results is improved through combination of local feature extraction and all feature extraction.

Wherein said combining said first signature sequence with said second signature sequence to obtain a prediction of drug-cell line response comprises: and adding the first characteristic sequences and the second characteristic sequences in a one-to-one correspondence manner to obtain a drug and cell line reaction prediction result.

Wherein the step of performing a first feature extraction process on the cell line gene data and the drug molecule data to obtain a first feature sequence comprises: carrying out all characteristic extraction processing on the cell line gene data and the drug molecule data to obtain a first characteristic sequence; the second characteristic extraction processing of the cell line gene data and the drug molecule data to obtain a second characteristic sequence comprises: and carrying out local feature extraction processing on the cell line gene data and the drug molecule data to obtain a second feature sequence.

Wherein, the step of carrying out all characteristic extraction processing on the cell line gene data and the drug molecule data to obtain a first characteristic sequence comprises the following steps: carrying out feature extraction processing on cell line gene data to obtain a first sub-feature of a first dimension; carrying out feature extraction processing on the drug molecule data to obtain a second sub-feature of the first dimension; and splicing the first sub-feature of the first dimension and the second sub-feature of the first dimension to obtain a first feature sequence. And all features are extracted, and the features corresponding to the extracted cell line gene data are combined with the features corresponding to the drug molecule data, so that the accuracy of a prediction result is further improved.

Performing feature extraction processing on cell line gene data to obtain a first sub-feature of a first dimension; and performing feature extraction processing on the drug molecule data to obtain a second sub-feature of the first dimension, wherein the second sub-feature comprises: carrying out feature extraction processing on cell line gene data by using a first multilayer perceptron to obtain a first sub-feature of a first dimension; and performing feature extraction processing on the drug molecule data by using a second multilayer perceptron to obtain a second sub-feature of the first dimension. And all features are extracted, and the features corresponding to the extracted cell line gene data are combined with the features corresponding to the drug molecule data, so that the accuracy of a prediction result is further improved.

Splicing the first sub-feature of the first dimension and the second sub-feature of the first dimension to obtain a first feature sequence comprises: splicing the first sub-feature of the first dimension and the second sub-feature of the first dimension; and performing feature extraction processing on the spliced first sub-feature and the spliced second sub-feature by using a third multilayer perceptron to obtain a first feature sequence. The characteristics corresponding to the extracted cell line gene data are combined with the characteristics corresponding to the drug molecule data, so that the accuracy of the prediction result is further improved.

Wherein, the local feature extraction processing of the cell line gene data and the drug molecule data to obtain a second feature sequence comprises: mapping the cell line gene data to obtain a first sub-feature of a second dimension, and mapping the drug molecule data to obtain a second sub-feature of the second dimension; associating the first sub-feature of the second dimension with the second sub-feature of the second dimension to obtain an associated feature; extracting the correlation characteristics of which the interaction between the cell line and the medicine is greater than a threshold value from the correlation characteristics; and adding the extracted cell lines and the associated features with the drug interaction greater than the threshold value to obtain a second feature sequence. And local feature extraction is carried out, and the features corresponding to the extracted cell line gene data and the features corresponding to the drug molecule data are correlated to obtain new features, so that the accuracy of a prediction result is further improved.

Wherein mapping cell line gene data to obtain a second dimension of the first sub-feature, and mapping drug molecule data to obtain a second dimension of the second sub-feature comprises: mapping the cell line gene data by using the first embedding layer to obtain a first sub-feature of a second dimension, and mapping the drug molecule data by using the second embedding layer to obtain a second sub-feature of the second dimension.

Wherein, associating the first sub-feature of the second dimension with the second sub-feature of the second dimension to obtain an associated feature comprises: and calculating the dot product of the first sub-feature of the second dimension and the second sub-feature of the second dimension to further obtain the associated feature. And the extracted characteristics corresponding to the cell line gene data are associated with the characteristics corresponding to the drug molecule data to obtain new characteristics, so that the accuracy of a prediction result is further improved.

Wherein the predicted result of the drug-cell line reaction is the predicted IC50 value of the drug-cell line reaction.

In order to solve the above technical problems, a second technical solution provided by the present invention is: providing a model for predicting drug and cell line responses, the model comprising: the first characteristic extraction network is used for carrying out first characteristic extraction processing on cell line gene data and drug molecule data to obtain a first characteristic sequence; the second characteristic extraction network is used for carrying out second characteristic extraction processing on the cell line gene data and the drug molecule data to obtain a second characteristic sequence; and the feature combination network is used for connecting the first feature extraction network and the second feature extraction network and combining the first feature sequence and the second feature sequence to obtain the drug and cell line response prediction result. The model does not need human participation, reduces the labor cost, improves the efficiency, and improves the accuracy of the prediction result through the combination of local feature extraction and all feature extraction.

Wherein the first feature extraction network comprises: the system comprises a first multilayer perceptron, a second multilayer perceptron and a splicing network; the first multilayer perceptron is used for carrying out feature extraction processing on cell line gene data to obtain first sub-features of a first dimension; the second multilayer perceptron is used for carrying out feature extraction processing on the drug molecule data to obtain a second sub-feature of the first dimension; the splicing network is connected with the first multilayer perceptron and the second multilayer perceptron and is used for splicing the first sub-feature of the first dimension and the second sub-feature of the first dimension to obtain a first feature sequence.

Wherein the first feature extraction network further comprises: a third multi-layer perceptron; and the third multilayer perceptron is connected with the splicing network and is used for carrying out feature extraction processing on the spliced first sub-features and the spliced second sub-features to obtain a first feature sequence.

Wherein the second feature extraction network comprises: the system comprises a first embedding layer, a second embedding layer, a feature association network, a computing network and a feature extraction network; the first embedding layer is used for mapping cell line gene data to obtain a first sub-feature of a second dimension; the second embedding layer is used for mapping the drug molecule data to obtain a second sub-feature of a second dimension; the feature association network is connected with the first embedding layer and the second embedding layer and is used for associating the first sub-feature of the second dimension with the second sub-feature of the second dimension to obtain an associated feature; the feature extraction network is connected with the feature association network and is used for extracting association features of which the interaction between the cell line and the medicine is greater than a threshold value from the association features; and calculating a network connection feature extraction network for adding the extracted cell lines and the associated features of which the drug interaction is greater than a threshold value to obtain a second feature sequence.

In order to solve the above technical problems, a third technical solution provided by the present invention is: provided is a drug-and-cell line reaction prediction apparatus including: a memory storing program instructions and a processor retrieving the program instructions from the memory to perform the method of predicting drug and cell line response of any of the above.

In order to solve the above technical problems, a fourth technical solution provided by the present invention is: there is provided a computer readable storage medium storing a program file executable to implement the method of predicting a drug-cell line response of any one of the above.

The invention has the beneficial effects that: different from the prior art, the invention obtains cell line gene data and drug molecule data; performing first characteristic extraction processing on cell line gene data and drug molecule data to obtain a first characteristic sequence; performing second characteristic extraction processing on the cell line gene data and the drug molecule data to obtain a second characteristic sequence; combining the first signature sequence with the second signature sequence to obtain a prediction of drug-cell line response. Therefore, manual participation in experiments is avoided, labor cost is reduced, efficiency is improved, and the accuracy of the prediction result is improved by the combination of local feature extraction and all feature extraction.

Drawings

In order to more clearly illustrate the technical solutions in the embodiments of the present invention, the drawings needed to be used in the description of the embodiments are briefly introduced below, it is obvious that the drawings in the following description are only some embodiments of the present invention, and other drawings can be obtained by those skilled in the art without inventive efforts, wherein:

FIG. 1 is a schematic flow chart of a first embodiment of the method for predicting a reaction between a drug and a cell line according to the present invention;

FIG. 2 is a flowchart illustrating an embodiment of step S12 in FIG. 1;

FIG. 3 is a flowchart illustrating an embodiment of step S122 in FIG. 2;

FIG. 4 is a flowchart illustrating an embodiment of step S13 in FIG. 1;

FIG. 5 is a schematic flow chart illustrating a first embodiment of the method for predicting a reaction between a drug and a cell line according to the present invention;

FIG. 6 is a schematic structural diagram of a model for predicting the reaction of a drug with a cell line according to a first embodiment of the present invention;

FIG. 7 is a schematic structural diagram of a model for predicting the reaction of drugs with cell lines according to a second embodiment of the present invention;

FIG. 8 is a schematic structural diagram of an embodiment of the apparatus for predicting the reaction between a drug and a cell line according to the present invention;

fig. 9 is a schematic structural diagram of a computer-readable storage medium according to the present invention.

Detailed Description

The technical solutions in the embodiments of the present application will be clearly and completely described below with reference to the drawings in the embodiments of the present application, and it is obvious that the described embodiments are only a part of the embodiments of the present application, and not all of the embodiments. All other embodiments, which can be derived by a person skilled in the art from the embodiments given herein without making any creative effort, shall fall within the protection scope of the present application.

The present invention will be described in detail below with reference to the accompanying drawings and examples.

Please refer to fig. 1, which is a schematic flow chart of a first embodiment of the method for predicting the reaction between a drug and a cell line according to the present invention. In order to give consideration to the relationship between cell line gene data and drug molecule data, the model adopts Deep and Cross parts, as the Deep part network structure is a DNN network module which is often used for Deep combination of features, the DNN network module is essentially linear weighting of vectors, but as artificial constraint is not provided, DNN is easy to be overfitted on a training set, so that the DNN does not have enough generalization capability in a prediction task, the Cross part is added in the application, and the Cross part mainly carries out Cross processing on the cell line gene data and the drug molecule data, so that new features are generated between the cell line gene data and the drug molecule data, the model can have better generalization capability, and the prediction result is more accurate. Specifically, as shown in fig. 1, the present embodiment includes:

step S11: and acquiring cell line gene data and drug molecule data.

Specifically, cell line gene data and drug molecule data are obtained. In one embodiment, the gene data corresponding to the cell line can be searched according to the name of the cell line, and the molecular expression or fingerprint expression corresponding to the drug can be searched according to the name of the drug as the molecular data of the drug, so that the molecular data can be identified by the model.

Step S12: and performing first characteristic extraction processing on the cell line gene data and the drug molecule data to obtain a first characteristic sequence.

Specifically, in this embodiment, a Deep network module is used to perform a first feature extraction process on cell line gene data and drug molecule data to obtain a first feature sequence. In one embodiment, a Deep network module is used to perform all feature extraction processing on cell line gene data and drug molecule data to obtain a first feature sequence.

Referring to fig. 2, fig. 2 is a schematic flowchart of an embodiment of step S12, including:

step S121: carrying out feature extraction processing on cell line gene data to obtain a first sub-feature of a first dimension; and carrying out feature extraction processing on the drug molecule data to obtain a second sub-feature of the first dimension.

Referring to fig. 5, in an embodiment, a first multi-layered sensor may be used to perform feature extraction on cell line gene data to obtain a first sub-feature of a first dimension; and performing feature extraction processing on the drug molecule data by using a second multilayer perceptron to obtain a second sub-feature of the first dimension.

Specifically, cell line gene data are input into a first multilayer perceptron, and the first multilayer perceptron processes the cell line gene data to obtain a first sub-characteristic of a first dimension; and inputting the drug molecule data into a second multilayer perceptron, and processing the drug molecule data by the second multilayer perceptron to further obtain a second sub-characteristic of the first dimension.

Step S122: and splicing the first sub-feature of the first dimension and the second sub-feature of the first dimension to obtain a first feature sequence.

Specifically, with reference to fig. 5, the first sub-feature of the first dimension and the second sub-feature of the first dimension are spliced to obtain a first feature sequence. In an embodiment, when performing the splicing, the first sub-features of the first dimension may be spliced before and after the second sub-features of the first dimension, or the first sub-features of the first dimension may be spliced in one-to-one correspondence with the second sub-features of the first dimension. For example, a first sub-feature of a first dimension includes A, B, C, D, and a second sub-feature of the first dimension includes a, b, c, d; a first sub-feature of a first dimension may be stitched before and after a second sub-feature of the first dimension, for example, after stitching: A. b, C, D, a, B, C, D, or a, B, C, D, A, B, C, D; the first sub-features of the first dimension can be spliced with the second sub-features of the first dimension in a one-to-one correspondence manner, such as a, B, C, D and D; or a, B, C, D, without specific limitation.

In one embodiment, please refer to fig. 3, which includes:

step S1221: and splicing the first sub-feature of the first dimension and the second sub-feature of the first dimension.

Specifically, the first sub-feature of the first dimension and the second sub-feature of the first dimension are spliced.

Step S1222: and performing feature extraction processing on the spliced first sub-feature and the spliced second sub-feature by using a third multilayer perceptron to obtain a first feature sequence.

And inputting the spliced first sub-feature of the first dimension and the spliced second sub-feature of the first dimension into a third multilayer perceptron, so that the third multilayer perceptron processes the spliced first sub-feature of the first dimension and the spliced second sub-feature of the first dimension, and further a first feature sequence is obtained. Specifically, feature extraction processing is performed on the spliced first sub-feature and the spliced second sub-feature through a third multi-layer perceptron, and a first feature sequence is obtained as one-dimensional data.

Please continue to refer to fig. 1:

step S13: and performing second characteristic extraction processing on the cell line gene data and the drug molecule data to obtain a second characteristic sequence.

Specifically, in this embodiment, a Cross network module is used to perform a second feature extraction process on the cell line gene data and the drug molecule data to obtain a second feature sequence. In one embodiment, a Cross network module is used to perform local feature extraction on the cell line gene data and the drug molecule data to obtain a second feature sequence. Please refer to fig. 4, which includes:

step S131: mapping the cell line gene data to obtain a first sub-feature of a second dimension, and mapping the drug molecule data to obtain a second sub-feature of the second dimension.

In one embodiment, the cell line gene data is mapped using the first embedding layer to obtain a first sub-feature of the second dimension, and the drug molecule data is mapped using the second embedding layer to obtain a second sub-feature of the second dimension. Specifically, cell line gene data are input into the first embedding layer, so that the first embedding layer maps the cell line gene data, and then a first sub-feature of a second dimension is obtained, and drug molecule data are input into the second embedding layer, so that the second embedding layer maps the drug molecule data, and then a second sub-feature of the second dimension is obtained. Specifically, the obtained first sub-feature of the second dimension and the second sub-feature of the second dimension are feature vectors with the same dimension, density and non-orthogonality, so that the calculation is convenient, and the logical significance is better met.

Step S132: and associating the first sub-feature of the second dimension with the second sub-feature of the second dimension to obtain an associated feature.

Specifically, referring to fig. 5, it is assumed that the first sub-features of the second dimension obtained by mapping the cell line gene data are two, for example, the first sub-feature 1 of the second dimension and the first sub-feature 2 of the second dimension; and the second sub-features of the second dimension obtained after the drug molecule data is mapped are also two, for example, the second sub-feature 1 of the second dimension and the second sub-feature 2 of the second dimension. When the first sub-feature of the second dimension and the second sub-feature of the second dimension are associated, one of them may be associated correspondingly, for example, the first sub-feature 1 of the second dimension is associated with the second sub-feature 1 of the second dimension, the first sub-feature 2 of the second dimension is associated with the second sub-feature 2 of the second dimension, and then a plurality of associated features are obtained.

Specifically, in an embodiment, when the first sub-feature of the second dimension and the second sub-feature of the second dimension are associated, the association feature obtained by performing dot multiplication on the first sub-feature of the second dimension and the second sub-feature of the second dimension to perform association is a dot product. For example, a dot product of a first sub-feature 1 in a second dimension and a second sub-feature 1 in the second dimension is calculated, and a dot product of a first sub-feature 2 in the second dimension and a second sub-feature 2 in the second dimension is calculated.

Step S133: and extracting the correlation characteristics of which the interaction between the cell line and the medicine is greater than a threshold value from the correlation characteristics.

Specifically, the correlation characteristics of which the interaction between the cell line and the drug is greater than the threshold value in the correlation characteristics are extracted, and in an embodiment, MaxPooling operation can be performed on the correlation characteristics in the model, so that the correlation characteristics of which the interaction between the cell line and the drug is greater than the threshold value are extracted.

Step S134: and adding the extracted cell lines and the associated features with the drug interaction greater than the threshold value to obtain a second feature sequence.

Specifically, the extracted cell lines and the associated features of which the interaction of the drug is greater than the threshold are added and summed to finally obtain a second feature sequence. It should be noted that the second feature sequence is a feature vector with the same dimension as the first feature sequence. In one embodiment, the first feature sequence is a one-dimensional feature vector and the second feature sequence is also a one-dimensional feature vector.

Please continue to refer to fig. 1:

step S14: combining the first signature sequence with the second signature sequence to obtain a prediction of drug-cell line response.

In one embodiment, the first signature sequence and the second signature sequence are added in a one-to-one correspondence to obtain a prediction of drug-cell line response.

Specifically, each drug and cell line has its corresponding prediction result, and when the first signature sequence and the second signature sequence are added, the addition may be performed according to their corresponding sample names, for example, according to the corresponding drug names or cell line names, and the obtained result is the prediction result of the drug-cell line reaction. Specifically, the predicted result of the drug-cell line response is the predicted IC50 value of the drug-cell line response. IC50(half maximum inhibition concentration) refers to the half inhibitory concentration of the antagonist measured. It indicates that a drug or substance (inhibitor) is inhibiting half the amount of a biological process (or a substance, such as an enzyme, cellular receptor or microorganism, that is involved in the process).

The method predicts the reaction of the gene data of the cell line and the compound data of the medicine by the method, and further obtains the reaction prediction result of the medicine and the cell line.

Fig. 6 is a schematic structural diagram of a model for predicting the reaction between a drug and a cell line according to an embodiment of the present invention. It includes: a first feature extraction network 51, a first feature extraction network 52 and a feature combination network 53.

The first feature extraction network 51 is configured to perform a first feature extraction process on cell line gene data and drug molecule data to obtain a first feature sequence. The second feature extraction network 52 is configured to perform a second feature extraction process on the cell line gene data and the drug molecule data to obtain a second feature sequence. The feature combining network 53 is connected to the first feature extracting network 51 and the second feature extracting network 52, and is configured to combine the first feature sequence with the second feature sequence to obtain a drug-cell line response prediction result.

In one embodiment, the first feature extraction network 51 is used to perform a total feature extraction on the cell line gene data and the drug molecule data to obtain a first feature sequence. The second feature extraction network 52 is used for performing local feature extraction processing on the cell line gene data and the drug molecule data to obtain a second feature sequence. The feature combination network 53 is used to add the first feature sequence and the second feature sequence in a one-to-one correspondence to obtain a prediction result of drug-cell line response.

Referring to fig. 7, the first feature extraction network 51 includes: a first multi-tier perceptron 511, a second multi-tier perceptron 512, and a stitching network 513.

The first multilayer perceptron 511 is configured to perform feature extraction processing on cell line gene data to obtain a first sub-feature of a first dimension; the second multi-layer sensor 512 is configured to perform feature extraction processing on the drug molecule data to obtain a second sub-feature of the first dimension.

The splicing network 513 is connected to the first multi-layer perceptron 511 and the second multi-layer perceptron 512, and is configured to splice the first sub-feature of the first dimension and the second sub-feature of the first dimension to obtain a first feature sequence. Specifically, in an embodiment, when performing the splicing, the first sub-feature of the first dimension may be spliced before and after the second sub-feature of the first dimension, or the first sub-feature of the first dimension may be spliced in one-to-one correspondence with the second sub-feature of the first dimension. For example, a first sub-feature of a first dimension includes A, B, C, D, and a second sub-feature of the first dimension includes a, b, c, d; a first sub-feature of a first dimension may be stitched before and after a second sub-feature of the first dimension, for example, after stitching: A. b, C, D, a, B, C, D, or a, B, C, D, A, B, C, D; the first sub-features of the first dimension can be spliced with the second sub-features of the first dimension in a one-to-one correspondence manner, such as a, B, C, D and D; or a, B, C, D, without specific limitation.

Wherein the first feature extraction network 51 further comprises: a third multi-layered perceptron 514. The third multi-layer perceptron 514 is connected with the splicing network 513; and the characteristic extraction processing module is used for carrying out characteristic extraction processing on the spliced first sub-characteristic and the spliced second sub-characteristic to obtain a first characteristic sequence. In one embodiment, the first feature sequence is one-dimensional data.

Wherein the second feature extraction network 52 comprises: a first embedding layer 521, a second embedding layer 522, a feature association network 523, a computing network 525, and a feature extraction network 524.

The first embedding layer 521 is used for mapping cell line gene data to obtain a first sub-feature of a second dimension; the second embedding layer 522 is used to map the drug molecule data to obtain a second sub-feature of a second dimension. The feature association network 523 is connected to the first embedding layer 521 and the second embedding layer 522, and configured to associate the first sub-feature of the second dimension with the second sub-feature of the second dimension to obtain an associated feature; specifically, when the first sub-feature of the second dimension and the second sub-feature of the second dimension are associated, the first sub-feature of the second dimension and the second sub-feature of the second dimension may be associated by performing dot multiplication, and the obtained associated feature is a dot product. For example, a dot product of a first sub-feature 1 in a second dimension and a second sub-feature 1 in the second dimension is calculated, and a dot product of a first sub-feature 2 in the second dimension and a second sub-feature 2 in the second dimension is calculated.

The feature extraction network 524 is connected to the feature association network 523, and is configured to extract, from the association features, association features with which the interaction between the cell line and the drug is greater than a threshold value. Specifically, the feature extraction network 524 may be a MaxPooling layer, and MaxPooling operation is performed on the associated features, so as to extract the associated features of which the interaction between the cell line and the drug is greater than a threshold value.

The computing network 525 is connected to a feature extraction network 524 for summing the extracted cell lines with associated features having drug interactions greater than a threshold value to obtain a second feature sequence.

The model for predicting the reaction of the medicine and the cell line, provided by the invention, combines the Deep network module and the Cross network module to predict the reaction of the gene data of the cell line and the compound data of the medicine, so as to obtain the prediction result of the reaction of the medicine and the cell line, so that the prediction result is more accurate.

In one embodiment, the drug-and-cell line response prediction model is obtained by training a deep learning regression model. Specifically, the method for training to obtain the prediction model of the reaction between the drug and the cell line comprises the following steps:

(1) and acquiring a training sample set, wherein the training sample set comprises gene data of the cell line, compound data of the medicine and reaction result annotation of the compound data of the medicine and the gene data of the cell line.

Specifically, in one embodiment, gene data of the cell line, compound data of the drug, and a response result label of the compound data of the drug and the gene data of the cell line are obtained. Wherein the results of the reaction of the compound data with the genetic data of the cell line are labeled as labeled IC50 values for the drug and the cell line.

In an embodiment, after the gene data of the cell line, the compound data of the drug, and the reaction result between the compound data of the drug and the gene data of the cell line are obtained and labeled, the obtained data may be preprocessed, for example, the obtained data is subjected to duplicate checking, repeated data is eliminated, or the obtained data may be sorted and integrated. The main purpose is to preprocess the acquired data so that it can be more easily identified by the model. The preprocessing process may be performed by human processing, or in another embodiment, may be performed by computer intelligent processing, and is not limited specifically.

In one embodiment, the name of a drug is obtained, and a molecular expression and/or a fingerprint expression corresponding to the drug is determined according to the name of the drug; the molecular expressions and/or fingerprint expressions are integrated into compound data of the drug. And acquiring the name of the cell line, and determining the gene expression data, copy number variation and point mutation data corresponding to the cell line according to the name of the cell line. And integrating the gene expression data, copy number variation and point mutation data corresponding to the cell line into the gene data of the cell line. And (3) performing at least processing of filling, normalization and one-hot coding on gene expression data, copy number variation and point mutation data corresponding to the cell lines. And integrating the gene expression data, copy number variation and point mutation data corresponding to the processed cell line into the gene data of the cell line.

It should be noted that, the drug name of a general drug label is an identification name of a drug, but it cannot be identified by a model, so a molecular expression or a fingerprint expression of the drug corresponding to the drug name needs to be matched to enable it to be identified by the model and to be predicted by the model. Therefore, the molecular expression and/or fingerprint expression corresponding to the drug needs to be searched according to the drug name.

Specifically, after the name of the drug is obtained, a corresponding molecular expression of SMILES (Simplified molecular input specification) may be searched in an organic small molecule bioactivity database (PubChem), or a fingerprint expression corresponding to the drug may also be searched.

The obtained gene expression data, copy number variation and point mutation data can be subjected to filling, normalization, one-hot coding and other processing. For example, if the gene expression data obtained is incomplete, the missing portions may be filled in using known rules.

(2) And inputting the gene data of the cell line and the compound data of the medicine into a deep learning regression model for prediction to obtain a prediction result after the reaction of the compound data of the medicine and the gene data of the cell line.

Specifically, the compound data of the drug and the gene data of the cell line are input into a deep learning regression model, and prediction is performed in the deep learning regression model, so that a prediction result of the reaction between the compound data of the drug and the gene data of the cell line is obtained.

In one embodiment, the training sample set may be input into the deep learning regression model in batches for prediction, resulting in a prediction of the response of the compound data of the plurality of drugs to the gene data of the cell line. The training sample set can be simultaneously and completely input into a deep learning regression model for prediction, and a prediction result of the reaction of the compound data of the medicine and the gene data of the cell line can be obtained.

(3) And (3) performing iterative training on the deep learning regression model by using the prediction result of the reaction of the compound data of the medicine and the gene data of the cell line and the reaction result of the compound data of the medicine and the gene data of the cell line to obtain the prediction model of the reaction of the medicine and the cell line.

Specifically, in one embodiment, after the prediction result of the reaction between the compound data of the drug and the gene data of the cell line is obtained by predicting through the deep regression model, the deep learning regression model is iteratively trained by using the prediction result of the reaction between the compound data of the drug and the gene data of the cell line and the reaction result label between the compound data of the drug and the gene data of the cell line, so as to obtain the prediction model of the reaction between the drug and the cell line. Wherein the predicted outcome of the reaction of the compound data of the drug with the gene data of the cell line is the predicted IC50 value of the reaction of the drug with the cell line.

In one embodiment, the difference between the predicted outcome of the reaction of the compound data of the drug with the gene data of the cell line and the response annotation of the compound data of the drug with the gene data of the cell line is calculated. And performing iterative training on the deep learning regression model according to the difference value to obtain a prediction model of the reaction of the medicine and the cell line.

Specifically, in an embodiment, when the training sample set is input to the deep regression model in batches for prediction, after the prediction result of the reaction between the compound data of the drug and the gene data of the cell line is obtained by prediction in the first batch, the prediction result of the reaction between the compound data of the drug and the gene data of the cell line obtained by prediction in the first batch and the difference value labeled on the reaction result between the compound data of the drug and the gene data of the cell line are calculated, and the deep learning regression model is iteratively trained according to the difference value obtained in the first batch; and inputting the second batch of training sample set into the deep regression model for prediction, calculating a prediction result of the reaction of the compound data of the medicine of the second batch and the gene data of the cell line and a difference value marked on the reaction result of the compound data of the medicine and the gene data of the cell line, predicting the deep learning regression model according to the difference value obtained for the second time until all data of the training sample set are completely learned, and further obtaining the prediction model of the reaction of the medicine and the cell line.

Or in another embodiment, after all the data in all the training sample sets are input into the deep learning regression model at one time to predict and obtain the prediction result of the reaction between the compound data of the drug and the gene data of the cell line, the prediction result of the reaction between the compound data of the drug and the gene data of the cell line and the difference value marked on the reaction result between the compound data of the drug and the gene data of the cell line are calculated, and the deep learning regression model is iteratively trained according to the difference value to obtain the prediction model of the reaction between the drug and the cell line.

Specifically, in an embodiment, when the deep learning regression model is iteratively trained through the difference, convergence may be performed through a loss function, or the deep learning regression model may be iteratively trained according to the difference through a back propagation method, so as to obtain a prediction model of a reaction between a drug and a cell line.

This example was trained to obtain a model for predicting drug response to cell lines using the above method. Compared with the prior art, the method can predict the reaction of the medicine and the cell line, does not need to manually do too many experimental researches, reduces the loss of manpower and material resources, and can improve the efficiency by model prediction. In addition, the method of the embodiment combines the compound data of the drug and the gene data of the cell line for model training, overcomes the defect that the existing relation between genes is ignored, and can well predict the reaction between the cell line and the drug.

In the embodiment, for the non-existent cell line or drug, the model can predict the interaction relationship between the gene data of the cell line and the compound data of the drug according to the components of the cell line and the compound data of the drug. And the model can analyze the relationship between genes and drug compounds in the cell line and can also take the relationship between the genes and the drug compounds of the cell line into consideration through the cross of the cell line and the drug. Compared with the traditional method, the research method combines the relationship between cell line genes and the relationship between the cell line genes and the drug compound, and the accuracy of the prediction result is far better than that of the traditional probability-based method.

Please refer to fig. 8, which is a schematic structural diagram of an embodiment of the apparatus for predicting a reaction between a drug and a cell line according to the present invention. The drug and cell line response prediction apparatus comprises a memory 102 and a processor 101 interconnected.

The memory 102 is used to store program instructions for implementing the method for predicting drug and cell line response of any of the above.

The processor 101 is configured to execute program instructions stored in the memory 102.

The processor 101 may also be referred to as a Central Processing Unit (CPU). The processor 101 may be an integrated circuit chip having signal processing capabilities. The processor 101 may also be a general purpose processor, a Digital Signal Processor (DSP), an Application Specific Integrated Circuit (ASIC), a Field Programmable Gate Array (FPGA) or other programmable logic device, discrete gate or transistor logic, discrete hardware components. A general purpose processor may be a microprocessor or the processor may be any conventional processor or the like.

The memory 102 may be a memory bank, TF card, etc. and may store all the information in the drug and cell line response prediction apparatus, including the input raw data, the computer program, the intermediate run results, and the final run results. It stores and retrieves information based on the location specified by the controller. With the memory, the medicine and cell line reaction prediction equipment has a memory function, and normal work can be guaranteed. The storage of the drug and cell line reaction prediction apparatus may be classified into a primary storage (internal storage) and a secondary storage (external storage) according to the purpose of use, and there is a classification method into an external storage and an internal storage. The external memory is usually a magnetic medium, an optical disk, or the like, and can store information for a long period of time. The memory refers to a storage component on the main board, which is used for storing data and programs currently being executed, but is only used for temporarily storing the programs and the data, and the data is lost when the power is turned off or the power is cut off.

In the several embodiments provided in the present application, it should be understood that the disclosed method and apparatus may be implemented in other ways. For example, the above-described apparatus embodiments are merely illustrative, and for example, a division of a module or a unit is merely a logical division, and an actual implementation may have another division, for example, a plurality of units or components may be combined or integrated into another system, or some features may be omitted, or not executed. In addition, the shown or discussed mutual coupling or direct coupling or communication connection may be an indirect coupling or communication connection through some interfaces, devices or units, and may be in an electrical, mechanical or other form.

Units described as separate parts may or may not be physically separate, and parts displayed as units may or may not be physical units, may be located in one place, or may be distributed on a plurality of network units. Some or all of the units can be selected according to actual needs to achieve the purpose of the embodiment.

In addition, functional units in the embodiments of the present application may be integrated into one processing unit, or each unit may exist alone physically, or two or more units are integrated into one unit. The integrated unit can be realized in a form of hardware, and can also be realized in a form of a software functional unit.

The integrated unit, if implemented in the form of a software functional unit and sold or used as a stand-alone product, may be stored in a computer readable storage medium. Based on such understanding, the technical solution of the present application may be substantially implemented or contributed to by the prior art, or all or part of the technical solution may be embodied in a software product, which is stored in a storage medium and includes instructions for causing a computer device (which may be a personal computer, a system server, a network device, or the like) or a processor (processor) to execute all or part of the steps of the method of the embodiments of the present application.

Please refer to fig. 9, which is a schematic structural diagram of a computer-readable storage medium according to the present invention. The storage medium of the present application stores a program file 201 capable of implementing all the above-mentioned drug and cell line reaction prediction methods, wherein the program file 201 may be stored in the storage medium in the form of a software product, and includes several instructions for enabling a computer device (which may be a personal computer, a server, or a network device) or a processor (processor) to execute all or part of the steps of the methods according to the embodiments of the present application. The aforementioned storage device includes: various media capable of storing program codes, such as a usb disk, a mobile hard disk, a Read-Only Memory (ROM), a Random Access Memory (RAM), a magnetic disk or an optical disk, or terminal devices, such as a computer, a server, a mobile phone, and a tablet.

The above description is only an embodiment of the present invention, and not intended to limit the scope of the present invention, and all modifications of equivalent structures and equivalent processes, which are made by using the contents of the present specification and the accompanying drawings, or directly or indirectly applied to other related technical fields, are included in the scope of the present invention.

19页详细技术资料下载

上一篇：一种医用注射器针头装配设备

下一篇：一种用于人类遗传病基因检测的智能解读方法及系统

Method and related device for predicting drug and cell line response

相关技术

网友询问留言