Single-sample CERNA network identification method, device, electronic equipment and storage medium

文档序号:1906617 发布日期:2021-11-30 浏览:28次 中文

阅读说明:本技术 单样本ceRNA网络识别方法、装置、电子设备及存储介质 (Single-sample CERNA network identification method, device, electronic equipment and storage medium ) 是由 张俊鹏 赵春文 李司婧 杨燕婷 于 2021-09-06 设计创作,主要内容包括:本发明提供一种单样本ceRNA网络识别方法、装置、电子设备及存储介质,涉及基因识别技术领域。该单样本ceRNA网络识别方法包括:根据匹配样本的miRNA的转录组矩阵、匹配样本的第一靶基因和第二靶基因的转录组矩阵以及先验miRNA和靶基因调控关系数据,识别获取匹配样本中每个样本对应的多个ceRNA竞争关系对。将每个样本对应的多个ceRNA竞争关系对进行融合,得到每个样本对应的ceRNA网络。本发明基于匹配样本的miRNA、第一靶基因和第二靶基因的转录组矩阵和先验miRNA和靶基因调控关系数据识别得到的单样本的ceRNA网络,有助于揭示单样本内的基因调控机制,反映单样本下miRNA靶基因之间的竞争关系。(The invention provides a single-sample CERNA network identification method, a single-sample CERNA network identification device, electronic equipment and a storage medium, and relates to the technical field of gene identification. The single-sample CERNA network identification method comprises the following steps: and identifying and obtaining a plurality of CERNA competition relationship pairs corresponding to each sample in the matched sample according to the transcriptome matrix of the miRNA in the matched sample, the transcriptome matrices of the first target gene and the second target gene in the matched sample, and the prior miRNA and target gene regulation relationship data. And fusing a plurality of ceRNA competition relationship pairs corresponding to each sample to obtain a ceRNA network corresponding to each sample. The single-sample CERNA network obtained by identifying the miRNA of the matched sample, the transcriptome matrix of the first target gene and the second target gene and the prior miRNA and target gene regulation relation data is beneficial to revealing a gene regulation mechanism in the single sample and reflecting the competition relation between the miRNA target genes in the single sample.)

1. A method for identifying a single-sample competitive endogenous ribonucleic acid (CERNA) network, which is characterized by comprising the following steps:

obtaining a transcriptome matrix of miRNA, a first target gene and a second target gene of matched samples, wherein each matched sample comprises a plurality of samples, and each sample comprises a plurality of miRNA, the first target gene and the second target gene;

acquiring prior miRNA and target gene regulation relation data related to a matching sample according to the matching sample;

identifying and obtaining a plurality of CERNA competition relationship pairs corresponding to each sample in the matched sample according to the transcriptome matrix of the miRNA in the matched sample, the transcriptome matrices of the first target gene and the second target gene in the matched sample and the regulation and control relationship data of the prior miRNA and the target gene;

and fusing the multiple pairs of the ceRNA competition relationship corresponding to each sample to obtain a ceRNA network corresponding to each sample.

2. The method of claim 1, wherein obtaining a transcriptome matrix of the miRNA, the first target gene, and the second target gene of the matched sample comprises:

and extracting the transcriptome matrixes of the miRNA, the first target gene and the second target gene with the same biological characteristics from the given transcriptome matrixes of the miRNA, the first target gene and the second target gene to obtain the transcriptome matrixes of the miRNA, the first target gene and the second target gene of the matched sample.

3. The method according to claim 1, wherein the identifying and obtaining a plurality of pairs of the ceRNA competition relationship corresponding to each of the samples in the matching sample according to the transcriptome matrix of the miRNA in the matching sample, the transcriptome matrices of the first target gene and the second target gene in the matching sample, and the prior miRNA and target gene regulation relationship data comprises:

obtaining a significance p value of a shared miRNA, a positive correlation significance p value between a first target gene and a second target gene and a sensitivity correlation significance p value between the first target gene and the second target gene according to the transcriptome matrix of the miRNA of the sample, the transcriptome matrices of the first target gene and the second target gene of the sample and the regulation and control relation data of the prior miRNA and the target genes;

determining that the first target gene and the second target gene are the pair of the CERNA competition relationship of the sample when the significance p-value of the shared miRNA, the positive correlation significance p-value between the first target gene and the second target gene and the sensitivity correlation significance p-value between the first target gene and the second target gene all meet preset conditions.

4. The method according to claim 3, wherein the significance p-value of the shared miRNA, the positive correlation significance p-value between the first target gene and the second target gene, and the sensitivity correlation significance p-value between the first target gene and the second target gene all meet preset conditions, comprising:

the shared miRNA having a significance p-value of less than 0.05, a positive correlated significance p-value between the first and second target genes of less than 0.05, and a sensitivity correlated significance p-value between the first and second target genes of less than 0.05.

5. The method of any one of claims 1 to 4, wherein after obtaining the single sample of the network of cenRNAs, the method further comprises:

and analyzing and verifying the single-sample CERNA network by at least one analysis mode of similarity analysis of the single-sample CERNA network, correlation analysis with the target disease and sample cluster analysis.

6. A single-sample ceRNA network recognition device, comprising:

the system comprises an acquisition module, a first target gene acquisition module and a second target gene acquisition module, wherein the acquisition module is used for acquiring a transcriptome matrix of miRNA, the first target gene and the second target gene of matched samples, each matched sample comprises a plurality of samples, and each sample comprises a plurality of miRNA, the first target gene and the second target gene;

the acquisition module is also used for acquiring prior miRNA and target gene regulation relation data related to the matching sample according to the matching sample;

the identification module is used for identifying and obtaining a plurality of ceRNA competition relationship pairs corresponding to each sample in the matched sample according to the transcriptome matrix of the miRNA in the matched sample, the transcriptome matrices of the first target gene and the second target gene in the matched sample and the regulation and control relationship data of the prior miRNA and the target genes;

and the fusion module is used for fusing the multiple pairs of the ceRNA competition relationship corresponding to each sample to obtain a ceRNA network corresponding to each sample.

7. The apparatus according to claim 6, wherein the obtaining module is specifically configured to extract, according to the biological characteristics of the matched sample, the transcriptome matrices of the miRNA, the first target gene, and the second target gene having the same biological characteristics from the given transcriptome matrices of the miRNA, the first target gene, and the second target gene, so as to obtain the transcriptome matrices of the miRNA, the first target gene, and the second target gene of the matched sample.

8. The apparatus according to claim 6, wherein the identification module is specifically configured to obtain a significance p-value of a shared miRNA, a positive correlation significance p-value between a first target gene and a second target gene, and a sensitivity correlation significance p-value between the first target gene and the second target gene according to the transcriptome matrix of the miRNA of the sample, the transcriptome matrices of the first target gene and the second target gene of the sample, and the a priori miRNA and target gene regulatory relationship data;

determining that the first target gene and the second target gene are the pair of the CERNA competition relationship of the sample when the significance p-value of the shared miRNA, the positive correlation significance p-value between the first target gene and the second target gene and the sensitivity correlation significance p-value between the first target gene and the second target gene all meet preset conditions.

9. An electronic device comprising a processor, a storage medium and a bus, wherein the storage medium stores machine-readable instructions executable by the processor, and when the electronic device is operated, the processor communicates with the storage medium via the bus, and the processor executes the machine-readable instructions to perform the method of any one of claims 1-5.

10. A computer-readable storage medium, characterized in that a computer program is stored on the computer-readable storage medium, which computer program, when being executed by a processor, performs the method of any one of claims 1-5.

Technical Field

The invention relates to the technical field of gene identification, in particular to a single-sample CERNA network identification method, a single-sample CERNA network identification device, electronic equipment and a storage medium.

Background

Microribonucleic acids (mirnas) are a class of non-coding conserved ribonucleic acid (RNA) molecules. It normally negatively regulates protein-encoding gene mRNA (messenger RNA) to regulate gene expression levels. Although miRNA is a small molecule, it plays an important role in many biological processes. The competitive Endogenous RNA (ceRNA) hypothesis revealed that: different RNA transcripts bind to miRNA in a competitive manner. These competing RNA transcripts, collectively referred to as cernas, comprise four major classes: protein coding gene mRNA, long non-coding RNA (lncRNA), pseudogene transcript (pseudogene) and circular RNA (circRNA), wherein the formed RNA competition network is a cerRNA network. The previous study shows that: the association of the cerRNA network with many human complex diseases including malignant tumor diseases is indicative of the potential diagnosis and target treatment biomarkers of the human complex diseases by the cerRNA.

Since the CERNA is involved in a plurality of important biological processes, the previous research methods mainly identify the CERNA network under the hierarchical level of a plurality of samples based on transcriptome data of the plurality of samples. In the prior art, a cenRNA network associated with lncRNA at a single cell level can be obtained based on single cell transcriptome data.

However, existing methods do not take into account miRNA transcriptome data. However, the ceRNA network is closely related to the miRNA transcriptome data, so that the obtained ceRNA network cannot accurately reflect the competition relationship between the miRNA target genes under a single sample.

Disclosure of Invention

The present invention aims to provide a method, an apparatus, an electronic device and a storage medium for identifying a single-sample cepna network, so as to obtain a single-sample cepna network and reflect the competition relationship between miRNA target genes in a single sample.

In order to achieve the above purpose, the embodiment of the present invention adopts the following technical solutions:

in a first aspect, an embodiment of the present invention provides a single-sample ceRNA network identification method, including:

obtaining a transcriptome matrix of miRNA, a first target gene and a second target gene of matched samples, wherein each matched sample comprises a plurality of samples, and each sample comprises a plurality of miRNA, the first target gene and the second target gene. And acquiring prior miRNA and target gene regulation relation data related to the matching sample according to the matching sample. And identifying and obtaining a plurality of CERNA competition relationship pairs corresponding to each sample in the matched sample according to the transcriptome matrix of the miRNA in the matched sample, the transcriptome matrices of the first target gene and the second target gene in the matched sample, and the prior miRNA and target gene regulation relationship data. And fusing a plurality of ceRNA competition relationship pairs corresponding to each sample to obtain a ceRNA network corresponding to each sample.

In some embodiments, obtaining a transcriptome matrix of the miRNA, the first target gene, and the second target gene that match the sample comprises: extracting the transcriptome matrixes of the miRNA, the first target gene and the second target gene with the same biological characteristics from the given transcriptome matrixes of the miRNA, the first target gene and the second target gene to obtain the transcriptome matrixes of the miRNA, the first target gene and the second target gene of the matched sample.

In some embodiments, identifying and obtaining a plurality of pairs of the ceRNA competitive relationship corresponding to each sample in the matched sample according to the transcriptome matrix of the miRNA in the matched sample, the transcriptome matrices of the first target gene and the second target gene in the matched sample, and the prior miRNA and target gene regulation relationship data includes: according to the transcriptome matrix of the miRNA of the sample, the transcriptome matrix of the first target gene and the second target gene of the sample and the regulation and control relation data of the prior miRNA and the target genes, obtaining the significance p value of the shared miRNA, the positive correlation significance p value between the first target gene and the second target gene and the sensitivity correlation significance p value between the first target gene and the second target gene. And when the significance p value of the shared miRNA, the positive correlation significance p value between the first target gene and the second target gene and the sensitivity correlation significance p value between the first target gene and the second target gene all accord with preset conditions, determining that the first target gene and the second target gene are a ceRNA competition relation pair of the sample.

In some embodiments, the significance p-value of the shared miRNA, the positive correlated significance p-value between the first target gene and the second target gene, and the sensitivity-related significance p-value between the first target gene and the second target gene all meet predetermined conditions, comprising: a significance p-value of less than 0.05 for shared mirnas, a positive correlation significance p-value between the first target gene and the second target gene of less than 0.05, and a sensitivity correlation significance p-value between the first target gene and the second target gene of less than 0.05.

In some embodiments, after obtaining a single sample of the ceRNA network, the method further comprises: and analyzing and verifying the single-sample CERNA network by at least one analysis mode of single-sample CERNA network similarity analysis, target disease association analysis and sample cluster analysis.

In a second aspect, an embodiment of the present invention further provides a single-sample ceRNA network recognition apparatus, including: the acquisition module is used for acquiring a transcriptome matrix of the miRNA, the first target gene and the second target gene of the matched samples, each matched sample comprises a plurality of samples, and each sample comprises a plurality of miRNA, the first target gene and the second target gene. And the acquisition module is also used for acquiring the prior miRNA related to the matching sample and the target gene regulation relation data according to the matching sample. And the identification module is used for identifying and obtaining a plurality of ceRNA competition relationship pairs corresponding to each sample in the matched sample according to the transcriptome matrix of the miRNA in the matched sample, the transcriptome matrices of the first target gene and the second target gene in the matched sample, and the prior miRNA and target gene regulation and control relationship data. And the fusion module is used for fusing the multiple pairs of the ceRNA competition relationship corresponding to each sample to obtain a ceRNA network corresponding to each sample.

In some embodiments, the obtaining module is specifically configured to extract, according to the biological characteristics of the matched sample, the transcriptome matrices of the miRNA, the first target gene, and the second target gene having the same biological characteristics from the given transcriptome matrices of the miRNA, the first target gene, and the second target gene, so as to obtain the transcriptome matrices of the miRNA, the first target gene, and the second target gene of the matched sample.

In some embodiments, the identification module is specifically configured to obtain a significance p-value of a shared miRNA, a positive correlation significance p-value between a first target gene and a second target gene, and a sensitivity correlation significance p-value between the first target gene and the second target gene, based on the transcriptome matrix of the miRNA of the sample, the transcriptome matrices of the first target gene and the second target gene of the sample, and the prior miRNA and target gene regulatory relationship data. And when the significance p value of the shared miRNA, the positive correlation significance p value between the first target gene and the second target gene and the sensitivity correlation significance p value between the first target gene and the second target gene all accord with preset conditions, determining that the first target gene and the second target gene are a ceRNA competition relation pair of the sample.

In some embodiments, the significance p-value of the shared miRNA, the positive correlated significance p-value between the first target gene and the second target gene, and the sensitivity-related significance p-value between the first target gene and the second target gene all meet predetermined conditions, comprising: a significance p-value of less than 0.05 for shared mirnas, a positive correlation significance p-value between the first target gene and the second target gene of less than 0.05, and a sensitivity correlation significance p-value between the first target gene and the second target gene of less than 0.05.

In some embodiments, the apparatus further comprises a validation module for performing analytical validation of the single sample cerana network by at least one of single sample cerana network similarity analysis, association analysis with the target disease, and sample cluster analysis.

In a third aspect, an embodiment of the present invention provides an electronic device, including: a processor, a storage medium and a bus, the storage medium storing machine-readable instructions executable by the processor, the processor and the storage medium communicating via the bus when the electronic device is operating, the processor executing the machine-readable instructions to perform the steps of any one of the above-mentioned methods of the first aspect.

In a fourth aspect, the present invention further provides a computer-readable storage medium, on which a computer program is stored, where the computer program is executed by a processor to perform the steps of any one of the methods in the first aspect.

The invention has the beneficial effects that: by obtaining a transcriptome matrix of the mirnas, the first target genes and the second target genes of matching samples, each matching sample comprises a plurality of mirnas, the first target genes and the second target genes. And acquiring prior miRNA and target gene regulation relation data related to the matching sample according to the matching sample. And identifying and obtaining a plurality of CERNA competition relationship pairs corresponding to each sample in the matched sample according to the transcriptome matrix of the miRNA in the matched sample, the transcriptome matrices of the first target gene and the second target gene in the matched sample, and the prior miRNA and target gene regulation relationship data. And fusing a plurality of ceRNA competition relationship pairs corresponding to each sample. Because the miRNA expression level is closely related to the competition among the miRNA target genes, the single-sample CERNA network obtained by identifying the miRNA of the matched sample, the transcriptome matrixes of the first target gene and the second target gene and the prior miRNA and target gene regulation relation data is helpful for revealing a gene regulation mechanism in the single sample and reflecting the competition relation among the miRNA target genes under the single sample.

Drawings

In order to more clearly illustrate the technical solutions of the embodiments of the present invention, the drawings needed to be used in the embodiments will be briefly described below, it should be understood that the following drawings only illustrate some embodiments of the present invention and therefore should not be considered as limiting the scope, and for those skilled in the art, other related drawings can be obtained according to the drawings without inventive efforts.

FIG. 1 is a schematic flow chart of a single sample CERNA network identification method according to an embodiment of the present application;

FIG. 2 is a schematic flow chart of S130 in a single-sample CERNA network identification method according to another embodiment of the present application;

FIG. 3 is a schematic representation of single sample CERNA network similarity in one embodiment of the present application;

FIG. 4 shows a schematic representation of the similarity of the breast cancer associated ceraRNA networks in an embodiment of the present application;

FIG. 5 is a schematic diagram illustrating cluster analysis of breast cancer samples according to an embodiment of the present application;

FIG. 6 is a schematic structural diagram of a single-sample recognition device for a CERNA network according to an embodiment of the present application;

fig. 7 is a schematic structural diagram of an electronic device according to an embodiment of the present application.

Detailed Description

In order to make the objects, technical solutions and advantages of the embodiments of the present invention clearer, the technical solutions in the embodiments of the present invention will be clearly and completely described below with reference to the drawings in the embodiments of the present invention, and it is obvious that the described embodiments are some, but not all, embodiments of the present invention.

Fig. 1 is a schematic flow chart of a single-sample cepna network identification method according to an embodiment of the present disclosure, wherein an execution subject of the method may be a device with data processing capability, such as a desktop computer, a notebook computer, a server, a cloud server, an intelligent terminal, and a tablet computer, which is not limited herein.

As shown in fig. 1, the method includes:

s110, obtaining a transcriptome matrix of miRNA, the first target gene and the second target gene of the matched sample.

Wherein the matched sample comprises a plurality of samples, each sample comprises a plurality of miRNAs and a first target gene (RNA)1) And a second target Gene (RNA)2)。

In some embodiments, the transcriptome matrices of the miRNA, the first target gene, and the second target gene of the matching sample may be obtained by extracting the transcriptome matrices of the miRNA, the first target gene, and the second target gene of the same biological characteristic from the given transcriptome matrices of the miRNA, the first target gene, and the second target gene of the matching sample according to the biological characteristic of the matching sample.

It should be noted that the transcriptome matrix of a given matched sample may be derived from a Gene Expression profiling database (Gene Expression Omnibus, GEO) or the like that provides Gene Expression profiling data (i.e., transcriptome matrix). The type of the target gene may be messenger RNA (mRNA), Long non-coding RNA (lncRNA), circular RNA (circrna), pseudogene (pseudogene), and the like, and is not limited herein.

In some embodiments, the sample miRNA, RNA are matched1And RNA2The expression profile data of (a) are respectively expressed as:

matching sample miRNA:

RNA1

RNA2

wherein, the miRNA and the RNA of each matched sample comprise s matched samples1And RNA2The number is n respectively1、n2And n3

And S120, acquiring prior miRNA and target gene regulation relation data related to the matching sample according to the matching sample.

In some embodiments, the a priori miRNA-target gene regulatory relationship comprises miRNA-RNA1And miRNA-RNA2The data types of the regulation and control relation data can be divided into computer prediction type data and experimental verification type data. The used prior miRNA-target gene regulation relation data can be obtained from a single database or obtained by fusing a plurality of different databases.

S130, identifying and obtaining a plurality of ceRNA competition relationship pairs corresponding to each sample in the matched sample according to the transcriptome matrix of the miRNA in the matched sample, the transcriptome matrices of the first target gene and the second target gene in the matched sample, and the prior miRNA and target gene regulation and control relationship data.

Fig. 2 is a schematic flowchart of S130 in the single-sample ceRNA network identification method according to an embodiment of the present disclosure.

In some embodiments, referring to fig. 2, S130 may be implemented by:

s1301, acquiring a significance p value of a shared miRNA, a positive correlation significance p value between a first target gene and a second target gene and a sensitivity correlation significance p value between the first target gene and the second target gene according to the transcriptome matrix of the miRNA of the sample, the transcriptome matrix of the first target gene and the second target gene of the sample and the regulation and control relation data of the prior miRNA and the target genes.

In some embodiments, calculating the significance p-value of the shared miRNAs may be based on a priori miRNA-target gene regulatory relationship data using a hyper-geometric distribution test (hyper-geometric distribution) to measure RNA in each sample1(x) And RNA2(y) statistical significance p-values of miRNAs shared between (y) were calculated as follows:

wherein N represents the number of miRNAs in the data set, and M and K represent regulatory RNAs respectively1(x) And RNA2(y) number of miRNAs, R represents RNA1(x) And RNA2(y) number of miRNAs shared.

The smaller p-value for significance of shared miRNAs indicates that: in a single sample k, RNA1(x) And RNA2(y) the more pronounced the shared miRNAs, the more typical the p-value<0.05。

In some embodiments, calculating a positive correlation significant p-value can be performed by:

in a single sample k, RNA1(x) And RNA2Statistical correlation values between (y)Is defined as:

wherein s is the number of matched samples,andare respectively RNA1(x) And RNA2(y) the number of adjacent samples in the vicinity of a single sample k,is RNA1(x) And RNA2(y) the number of coincident samples in the vicinity of a single sample k.

Essentially following a normal distribution, the mean and standard deviation are as follows:

thus, normalized statistical correlation valuesComprises the following steps:

each statistical correlation valueFor a positive correlation significance p-value, the following is calculated:

wherein the pnorm () function is used to calculate a normal distribution random number smaller thanP-value, a smaller p-value indicates: in thatIn a single sample k, RNA1(x) And RNA2The more likely (y) there is a positive correlation, typically the p value<0.05。

In some embodiments, calculating the sensitivity-related significance p-value can be performed by:

in a single sample k, RNA is considered under the precondition of sharing miRNAs (z)1(x) And RNA2Statistical correlation values between (y)Is defined as:

wherein the content of the first and second substances,to share mirnas (z) the number of adjacent samples around a single sample k,for sharing miRNAs (z), RNA1(x) And RNA2(y) the number of coincident samples in the vicinity of a single sample k,is RNA1(x) And sharing miRNAs (z) the number of coincident samples in the vicinity of a single sample k,is RNA2(y) and shared mirnas (z) number of coincident samples in the vicinity of a single sample k.

Essentially following a normal distribution, the mean and standard deviation are as follows:

thus, normalized statistical correlation valuesComprises the following steps:

each statistical correlation valueFor a positive correlation significance p-value, the following is calculated:

wherein the pnorm () function is used to calculate a normal distribution random number smaller thanP-value, a smaller p-value indicates: in a single sample k, RNA is considered under the precondition of sharing miRNAs (z)1(x) And RNA2The more likely (y) there is a conditional dependency, usually the p-value<0.05。

In a single sample k, RNA1(x) And RNA2(y) sensitive correlation coefficient betweenThe definition is as follows:

due to the fact thatAndall follow a normal distribution, and thereforeAlso obey a normal distribution. Each sensitive correlation coefficientFor one significance p-value, the following is calculated:

wherein the pnorm () function is used to calculate a normal distribution random number smaller thanP-value, a smaller p-value indicates: sharing of miRNAs (z) RNA in a single sample k1(x) And RNA2The correlation between (y) is significant, typically the p-value<0.05。

S1302, when the significance p value of the shared miRNA, the positive correlation significance p value between the first target gene and the second target gene and the sensitivity correlation significance p value between the first target gene and the second target gene all accord with preset conditions, determining that the first target gene and the second target gene are a ceRNA competition relation pair of the sample.

In some embodiments, the preset condition may be: and when the significance p value of the shared miRNA is less than 0.05, the positive correlation significance p value between the first target gene and the second target gene is less than 0.05, and the sensitivity correlation significance p value between the first target gene and the second target gene is less than 0.05, determining that the first target gene and the second target gene are a ceRNA competition relation pair of the sample.

S140, fusing the multiple pairs of the ceRNA competition relationship corresponding to each sample to obtain a ceRNA network corresponding to each sample.

In some embodiments, fusing multiple pairs of ceRNA competitors may be fusing all of the ceRNA competitors of the kth sample to determine the ceRNA network of the kth sample.

The fusion mentioned in the application refers to merging each pair of the ceRNA competition relationship in the kth sample, and generating a ceRNA network according to the first target gene and the second target gene of each competition relationship pair and the merged competition relationship network.

Wherein each sample generates a corresponding network of cernas.

The single-sample CERNA network identification method provided by the application can be applied to malignant tumor transcriptome data to screen out a malignant tumor sample specific CERNA network.

Herein, a single sample ceRNA network identification method is described in connection with identifying single sample ceRNA networks associated with breast cancer.

First, miRNA of a breast Cancer matching sample, target gene expression profile data, and survival data information of the breast Cancer sample are obtained from a Cancer gene expression profile database tcga (the Cancer genomeatla). By removing duplicates and miRNA and target genes without gene names, 894 miRNAs and 19068 target gene expression profiling data were obtained for 690 breast cancer matched samples.

To reduce computational complexity, a feature selection method based on a Cox regression model is used to screen for feature genes (including miRNAs and target genes). After feature selection, a total of 690 miRNAs matched against the breast cancer sample and 1824 target gene expression profile data were obtained.

In this example, RNA1Is mRNA, RNA2Also mRNA. Thus matching sample miRNA, RNA1And RNA2The expression profile data are respectively expressed as:

D1={G1,1;G1,2;…;G1,690}∈R690×45

D2={G2,1;G2,2;…;G2,690}∈R690×1824

D3={G3,1;G3,2;…;G3,690}∈R690×1824

the prior miRNA-mRNA regulation and control relation data is obtained by fusing two experimental verification type databases of mirBase v8.0 and TarBase v8.0, and 762540 miRNA-mRNA regulation and control relation pairs are obtained in total. Breast cancer-associated mRNAs were obtained from both the DisGeNET v6.0 and COSMIC v86 databases, giving a total of 5694 mRNAs associated with breast cancer.

The miRNA, RNA of the sample are then matched according to the given breast cancer1And RNA2Expression profile data D1、D2And D3And identifying the single sample CERNA network. Wherein the shared miRNAs significance p-value threshold, the positive correlation significance p-value threshold and the sensitivity correlation significance p-value threshold are both set to 0.05. In each breast cancer sample, all mRNA-associated pairs of ceRNA competition relationships were fused to obtain a network of ceRNA for each breast cancer sample.

And finally, carrying out analysis verification on the single-sample CERNA network by at least one analysis mode of single-sample CERNA network similarity analysis, correlation analysis with the target disease and sample clustering analysis.

First, a single sample of the ceRNA network similarity was analyzed.

Given the cerRNA network ceR for sample i and sample jiAnd ceRjSimilarity values Sim for two single sample ceRNA networks (ceR)i,ceRj) The calculation is as follows:

wherein, overlap (ceR)i,ceRj) The same pair of ceRNA competition relationships, min, in the two ceRNA networks (ceR)i,ceRj) Is the ceRNA competition relationship logarithm of the smallest network of the two kinds of ceRNA networks. Sim (ceR)i,ceRj) Has a value range of [0,1 ]]The larger the value, the samples i and j are indicatedThe more similar the ceRNA network in (a).

Then, a single sample of the ceRNA network associated with breast cancer is obtained.

Based on 5694 breast cancer associated mRNAs, single sample ceRNA networks associated with breast cancer were extracted. For each mRNA-mRNA competition, if and only if both mRNAs are associated with breast cancer, the mRNA-mRNA competition is considered to be the breast cancer-associated cepRNA competition. And fusing the ceRNA competition relations related to the breast cancer in each sample to obtain a single-sample ceRNA network related to the breast cancer.

Next, a sample cluster analysis may be performed.

Referring to the similarity matrix of the different single-sample CERNA networks obtained in the above example

The similarity matrix is also considered to be a single sample similarity matrix. The single sample distance matrix is defined as follows:

Dis=1-Sim

and (3) performing cluster analysis on 690 breast cancer samples by using a hierarchical clustering method based on the single-sample distance matrix Dis. Since breast cancer mainly has 5 subtypes (lumineal a type, lumineal B type, HER2 overexpression type, basal cell type and normal gene expression type), the number of clusters was set to 5.

FIG. 3 shows a schematic representation of single sample CERNA network similarity in an embodiment of the present application, FIG. 4 shows a schematic representation of breast cancer associated CERNA network similarity in an embodiment of the present application, and FIG. 5 shows a schematic representation of cluster analysis of breast cancer samples in an embodiment of the present application.

In this example, the similarity of the ceRNA network of 690 breast cancer single samples is [0.11,0.43] (as shown in fig. 2), and the result shows that: the network of cenna varies widely among breast cancer samples. Meanwhile, the similarity of the cenna network associated with breast cancer in 690 samples ranged from [0.09,0.44] (as shown in fig. 3), and the results show that: the network of ceRNA associated with breast cancer also varied widely in each sample. Based on the similarity of the single breast cancer sample CERNA network, a hierarchical clustering method is utilized to perform cluster analysis on 690 breast cancer samples (the number of clusters is 5), and the hierarchical clustering result shows that the 690 breast cancer samples can be well classified into 5 categories (as shown in FIG. 4).

The above results show that: each breast cancer sample is unique, and in order to study the involvement of CERNA in the genetic regulation of malignant tumors, the network of CERNA must be identified from a single sample level. In addition, the single-sample CERNA network can construct the association relationship between the samples, and a new method is provided for the classification of malignant tumor samples.

In conclusion, the single-sample CERNA network identification method provided by the invention can identify the malignant tumor specific CERNA network, provides technical support for clinical diagnosis and treatment of human malignant tumor, and has important biological significance.

Fig. 6 is a schematic structural diagram of a single-sample ceRNA network recognition device according to an embodiment of the present application, as shown in fig. 6,

a single-sample ceRNA network recognition device, comprising:

the obtaining module 21 is configured to obtain a transcriptome matrix of the miRNA, the first target gene, and the second target gene of the matched samples, where each matched sample includes a plurality of samples, and each sample includes a plurality of mirnas, the first target gene, and the second target gene.

The obtaining module 21 is further configured to obtain, according to the matching sample, prior miRNA and target gene regulation relation data related to the matching sample.

And the identification module 22 is configured to identify and obtain a plurality of pairs of the ceRNA competitive relationship corresponding to each sample in the matched sample according to the transcriptome matrix of the miRNA in the matched sample, the transcriptome matrices of the first target gene and the second target gene in the matched sample, and the prior miRNA and target gene regulation relationship data.

And the fusion module 23 is configured to fuse the multiple pairs of the ceRNA competition relationship corresponding to each sample to obtain a ceRNA network corresponding to each sample.

In some embodiments, the obtaining module 21 is specifically configured to extract, according to the biological characteristics of the matched sample, the transcriptome matrices of the miRNA, the first target gene, and the second target gene having the same biological characteristics from the given transcriptome matrices of the miRNA, the first target gene, and the second target gene, so as to obtain the transcriptome matrices of the miRNA, the first target gene, and the second target gene of the matched sample.

In some embodiments, the identification module 22 is specifically configured to obtain the significance p-value of the shared miRNA, the positive correlation significance p-value between the first target gene and the second target gene, and the sensitivity correlation significance p-value between the first target gene and the second target gene according to the transcriptome matrix of the miRNA of the sample, the transcriptome matrices of the first target gene and the second target gene of the sample, and the prior miRNA and target gene regulatory relationship data. And when the significance p value of the shared miRNA, the positive correlation significance p value between the first target gene and the second target gene and the sensitivity correlation significance p value between the first target gene and the second target gene all accord with preset conditions, determining that the first target gene and the second target gene are a ceRNA competition relation pair of the sample.

In some embodiments, the significance p-value of the shared miRNA, the positive correlated significance p-value between the first target gene and the second target gene, and the sensitivity-related significance p-value between the first target gene and the second target gene all meet predetermined conditions, comprising:

a significance p-value of less than 0.05 for shared mirnas, a positive correlation significance p-value between the first target gene and the second target gene of less than 0.05, and a sensitivity correlation significance p-value between the first target gene and the second target gene of less than 0.05.

In some embodiments, referring to fig. 6, the apparatus further comprises a validation module 24 for performing analytical validation of the single-sample ceRNA network by at least one of single-sample ceRNA network similarity analysis, association analysis with the target disease, and sample clustering analysis.

The above-mentioned apparatus is used for executing the method provided by the foregoing embodiment, and the implementation principle and technical effect are similar, which are not described herein again.

These above modules may be one or more integrated circuits configured to implement the above methods, such as: one or more Application Specific Integrated Circuits (ASICs), or one or more microprocessors (DSPs), or one or more Field Programmable Gate Arrays (FPGAs), among others. For another example, when one of the above modules is implemented in the form of a Processing element scheduler code, the Processing element may be a general-purpose processor, such as a Central Processing Unit (CPU) or other processor capable of calling program code. For another example, these modules may be integrated together and implemented in the form of a system-on-a-chip (SOC).

Fig. 7 is a schematic structural diagram of an electronic device according to an embodiment of the present application.

As shown in fig. 7, the electronic apparatus includes: a processor 31, a computer-readable storage medium 32, and a bus 33, wherein:

the electronic device may include one or more processors 31, a bus 33, and a storage medium 32, wherein the storage medium 32 is configured to store machine-readable instructions, the processor 31 is communicatively coupled to the storage medium 32 via the bus 33, and the processor 31 executes the machine-readable instructions stored by the storage medium 32 to perform the above-described method embodiments.

The electronic device may be a general-purpose computer, a server, a mobile terminal, or the like, and is not limited herein. The electronic device is used for realizing the above method embodiments of the present application.

It is noted that the processor 31 may include one or more processing cores (e.g., a single-core processor or a multi-core processor). Merely by way of example, a Processor may include a Central Processing Unit (CPU), an Application Specific Integrated Circuit (ASIC), an Application Specific Instruction Set Processor (ASIP), a Graphics Processing Unit (GPU), a Physical Processing Unit (PPU), a Digital Signal Processor (DSP), a Field Programmable Gate Array (FPGA), a Programmable Logic Device (PLD), a controller, a microcontroller Unit, a Reduced Instruction Set computer (Reduced Instruction Set computer), a microprocessor, or the like, or any combination thereof.

The storage medium 32 may include: including mass storage, removable storage, volatile Read-and-write Memory, or Read-Only Memory (ROM), among others, or any combination thereof. By way of example, mass storage may include magnetic disks, optical disks, solid state drives, and the like; removable memory may include flash drives, floppy disks, optical disks, memory cards, zip disks, tapes, and the like; volatile read-write Memory may include Random Access Memory (RAM); the RAM may include Dynamic RAM (DRAM), Double data Rate Synchronous Dynamic RAM (DDR SDRAM); static RAM (SRAM), Thyristor-Based Random Access Memory (T-RAM), Zero-capacitor RAM (Zero-RAM), and the like. By way of example, ROMs may include Mask Read-Only memories (MROMs), Programmable ROMs (PROMs), Erasable Programmable ROMs (PERROMs), Electrically Erasable Programmable ROMs (EEPROMs), compact disk ROMs (CD-ROMs), digital versatile disks (ROMs), and the like.

For ease of illustration, only one processor 31 is depicted in the electronic device. It should be noted, however, that the electronic device in the present application may also comprise a plurality of processors 31, and thus the steps performed by one processor described in the present application may also be performed by a plurality of processors in combination or individually. For example, if the processor 31 of the electronic device executes step a and step B, it should be understood that step a and step B may also be executed by two different processors together or executed in one processor separately. For example, a first processor performs step a and a second processor performs step B, or the first processor and the second processor perform steps a and B together.

Optionally, the present invention also provides a computer-readable storage medium, on which a computer program is stored, which, when being executed by a processor, performs the steps of the method as described above.

In the above embodiments, the descriptions of the respective embodiments have respective emphasis, and reference may be made to the related descriptions of other embodiments for parts that are not described or illustrated in a certain embodiment.

Those of ordinary skill in the art will appreciate that the various illustrative elements and algorithm steps described in connection with the embodiments disclosed herein may be implemented as electronic hardware or combinations of computer software and electronic hardware. Whether such functionality is implemented as hardware or software depends upon the particular application and design constraints imposed on the implementation. Skilled artisans may implement the described functionality in varying ways for each particular application, but such implementation decisions should not be interpreted as causing a departure from the scope of the present application.

In the embodiments provided in the present application, it should be understood that the disclosed apparatus and method may be implemented in other ways. For example, the above-described apparatus embodiments are merely illustrative, and for example, a module or a unit may be divided into only one logical function, and may be implemented in other ways, for example, a plurality of units or components may be combined or integrated into another system, or some features may be omitted, or not executed. In addition, the shown or discussed mutual coupling or direct coupling or communication connection may be an indirect coupling or communication connection through some interfaces, devices or units, and may be in an electrical, mechanical or other form.

Units described as separate parts may or may not be physically separate, and parts displayed as units may or may not be physical units, may be located in one place, or may be distributed on a plurality of network units. Some or all of the units can be selected according to actual needs to achieve the purpose of the solution of the embodiment.

The above embodiments are only used to illustrate the technical solutions of the present application, and not to limit the same; although the present application has been described in detail with reference to the foregoing embodiments, it should be understood by those of ordinary skill in the art that: the technical solutions described in the foregoing embodiments may still be modified, or some technical features may be equivalently replaced; such modifications and substitutions do not substantially depart from the spirit and scope of the embodiments of the present application and are intended to be included within the scope of the present application.

18页详细技术资料下载
上一篇:一种医用注射器针头装配设备
下一篇:基于卷积去噪自编码机的piRNA-疾病关联关系预测方法

网友询问留言

已有0条留言

还没有人留言评论。精彩留言会获得点赞!

精彩留言,会给你点赞!