Marker gene for colorectal cancer recurrence prediction in stage II and application thereof

文档序号:1553698 发布日期:2020-01-21 浏览:9次 中文

阅读说明:本技术 一种用于ii期结直肠癌复发预测的标记基因及应用 (Marker gene for colorectal cancer recurrence prediction in stage II and application thereof ) 是由 丁克峰 陆玮 肖乾 李军 于 2019-09-29 设计创作,主要内容包括:本发明公开了一种用于II期结直肠癌复发预测的标记基因及应用,本发明采用基因芯片荟萃分析技术鉴定与II期结直肠癌患者复发相关的公共差异表达基因,该模型预测II期结直肠癌患者5年复发风险的AUC值为0.806,在测试集中能将高复发风险和低复发风险的患者显著分开;本发明使用了Lasso Cox回归模型,既起到了建立模型的作用又起到了变量重要性筛选的作用,使模型的变量维度大幅度降低,将有助于降低基因表达检测的成本,有利于该模型在临床应用中的推广。(The invention discloses a marker gene for predicting colorectal cancer recurrence at stage II and application thereof, the invention adopts gene chip meta-analysis technology to identify common differential expression genes related to the recurrence of colorectal cancer patients at stage II, the model predicts the 5-year recurrence risk of colorectal cancer patients at stage II to have an AUC value of 0.806, and the patients with high recurrence risk and low recurrence risk can be significantly separated in a test set; the Lasso Cox regression model is used, so that the model can be established and the importance of the variables can be screened, the variable dimension of the model can be greatly reduced, the cost of gene expression detection can be reduced, and the model can be popularized in clinical application.)

1. A marker gene for use in the prediction of stage II colorectal cancer recurrence, characterized in that the marker gene comprises: PAOX, SIGLEC7, PHAX, XCR1, TM4SF4, TRIOBP, MCMBP, HCFC1R1, ADNP2, NUP50, GTF2A2, BCCIP, FLJ90680, NVL, ESM1, GABRR2, FAM166A, USP14, JUNB, UBAP2, AP5B1, FAM46C, LDB3, and JUP.

2. Use of the marker gene of claim 1 for constructing a model for predicting recurrence of stage II colorectal cancer.

3. Use according to claim 2, characterized in that the mathematical expression of the model is: lasso _ coxscore ═ Σ (gene expression value × regression coefficient), and the gene expression value is a marker gene mRNA expression value.

4. Use according to claim 2, wherein said regression coefficients are shown in table 1:

TABLE 1 regression coefficients for 24 genes in the lasso _ cox regression model

Figure FDA0002220668140000011

5. The application of claim 2, wherein the model is constructed by: (1) obtaining a gene expression dataset: acquiring mRNA expression data of a colorectal cancer tumor sample in the stage II, detecting an outlier through cluster analysis, and removing the outlier; (2) identifying common differentially expressed genes associated with recurrence in stage II colorectal cancer patients: calculating the p value of each gene in each data set in the step (1) by Lorrank test by adopting a gene chip meta-analysis technology; then, combining the p value of each gene in each data set by adopting a minP method to finally obtain the p of each geneminPA value; p of each gene by adopting Benjamini-Hochberg methodminPCorrecting the value to obtain the FDR value of each gene; according to FDR<0.1 standard screen for common differentially expressed genes associated with recurrence in stage II colorectal cancer patients; (3) signal pathway enrichment analysis of common differentially expressed genes: performing signal channel enrichment analysis on the public differentially expressed genes identified in the step (2) and related to the recurrence of the colorectal cancer patient in the stage II by adopting a Metascape database, and screening the signal channels with the differentially expressed genes significantly enriched by taking p as a threshold value of 0.01, namely the signal channels with the p value of less than 0.01; (4) constructing a model: constructing a Lasso _ Cox model for predicting the tumor recurrence of the colorectal cancer patient at the stage II by adopting a Lasso Cox method according to the expression value of the common differential expression gene related to the recurrence of the colorectal cancer patient at the stage II in the step (2);

the mathematical expression of the model is: lasso _ coxscore ═ Σ (gene expression values × regression coefficients).

(I) technical field

The invention relates to the field of bioinformatics, in particular to a technology for identifying common differential expression genes related to colorectal cancer recurrence at stage II by adopting a gene chip meta-analysis technology, and a tumor recurrence prediction model of colorectal cancer patients at stage II, which is established based on the common differential expression genes and a Lasso Cox model, is constructed.

(II) background of the invention

Colorectal cancer is one of the most common malignancies worldwide, with the third ranked incidence and the fourth ranked mortality among the various types of malignancy. In recent years, with the improvement of the economic level of people and the change of life style, the incidence rate of colorectal cancer in China is gradually increasing and the colorectal cancer is in a trend of youthfulness, and the number of new colorectal cancer patients per year is increased by about 4 percent.

The colorectal cancer is mainly treated by surgical operation and is assisted by comprehensive treatment modes such as chemotherapy, radiotherapy, targeted therapy, immunotherapy and the like. For early stage colorectal cancer patients, high quality radical colorectal cancer surgery can bring significant benefits to the patients. However, partial early colorectal cancer patients have local recurrence and metastatic recurrence after radical surgery, the prognosis of the patients with the recurrence is poor, and researches show that the shorter RFS (recurrence-free survival) is after radical surgery, the shorter the total survival is. Therefore, the prediction of the recurrence risk after the early colorectal cancer radical operation has important significance, and the method is helpful for guiding the formulation of the postoperative adjuvant therapy scheme of colorectal cancer patients.

The NCCN (national Integrated cancer network) guideline proposes 8 high-risk factors of colorectal cancer in stage II, namely a ① T4 tumor ② tumor accompanied by a perforation ③ tumor with obstruction ④ lymphatic vessels, and a blood vessel invasion ⑤ nerve invasion ⑥ detects that lymph nodes <12 ⑦ tumors are slightly differentiated or undifferentiated ⑧ borderline positive, meanwhile, in recent years, MSI-H (microsatellite high instability) or dMMR (mismatch repair protein loss) are found to be low-risk factors of colorectal cancer in stage II.

Disclosure of the invention

The invention aims to identify common differential expression genes related to colorectal cancer recurrence at the stage II by adopting a gene chip meta-analysis technology according to gene expression information of tumor tissues of colorectal cancer patients at the stage II, and establish a recurrence prediction model of the colorectal cancer patients at the stage II by adopting a Lasso Cox model.

The technical scheme adopted by the invention is as follows:

the present invention provides a marker gene for recurrence prediction of stage II colorectal cancer, the marker gene comprising: PAOX, SIGLEC7, PHAX, XCR1, TM4SF4, TRIOBP, MCMBP, HCFC1R1, ADNP2, NUP50, GTF2A2, BCCIP, FLJ90680, NVL, ESM1, GABRR2, FAM166A, USP14, JUNB, UBAP2, AP5B1, FAM46C, LDB3, and JUP.

The invention also provides an application of the marker gene in constructing a model for predicting colorectal cancer recurrence in stage II, wherein the mathematical expression of the model is as follows: lasso _ coxscore ═ Σ (gene expression values × regression coefficients).

The gene expression value is a marker gene mRNA expression value.

The regression coefficients are shown in table 1:

TABLE 1 regression coefficients for 24 genes in the lasso _ cox regression model

Figure BDA0002220668150000021

The invention discloses a construction method of a colorectal cancer recurrence prediction model in stage II, which comprises the following steps: (1) obtaining a gene expression dataset: acquiring mRNA expression data of a colorectal cancer tumor sample in the stage II, wherein the detection technology comprises but is not limited to a gene chip technology, a high-throughput transcriptome sequencing technology, a real-time fluorescence quantitative qPCR technology and the like, detecting outliers through cluster analysis and removing the outliers; (2) identifying common differentially expressed genes associated with recurrence in stage II colorectal cancer patients: calculating the p value of each gene in each data set in the step (1) by Lorrank test by adopting a gene chip meta-analysis technology; then, combining the p value of each gene in each data set by adopting a minP method to finally obtain the p of each geneminPA value; p of each gene by adopting Benjamini-Hochberg methodminPCorrecting the value to obtain the FDR value (false discovery rate) of each gene; according to FDR<0.1 standard screen for common differentially expressed genes associated with recurrence in stage II colorectal cancer patients; (3) signal pathway enrichment analysis of common differentially expressed genes: performing signal path enrichment analysis on the common differential expression genes identified in the step (2) and related to the recurrence of the colorectal cancer patient in the stage II by adopting a Metascape database (http:// Metascape. org/gp/index. html #/main/step1), and screening signal paths with significantly enriched differential expression genes by taking p as 0.01 as a threshold value, namely signal paths with the p value less than 0.01; (4) constructing a model: constructing a Lasso _ Cox model for predicting the tumor recurrence of the colorectal cancer patient at the stage II by adopting a Lasso Cox method according to the expression value of the common differential expression gene related to the recurrence of the colorectal cancer patient at the stage II in the step (2);

the mathematical expression of the model is: lasso _ coxscore ═ Σ (gene expression value × regression coefficient)

The expression of the model in the R language program is as follows: and (2) predicting (lasso _ cox, data), wherein the gene expression value is the expression value of the common differentially expressed gene related to the recurrence of the colorectal cancer patient in the stage II in the step (2), and the regression coefficient is the regression coefficient in the lasso _ cox model and is calculated by adopting a glmnet function in a glmnet package of the R language program. The lasso _ cox score is the tumor recurrence risk score, the size of the lasso _ cox score in the model represents the tumor recurrence risk probability, and the tumor recurrence risk score of the patient can be obtained by inputting the model (lasso _ cox) and the gene expression data (data) of the patient through the predict function of the stats package of the R language program. Patients are divided into high risk groups (recurrence risk score higher than median) and low risk groups (recurrence risk score lower than median) according to the median of the tumor recurrence risk scores of stage II colorectal cancer patients (preferably-2.748).

Compared with the prior art, the invention has the following beneficial effects: one of the innovation points of the invention is that the gene chip meta-analysis technology is adopted to identify the public differential expression genes related to the recurrence of the colorectal cancer patients in the II stage; compared with the traditional method for judging the recurrence risk only according to the clinical pathological characteristics of the colorectal cancer patients at the stage II, the invention provides a model for predicting the recurrence risk of the patients according to the gene expression information of the tumors of the colorectal cancer patients at the stage II, the model predicts the 5-year recurrence risk of the colorectal cancer patients at the stage II and has an AUC value of 0.806, and the patients with high recurrence risk and low recurrence risk can be remarkably separated in a test set (HR 2.052, 95% CI 1.219-3.455); the Lasso Cox regression model is used, so that the model can be established and the importance of the variables can be screened, the variable dimension of the model can be greatly reduced, the cost of gene expression detection can be reduced, and the model can be popularized in clinical application.

(IV) description of the drawings

FIG. 1: and (5) screening the data set.

FIG. 2: hierarchical clustering of GSE14333 data sets.

FIG. 3: a signal path with obviously enriched differentially expressed genes. The color of the histogram reflects the size of the p value, and the deeper the color, the smaller the p value; the signal path names are shown on the right side of the bar graph.

FIG. 4: the regularization parameter λ in the lasso _ cox regression model is related to the partial likelihood estimate bias. The abscissa is the natural logarithm of the regularization parameter lambda and the ordinate is the partial likelihood estimate deviation

FIG. 5: ROC curve of training set time dependence. ROC plots for time dependence of the lasso _ cox regression model at 1 year, 3 years, and 5 years, respectively.

FIG. 6: the survival curves for the high and low risk of relapse groups predicted according to the lasso _ cox model were pooled. Line b is the high recurrence risk group predicted according to the lasso _ cox model; line a is the low risk of recurrence group predicted according to the lasso _ cox model.

(V) detailed description of the preferred embodiments

The invention will be further described with reference to specific examples, but the scope of the invention is not limited thereto:

13页详细技术资料下载
上一篇:一种医用注射器针头装配设备
下一篇:一种用于乳腺癌辅助诊断的突变基因及其应用

网友询问留言

已有0条留言

还没有人留言评论。精彩留言会获得点赞!

精彩留言,会给你点赞!