Gene marker for tumor prognosis evaluation, evaluation product and application

文档序号：70759 发布日期：2021-10-01 浏览：29次中文

阅读说明：本技术 一种用于肿瘤预后评估的基因标志物、评估产品及应用 (Gene marker for tumor prognosis evaluation, evaluation product and application ) 是由杨承刚宋宏涛董东于 2021-06-29 设计创作，主要内容包括：本发明公开了一种用于肿瘤预后评估的基因标志物、评估产品及应用。所述评估产品的构建方法,包括：将基因标志物的表达量代入模型公式,计算每个样本的风险评分；以及根据模型的阈值对样本进行分组并进行组间生存分析。本发明的研究成果为患者的精准医学治疗提供了重要指导意义。(The invention discloses a gene marker for tumor prognosis evaluation, an evaluation product and application. The construction method of the evaluation product comprises the following steps: substituting the expression quantity of the gene marker into a model formula, and calculating the risk score of each sample; and grouping the samples according to the threshold value of the model and performing intergroup survival analysis. The research result of the invention provides important guiding significance for accurate medical treatment of patients.)

1. A construction method of a liver cancer prognosis model is characterized by comprising the following steps:

acquiring transcription spectrum expression data of a plurality of liver cancer patients and a plurality of reference persons;

screening candidate genes based on the transcript expression data of the plurality of lung cancer patients and the plurality of reference humans; and

constructing a risk scoring model based on the candidate genes;

the lung cancer prognosis model comprises the risk score model;

constructing a risk scoring model based on the candidate genes, comprising:

acquiring a training data set;

determining genes related to the life cycle in the training dataset through single-factor Cox regression analysis based on the candidate genes; and

screening the genes associated with survival by LASSOCox regression analysis to determine the genes used to construct the risk scoring model and the risk scoring model, wherein the genes used to construct the risk scoring model include BZW2, HES6, LAMTOR 1.

2. The method of construction of claim 1, wherein the risk scoring model is represented as: risk score 0.41449117 BZW2 gene expression level +0.15454888 HES6 gene expression level +0.14975255 LAMTOR1 gene expression level.

3. The method of claim 1, wherein constructing the risk scoring model based on the candidate genes further comprises:

evaluating the predictive performance of the risk scoring model based on the training dataset.

4. The method of claim 3, wherein evaluating the predictive performance of the risk scoring model based on the training dataset comprises:

calculating a risk score for each liver cancer patient in the training dataset based on the risk score model;

analyzing and evaluating goodness-of-fit of a risk scoring model using the time-dependent working characteristic curve of the training dataset for the liver cancer patient;

analyzing and determining a grouping cutoff value according to the time-dependent working characteristic curve of the liver cancer patient in the training data set, and dividing the liver cancer patient in the training data set into a first high risk group and a first low risk group according to the grouping cutoff value; and

evaluating whether the first high-risk group and the first low-risk group have a significant difference in survival using a Kaplan-Meier curve of the training dataset.

5. The method of constructing according to claim 4, wherein constructing the risk scoring model based on the candidate genes further comprises:

obtaining a verification dataset; and

verifying the efficacy of the risk scoring model based on the verification dataset.

6. The method of constructing as claimed in claim 5, wherein validating the efficacy of the risk scoring model based on the validation dataset comprises:

calculating a risk score for each liver cancer patient in the validation dataset based on the risk score model;

analyzing a goodness-of-fit of a validation risk scoring model using the time-dependent liver cancer patient working characteristic curve of the validation dataset; and

and dividing liver cancer patients in the verification data set into a second high-risk group and a second low-risk group according to the grouping cutoff value, and verifying whether the second high-risk group and the second low-risk group have significant difference in survival situation by using a Kaplan-Meier curve of the verification data set.

7. A product for predicting the prognosis of liver cancer, said product comprising any one of:

1) a gene combination comprising BZW2, HES6, LAMTOR 1;

2) the risk scoring model of claim 2;

3) a liver cancer prognosis model obtained by the construction method according to any one of claims 1 to 6;

4) a device for predicting prognosis of liver cancer, the device comprising a prognosis prediction analysis unit for predicting prognosis of a liver cancer patient by using a liver cancer prognosis model obtained by the construction method according to any one of claims 1 to 6;

5) a kit for predicting prognosis of liver cancer, the kit comprising reagents for detecting the expression level of genes for constructing a risk scoring model according to claim 1;

6) a chip for predicting prognosis of liver cancer, the chip comprising a reagent for detecting the expression level of a gene used in the construction of a risk scoring model according to claim 1;

7) an electronic device, the electronic device comprising: a memory for non-transitory storage of computer readable instructions; and a processor for executing the computer-readable instructions, wherein the computer-readable instructions, when executed by the processor, perform the construction method of any one of claims 1 to 6 or perform the steps of: calculating a risk score using the risk scoring model of claim 2 based on the level of gene expression in the subject sample used to construct the risk scoring model of claim 1;

8) a storage medium storing non-transitory computer readable instructions, wherein the non-transitory computer readable instructions, when executed by a computer, perform instructions of the construction method according to any one of claims 1-6 or perform the steps of: calculating a risk score using the risk scoring model of claim 2 based on the level of gene expression used to construct the risk scoring model of claim 1 in a sample from a liver cancer patient.

8. The product of claim 7, wherein the apparatus further comprises an information collecting unit that detects the expression levels of the genes of claim 1 used to construct the risk scoring model.

9. An application, characterized in that the application comprises any of the following:

1) use of the gene of claim 1 for constructing a risk score model for the preparation of a product for predicting the prognosis of liver cancer;

2) use of the risk scoring model of claim 2 in the preparation of a product for predicting the prognosis of liver cancer;

3) use of the liver cancer prognosis model obtained by the construction method according to any one of claims 1-6 in preparing a product for predicting liver cancer prognosis.

10. Use according to claim 9, characterized in that the product comprises the product of claim 7.

Technical Field

The invention belongs to the field of biomedicine, and particularly relates to a gene marker for tumor prognosis evaluation, an evaluation product and application.

Background

Prognosis refers to empirically predicted disease progression. Prognosis is primarily related to three aspects, what outcome will occur, the likelihood of a poor outcome, and the point in time. The purpose of research and grading prognosis is to facilitate understanding of the degree of harm of diseases to humans, to explore factors affecting prognosis, and to research specific measures for improving prognosis. Prognostic analysis is a clinical study that is very practical and has a guiding role in clinical practice.

The liver is one of the most important organs for maintaining the stable environment and the health of the organism in the organism, and the fatty liver, the hepatitis, the cirrhosis and the liver cancer are the four most common serious diseases in the liver and are also one of the important reasons for harming the health of human beings. In fact, approximately one million people die each year from cirrhosis and liver cancer. Particularly, liver cancer, the most common primary malignancy of the liver, is the leading cause of death among various diseases in humans due to high morbidity, difficulty in finding, few treatment methods, and the like. Currently, liver cancer ranks fifth in terms of tumor fatality rate worldwide, while in some african and asian countries, liver cancer is already the first in neoplastic causes of death.

After decades of efforts, research on liver cancer has been greatly progressed, and the early treatment of small liver cancer and the second resection of narrowed liver cancer are taken as marks in the last century, which respectively contribute 10 percent to the improvement of the survival rate of liver cancer after operation, but the overall curative effect of liver cancer is still poor due to the rapid progression of the disease course of liver cancer and extremely high recurrence rate, and the overall survival rate of liver cancer people in 5 years is still only about 5 percent. In recent years, although some progress has been made in the basic and clinical research of liver cancer, the mechanism of liver cancer recurrence has not been clarified, and effective intervention measures have not been found. The high recurrence rate of liver cancer is the bottleneck affecting the improvement of the curative effect.

The biomarker is an indicator capable of objectively measuring and evaluating normal biological processes, pathological processes or drug intervention reactions, is also an important early warning index when an organism is damaged, and relates to the change of the molecular structure and function of cells, the change of biochemical metabolic processes, abnormal expression of physiological activities, abnormal change of individuals, groups or the whole ecological system and the like. The research of the biomarker is not only an important content of biochemical basic research, but also has important value in the aspects of new drug development, medical diagnosis and clinical research, is helpful for helping researchers to provide more effective diagnosis and treatment means, and particularly has important value in the prevention and control of chronic diseases and complex diseases such as tumors, cardiovascular diseases, diabetes, neurological disorders and the like. Therefore, the method for searching the biomarkers related to the prognosis recurrence of the liver cancer can provide a new method for further reducing the clinical recurrence rate and the fatality rate of the liver cancer.

Disclosure of Invention

The invention provides a construction method of a liver cancer prognosis model, which comprises the following steps:

acquiring transcription spectrum expression data of a plurality of liver cancer patients and a plurality of reference persons;

screening candidate genes based on the transcript expression data of the plurality of lung cancer patients and the plurality of reference humans; and

constructing a risk scoring model based on the candidate genes;

wherein the lung cancer prognosis model comprises the risk score model.

Further, constructing a risk scoring model based on the candidate genes, comprising:

acquiring a training data set;

determining genes related to the life cycle in the training dataset through single-factor Cox regression analysis based on the candidate genes; and

screening the genes associated with survival by LASSO Cox regression analysis to determine the genes used to construct the risk scoring model and the risk scoring model, wherein the genes used to construct the risk scoring model include BZW2, HES6, LAMTOR 1.

Further, the risk scoring model is represented as: risk score 0.41449117 BZW2 gene expression level +0.15454888 HES6 gene expression level +0.14975255 LAMTOR1 gene expression level.

Further, constructing the risk scoring model based on the candidate genes, further comprising: evaluating the predictive performance of the risk scoring model based on the training dataset.

Further, assessing predictive performance of the risk scoring model based on the training dataset, comprising:

calculating a risk score for each subject in the training dataset based on the risk score model;

evaluating goodness-of-fit of a risk scoring model using a time-dependent subject working characteristic curve analysis of the training dataset;

determining a grouping cutoff value according to analysis of a time-dependent subject working characteristic curve of the training data set, and dividing the subjects in the training data set into a first high risk group and a first low risk group according to the grouping cutoff value; and

evaluating whether the first high-risk group and the first low-risk group have a significant difference in survival using a Kaplan-Meier curve of the training dataset.

Further, constructing the risk scoring model based on the candidate genes, further comprising:

obtaining a verification dataset; and

verifying the efficacy of the risk scoring model based on the verification dataset.

Further, verifying the efficacy of the risk scoring model based on the verification dataset comprises:

calculating a risk score for each subject in the validation dataset based on the risk score model;

analyzing a goodness-of-fit of a validation risk score model using a time-dependent subject working profile of the validation dataset; and

the subjects in the validation dataset are divided into a second high risk group and a second low risk group according to the group cutoff value, and the Kaplan-Meier curve of the validation dataset is used to validate whether the second high risk group and the second low risk group have a significant difference in survival.

The invention also provides a liver cancer prognosis model obtained by the construction method.

The invention also provides an application method of the liver cancer prognosis model, which comprises the following steps:

the lung cancer prognosis model comprises the risk scoring model constructed according to the construction method, and the application method comprises the following steps:

obtaining transcription profile expression data of a liver cancer patient sample, wherein the transcription profile expression data of the liver cancer sample comprise expression values of genes for constructing the risk scoring model; and

calculating a risk score for the prognosis of the liver cancer patient according to the risk score model based on the liver cancer sample transcript profile expression data.

The invention also provides a product for predicting liver cancer prognosis.

As an example of the product, the product may be a gene combination comprising BZW2, HES6, LAMTOR 1.

As an example of the product, the product can be a device for predicting liver cancer prognosis, and the device comprises a prognosis prediction analysis unit which predicts the prognosis of a liver cancer patient by using the liver cancer prognosis model.

Further, the device further comprises a data collection unit that detects the expression level of the molecular marker.

Further, the device also comprises a display unit, and the display unit displays the prognosis prediction result of the liver cancer patient.

Further, the apparatus further includes an evaluation result transmitting unit that transmits the prognosis prediction result obtained by the prediction analysis unit to the display unit.

As an example of the product, the product can be a kit for predicting liver cancer prognosis, which comprises the reagent for detecting the gene expression level for constructing the risk scoring model as described above.

Further, the reagent comprises a reagent for detecting the expression level of the gene by a sequencing technology, a nucleic acid hybridization technology, a nucleic acid amplification technology and a protein immunity technology.

Still further, the reagents include primers, probes, antibodies, ligands.

Still further, the kit further comprises one or more substances selected from the group consisting of: container, instructions for use, positive control, negative control, buffer, adjuvant or solvent.

As an example of the product, the product can be a chip for predicting prognosis of liver cancer, which comprises the reagent for detecting the gene expression level for constructing the risk scoring model.

Further, the reagent is as defined above.

As an example of the product, the product may be an electronic device including: a memory for non-transitory storage of computer readable instructions; and a processor for executing the computer-readable instructions, wherein the computer-readable instructions, when executed by the processor, perform the aforementioned construction method or perform the following steps: calculating a risk score using the risk scoring model as described above based on the gene expression levels in the liver cancer patient sample used to construct the risk scoring model as described above.

As an example of the product, the product may be a storage medium storing non-transitory computer readable instructions, wherein the non-transitory computer readable instructions, when executed by a computer, perform instructions according to the aforementioned construction method or perform the following steps: calculating a risk score using the risk scoring model as described above based on the gene expression levels in the liver cancer patient sample used to construct the risk scoring model as described above.

The invention also provides application of the genome combination comprising BZW2, HES6 and LAMTOR1 in preparing products for predicting liver cancer prognosis.

The invention also provides application of the risk scoring model in preparation of products for predicting liver cancer prognosis.

The application of the liver cancer prognosis model obtained by the construction method in preparing products for predicting the prognosis of liver cancer.

Further, the product is the product described above.

"BZW 2" as used herein refers to a nucleic acid encoding all or part of a BZW2 protein or substantially the same nucleic acid sequence as all or part of a nucleic acid sequence or analog thereof, with the Gene ID of 28969.

"LAMTOR 1" as used herein refers to a nucleic acid encoding all or part of the LAMTOR1 protein or substantially the same nucleic acid sequence as all or part of the nucleic acid sequence or analog thereof, with a Gene ID of 55004.

"HES 6" as used herein, refers to a nucleic acid encoding all or part of the HES6 protein or substantially the same nucleic acid sequence as all or part of the nucleic acid sequence or analog thereof, with the Gene ID of 55502.

As used herein, a "sample" may include, but is not limited to, a single cell or a plurality of cells, a layer of cells, a tissue biopsy, excised tissue, a tissue extract, a tissue culture extract, a tissue culture medium, exhaled breath, whole blood, platelets, serum, plasma, red blood cells, white blood cells, lymphocytes, neutrophils, macrophages, B cells or subsets thereof, T cells or subsets thereof, subsets of hematopoietic cells, endothelial cells, synovial fluid, lymphatic fluid, ascites fluid, interstitial fluid, bone marrow, cerebrospinal fluid, pleural fluid, tumor infiltrates, saliva, mucus, sputum, semen, sweat, urine, or any other bodily fluid. Samples may be obtained from a subject by means including, but not limited to, venipuncture, drainage, biopsy, needle aspiration, lavage, scraping, surgical excision, or other means known in the art.

Drawings

FIG. 1 shows a survival plot for the TCGA data set;

figure 2 shows a ROC plot for the TCGA dataset.

Detailed Description

The technical solutions of the present invention are further illustrated by the following specific examples, which do not represent limitations to the scope of the present invention. Insubstantial modifications and adaptations of the present invention by others of the concepts fall within the scope of the invention.

Example 1 screening of genes involved in prognosis of liver cancer

1. Data download

Public gene expression data and complete clinical annotations were searched in a gene expression integration database (GEO) and a cancer genomic profile database (TCGA). For the data set in TCGA, RNA sequencing data (FPKM values) and clinical information for gene expression were downloaded from UCSC Xena (https:// gdc. The FPKM values were then converted to million per kilobase (TPM) value transcripts. The gene expression data of GSE76427 is downloaded from GEO database (http:// www.ncbi.nlm.nih.gov/GEO /), and is annotated by annotation file, the average value of multiple probes corresponding to the same gene is taken as the expression quantity, then gene expression matrix file is obtained. Wherein, the TCGA data set is used as a discovery queue, and the GEO data set is used as a verification queue. After removing the sample with incomplete clinical information, the number of samples contained in the TCGA cohort was paracancerous: carcinoma 50:368, the amount of samples in the GEO cohort is paracarcinoma: cancer 52: 115.

2. Differential expression analysis

Differential expression analysis was performed using the "limma" package in the R software, with screening criteria for differential genes being adj. pvalue <0.01, | log2FC | > 1. Under this standard, 1827 differentially expressed genes, 1463 up-regulated differentially expressed genes, and 364 down-regulated differentially expressed genes were present in TCGA. There were 724 differentially expressed genes in the GEO, 528 differentially expressed genes up-regulated and 196 differentially expressed genes down-regulated. There were 456 genes differentially expressed in the two databases, 399 genes consistently up-regulated and 157 genes consistently down-regulated.

3. One-factor Cox analysis

A one-way Cox analysis was performed on 456 genes whose differential expression was consistent, and genes with P <0.05 were considered to have an effect on survival in hepatocellular carcinoma patients. Under this standard, there are 287 genes in the TCGA database and 32 genes in the GEO database. After the intersection treatment, the two genes have 18 genes in total.

4. LASSO Cox analysis

And (3) carrying out LASSO Cox analysis on 18 genes in the TCGA dataset, and screening out genes to form a prognosis gene signature. And calculating the risk score of each sample according to a formula, and dividing all samples into high-risk groups and low-risk groups according to the median of the risk scores.

Note: and (3) a calculation formula of the risk score, wherein n is a prognostic factor, expi is an expression value of the gene i, and beta i is a regression coefficient of the gene i.

The genes identified by the final screening for constructing the risk score model include the following three genes: BZW2, HES6, LAMTOR 1. Table 2 lists the relevant information and parameters for the 3 genes used to construct the risk scoring model. HR in the one-factor cox regression analysis is used to characterize the relative risk, wherein an HR value greater than 1 indicates that the expression value of the corresponding gene is in a positive correlation with the risk score, such that the corresponding LASSO coefficient is greater than 0, and an HR value less than 1 indicates that the expression value of the corresponding gene is in a negative correlation with the risk score, such that the corresponding LASSO coefficient is less than 0. In table 2, 95% CI indicates a 95% Confidence interval (Confidence interval).

TABLE 2 3 genes in the Risk score model

From the results in table 2, the risk score model for 3 genes is shown as:

risk score 0.41449117 BZW2 gene expression level +0.15454888 HES6 gene expression level +0.14975255 LAMTOR1 gene expression level

Survival analysis results showed that the survival time of patients in the high risk group was significantly shorter than that in the low risk group (fig. 1). To evaluate the accuracy of the prognostic model consisting of 3 genes in predicting the prognosis of hepatocellular carcinoma, Receiver Operating Characteristic (ROC) curve analyses were performed for 1-year, 3-year and 5-year subjects, comparing the respective AUC values. The results showed that the AUC for 1 year, 3 years and 5 years were 0.72, 0.64, 0.62, respectively (fig. 2). The AUC value shows that the prognosis model composed of 3 genes has better distinguishing performance on the prognosis of hepatocellular carcinoma patients.

10页详细技术资料下载

上一篇：一种医用注射器针头装配设备

下一篇：骨关节炎中调控软骨细胞凋亡信号通路的建模方法和应用

Gene marker for tumor prognosis evaluation, evaluation product and application

相关技术

网友询问留言