Triple negative breast cancer prognosis prediction device, prediction model and construction method thereof

文档序号:1818153 发布日期:2021-11-09 浏览:37次 中文

阅读说明:本技术 一种三阴乳腺癌预后预测装置、预测模型及其构建方法 (Triple negative breast cancer prognosis prediction device, prediction model and construction method thereof ) 是由 贾永峰 刘霞 康畅元 施琳 云芬 梁俊青 陈永霞 安彦榕 于 2021-08-13 设计创作,主要内容包括:本发明公开了一种三阴乳腺癌预后预测模型构建方法,包括以下步骤:实现三阴乳腺癌样本原始基因表达数据和相应的临床生存数据的收集,并实现基因表达数据的标准化处理后,获得基因表达矩阵;获取三阴乳腺癌独立的成癌特异基因,并获取此类基因表达量;从获取的独立的成癌特异基因中筛选出构建预后预测模型的参数并获取对应的回归系数,所述参数为多个基因类型;基于筛选出的基因,根据其表达量和对应的回归系数,计算风险评分,得到三阴乳腺癌预后预测模型。本发明构建的三阴乳腺癌预后预测模型,实现了三阴乳腺癌患者预后的风险分层,显著地将高低风险的患者分开,进而可以预测三阴乳腺癌的临床结果,指导个体化治疗,具有较高的临床应用价值。(The invention discloses a method for constructing a prognosis prediction model of triple negative breast cancer, which comprises the following steps: the method comprises the steps of collecting original gene expression data and corresponding clinical survival data of a triple negative breast cancer sample, and obtaining a gene expression matrix after standardized processing of the gene expression data is realized; acquiring independent oncogene specific genes of the triple-negative breast cancer, and acquiring the expression quantity of the genes; screening parameters for constructing a prognosis prediction model from the obtained independent cancer-forming specific genes and obtaining corresponding regression coefficients, wherein the parameters are of a plurality of gene types; and calculating a risk score based on the screened genes according to the expression quantity and the corresponding regression coefficients to obtain a prognosis prediction model of the three-negative breast cancer. The three-negative breast cancer prognosis prediction model constructed by the invention realizes risk stratification of prognosis of three-negative breast cancer patients, remarkably separates high-risk and low-risk patients, can predict clinical results of the three-negative breast cancer, guides individualized treatment, and has higher clinical application value.)

1. A method for constructing a prognosis prediction model of triple negative breast cancer is characterized by comprising the following steps: the method comprises the following steps:

s1, collecting original gene expression data and corresponding clinical survival data of the triple negative breast cancer sample, and obtaining a gene expression matrix after realizing the standardized processing of the gene expression data;

s2, acquiring independent oncogenes of the three-negative breast cancer, and acquiring the expression quantity of the genes;

s3, screening parameters for constructing a prognosis prediction model from the obtained independent cancer forming specific genes and obtaining corresponding regression coefficients, wherein the parameters are of a plurality of gene types;

and S4, calculating a risk score based on the screened genes according to the expression quantity and the corresponding regression coefficient of the genes to obtain a prognosis prediction model of the three-negative breast cancer.

2. The method for constructing a prognosis prediction model of triple negative breast cancer according to claim 1, wherein: in the step S1, the collection of original gene expression data and corresponding clinical survival data of the triple negative breast cancer sample is realized based on the UCSC-TCGA database, and sample data with incomplete clinical data and total survival time less than one month needs to be removed in the collection process.

3. The method for constructing a prognosis prediction model of triple negative breast cancer according to claim 1, wherein: the step S3 specifically includes the following steps:

s31, screening a protein pathway core gene in the cancer specific gene, then realizing single-factor COX analysis of the protein pathway core gene, and screening a prognosis core gene related to clinical prognosis;

s32, taking the screened prognosis core basis as a parameter for establishing a three-negative breast cancer independent clinical prediction model, and determining an optimal adjustment parameter lambda according to the minimum variance by adopting an LASSO regression algorithm and a cross-validation method so as to determine an optimal variable for constructing the prediction model;

the LASSO regression algorithm specifically comprises the following steps: the LASSO objective function is the sum of the residual squared and the absolute value of the + λ coefficient, and is expressed as:

wherein loss (w) is the LASSO objective function, yj is the n x 1 observation vector, xji is the predictor variable, i.e. the prognostic core gene, wi is the coefficient; and determining an optimal adjustment parameter lambda by using a cross validation method.

4. The method for constructing a prognosis prediction model of triple negative breast cancer according to claim 1, wherein: the parameters for constructing the prognosis prediction model comprise ARL9, NCCRP1, SBSN, RERG, TPSB2, TPSAB1, C15orf59, GPR158, SRRM3, PSORS1C2, DSC2, SEPT3, PTPRN2, ALX3 and KLHDC 7A.

5. The method for constructing a prognosis prediction model of triple negative breast cancer according to claim 1, wherein: the three-negative breast cancer prognosis prediction model is established based on circulating COX and is specifically represented as follows: i ═ Σ F × C; wherein, I is the risk score, F is the relative ratio of each model gene, and C is the regression coefficient corresponding to each model gene.

6. A three-negative breast cancer prognosis prediction model, characterized by: the model is constructed by the construction method according to any one of claims 1 to 5.

7. A triple negative breast cancer prognosis prediction device, characterized in that: the prognosis prediction model implementation of the triple negative breast cancer based on the claim 6, comprising a data collection module, a model gene type analysis module, a parameter screening module, a prognosis model construction module and a prediction output module; the data collection module is used for collecting original gene expression data and corresponding clinical survival data of the triple negative breast cancer sample, and preprocessing and standardizing the collected data; the model gene type analysis module is used for acquiring model genes in tumor tissues and calculating relative ratio of each model gene; the parameter screening module is used for screening parameters for constructing a prognosis prediction model from the model gene types and obtaining corresponding regression coefficients; the prognosis model construction module is used for calculating a risk score according to the relative ratio of the model gene types selected as the parameters and the corresponding regression coefficients thereof, and constructing a three-negative breast cancer prognosis prediction model; and the prediction output module is used for determining the cutoff value through the maximum selection grade statistics, comparing the obtained risk score with the cutoff value and outputting the risk value of the tested patient.

8. The apparatus of claim 7, wherein the apparatus for prognosis of triple negative breast cancer comprises: the output risk value of the patient to be tested is specifically smaller than or equal to the cutoff value, the patient to be tested belongs to low risk, and is larger than the cutoff value, and the patient to be tested belongs to high risk.

Technical Field

The invention relates to the field of medical treatment, in particular to a triple negative breast cancer prognosis prediction device, a prediction model and a construction method thereof.

Background

Breast cancer has diverse differences in biological behaviors, clinical pathological characteristics, and molecular characteristics. The breast cancer is divided into different molecular subtypes according to the difference of clinical diagnosis markers, including a lumen A type (1 tubular subtype A), a lumen B/C type (1 tubular subtype B/C), a normal breast-like type (normal breast-like subtype), a HER-2 overexpression type (HER-2 over-expression subtype) and a basal cell-like type (basal-like subtype), and the clinical characteristics, treatment responsiveness and prognosis of the breast cancer of different subtypes are obviously different.

Triple-negative breast cancer (TNBC) is a clinical pathology type of breast cancer, and is characterized by no expression or low expression of Estrogen Receptor (ER), Progesterone Receptor (PR) and HER 2/neu. It has a certain cross relationship with basal cell type, about 80-90% of triple negative breast cancers belong to basal cell type breast cancers, and a few basal cell type breast cancers express hormone receptors.

Because the breast cancer lacks effective endocrine therapy and anti-HER 2/neu targeted therapy, the clinical application mostly adopts conventional treatment means, and the tumor has the characteristics of rapid local recurrence and distant metastasis, high fatality rate, poor prognosis and poor treatment effect. Moreover, the prognosis of the type is not greatly related to the condition of the tumor size memory lymph nodes, the recurrence is relatively rapid, and the recurrence peak is 1 to 3 years. The histology is characterized in that the catheter basal-like cells are from, the tumor invasion capacity is high, the risk of distant metastasis is high, the visceral metastasis probability is higher than that of bone metastasis, the incidence rate of brain metastasis is higher, the metastasis peak is within 3 years, the metastasis risk is reduced, but the prognosis is still poor, and the death risk is higher. For the treatment of triple negative breast cancer, general comprehensive treatment is common, namely, clinical professional treatment is applied and assistance is performed in life. ICB therapy has been developed in recent years and has been somewhat effective in treating triple negative breast cancer, but it has not been optimistic for a long time. Triple negative breast cancer has a much higher risk of death than other types of breast cancer.

Disclosure of Invention

The invention aims to provide a prognosis prediction device, a prediction model and a construction method for three-negative breast cancer, which realize risk stratification of prognosis of three-negative breast cancer patients from the aspect of molecular pathology and the level of genomics, remarkably separate high-risk and low-risk patients, can predict clinical results of the three-negative breast cancer, guides individualized treatment and has higher clinical application value.

In order to achieve the purpose, the invention adopts the technical scheme that:

a method for constructing a prognosis prediction model of triple negative breast cancer comprises the following steps:

s1, collecting original gene expression data and corresponding clinical survival data of the triple negative breast cancer sample, and obtaining a gene expression matrix after realizing the standardized processing of the gene expression data;

s2, acquiring independent oncogenes of the three-negative breast cancer, and acquiring the expression quantity of the genes;

s3, screening parameters for constructing a prognosis prediction model from the obtained independent cancer forming specific genes and obtaining corresponding regression coefficients, wherein the parameters are of a plurality of gene types;

and S4, calculating a risk score based on the screened genes according to the expression quantity and the corresponding regression coefficient of the genes to obtain a prognosis prediction model of the three-negative breast cancer.

Further, in step S1, the collection of the original gene expression data and the corresponding clinical survival data of the triple negative breast cancer sample is realized based on the UCSC-TCGA database, and sample data with incomplete clinical data and a total survival time of less than one month needs to be removed in the collection process.

Further, the step S3 specifically includes the following steps:

s31, screening a protein pathway core gene in the cancer specific gene, then realizing single-factor COX analysis of the protein pathway core gene, and screening a prognosis core gene related to clinical prognosis;

s32, taking the screened prognosis core basis as a parameter for establishing a three-negative breast cancer independent clinical prediction model, and determining an optimal adjustment parameter lambda according to the minimum variance by adopting an LASSO regression algorithm and a cross-validation method so as to determine an optimal variable for constructing the prediction model; the LASSO regression algorithm specifically comprises the following steps: the LASSO objective function is the sum of the residual squared and the absolute value of the + λ coefficient, and is expressed as:

wherein loss (w) is the LASSO objective function, yj is the n x 1 observation vector, xji is the predictor variable, i.e. the prognostic core gene, wi is the coefficient; and determining an optimal adjustment parameter lambda by using a cross validation method.

Further, the parameters for constructing the prognosis prediction model include ARL9, NCCRP1, SBSN, RERG, TPSB2, TPSAB1, C15orf59, GPR158, SRRM3, PSORS1C2, DSC2, SEPT3, PTPRN2, ALX3 and KLHDC 7A.

Further, the three-negative breast cancer prognosis prediction model is established based on circulating COX, and is specifically represented as: i ═ Σ F × C; wherein, I is the risk score, F is the relative ratio of each model gene, and C is the regression coefficient corresponding to each model gene.

The invention also provides a prognosis prediction model of triple negative breast cancer, which is constructed by adopting the construction method.

The invention also provides a prognosis prediction device for the three-negative breast cancer, which is realized based on the prognosis prediction model for the three-negative breast cancer and comprises a data collection module, a model gene type analysis module, a parameter screening module, a prognosis model construction module and a prediction output module; the data collection module is used for collecting original gene expression data and corresponding clinical survival data of the triple negative breast cancer sample, and preprocessing and standardizing the collected data; the model gene type analysis module is used for acquiring model genes in tumor tissues and calculating relative ratio of each model gene; the parameter screening module is used for screening parameters for constructing a prognosis prediction model from the model gene types and obtaining corresponding regression coefficients; the prognosis model construction module is used for calculating a risk score according to the relative ratio of the model gene types selected as the parameters and the corresponding regression coefficients thereof, and constructing a three-negative breast cancer prognosis prediction model; and the prediction output module is used for determining the cutoff value through the maximum selection grade statistics, comparing the obtained risk score with the cutoff value and outputting the risk value of the tested patient.

Further, the output of the risk value of the patient to be tested is specifically smaller than or equal to the cutoff value, the patient to be tested belongs to a low risk, and is larger than the cutoff value, and the patient to be tested belongs to a high risk.

The invention has the following beneficial effects:

1) the three-negative breast cancer prognosis prediction model is constructed based on the molecular pathology angle and the genomics level, so that the risk stratification of prognosis of the three-negative breast cancer patient is realized, the high-risk and low-risk patients are remarkably separated, the clinical result of the three-negative breast cancer can be predicted, the individualized treatment is guided, and the three-negative breast cancer prognosis prediction model has higher clinical application value;

2) the invention finds model gene subtypes related to the survival of triple negative breast cancer, and establishes a prognosis model between the model gene subtypes and the survival time;

3) the model established by the invention downloads gene expression data and clinical data of the patient with the triple negative breast cancer from an open public database, and solves the problems of difficult sample collection, high sequencing cost and follow-up visit to the patient.

Detailed Description

In order that the objects and advantages of the invention will be more clearly understood, the invention is further described in detail below with reference to examples. It should be understood that the specific embodiments described herein are merely illustrative of the invention and are not intended to limit the invention.

Example 1

A method for constructing a prognosis prediction model of triple negative breast cancer comprises the following steps:

s1, collecting original gene expression data and corresponding clinical survival data of the triple negative breast cancer sample, and obtaining a gene expression matrix after realizing the standardized processing of the gene expression data; specifically, the original gene expression data of the triple negative breast cancer sample and the corresponding clinical survival data are collected based on a UCSC-TCGA database, and sample data with incomplete clinical data and total survival time less than one month need to be removed in the collection process;

s2, acquiring independent oncogenes of the three-negative breast cancer, and acquiring the expression quantity of the genes;

s3, screening parameters for constructing a prognosis prediction model from the obtained independent cancer-forming specific genes and obtaining corresponding regression coefficients, wherein the parameters are of multiple gene types and comprise ARL9, NCCRP1, SBSN, RERG, TPSB2, TPSAB1, C15orf59, GPR158, SRRM3, PSORS1C2, DSC2, SEPT3, PTPRN2, ALX3 and KLHDC 7A;

s31, screening a protein pathway core gene in the cancer specific gene, then realizing single-factor COX analysis of the protein pathway core gene, and screening a prognosis core gene related to clinical prognosis; s32, taking the screened prognosis core basis as a parameter for establishing a three-negative breast cancer independent clinical prediction model, and determining an optimal adjustment parameter lambda according to the minimum variance by adopting an LASSO regression algorithm and a cross-validation method so as to determine an optimal variable for constructing the prediction model;

the LASSO regression algorithm specifically comprises the following steps: the LASSO objective function is the sum of the residual squared and the absolute value of the + λ coefficient, and is expressed as:

wherein loss (w) is the LASSO objective function, yj is the n x 1 observation vector, xji is the predictor variable, i.e. the prognostic core gene, wi is the coefficient; determining an optimal adjustment parameter lambda by using a cross validation method;

and S4, calculating a risk score based on the screened genes according to the expression quantity and the corresponding regression coefficient of the genes to obtain a prognosis prediction model of the three-negative breast cancer. The three-negative breast cancer prognosis prediction model is established based on circulating COX and is specifically represented as follows: i ═ Σ F × C;

wherein, I is the risk score, F is the relative ratio of each model gene, and C is the regression coefficient corresponding to each model gene.

Example 2

A prognosis prediction model for triple negative breast cancer, which is constructed by the construction method described in embodiment 1.

Example 3

A prognosis prediction device for triple negative breast cancer is realized based on the prognosis prediction model for triple negative breast cancer described in embodiment 2, and comprises a data collection module, a model gene type analysis module, a parameter screening module, a prognosis model construction module and a prediction output module;

the data collection module is used for collecting original gene expression data and corresponding clinical survival data of the triple negative breast cancer sample, and preprocessing and standardizing the collected data; the model gene type analysis module is used for acquiring model genes in tumor tissues and calculating relative ratio of each model gene; the parameter screening module is used for screening parameters for constructing a prognosis prediction model from the model gene types and obtaining corresponding regression coefficients; the prognosis model construction module is used for calculating a risk score according to the relative ratio of the model gene types selected as the parameters and the corresponding regression coefficients thereof, and constructing a three-negative breast cancer prognosis prediction model; and the prediction output module is used for determining the cutoff value through the maximum selection grade statistics, comparing the obtained risk score with the cutoff value and outputting the risk value of the tested patient.

Further, the output of the risk value of the patient to be tested is specifically smaller than or equal to the cutoff value, the patient to be tested belongs to a low risk, and is larger than the cutoff value, and the patient to be tested belongs to a high risk.

The foregoing is only a preferred embodiment of the present invention, and it should be noted that those skilled in the art can make various improvements and modifications without departing from the principle of the present invention, and these improvements and modifications should also be construed as the protection scope of the present invention.

7页详细技术资料下载
上一篇:一种医用注射器针头装配设备
下一篇:一种基于基准集的基因组结构变异性能检测方法

网友询问留言

已有0条留言

还没有人留言评论。精彩留言会获得点赞!

精彩留言,会给你点赞!