Method and system for predicting survival rate after lung cancer surgery

文档序号:117078 发布日期:2021-10-19 浏览:30次 中文

阅读说明:本技术 肺癌手术后生存率预测方法和系统 (Method and system for predicting survival rate after lung cancer surgery ) 是由 何建行 梁文华 李坚福 于 2021-09-13 设计创作,主要内容包括:本公开实施例公开了一种肺癌手术后生存率预测方法和系统。其中,通过测量包括基因突变分型在内的临床数据来预测肺癌手术后生存率的方法包括:数据获取步骤,获取肺癌手术后临床数据;预处理步骤,对肺癌手术后临床数据进行分类分组,得到建模组临床数据和验证组临床数据;危险因素筛选步骤,对建模组临床数据进行危险因素筛选,得到危险因素数据和总生存期数据;回归分析步骤,对危险因素数据和总生存期数据进行回归分析,得到回归分析后数据,肺癌手术后临床数据包括基因突变分型,年龄,肿瘤大小,淋巴结转移,手术方式。(The embodiment of the disclosure discloses a method and a system for predicting survival rate after lung cancer surgery. Wherein, the method for predicting the survival rate of the lung cancer after the operation by measuring the clinical data including the gene mutation typing comprises the following steps: a data acquisition step, in which clinical data after lung cancer surgery are acquired; a preprocessing step, namely classifying and grouping clinical data after lung cancer surgery to obtain modeling group clinical data and verification group clinical data; a risk factor screening step, namely screening risk factors of the clinical data of the modeling group to obtain risk factor data and total life cycle data; and a regression analysis step, wherein the risk factor data and the overall survival period data are subjected to regression analysis to obtain data after the regression analysis, and clinical data after lung cancer surgery comprise gene mutation typing, age, tumor size, lymph node metastasis and a surgery mode.)

1. A method for predicting post-operative survival of lung cancer by measuring clinical data including gene mutation typing comprising:

a data acquisition step, in which clinical data after lung cancer surgery are acquired;

a preprocessing step, namely classifying and grouping the clinical data after the lung cancer operation to obtain modeling group clinical data and verification group clinical data;

a risk factor screening step, namely screening risk factors for the clinical data of the building module to obtain risk factor data and total life cycle data;

a regression analysis step of performing regression analysis on the risk factor data and the total life cycle data to obtain regression-analyzed data,

the postoperative clinical data of lung cancer comprises gene mutation typing, age, tumor size, lymph node metastasis, and operation mode,

the regression analysis is calculated by the following formula

ln [ h (t, X)/h0(t) ] = β 1 × age + β 2 × tumor size + β 3 × lymph node metastasis + β 4 + β 5 × genetic mutation typing,

h (t, X) is data after regression analysis, h0(t) is a reference risk rate, and beta 1, beta 2, beta 3, beta 4 and beta 5 are coefficients with values of

2. The method of claim 1,

the screening step of the risk factors comprises the following steps: and screening risk factors of the clinical data of the building module by using a lasso analysis method to obtain risk factor data and total life cycle data.

3. The method of claim 1,

the regression analysis step comprises: and performing regression analysis on the risk factor data and the total life cycle data by using a multi-factor Cox analysis method to obtain regression-analyzed data.

4. The method of claim 1,

the post-operative clinical data for lung cancer further comprises: the type of pathology; and/or

The post-regression analysis data includes: disease-free survival rate after operation; and/or

The risk factor data includes genotyping, and further includes at least one of: age, tumor size, lymph node metastasis, mode of surgery.

5. The method of claim 1,

the lung cancer comprises: stage I-IIIA lung cancer; and/or

The gene mutation typing comprises: EGFR mutation, HER2 mutation, MET amplification, ALK fusion, ROIS1 fusion, Kras mutation, RET fusion, Braf mutation.

6. The method of claim 1,

the pretreatment step comprises: and for continuous data in the lung cancer postoperative clinical data, acquiring an optimal critical point by adopting an optimal approximation method of a receiver operating characteristic curve, and grouping a plurality of classified lung cancer postoperative clinical data by adopting the optimal critical point to obtain the modeling group clinical data and the verification group clinical data.

7. The method of claim 1, further comprising:

and a verification step, which is used for verifying the risk factor screening step and the regression analysis step.

8. The method of claim 7, wherein the step of verifying comprises:

calculating the area under the line, the sensitivity and the specificity of the operating characteristic curve of the receiver by adopting a machine learning method based on the risk factor screening step, the regression analysis step and the clinical data of the verification group;

and judging the processing accuracy of the risk factor screening step and the regression analysis step according to the area, the sensitivity and the specificity under the operating characteristic curve of the receiver.

9. The method of claim 8, wherein the machine learning method comprises at least one of:

a logistic regression method, a support vector machine method, a random forest method, a decision tree method, a k-nearest neighbor method, a naive Bayes method, an AdaboDFSt method.

10. The method of claim 9,

and judging the processing accuracy of the risk factor screening step and the regression analysis step under the conditions that the area under the receiver operating characteristic curve is more than 0.65, the sensitivity is more than 0.5 and the specificity is more than 0.5.

11. The method of claim 1, further comprising:

and a display step, displaying the relationship between the risk factor data and the regression analyzed data in a graphical mode.

12. The method of claim 11,

the display step comprises the following steps: and displaying the relationship between the risk factor data and the regression analysis data by using a nomogram.

13. A system for predicting post-operative survival of lung cancer by measuring clinical data including genotyping of gene mutations, comprising:

the data acquisition module is used for acquiring clinical data after lung cancer surgery;

the preprocessing module is used for classifying and grouping the clinical data after the lung cancer operation to obtain modeling group clinical data and verification group clinical data;

the risk factor screening module is used for screening risk factors of the modeling group clinical data to obtain risk factor data and overall life cycle data;

a regression analysis module for performing regression analysis on the risk factor data and the total life cycle data to obtain regression analyzed data,

the postoperative clinical data of lung cancer comprises gene mutation typing, age, tumor size, lymph node metastasis, and operation mode,

the regression analysis is calculated by the following formula

ln [ h (t, X)/h0(t) ] = β 1 × age + β 2 × tumor size + β 3 × lymph node metastasis + β 4 + β 5 × genetic mutation typing,

h (t, X) is data after regression analysis, h0(t) is a reference risk rate, and beta 1, beta 2, beta 3, beta 4 and beta 5 are coefficients with values of

14. The system of claim 13,

the risk factor screening module is used for: and screening risk factors of the clinical data of the building module by using a lasso analysis method to obtain risk factor data and total life cycle data.

15. The system of claim 13,

the regression analysis module is to: and performing regression analysis on the risk factor data and the total life cycle data by using a multi-factor Cox analysis method to obtain regression-analyzed data.

16. The system of claim 13,

the post-operative clinical data for lung cancer further comprises: the type of pathology; and/or

The post-regression analysis data includes: disease-free survival rate after operation; and/or

The risk factor data includes genotyping, and further includes at least one of: age, tumor size, lymph node metastasis, mode of surgery.

17. The system of claim 13,

the lung cancer comprises: stage I-IIIA lung cancer; and/or

The gene mutation typing comprises: EGFR mutation, HER2 mutation, MET amplification, ALK fusion, ROIS1 fusion, Kras mutation, RET fusion, Braf mutation.

18. The system of claim 13,

the preprocessing module is used for: and for continuous data in the lung cancer postoperative clinical data, acquiring an optimal critical point by adopting an optimal approximation method of a receiver operating characteristic curve, and grouping a plurality of classified lung cancer postoperative clinical data by adopting the optimal critical point to obtain the modeling group clinical data and the verification group clinical data.

19. The system of claim 13, further comprising:

and the verification module is used for verifying the risk factor screening module and the regression analysis module.

20. The system of claim 19, wherein the verification module is configured to:

calculating the area, sensitivity and specificity under the operating characteristic curve of the receiver based on the risk factor screening module, the regression analysis module and the clinical data of the verification group by adopting a machine learning method;

and judging the processing accuracy of the risk factor screening module and the regression analysis module according to the area, the sensitivity and the specificity under the line of the receiver operating characteristic curve.

21. The system of claim 20, wherein the machine learning method comprises at least one of:

a logistic regression method, a support vector machine method, a random forest method, a decision tree method, a k-nearest neighbor method, a naive Bayes method, an AdaboDFSt method.

22. The system of claim 21,

and judging the processing accuracy of the risk factor screening module and the regression analysis module under the conditions that the area under the receiver operating characteristic curve is more than 0.65, the sensitivity is more than 0.5 and the specificity is more than 0.5.

23. The system of claim 13, further comprising:

and the display module is used for displaying the relationship between the risk factor data and the regression analyzed data in a graphical mode.

24. The system of claim 23,

the display module is used for: and displaying the relationship between the risk factor data and the regression analysis data by using a nomogram.

Technical Field

The present disclosure relates to the field of surgery, and in particular to methods and systems for predicting post-operative survival of lung cancer by measuring clinical data including gene mutation typing.

Background

Early stage lung cancer comprises I, II stages and a subset of stage III disease. The standard treatment for non-small cell lung cancer is radical resection. After lung cancer surgery, there is a need to predict post-operative survival rates for patients.

In the prior art, TNM is adopted to predict disease-free survival rate after lung cancer surgery by stages. The seventh version of TNM staging is the most widely used staging system, and non-metastatic NSCLC patients are stratified according to the size and infiltration of the tumor and the degree of lymph node involvement. However, the prediction of disease-free survival rate after lung cancer surgery by using TNM in stages is not accurate, the difference of disease-free survival rates of different patients in the same stage is large, and the prediction of the disease-free survival rate after the surgery is very inaccurate.

Patent document CN111640518A discloses a method for predicting the post-operative disease-free survival rate of a cervical cancer patient using a cervical cancer post-operative survival prediction model. The parameter selection, the nomogram and the like are suitable for the postoperative disease-free survival rate of cervical cancer, the pregnancy history, the HPV typing and the FIGO staging are related indexes of the cervical cancer, but not related indexes of lung cancer, and the cervical cancer and the lung cancer are two completely different diseases, so the nomogram in CN111640518A obtained by the indexes is not suitable for the postoperative disease-free survival rate prediction of early lung cancer patients.

Therefore, there is a need for more accurate prediction of postoperative disease-free survival for patients with early stage lung cancer in other, more effective ways. In the prediction process, the selection of prediction parameters and the like is very important for the accuracy of the prediction result.

Disclosure of Invention

To solve the problems in the related art, embodiments of the present disclosure provide methods and systems for predicting survival rate after lung cancer surgery by measuring clinical data including gene mutation typing.

In a first aspect, the disclosed embodiments provide a method for predicting post-operative survival of lung cancer by measuring clinical data including gene mutation typing, comprising:

a data acquisition step, in which clinical data after lung cancer surgery are acquired;

a preprocessing step, namely classifying and grouping the clinical data after the lung cancer operation to obtain modeling group clinical data and verification group clinical data;

a risk factor screening step, namely screening risk factors for the clinical data of the building module to obtain risk factor data and total life cycle data;

a regression analysis step of performing regression analysis on the risk factor data and the total life cycle data to obtain regression-analyzed data,

the postoperative clinical data of lung cancer comprises gene mutation typing, age, tumor size, lymph node metastasis, and operation mode,

the regression analysis is calculated by the following formula

ln [ h (t, X)/h0(t) ] = β 1 × age + β 2 × tumor size + β 3 × lymph node metastasis + β 4 + β 5 × genetic mutation typing,

h (t, X) is data after regression analysis, h0(t) is a reference risk rate, and beta 1, beta 2, beta 3, beta 4 and beta 5 are coefficients with values of

With reference to the first aspect, the present disclosure provides, in a first implementation form of the first aspect,

the screening step of the risk factors comprises the following steps: and screening risk factors of the clinical data of the building module by using a lasso analysis method to obtain risk factor data and total life cycle data.

With reference to the first aspect, the present disclosure provides, in a second implementation form of the first aspect,

the regression analysis step comprises: and performing regression analysis on the risk factor data and the total life cycle data by using a multi-factor Cox analysis method to obtain regression-analyzed data.

With reference to the first aspect, the present disclosure provides, in a third implementation form of the first aspect,

the post-operative clinical data for lung cancer further comprises at least one of: the type of pathology; and/or

The post-regression analysis data includes: disease-free survival rate after operation; and/or

The risk factor data includes genotyping, and further includes at least one of: age, tumor size, lymph node metastasis, mode of surgery.

With reference to the first aspect, the present disclosure provides, in a fourth implementation form of the first aspect,

the lung cancer comprises: stage I-IIIA lung cancer; and/or

The gene mutation typing comprises: EGFR mutation, HER2 mutation, MET amplification, ALK fusion, ROIS1 fusion, Kras mutation, RET fusion, Braf mutation.

With reference to the first aspect, the present disclosure provides, in a fifth implementation form of the first aspect,

the pretreatment step comprises: and for continuous data in the lung cancer postoperative clinical data, acquiring an optimal critical point by adopting an optimal approximation method of a receiver operating characteristic curve, and grouping a plurality of classified lung cancer postoperative clinical data by adopting the optimal critical point to obtain the modeling group clinical data and the verification group clinical data.

With reference to the first aspect, the present disclosure provides, in a sixth implementation form of the first aspect,

and a verification step, which is used for verifying the risk factor screening step and the regression analysis step.

With reference to the sixth implementation manner of the first aspect, in a seventh implementation manner of the first aspect, the verifying step includes:

calculating the area under the line, the sensitivity and the specificity of the operating characteristic curve of the receiver by adopting a machine learning method based on the risk factor screening step, the regression analysis step and the clinical data of the verification group;

and judging the processing accuracy of the risk factor screening step and the regression analysis step according to the area, the sensitivity and the specificity under the operating characteristic curve of the receiver.

With reference to the seventh implementation manner of the first aspect, in an eighth implementation manner of the first aspect, the machine learning method includes at least one of:

a logistic regression method, a support vector machine method, a random forest method, a decision tree method, a k-nearest neighbor method, a naive Bayes method, an AdaboDFSt method.

With reference to the eighth implementation manner of the first aspect, in a ninth implementation manner of the first aspect,

and judging the processing accuracy of the risk factor screening step and the regression analysis step under the conditions that the area under the receiver operating characteristic curve is more than 0.65, the sensitivity is more than 0.5 and the specificity is more than 0.5.

With reference to the first aspect, in a tenth implementation manner of the first aspect, the present disclosure further includes:

and a display step, displaying the relationship between the risk factor data and the regression analyzed data in a graphical mode.

With reference to the tenth implementation manner of the first aspect, in an eleventh implementation manner of the first aspect,

the display step comprises the following steps: and displaying the relationship between the risk factor data and the regression analysis data by using a nomogram.

In a second aspect, a system for predicting post-operative survival of lung cancer by measuring clinical data including genotyping of genetic mutations is provided in embodiments of the present disclosure, comprising:

the data acquisition module is used for acquiring clinical data after lung cancer surgery;

the preprocessing module is used for classifying and grouping the clinical data after the lung cancer operation to obtain modeling group clinical data and verification group clinical data;

the risk factor screening module is used for screening risk factors of the modeling group clinical data to obtain risk factor data and overall life cycle data;

a regression analysis module for performing regression analysis on the risk factor data and the total life cycle data to obtain regression analyzed data,

the postoperative clinical data of lung cancer comprises gene mutation typing, age, tumor size, lymph node metastasis, and operation mode,

the regression analysis is calculated by the following formula

ln [ h (t, X)/h0(t) ] = β 1 × age + β 2 × tumor size + β 3 × lymph node metastasis + β 4 + β 5 × genetic mutation typing,

h (t, X) is data after regression analysis, h0(t) is a reference risk rate, and beta 1, beta 2, beta 3, beta 4 and beta 5 are coefficients with values of

With reference to the second aspect, the present disclosure provides, in a first implementation form of the second aspect,

the risk factor screening module is used for: and screening risk factors of the clinical data of the building module by using a lasso analysis method to obtain risk factor data and total life cycle data.

With reference to the second aspect, the present disclosure provides, in a second implementation form of the second aspect,

the regression analysis module is to: and performing regression analysis on the risk factor data and the total life cycle data by using a multi-factor Cox analysis method to obtain regression-analyzed data.

With reference to the second aspect, the present disclosure, in a third implementation form of the second aspect,

the post-operative clinical data for lung cancer further comprises at least one of: the type of pathology; and/or

The post-regression analysis data includes: disease-free survival rate after operation; and/or

The risk factor data includes genotyping, and further includes at least one of: age, tumor size, lymph node metastasis, mode of surgery.

With reference to the second aspect, the present disclosure, in a fourth implementation form of the second aspect,

the lung cancer comprises: stage I-IIIA lung cancer; and/or

The gene mutation typing comprises: EGFR mutation, HER2 mutation, MET amplification, ALK fusion, ROIS1 fusion, Kras mutation, RET fusion, Braf mutation.

With reference to the second aspect, the present disclosure provides, in a fifth implementation form of the second aspect,

the preprocessing module is used for: and for continuous data in the lung cancer postoperative clinical data, acquiring an optimal critical point by adopting an optimal approximation method of a receiver operating characteristic curve, and grouping a plurality of classified lung cancer postoperative clinical data by adopting the optimal critical point to obtain the modeling group clinical data and the verification group clinical data.

With reference to the second aspect, in a sixth implementation manner of the second aspect, the present disclosure further includes:

and the verification module is used for verifying the risk factor screening module and the regression analysis module.

With reference to the sixth implementation manner of the second aspect, in a seventh implementation manner of the second aspect, the verification module is configured to:

calculating the area, sensitivity and specificity under the operating characteristic curve of the receiver based on the risk factor screening module, the regression analysis module and the clinical data of the verification group by adopting a machine learning method;

and judging the processing accuracy of the risk factor screening module and the regression analysis module according to the area, the sensitivity and the specificity under the line of the receiver operating characteristic curve.

With reference to the seventh implementation manner of the second aspect, in an eighth implementation manner of the second aspect, the machine learning method includes at least one of:

a logistic regression method, a support vector machine method, a random forest method, a decision tree method, a k-nearest neighbor method, a naive Bayes method, an AdaboDFSt method.

With reference to the eighth implementation manner of the second aspect, in a ninth implementation manner of the second aspect,

and judging the processing accuracy of the risk factor screening module and the regression analysis module under the conditions that the area under the receiver operating characteristic curve is more than 0.65, the sensitivity is more than 0.5 and the specificity is more than 0.5.

With reference to the second aspect, in a tenth implementation manner of the second aspect, the present disclosure further includes:

and the display module is used for displaying the relationship between the risk factor data and the regression analyzed data in a graphical mode.

With reference to the tenth implementation manner of the second aspect, in an eleventh implementation manner of the second aspect,

the display module is used for: and displaying the relationship between the risk factor data and the regression analysis data by using a nomogram.

The technical scheme provided by the embodiment of the disclosure can have the following beneficial effects:

according to the technical scheme provided by the embodiment of the disclosure, the method for predicting the survival rate after lung cancer surgery comprises the following steps: a data acquisition step, in which clinical data after lung cancer surgery are acquired; a preprocessing step, namely classifying and grouping clinical data after lung cancer surgery to obtain modeling group clinical data and verification group clinical data; a risk factor screening step, namely screening risk factors of the clinical data of the modeling group to obtain risk factor data and total life cycle data; a regression analysis step of performing regression analysis on the risk factor data and the total survival data to obtain data after the regression analysis, wherein clinical data after lung cancer operation comprise gene mutation typing, age, tumor size, lymph node metastasis and operation mode, the regression analysis calculates ln [ h (t, X)/h0(t) ] = beta 1 age + beta 2 tumor size + beta 3 lymph node metastasis + beta 4 operation mode + beta 5 gene mutation typing by the following formula, h (t, X) is data after the regression analysis, h0(t) is reference risk rate, and beta 1, beta 2, beta 3, beta 4 and beta 5 are coefficients with values of which are set as reference risk rates

Therefore, the accuracy of the patient survival prediction model is improved, and the postoperative disease-free survival rate is accurately estimated.

It is to be understood that both the foregoing general description and the following detailed description are exemplary and explanatory only and are not restrictive of the disclosure.

Drawings

Other features, objects, and advantages of the present disclosure will become more apparent from the following detailed description of non-limiting embodiments when taken in conjunction with the accompanying drawings. The following is a description of the drawings.

Fig. 1a shows an exemplary schematic diagram of an implementation scenario for grouping lung cancer patient data according to an embodiment of the present disclosure.

Fig. 1b illustrates an exemplary schematic diagram of an implementation scenario of a patient survival prediction model according to an embodiment of the present disclosure.

Fig. 1c illustrates an exemplary schematic diagram of an implementation scenario of a validated patient survival prediction model according to an embodiment of the present disclosure.

Fig. 1d shows an exemplary schematic of a nomogram for predicting disease-free survival of a patient, according to an embodiment of the present disclosure.

Fig. 1e shows an exemplary schematic of a subject performance curve according to an embodiment of the present disclosure.

Fig. 2 illustrates a flow chart of a method for predicting post-operative survival of lung cancer by measuring clinical data including gene mutation typing according to an embodiment of the present disclosure.

Fig. 3 shows a flowchart of a method for predicting post-operative survival of lung cancer by measuring clinical data including gene mutation typing according to yet another embodiment of the present disclosure.

Fig. 4 illustrates a flowchart of a method for predicting survival rate after lung cancer surgery by measuring clinical data including gene mutation typing according to still another embodiment of the present disclosure.

Fig. 5 illustrates a block diagram of a system for predicting post-operative survival of lung cancer by measuring clinical data including gene mutation typing according to another embodiment of the present disclosure.

Detailed Description

Hereinafter, exemplary embodiments of the present disclosure will be described in detail with reference to the accompanying drawings so that those skilled in the art can easily implement them. Also, for the sake of clarity, parts not relevant to the description of the exemplary embodiments are omitted in the drawings.

In the present disclosure, it is to be understood that terms such as "including" or "having," etc., are intended to indicate the presence of labels, numbers, steps, actions, components, parts, or combinations thereof disclosed in the present specification, and are not intended to preclude the possibility that one or more other labels, numbers, steps, actions, components, parts, or combinations thereof are present or added.

It should be further noted that the embodiments and labels in the embodiments of the present disclosure may be combined with each other without conflict. The present disclosure will be described in detail below with reference to the accompanying drawings in conjunction with embodiments.

For early stage lung cancer, including I, II stage and a subset of stage III disease, the standard treatment is radical resection. The seventh version of TNM staging is the most widely used tumor staging system, and non-metastatic NSCLC patients are stratified according to the size and infiltration of the tumor and the degree of lymph node involvement. However, TNM staging is less accurate and patients in the same stage have a greatly different post-operative disease-free survival rate. Therefore, there is a need for more accurate prediction of postoperative disease-free survival for patients with early stage lung cancer in other, more effective ways. In the prediction process, selection of prediction parameters, coefficients and the like is very important for the accuracy of the result.

In order to solve the above problems, the present disclosure provides a method and a system for predicting survival rate after lung cancer surgery.

Fig. 1a shows an exemplary schematic diagram of an implementation scenario for grouping lung cancer patient data according to an embodiment of the present disclosure.

FIG. 1a specifically illustrates the process of grouping patient data by a method of predicting post-lung cancer survival by measuring clinical data including gene mutation typing.

It will be appreciated by those of ordinary skill in the art that fig. 1a illustrates an implementation scenario for grouping lung cancer patient data, and does not constitute a limitation of the present disclosure.

As shown in fig. 1a, for the acquired lung cancer patient data 101, step S101 performs 3:1 random grouping, resulting in modeling group data 102 and verification group data 103. Wherein the data volume ratio of the modeling group data 102 to the verification group data 103 is 3: 1. The modeling group data 102 is used to train the patient survival prediction model, and the validation group data 103 is used to validate the accuracy of the results of the patient survival prediction model.

In an embodiment of the present disclosure, lung cancer patient data 101 includes gene mutation typing, age, tumor size, lymph node metastasis, surgical procedure. Gene mutation typing includes all 8 species of the following: EGFR mutation, HER2 mutation, MET amplification, ALK fusion, ROS1 fusion, Kras mutation, RET fusion, Braf mutation.

In the embodiment of the disclosure, the disease-free survival rate can be accurately predicted by adopting the 8 gene mutation typing. For example: EGFR plays an important role in the proliferation, growth, repair, survival, etc. of tumor cells. EGFR mutations can be overexpressed in tumors of epithelial origin, such as non-small cell lung cancer. In addition, EGFR mutations are also closely associated with neovascularization, tumor invasion and metastasis, tumor chemotherapy resistance, and prognosis. The HER2 mutant highly-expressed tumor shows stronger metastatic capacity and infiltration capacity, has poorer sensitivity to chemotherapy and is easy to relapse. The c-Met protein coded by the MET gene is a tyrosine kinase receptor of Hepatocyte Growth Factor (HGF), and the HGF is combined with the c-Met to activate a downstream signal channel and promote cell proliferation, growth, migration and angiogenesis. When MET gene is amplified, related signal paths are continuously activated, so that lung cancer cells are continuously proliferated and transferred.

In an embodiment of the present disclosure, lung cancer patient data 101 also includes a pathology type.

In embodiments of the present disclosure, lung cancer patient data 101 may be obtained by measurement in a variety of ways, such as CT examination, chest puncture biopsy, gene testing kits, and the like.

In embodiments of the present disclosure, lung cancer patient data 101 may be stored in a database to facilitate extraction of lung cancer patient data 101 at any time and for comprehensive analysis.

In an embodiment of the present disclosure, the optimal critical point may be found by adopting a method of optimal approximation of a Receiver Operating Characteristic (ROC) curve for continuous data, such as age, in the clinical lung cancer patient data 101, and the clinical lung cancer patient data may be grouped into multiple categories based on the optimal critical point. Classification data, such as patient tumor size, lymph node metastasis, surgical procedure, type of pathology, and adjuvant treatment plan, may all be treated as grouped data.

In embodiments of the disclosure, the inclusion criteria for lung cancer patients entering the statistical analysis are:

TNM staging for early stage lung cancer patients in TNM I-IIIA;

2, the surgical treatment is the first choice, and no new auxiliary chemotherapy or radiotherapy is performed before the operation;

3, the operation mode is as follows: radical resection of lung cancer + lymph node dissection;

and 4, the postoperative follow-up time is at least 3 years.

Criteria for excluding lung cancer patients were:

1, absence of any clinical information;

2, not combining other primary malignant tumors at the same time.

In embodiments of the present disclosure, the population of lung cancer patients may be more than 500.

One of ordinary skill in the art will appreciate that the population of lung cancer patients may also be other values, such as 1000 above 500, and the present disclosure is not limited thereto.

Fig. 1b illustrates an exemplary schematic diagram of an implementation scenario of a patient survival prediction model according to an embodiment of the present disclosure.

FIG. 1b specifically illustrates the workflow of a patient survival prediction model in a method for predicting survival after lung cancer surgery by measuring clinical data including gene mutation typing.

It will be understood by those of ordinary skill in the art that fig. 1b illustrates an implementation scenario of a patient survival prediction model, and does not constitute a limitation of the present disclosure.

As shown in fig. 1b, for the modeling group data 102, step S102 performs risk factor screening such as LASSO analysis (LASSO analysis), to obtain risk factor data and total Disease-free Survival (DFS) data 104.

The LASSO analysis is to add a penalty term to compress the estimated parameters based on least square, and when the parameters are reduced to be less than a threshold, the parameters are changed to 0, so as to select independent variables with larger influence on dependent variables and calculate corresponding regression coefficients. LASSO analysis has significant advantages in processing sample data where multiple collinearity exists. The formula for LASSO analysis is

FLASSO=‖y-Xw‖2+λ‖w‖

Wherein y is a dependent variable, X is an independent variable, w is a loss function, and λ is a penalty coefficient.

In embodiments of the disclosure, the risk factor data comprises a genetic mutation typing. The gene mutation typing includes: EGFR mutation, HER2 mutation, MET amplification, ALK fusion, ROIS1 fusion, Kras mutation, RET fusion, Braf mutation. The risk factor data further includes at least one of: age, tumor size, lymph node metastasis, surgical procedure. For the same lung cancer patient, the risk factor data and the overall survival data correspond to each other.

The overall lifetime data is the time from the randomized block to death due to any cause. For subjects who have been lost prior to death, the last follow-up time can be calculated as the time of death.

In embodiments of the present disclosure, a regression analysis step, such as S103 multifactor Cox analysis, performs regression analysis on the risk factor data and overall survival data 104 to obtain post-regression analysis data, such as post-operative disease-free survival. The multifactor Cox analysis calculates the postoperative disease-free survival rate in the following manner:

ln [ h (t, X)/h0(t) ] = β 1 age + β 2 tumor size + β 3 lymph node metastasis + β 4 surgical modality + β 5 gene mutation typing (β values see table below), where ln represents logarithms, h0(t) represents baseline risk rates, and β 1, β 2, β 3, β 4, β 5 are coefficients.

In the embodiment of the present disclosure, in the above formula, the values of tumor size, lymph node metastasis, operation mode, age, and gene mutation typing may be all 1.

Based on the risk factor data and the postoperative disease-free survival rate, in step S104, a nomogram is established to predict a patient survival score and corresponding probability. The alignment chart is shown in FIG. 1 d.

In an embodiment of the present disclosure, LASSO analysis S102 and multifactor Cox analysis S103 together comprise a patient survival prediction model 105.

Fig. 1c illustrates an exemplary schematic diagram of an implementation scenario of a validated patient survival prediction model according to an embodiment of the present disclosure.

FIG. 1c specifically illustrates a validation procedure in a method for predicting post-operative survival of lung cancer by measuring clinical data including gene mutation typing.

It will be appreciated by those of ordinary skill in the art that fig. 1c illustrates an implementation scenario of a predictive model for validating patient survival, and does not constitute a limitation of the present disclosure.

In the embodiment of the present disclosure, based on the patient survival prediction model 105 and the validation group data 103, the area, sensitivity, and specificity 106 under the ROC curve are calculated by the artificial intelligence model S105, and the patient survival prediction model accuracy prediction is performed in step S106.

In the embodiment of the disclosure, the artificial intelligence model may use at least one of a Logistic Regression (LR) method, a Support Vector Machine (SVM) method, a Random Forest (RF) method, a Decision Tree (DT) method, a K-nearest neighbor (KNN) method, a Naive Bayesian (NB) method, and an adabodfst (ada) method, and adopt a 10-fold cross validation method to obtain an ROC curve, and calculate sensitivity and specificity. When the area under the ROC curve is larger than 0.65, the survival prediction model of the patient can be predicted to have good model discrimination; when the sensitivity and the specificity are both more than 0.5, the prediction model for predicting the survival of the patient has good prediction effect. By integrating the area under the ROC curve, the sensitivity and the specificity, when the area under the ROC curve is more than 0.65 and both the sensitivity and the specificity are more than 0.5, the survival prediction model of the patient can be predicted to have higher accuracy.

Fig. 1d shows an exemplary schematic of a nomogram for predicting disease-free survival of a patient, according to an embodiment of the present disclosure.

FIG. 1d specifically shows a nomogram for predicting disease-free survival of a patient in a method for predicting survival after lung cancer surgery by measuring clinical data including gene mutation typing.

It will be appreciated by those of ordinary skill in the art that fig. 1d illustrates a nomogram for predicting disease-free survival of a patient, and does not constitute a limitation of the present disclosure.

The nomogram is that a multi-factor Cox regression model is built, each value level of each influence factor is assigned according to the influence degree (the size of a regression coefficient) of each influence factor on an ending variable in the model, then all scores are added to obtain a total score, and finally the prediction probability of the individual ending event is calculated through the function conversion relation between the total score and the occurrence probability of the ending event. Based on the alignment chart shown in fig. 1d, the number of points corresponding to each risk factor can be obtained using the values of each risk factor for tumor size, lymph node metastasis, age, type of surgery, and genetic mutation typing. And adding the points corresponding to the risk factors to obtain the total points. The corresponding disease-free survival rate of 1 year, 3 years and 5 years can be obtained from the total points.

In the embodiment of the present disclosure, as shown in the alignment chart of fig. 1d, the corresponding relationship between the value and the point number of each optimized risk factor is obtained through a Cox regression model. For example: when the size of the tumor is1, the corresponding point number is 0; when the tumor size is 2, the corresponding number of points is 33; when the tumor size is 3, the corresponding number of points is 66; when the tumor size is 4, the corresponding number of points is 100. When the gene mutation typing is Pure EGFR mutation/AE Function, the corresponding point number is 0; when the gene mutation typing is other, the number of the corresponding points is 24. The area under the line of the operating characteristic curve 107 in fig. 1e is made larger than 0.65 by the correspondence between the values and the number of points of the risk factors. The corresponding sensitivity and specificity are both more than 0.5, specifically, the area under the line of the working characteristic curve of the testee is 0.71, the corresponding sensitivity is 0.67, and the specificity is 0.68, so that accurate results of the disease-free survival rate of 1 year, the disease-free survival rate of 3 years and the disease-free survival rate of 5 years are obtained.

Fig. 1e shows an exemplary schematic of a subject performance curve according to an embodiment of the present disclosure.

FIG. 1e specifically illustrates a subject performance curve for a method of predicting post-operative survival of lung cancer by measuring clinical data including gene mutation typing.

It will be understood by those of ordinary skill in the art that figure 1e illustrates a subject performance curve without limiting the present disclosure.

As shown in fig. 1e, the area under the line of the subject performance curve 107 was 0.71, greater than 0.65. The corresponding sensitivity was 0.67 and specificity was 0.68, both greater than 0.5. Therefore, the patient survival prediction model has good model distinguishing effect and prediction effect and higher accuracy.

Fig. 2 illustrates a flow chart of a method for predicting post-operative survival of lung cancer by measuring clinical data including gene mutation typing according to an embodiment of the present disclosure.

As shown in fig. 2, the method for predicting post-operative survival of lung cancer by measuring clinical data including gene mutation typing comprises: steps S201, S202, S203, S204.

In step S201, post-lung cancer surgery clinical data is acquired.

In step S202, clinical data after lung cancer surgery are classified and grouped to obtain clinical data of a building group and clinical data of a verification group.

In step S203, risk factor screening is performed on the modeling group clinical data to obtain risk factor data and overall lifetime data.

In step S204, regression analysis is performed on the risk factor data and the total lifetime data to obtain regression-analyzed data.

Step S201 is a data acquisition step, step S202 is a preprocessing step, step S203 is a risk factor screening step, and step S204 is a regression analysis step.

Clinical data after lung cancer surgery include gene mutation typing, age, tumor size, lymph node metastasis, mode of surgery,

the regression analysis is calculated by the following formula

ln [ h (t, X)/h0(t) ] = β 1 × age + β 2 × tumor size + β 3 × lymph node metastasis + β 4 + β 5 × genetic mutation typing,

h (t, X) is data after regression analysis, h0(t) is a reference risk rate, and beta 1, beta 2, beta 3, beta 4 and beta 5 are coefficients with values of

According to an embodiment of the present disclosure, post-operative clinical data of lung cancer is acquired through a data acquisition step; a preprocessing step, namely classifying and grouping clinical data after lung cancer surgery to obtain modeling group clinical data and verification group clinical data; a risk factor screening step, namely screening risk factors of the clinical data of the modeling group to obtain risk factor data and total life cycle data; a regression analysis step of performing regression analysis on the risk factor data and the total survival data to obtain data after the regression analysis, wherein clinical data after lung cancer operation comprise gene mutation typing, age, tumor size, lymph node metastasis and operation mode, the regression analysis calculates ln [ h (t, X)/h0(t) ] = beta 1 age + beta 2 tumor size + beta 3 lymph node metastasis + beta 4 operation mode + beta 5 gene mutation typing by the following formula, h (t, X) is data after the regression analysis, h0(t) is reference risk rate, and beta 1, beta 2, beta 3, beta 4 and beta 5 are coefficients with values of which are set as reference risk rates

Therefore, the risk factors more relevant to the postoperative disease-free survival rate are screened out, the accuracy of the survival prediction model of the patient is improved, and the postoperative disease-free survival rate is accurately estimated.

According to an embodiment of the present disclosure, the screening of the risk factors includes: and screening risk factors on the clinical data of the building module by using a lasso analysis method to obtain risk factor data and total life cycle data, thereby obtaining the risk factors more related to the postoperative disease-free survival rate and improving the estimation accuracy of the postoperative disease-free survival rate.

According to an embodiment of the present disclosure, the analyzing step by regression includes: and performing regression analysis on the risk factor data and the total life cycle data by using a multi-factor Cox analysis method to obtain regression-analyzed data, so that accurate regression-analyzed data is obtained, the accuracy of a patient survival prediction model is improved, and the estimation accuracy of postoperative disease-free survival rate is improved.

According to embodiments of the present disclosure, post-operative clinical data by lung cancer further includes a pathology type; and/or the post-regression analysis data comprises: disease-free survival rate after operation; and/or the risk factor data comprises genotyping, further comprising at least one of: age, tumor size, lymph node metastasis and operation mode, so that the accuracy of a patient survival prediction model and the estimation accuracy of postoperative disease-free survival rate are improved by selecting appropriate postoperative clinical data and risk factor data of the lung cancer.

According to embodiments of the present disclosure, lung cancer comprises: stage I-IIIA lung cancer; and/or the genotyping of the gene comprises: EGFR mutation, HER2 mutation, MET amplification, ALK fusion, ROIS1 fusion, Kras mutation, RET fusion and Braf mutation, so that a reasonable lung cancer applicable type is selected, the accuracy of a patient survival prediction model is improved, and the estimation accuracy of postoperative disease-free survival rate is improved.

According to an embodiment of the present disclosure, the pre-processing step includes: for continuous data in the lung cancer postoperative clinical data, an optimal critical point is obtained by adopting an optimal approximation method of a receiver operating characteristic curve, and a plurality of classified lung cancer postoperative clinical data are grouped by adopting the optimal critical point to obtain modeling group clinical data and verification group clinical data, so that reasonable data grouping is performed, the accuracy of a patient survival prediction model is improved, and the estimation accuracy of postoperative disease-free survival rate is improved.

Fig. 3 shows a flowchart of a method for predicting post-operative survival of lung cancer by measuring clinical data including gene mutation typing according to yet another embodiment of the present disclosure.

As shown in fig. 3, the method for predicting the survival rate after lung cancer surgery by measuring clinical data including gene mutation typing includes, in addition to the steps S201, S202, S203, S204 identical to fig. 2: step S301.

In step S301, the risk factor screening step and the regression analysis step are verified.

Step S301 is a verification step.

According to an embodiment of the present disclosure, by further comprising: and a verification step, namely verifying the risk factor screening step and the regression analysis step so as to verify the accuracy of the patient survival prediction model.

According to an embodiment of the present disclosure, the verifying step includes: calculating the area, sensitivity and specificity under the operating characteristic curve of a receiver by adopting a machine learning method based on the risk factor screening step, the regression analysis step and the verification group clinical data; and judging the processing accuracy of the risk factor screening step and the regression analysis step according to the area, the sensitivity and the specificity under the operating characteristic curve of the receiver, thereby verifying the accuracy of the patient survival prediction model.

According to an embodiment of the present disclosure, the machine learning method includes at least one of: the method comprises a logistic regression method, a support vector machine method, a random forest method, a decision tree method, a k-nearest neighbor method, a naive Bayes method and an AdaboDFSt method, so that the offline area, the sensitivity and the specificity of the receiver operation characteristic curve can be accurately calculated, and the patient survival prediction model can be accurately verified.

According to the embodiment of the disclosure, the quantitative prediction standard of the accuracy of the patient survival prediction model is obtained by judging the processing accuracy of the risk factor screening step and the regression analysis step under the condition that the area under the receiver operation characteristic curve is more than 0.65, the sensitivity is more than 0.5, and the specificity is more than 0.5.

Fig. 4 illustrates a flowchart of a method for predicting survival rate after lung cancer surgery by measuring clinical data including gene mutation typing according to still another embodiment of the present disclosure.

As shown in fig. 4, the method for predicting the survival rate after lung cancer surgery by measuring clinical data including gene mutation typing includes a step S401 in addition to the steps S201, S202, S203, S204, S301 identical to fig. 3.

In step S401, the relationship between the risk factor data and the regression-analyzed data is graphically displayed.

According to the embodiment of the disclosure, the relationship between the risk factor data and the regression-analyzed data is graphically displayed through the displaying step, so that the relationship between the risk factor data and the regression-analyzed data, such as postoperative disease-free survival rate, is visually and vividly displayed, and the use convenience is improved.

According to an embodiment of the present disclosure, the displaying step includes: the nomogram is used for displaying the relationship between the risk factor data and the regression-analyzed data, so that the regression-analyzed data such as postoperative disease-free survival rate can be intuitively and conveniently calculated from the risk factor data.

Fig. 5 illustrates a block diagram of a system for predicting post-operative survival of lung cancer by measuring clinical data including gene mutation typing according to an embodiment of the present disclosure.

As shown in fig. 5, a system 500 for predicting post-operative survival of lung cancer by measuring clinical data including genotyping of gene mutations comprises: the system comprises a data acquisition module 501, a preprocessing module 502, a risk factor screening module 503 and a regression analysis module 504.

In an embodiment of the present disclosure, the data acquisition module 501 is used to acquire post-operative clinical data of lung cancer; the preprocessing module 502 is used for classifying and grouping clinical data after lung cancer surgery to obtain modeling group clinical data and verification group clinical data; the risk factor screening module 503 is configured to perform risk factor screening on the modeling group clinical data to obtain risk factor data and overall lifetime data; the regression analysis module 504 is configured to perform regression analysis on the risk factor data and the total lifetime data to obtain post-regression analysis data, wherein the post-lung cancer surgery clinical data includes a genetic mutation typing, an age, a tumor size, a lymph node metastasis, and a surgery mode, the regression analysis calculates ln [ h (t, X)/h0(t) ] = β 1 age + β 2 tumor size + β 3 lymph node metastasis + β 4 surgery mode + β 5 genetic mutation typing by the following formula, h (t, X) is post-regression analysis data, h0(t) is a reference risk rate, β 1, β 2, β 3, β 4, and β 5 are coefficients, and values thereof are set as reference risk rates

According to an embodiment of the present disclosure, the data acquisition module is used for acquiring clinical data after lung cancer surgery; the preprocessing module is used for classifying and grouping clinical data after lung cancer surgery to obtain modeling group clinical data and verification group clinical data; the risk factor screening module is used for screening risk factors of the clinical data of the modeling group to obtain risk factor data and total life cycle data; a regression analysis module for performing regression analysis on the risk factor data and the total survival data to obtain data after regression analysis, wherein clinical data after lung cancer operation comprises gene mutation typing, age, tumor size, lymph node metastasis and operation mode, the regression analysis calculates ln [ h (t, X)/h0(t) ] = beta 1 age + beta 2 tumor size + beta 3 lymph node metastasis + beta 4 operation mode + beta 5 gene mutation typing by the following formula, h (t, X) is data after regression analysis, h0(t) is reference risk rate, beta 1, beta 2, beta 3, beta 4 and beta 5 are coefficients, and values are set as values

Therefore, the risk factors more relevant to the postoperative disease-free survival rate are screened out, the accuracy of the survival prediction model of the patient is improved, and the postoperative disease-free survival rate is accurately estimated.

According to an embodiment of the present disclosure, the risk factor screening module is configured to: and (3) screening the risk factors of the clinical data of the modeling group by using a lasso analysis method to obtain the risk factor data and the overall survival time data, thereby obtaining the risk factors more related to the postoperative disease-free survival rate and improving the estimation accuracy of the postoperative disease-free survival rate.

According to an embodiment of the present disclosure, the regression analysis module is configured to: and performing regression analysis on the risk factor data and the total life cycle data by using a multi-factor Cox analysis method to obtain regression-analyzed data, so that accurate regression-analyzed data is obtained, the accuracy of a patient survival prediction model is improved, and the estimation accuracy of postoperative disease-free survival rate is improved.

According to embodiments of the present disclosure, post-operative clinical data via lung cancer further comprises: the type of pathology; and/or the post-regression analysis data comprises: disease-free survival rate after operation; and/or the risk factor data comprises genotyping, further comprising at least one of: age, tumor size, lymph node metastasis and operation mode, so that the accuracy of a patient survival prediction model and the estimation accuracy of postoperative disease-free survival rate are improved by selecting appropriate postoperative clinical data and risk factor data of the lung cancer.

According to embodiments of the present disclosure, lung cancer comprises: stage I-IIIA lung cancer; and/or the gene mutation typing comprises at least one of: EGFR mutation, HER2 mutation, MET amplification, ALK fusion, ROS1 fusion, Kras mutation, RET fusion and Braf mutation, so that a reasonable lung cancer applicable type is selected, the accuracy of a patient survival prediction model is improved, and the estimation accuracy of postoperative disease-free survival rate is improved.

According to an embodiment of the present disclosure, the preprocessing module is configured to: for continuous data in the lung cancer postoperative clinical data, an optimal critical point is obtained by adopting an optimal approximation method of a receiver operating characteristic curve, and a plurality of classified lung cancer postoperative clinical data are grouped by adopting the optimal critical point to obtain modeling group clinical data and verification group clinical data, so that reasonable data grouping is performed, the accuracy of a patient survival prediction model is improved, and the estimation accuracy of postoperative disease-free survival rate is improved.

According to an embodiment of the present disclosure, the system for predicting post-operative survival rate of lung cancer by measuring clinical data including gene mutation typing may further include, in addition to the data acquisition module 501, the preprocessing module 502, the risk factor screening module 503, and the regression analysis module 504 in fig. 5: and a verification module.

And the verification module is used for verifying the risk factor screening module and the regression analysis module.

According to an embodiment of the present disclosure, by further comprising: and the verification module is used for verifying the risk factor screening module and the regression analysis module so as to verify the accuracy of the patient survival prediction model.

According to an embodiment of the present disclosure, the verification module is configured to: calculating the area, sensitivity and specificity under the operating characteristic curve of a receiver by adopting a machine learning method based on a risk factor screening module, a regression analysis module and verification group clinical data; and judging the processing accuracy of the risk factor screening module and the regression analysis module according to the area, the sensitivity and the specificity under the operating characteristic curve of the receiver, so as to verify the accuracy of the patient survival prediction model.

According to an embodiment of the present disclosure, the machine learning method includes at least one of: the method comprises a logistic regression method, a support vector machine method, a random forest method, a decision tree method, a k-nearest neighbor method, a naive Bayes method and an AdaboDFSt method, so that the offline area, the sensitivity and the specificity of the receiver operation characteristic curve can be accurately calculated, and the patient survival prediction model can be accurately verified.

According to the embodiment of the disclosure, the quantitative prediction standard of the accuracy of the patient survival prediction model is obtained by judging the processing accuracy of the risk factor screening module and the regression analysis module under the conditions that the area under the receiver operation characteristic curve is more than 0.65, the sensitivity is more than 0.5 and the specificity is more than 0.5.

In the embodiment of the present disclosure, the system for predicting the post-operative survival rate of lung cancer by measuring clinical data including gene mutation typing may further include, in addition to the data acquisition module 501, the preprocessing module 502, the risk factor screening module 503, the regression analysis module 504, and the verification module: and a display module.

And the display module is used for displaying the relationship between the risk factor data and the regression analyzed data in a graphical mode.

According to an embodiment of the present disclosure, by further comprising: the display module is used for displaying the relationship between the risk factor data and the regression analysis data in a graphical mode, so that the relationship between the risk factor data and the regression analysis data such as postoperative disease-free survival rate is visually and vividly displayed, and the use convenience is improved.

According to an embodiment of the present disclosure, a display module is used for: the nomogram is used for displaying the relationship between the risk factor data and the regression-analyzed data, so that the regression-analyzed data such as postoperative disease-free survival rate can be intuitively and conveniently calculated from the risk factor data.

The flowchart and block diagrams in the figures illustrate the architecture, functionality, and operation of possible implementations of systems, methods and computer program products according to various embodiments of the present disclosure. In this regard, each block in the flowcharts or block diagrams may represent a module, a program segment, or a portion of code, which comprises one or more executable instructions for implementing the specified logical function(s). It should also be noted that, in some alternative implementations, the functions noted in the block may occur out of the order noted in the figures. For example, two blocks shown in succession may, in fact, be executed substantially concurrently, or the blocks may sometimes be executed in the reverse order, depending upon the functionality involved. It will also be noted that each block of the block diagrams and/or flowchart illustration, and combinations of blocks in the block diagrams and/or flowchart illustration, can be implemented by special purpose hardware-based systems which perform the specified functions or acts, or combinations of special purpose hardware and computer instructions.

The units or modules described in the embodiments of the present disclosure may be implemented by software or hardware. The units or modules described may also be provided in a processor, and the names of the units or modules do not in some cases constitute a limitation of the units or modules themselves.

The foregoing description is only exemplary of the preferred embodiments of the disclosure and is illustrative of the principles of the technology employed. It will be appreciated by those skilled in the art that the scope of the invention in the present disclosure is not limited to the specific combination of the above-mentioned features, but also encompasses other embodiments in which any combination of the above-mentioned features or their equivalents is possible without departing from the inventive concept. For example, the above features and (but not limited to) the features disclosed in this disclosure having similar functions are replaced with each other to form the technical solution.

26页详细技术资料下载
上一篇:一种医用注射器针头装配设备
下一篇:一种流行病患者信息三维空间可视化方法

网友询问留言

已有0条留言

还没有人留言评论。精彩留言会获得点赞!

精彩留言,会给你点赞!