Method for predicting survival rate of cancer patient

文档序号:453223 发布日期:2021-12-28 浏览:3次 中文

阅读说明:本技术 癌症患者生存率的预测方法 (Method for predicting survival rate of cancer patient ) 是由 R·C·斯旺顿 D·比斯瓦斯 N·麦格拉纳汉 N·J·比尔克巴克 于 2020-01-30 设计创作,主要内容包括:本发明提供了一种为肺癌受试者提供预后的方法,所述方法包括:(a)将来自受试者的生物样本与和一组生物标志物中的每个成员特异性结合的试剂接触,所述生物标志物包括ANLN、ASPM、CDCA4、ERRFI1、FURIN、GOLGA8A、ITGA6、JAG1、LRP12、MAFF、MRPS17、PLK1、PNP、PPP1R13L、PRKCA、PTTG1、PYGB、RPP25、SCPEP1、SLC46A3、SNX7、TPBG、XBP1;(b)根据样本中生物标志物的核酸表达水平,确定受试者的风险分数;(c)根据受试者的风险分数,提供肺癌的预后。(The present invention provides a method of providing a prognosis for a subject with lung cancer, the method comprising: (a) contacting a biological sample from a subject with an agent that specifically binds to each member of a set of biomarkers comprising ANLN, ASPM, CDCA4, ERRFI1, FURIN, GOLGA8A, ITGA6, JAG1, LRP12, MAFF, MRPS17, PLK1, PNP, PPP1R13L, PRKCA, PTTG1, PYGB, RPP25, SCPEP1, SLC46A3, SNX7, TPBG, XBP 1; (b) determining a risk score for the subject based on the nucleic acid expression level of the biomarker in the sample; (c) providing a prognosis for lung cancer based on the risk score of the subject.)

1. A method of providing a prognosis for a subject with lung cancer, the method comprising:

(a) contacting a biological sample of a subject with an agent that specifically binds to each member of a group of biomarkers comprising ANLN, ASPM, CDCA4, ERRFI1, FURIN, GOLGA8A, ITGA6, JAG1, LRP12, MAFF, MRPS17, PLK1, PNP, PPP1R13L, PRKCA, PTTG1, PYGB, RPP25, SCPEP1, SLC46A3, SNX7, TPBG, XBP 1;

(b) determining a risk score for the subject based on the nucleic acid expression level of the biomarker in the sample; and

(c) providing a prognosis for lung cancer based on the risk score of the subject.

2. The method of claim 1, wherein determining a risk score for a subject comprises:

for each biomarker, determining a score indicative of the level of nucleic acid expression in the tissue sample;

calculating a risk score from the determined scores, wherein the risk score is calculated by adding the scores of the weighted biomarkers, wherein the scores of the biomarkers are based on the determined scores and the score of each biomarker has an associated weight; and

the risk score is compared to a threshold.

3. The method of claim 2, wherein the associated weight for each biomarker score in GOLGA8A, SCPEP1, SLC46A3, and XBP1 is a negative value and the associated weight for the biomarker score in ANLN, ASPM, CDCA4, ERRFI1, FURIN, ITGA6, JAG1, LRP12, MAFF, MRPS17, PLK1, PNP, PPP1R13L, PRKCA, PTTG1, PYGB, RPP25, SNX7, and TPBG is a positive value.

4. The method of claim 2 or claim 3, wherein the weighted sum of the risk scores is:

risk score b1x1i+b2x2i+...+bnxni

Wherein x1i、x2i、……、xniIs the biomarker score of four selected biomarkers per subject i, b1、b2、……、bnIs a set of associated weights for each biomarker score.

5. The method of claim 4, further comprising determining weights for the weighted sum using a Cox proportional risk model trained using training data comprising information about a plurality of biomarkers for a set of subjects.

6. The method of claim 5, further comprising identifying a plurality of biomarkers for use in a Cox proportional hazards model, wherein the plurality of biomarkers is selected from the group comprising ANLN, ASPM, CDCA4, ERRFI1, FURIN, GOLGA8A, ITGA6, JAG1, LRP12, MAFF, MRPS17, PLK1, PNP, PPP1R13L, PRKCA, PTTG1, PYGB, RPP25, SCPEP1, SLC46A3, SNX7, TPBG, XBP 1.

7. The method of any of claims 2 to 6, wherein the threshold is a median risk score of training data.

8. The method of any one of the preceding claims, wherein determining a score indicative of a biomarker level comprises determining a zoom intensity score.

9. The method of claim 8, wherein the biomarker score is based on a scaled intensity score that has been adjusted by subtracting an adjustment factor.

10. The method of any one of the preceding claims, wherein determining a score indicative of a biomarker level comprises assigning a first value when the level is above a threshold and assigning a second value when the level is below a threshold.

11. The method of any one of the preceding claims, wherein determining a score indicative of a biomarker level comprises assigning a first value when the level is above an upper threshold, a second value when the level is below the upper threshold but above a lower threshold, and a third value when the level is below the lower threshold.

12. The method of any one of the preceding claims, wherein the agent is a nucleic acid.

13. The method of any one of the preceding claims, wherein the lung cancer is non-small cell lung cancer (NSCLC).

14. The method of claim 13, wherein the NSCLC is selected from invasive adenocarcinoma (LUAD), squamous cell carcinoma (LUSC), large cell carcinoma, adenosquamous carcinoma, carcinosarcoma, or large cell neuroendocrine carcinoma.

15. The method of claim 13 or 14, wherein the NSCLC is stage I, stage II, stage III, or stage IV.

16. The method of any one of the preceding claims, wherein the sample is from a surgically resected tumor.

17. The method of any one of the preceding claims, wherein the sample is from a lung tissue or lung tumor biopsy.

18. The method of any one of the preceding claims, wherein the prognosis provides a risk assessment.

19. The method of any one of the preceding claims, wherein the method further comprises determining a treatment regimen.

20. The method of claim 19, wherein the treatment regimen is selected from surgical treatment, chemotherapy, surgery, radiation therapy, immunotherapy, or CAR-T therapy.

21. A method for determining a treatment regimen for a subject, the method comprising the method of any one of claims 1 to 18 and further comprising the further step of determining a treatment regimen.

22. The method of claim 21, wherein the treatment regimen is selected from surgical treatment, chemotherapy, surgery, radiation therapy, immunotherapy, or CAR-T therapy.

23. A composition comprising a set of reagents that specifically bind to each member of a set of biomarkers comprising or consisting of: ANLN, ASPM, CDCA4, ERRFI1, FURIN, GOLGA8A, ITGA6, JAG1, LRP12, MAFF, MRPS17, PLK1, PNP, PPP1R13L, PRKCA, PTTG1, PYGB, RPP25, SCPEP1, SLC46A3, SNX7, TPBG, and XBP 1.

24. A kit comprising an agent that specifically binds to each member of a group of biomarkers comprising ANLN, ASPM, CDCA4, erfi 1, FURIN, GOLGA8A, ITGA6, JAG1, LRP12, MAFF, MRPS17, PLK1, PNP, PPP1R13L, PRKCA, PTTG1, PYGB, RPP25, SCPEP1, SLC46A3, SNX7, TPBG, and XBP 1.

25. The composition of claim 23 or kit of claim 24, wherein the agent is a nucleic acid.

26. Use of the composition of claim 23 or 25 or the kit of claims 24 and 25 in a method of providing a prognosis for a subject with lung cancer according to any one of claims 1 to 18.

27. Use of the composition of claim 23 or 25 or the kit of claims 24 and 25 in a method of providing a treatment regimen for a lung cancer subject according to claim 19 or 20.

Use of ANLN, ASPM, CDCA4, erfi 1, FURIN, GOLGA8A, ITGA6, JAG1, LRP12, MAFF, MRPS17, PLK1, PNP, PPP1R13L, PRKCA, PTTG1, PYGB, RPP25, SCPEP1, SLC46A3, SNX7, TPBG and XBP1 in a method of providing a prognosis for a lung cancer subject according to any one of claims 1 to 18.

Use of ANLN, ASPM, CDCA4, ERRFI1, FURIN, GOLGA8A, ITGA6, JAG1, LRP12, MAFF, MRPS17, PLK1, PNP, PPP1R13L, PRKCA, PTTG1, PYGB, RPP25, SCPEP1, SLC46A3, SNX7, TPBG, and XBP1 in a method according to claim 19 or 20 for providing a treatment regimen for a lung cancer subject.

30. A method of treating a subject with lung cancer, comprising the step of predicting a level of risk of mortality in a subject with lung cancer, the method comprising:

(a) contacting a biological sample of a subject with an agent that specifically binds to each member of a group of biomarkers comprising ANLN, ASPM, CDCA4, ERRFI1, FURIN, GOLGA8A, ITGA6, JAG1, LRP12, MAFF, MRPS17, PLK1, PNP, PPP1R13L, PRKCA, PTTG1, PYGB, RPP25, SCPEP1, SLC46A3, SNX7, TPBG, XBP 1;

(b) determining a risk score for the subject based on the nucleic acid expression level of the biomarker in the sample;

(c) comparing the risk score to a threshold to predict whether the subject is at high risk of mortality;

(d) selecting a treatment regimen; and

(e) the treatment regimen is implemented.

31. A method of generating a biomarker signature for a cancer subject, the method comprising:

generating training data from a plurality of subjects having cancer, the training data comprising gene expression data for a plurality of genes for each of a plurality of subjects;

calculating an intratumoral heterogeneity metric and an intratumoral heterogeneity metric for each gene of the plurality of genes from the gene expression data; and

selecting genes for which intratumoral heterogeneity is below an intratumoral heterogeneity threshold and intratumoral heterogeneity is above an intratumoral heterogeneity threshold using a heterogeneity filter;

wherein the biomarker signature comprises at least some of the selected genes.

32. The method of claim 31, further comprising:

calculating a consensus score for each gene; and

genes with a consensus score below the consensus threshold are selected using the consensus filter.

33. The method of claim 32, wherein the identity score of the selected genes can be calculated after applying the heterogeneity filter.

34. The method of any one of claims 31 to 33, wherein the measure of intratumoral heterogeneity of each gene is calculated by:

obtaining the gene expression values of each gene at a plurality of positions in the same tumor,

calculating, for each tumor, a metric indicative of the resulting gene expression value for each gene, an

An intratumoral heterogeneity metric was obtained as the mean of each gene indicative metric in each tumor.

35. The method of claim 34, wherein the measure indicative of gene expression value is selected from the group consisting of standard deviation, median absolute deviation, and coefficient of variation.

36. The method of any one of claims 31 to 35, wherein the measure of inter-tumor heterogeneity for each gene is calculated by:

obtaining a gene expression value for each gene in each subject for one of a plurality of regions of a tumor; and

the standard deviation was taken for the obtained values.

37. The method of claim 36, further comprising a plurality of iterative obtaining and evaluating steps, and averaging the standard deviation of the iterations to obtain a measure of intratumoral heterogeneity.

38. The method of any one of claims 31-37, wherein the biomarker signature is prognostic, and the method further comprises:

generating training data including associated survival data for each of a plurality of subjects;

calculating a prognostic metric for each of the plurality of genes from the survival data; and

genes with a prognostic metric above the prognostic threshold are selected using a prognostic filter.

39. The method of claim 38, wherein the prognostic metric is calculated using Cox univariate regression analysis.

40. The method of any one of claims 31 to 37, wherein the biomarker signature predicts a subject's response to a particular treatment regimen, and the method further comprises:

generating training data comprising associated response data for each of a plurality of subjects;

calculating a predictive metric for each of the plurality of genes from the response data; and

genes with a prediction metric above the prediction threshold are selected using a prediction filter.

41. A method of providing a prognosis for a subject with cancer, the method comprising:

contacting a biological sample from a subject with an agent that specifically binds to each member of a set of biomarkers in a signature generated according to the method of any one of claims 31 to 40;

determining a risk score for the subject based on the nucleic acid expression level of the biomarker in the sample; and

providing a prognosis of the cancer based on the risk score of the subject.

42. A method of determining a treatment regimen for a subject, the method comprising:

the method of providing a prognosis of claim 41; and

further comprising the further step of determining a treatment regimen.

43. A composition comprising a set of reagents that specifically bind to each member of a set of biomarkers in a signature generated according to the method of any one of claims 31 to 40.

44. A kit comprising reagents that specifically bind to each member of a set of biomarkers in a signature generated according to the method of any one of claims 31 to 40.

45. Use of a biomarker in a signature produced according to the method of any one of claims 31 to 39 in a method of providing a prognosis for a subject with cancer.

46. Use of a biomarker in a signature produced according to the method of any one of claims 31 to 40 in a method of providing a treatment regimen for a cancer subject.

47. A method of treating a cancer subject comprising the step of predicting a level of risk of mortality in the cancer subject, the method comprising:

contacting a biological sample from a subject with an agent that specifically binds to each member of a set of biomarkers in a signature generated according to the method of any one of claims 31 to 40;

determining a risk score for the subject based on the nucleic acid expression level of the biomarker in the sample;

comparing the risk score to a threshold to predict whether the subject is at high risk of mortality;

selecting a treatment regimen; and

the treatment regimen is implemented.

48. A method of providing a prognosis for a subject with lung cancer, the method comprising:

(a) contacting a biological sample of a subject with an agent that specifically binds to each member of a group of biomarkers comprising at least two biomarkers selected from the group consisting of ANLN, ASPM, CDCA4, erfi 1, FURIN, GOLGA8A, ITGA6, JAG1, LRP12, MAFF, MRPS17, PLK1, PNP, PPP1R13L, PRKCA, PTTG1, PYGB, RPP25, SCPEP1, SLC 3546 A3, SNX7, TPBG, XBP 1;

(b) determining a risk score for the subject based on the nucleic acid expression level of the biomarker in the sample; and

(c) providing a prognosis for lung cancer based on the risk score of the subject.

49. A composition comprising a set of reagents that specifically bind to each member of a set of biomarkers, said biomarkers comprising or consisting of at least two biomarkers selected from: ANLN, ASPM, CDCA4, ERRFI1, FURIN, GOLGA8A, ITGA6, JAG1, LRP12, MAFF, MRPS17, PLK1, PNP, PPP1R13L, PRKCA, PTTG1, PYGB, RPP25, SCPEP1, SLC46A3, SNX7, TPBG, and XBP 1.

50. A kit comprising agents that specifically bind to each member of a group of biomarkers comprising at least two biomarkers selected from the group consisting of ANLN, ASPM, CDCA4, erfi 1, FURIN, GOLGA8A, ITGA6, JAG1, LRP12, MAFF, MRPS17, PLK1, PNP, PPP1R13L, PRKCA, PTTG1, PYGB, RPP25, SCPEP1, SLC46A3, SNX7, TPBG, and XBP 1.

51. The composition of claim 48 or kit of claim 50, wherein the agent is a nucleic acid.

52. Use of the composition of claim 49 or the kits of claims 50 and 51 in a method of providing a prognosis for a subject with lung cancer according to claim 48.

53. Use of at least two biomarkers selected from the group consisting of ANLN, ASPM, CDCA4, ERRFI1, FURIN, GOLGA8A, ITGA6, JAG1, LRP12, MAFF, MRPS17, PLK1, PNP, PPP1R13L, PRKCA, PTTG1, PYGB, RPP25, SCPEP1, SLC46A3, SNX7, TPBG, and XBP1 in a method of providing a prognosis for a lung cancer subject according to claim 48.

54. A method of treating a subject with lung cancer, comprising the step of predicting a level of risk of mortality in a subject with lung cancer, the method comprising:

(a) contacting a biological sample of a subject with an agent that specifically binds to each member of a group of biomarkers comprising at least two biomarkers selected from the group consisting of ANLN, ASPM, CDCA4, erfi 1, FURIN, GOLGA8A, ITGA6, JAG1, LRP12, MAFF, MRPS17, PLK1, PNP, PPP1R13L, PRKCA, PTTG1, PYGB, RPP25, SCPEP1, SLC 3546 A3, SNX7, TPBG, XBP 1;

(b) determining a risk score for the subject based on the nucleic acid expression level of the biomarker in the sample;

(c) comparing the risk score to a threshold to predict whether the subject is at high risk of mortality;

(d) selecting a treatment regimen; and

(e) the treatment regimen is implemented.

Technical Field

The present invention relates to methods of determining the prognosis and/or predicting the response to treatment of cancer patients to guide the selection of a treatment regimen for a cancer patient and/or to predict the survival rate and/or risk of survival and/or clinical outcome of a cancer patient and/or to methods of determining whether a treatment is appropriate for a particular cancer patient and/or to methods of determining the course of treatment for a cancer patient (e.g., a regimen stratification method), particularly those patients having lung cancer such as non-small cell lung cancer.

Background

Lung cancer is the leading cause of cancer death worldwide, with non-small cell lung cancer (NSCLC) accounting for 85-90% of cases diagnosed worldwide. As described in "Lung Cancer Stage Classification" by Detterbeck et al, CHEST 15,193-203,2017, the staging of tumors helps to guide the clinical decision whether to perform adjuvant chemotherapy. However, as described in "Biomarker reduction in the prediction mechanism: lung Cancer as a case study" by Vargas et al at Nat Rev Cancer (2016), TNM staging is an imperfect predictor of survival risk because patients in the same tumor stage may have significantly different clinical outcomes.

It has been suggested that incorporation of molecular biomarkers (e.g., tumor invasiveness correlates based on gene expression) in current diagnostic criteria can classify cancer patients into more precise disease subtypes. Examples of this are "enhancing the cancer cell metabolism of gene-expression patterns" published by Van't Veer et al, Nature452, 564-570 (2008), "Biomarker reduction in the expression patterns" published by Vargas et al, Nature.Rev.cancer 16,525 and 537 (2016); lung cancer as a case study "and Kumar-Sinha et al, in Nat.Biotechnol.36,46-60(2018) published in the Precision on-society in the age of integrated genetics. Accurate identification of patients at high risk for postoperative non-small cell lung cancer (NSCLC) recurrence may be of considerable clinical value, helping to guide decisions such as whether to administer adjuvant chemotherapy after surgical resection or the required intensity of patient follow-up.

Over the past two decades, numerous attempts have been made to characterize prognostic gene expression in patients with lung adenocarcinoma (LUAD), the most common histological subtype of non-small cell lung cancer (NSCLC). Examples of this are described in "Gene-Expression profiles prediction summary of titles with long acquisition in" by Beer et al, Nat Med 8,816-824(2002), "A Robust Expression Signature for Early Stage acquisition in" by Krystanek et al, Biomark Res 4,4(2016), and "differentiation of a recommendation Based Expression Signature in Early Stage acquisition in" by Wistuba et al, Clin Cancer Res (2013). However, these efforts have been hampered by poor reproducibility, or limited Prognostic power independent of existing clinical pathology risk factors, such as "Gene Expression-Based diagnostic Signatures in Lung Cancer, as published by Subramanian et al at JNCL J Natl Cancer Inst 102, 464-474 (2010); ready for Clinical Use? "is as defined in.

Fig. 1a to 1d illustrate some of the problems associated with known features. Fig. 1a shows a lung 10 containing a tumor 12. Multiple regions R1, R2, R3 and R4 may allow lung biopsy. However, with the well-known prognostic biomarkers, biopsies of regions R1, R2 and R3 will yield a high risk classification, while biopsies of region R4 will yield a low risk classification, as shown in the red and blue schematic. Typically, in routine clinical practice, a single biopsy 14 is used to make a diagnosis or to develop a prognostic assessment. Thus, the hypothetical prognostic signature shown in fig. 1a will show inconsistent classification of tumor risk because the biopsy results of region R4 do not match the results of samples taken from different regions. Thus, the readout of the features is susceptible to tumor sampling bias.

Figure 1b illustrates the effect of tumor sampling bias on patient population. A plurality of lung tumors 20, 22, 24, 26, 28, 30 are shown, each having a plurality of sampling regions (e.g., R1 through R5). Applying prognostic biomarkers to biopsy of one of the regions, lung cancer patients 40, 42, 44, 46, 48, 50 are classified into more precise disease subtypes based on estimated risk of survival, which may help guide treatment decisions. It is important to correctly distinguish high risk patients who require adjuvant chemotherapy from low risk patients who can be cured by surgery only.

In each region of a lung tumour 20, 22, the biopsy will correctly conclude that the relevant patient 40, 42 is a low risk classification, and therefore these patients will be classified as suitable for treatment by surgical resection only. Similarly, in each region of a lung tumor 28, 30, a biopsy will correctly result in a high risk classification of the relevant patient 48, 50, and therefore these patients will be classified as requiring treatment by surgical resection and adjuvant chemotherapy. However, the third patient 44 has a lung tumor 24 similar to that shown in FIG. 1 a. As shown in the figure, biopsy of region R4 results in classification of the patient as low risk, which is inconsistent with the classification resulting from biopsy of other regions in lung tumor 24. This is very important because, according to the diagnosis, the patient is not likely to receive adjuvant therapy and thus not receive sufficient therapy. Thus, the patient has suboptimal treatment and follow-up. Similarly, the fourth patient 46 has a lung tumor 26 and will yield different results depending on the biopsy sampling location. In the illustration, biopsies yield a high risk classification, which may lead to patients receiving unnecessary treatment and thus suffering from the side effects of chemotherapy.

FIG. 1c shows the results of an analysis of known characteristics of LUAD described by Shukla et al in "Development of aRNA-seq Based qualitative Signature in Lung Adenoccarnoma" published by JNCL J Natl Cancer Inst 109 (2017). The features are analyzed using information provided by the largest world multizone sequencing study TRACERx lung test, enabling detailed exploration of tumor evolution. This study, such as "Tracking genetic cancer evolution for precision media, published in Jamal-Hanjani et al, PLos Biol 12 (2014); the lung TRACERx stuck "is described. In figure 1c, 89 tumor regions from 28 patients in the TRACERx study were analyzed, as shown, each patient is ranked by the predicted survival "risk score" with the lowest risk patient on the left side of the graph. To calculate the "risk score" in this example, regression coefficients were re-derived from the supplementary data provided in the original publication by fitting an intercept-free linear model by regression to the calculated risk scores of the four gene expression values in the feature. Each point on figure 1c represents a single tumor region and the vertical lines represent the range of risk scores for each patient. Regardless of the location of the biopsy, 11 patients were classified as low risk and 5 patients were classified as high risk. However, there were 12 patients with inconsistent classifications and their risk scores were dependent on the location of the biopsy.

FIG. 1d presents the data in FIG. 1c as a bar graph showing the percentage of low risk, high risk and inconsistent patients. FIG. 1e is a similar bar graph based on the different characteristics of Immune-related gene pairs, as described in "Development and differentiation of an induced Immune developmental Signature in Early-Stage Non-transgenic Non-Small Cell Lung Cancer" published by Li et al in JAMA Oncol (2017). In both cases, there is a large proportion of non-uniform patients-43% or 29% -different regions from the same tumor may be classified as having different molecular risk characteristics. The high proportion of patients susceptible to tumor sampling bias may limit the clinical utility of such prognostic assays.

To date, most prognostic signatures based on gene expression in LUAD have been determined by microarray expression profiling rather than RNA sequencing. Figure 1f shows the consistent results of the 9 published LUAD prognostic signatures detailed in the following table. The number of patients n in each paper is indicated. Each Prognostic signature is hierarchically clustered (hierarchically clustering) using the Ward method of Manhattan metrology as described in "Intra medicine contexts expressions Gene Expression Profile knowledge in Early diagnosis in client Cancer" published by Gyanchandanni et al in client Cancer Res 22,5362-5369 (2016). For a given number of clusters (cluster), cluster identity is quantified as the percentage of patients for which all tumor regions are the same cluster. Results are plotted as the percentage of patients with tumor regions belonging to the same cluster against the cluster number. Vertical dashed lines mark the range of clusters (2, 3, 14 and 28):

the median cluster inconsistency rate was 50% in 28 clusters (15.5/28 LUAD tumors), indicating that half of the tumor regions were at risk of being misclassified due to sampling bias. This range is between 18-82%, indicating that some features are significantly better than others. In summary, fig. 1a to 1f show that sampling bias can confound the use of molecular biomarkers in several cancer types. Intratumoral heterogeneity (ITH) and Chromosomal Instability (CIN) are common features of NSCLC and other types of cancer, as described in "Tracking of the evolution of non-small cell lung cancer" published by Jamal-Hanjani et al in N Engl J Med 376,2109-2121 (2017). Furthermore, as described in "The patents and sequences of Genetic Heterogeneity in Cancer Evolution" published by Burrell et al in Nature 501,338-345(2013), Genetic intratumoral Heterogeneity (ITH) is prevalent in various types of Cancer.

For background information on previous prognostic signatures for lung cancer see international patent publication WO201/063121 (describing the use of 16-gene prognostic signatures to classify non-small cell lung cancer (NSCLC) patients into individual risk groups); US patent publication US2010/184063 (describing the use of prognostic and predictive features of the 15-gene to classify NSCLC patients into various risk groups); and international patent publication WO2015/138769 (describing the use of 9-gene prognostic signatures to classify NSCLC patients into various risk groups).

Applicants have recognized a need for improved genetic profiles to help clinicians improve prognostic accuracy to help guide treatment decisions, such as choosing to only or following surgical resection to deliver chemotherapy or other adjuvant therapy.

Disclosure of Invention

According to the present invention, there are provided apparatus and methods as described in the accompanying claims. Further features of the invention will be apparent from the dependent claims and the following description.

We describe a method of providing a prognosis for a subject with lung cancer, the method comprising: (a) contacting a biological sample from a subject with an agent that specifically binds to each member of a set of biomarkers comprising ANLN, ASPM, CDCA4, ERRFI1, FURIN, GOLGA8A, ITGA6, JAG1, LRP12, MAFF, MRPS17, PLK1, PNP, PPP1R13L, PRKCA, PTTG1, PYGB, RPP25, SCPEP1, SLC46A3, SNX7, TPBG, XBP 1; (b) determining a risk score for the subject based on the nucleic acid expression level of the biomarker in the sample; (c) providing a prognosis for lung cancer based on the risk score of the subject.

Determining a risk score for a subject may comprise: for each biomarker, determining a score indicative of the level of nucleic acid expression in the tissue sample; calculating a risk score based on the determined scores, wherein the risk score is calculated by adding the scores of the weighted biomarkers, wherein the scores of the biomarkers are based on the determined scores and the score of each biomarker has an associated weight; and comparing the risk score to a threshold. As such, each subject may be assigned, for example, to a high risk group (e.g., a risk score above a threshold) or a low risk group (e.g., a risk score at or below a threshold). For example, when considering all types of lung cancer, the survival rate may be lower in the high risk group and higher in the low risk group. Alternatively, when early cancer is considered, a high risk group may relapse more easily than a low risk group. The associated weights for each biomarker score in GOLGA8A, SCPEP1, SLC46A3 and XBP1 may be negative, indicating that they are favorable genes. The relevant weights for biomarker scores in ANLN, ASPM, CDCA4, ERRFI1, FURIN, ITGA6, JAG1, LRP12, MAFF, MRPS17, PLK1, PNP, PPP1R13L, PRKCA, PTTG1, PYGB, RPP25, SNX7, and TPBG may be positive values.

The weighted sum of the risk scores may be determined by:

risk score b1x1i+b2x2i+…+bnxni

Wherein x1i、x2i、……、xniIs the biomarker score of four selected biomarkers per subject i, b1、b2、……、bnIs a set of associated weights for each biomarker score.

The method can further include determining weights for the weighted sum using a Cox proportional hazards model trained using training data including information about a plurality of biomarkers for a group of subjects. The method can include identifying a plurality of biomarkers for use in a Cox proportional hazards model, wherein the plurality of biomarkers is selected from the group consisting of ANLN, ASPM, CDCA4, erfi 1, FURIN, GOLGA8A, ITGA6, JAG1, LRP12, MAFF, MRPS17, PLK1, PNP, PPP1R13L, PRKCA, PTTG1, PYGB, RPP25, SCPEP1, SLC46A3, SNX7, TPBG, XBP 1.

The threshold may be a median risk score of the training data.

Determining a score indicative of a biomarker level may include determining a scaled (scaled) intensity score. The biomarker score may be based on a scaled intensity score that has been adjusted by subtracting an adjustment factor. Determining a score indicative of a biomarker level may include assigning a first value when the level is above a threshold and assigning a second value when the level is below the threshold. Determining a score indicative of a biomarker level may include assigning a first value when the level is above an upper threshold, assigning a second value when the level is below the upper threshold but above a lower threshold, and assigning a third value when the level is below the lower threshold.

The agent may be a nucleic acid.

The terms "nucleic acid," "nucleic acid sequence," "nucleotide," "nucleic acid molecule," or "polynucleotide" as used herein are intended to include DNA molecules (e.g., cDNA or genomic DNA), RNA molecules (e.g., mRNA, miRNA, lncRNA), naturally occurring, mutant, synthetic DNA or RNA molecules, and analogs of the DNA or RNA generated using nucleotide analogs. The nucleic acid may be single-stranded or double-stranded. Such nucleic acids or polynucleotides include, but are not limited to, coding sequences of structural genes, antisense sequences, and non-coding regulatory sequences that do not encode mRNA or protein products. These terms also include genes. The terms "gene", "allele (allele)" or "gene sequence" are used broadly to refer to a DNA nucleic acid associated with a biological function. Thus, a gene may include introns and exons as in genomic sequences, or may include only coding sequences as in cDNA, and/or may include cDNA in combination with regulatory sequences. Thus, according to various aspects of the invention, genomic DNA, cDNA or coding DNA may be used. In one embodiment, the nucleic acid is a cDNA or coding DNA. Thus, a gene may include introns and exons as in genomic sequences, or may include only coding sequences as in cDNA, and/or may include cDNA in combination with regulatory sequences.

Nucleic acid analysis may be performed using suitable techniques, such as techniques for measuring gene expression, including but not limited to digital PCR, qPCR, microarray, RNA-Seq orAnd (6) analyzing. In certain embodiments described herein, gene expression is determined by quantifying RNA, including RNA-Seq orAnd (6) analyzing. It is understood that gene expression can be determined using more than one technique.

RNA sequencing (RNA-Seq) is a transcriptome analysis technique that utilizes a next generation sequencing platform based on Next Generation Sequencing (NGS). RNA-Seq transcripts are reverse transcribed into cDNA, and linkers are attached to each end of the cDNA. Sequencing can be performed in one-way (single-ended sequencing) or two-way (double-ended sequencing), and then aligned or assembled with a reference genomic database to obtain a renewed transcript (de novo transcript), demonstrating a whole genome expression profile. RNA-seq can qualitatively and quantitatively investigate any type of RNA, including messenger RNA (mRNA), microRNAs, small interfering RNAs, and long non-coding RNAs.

RNA can be analyzed using a NanoString nCounter gene expression assay. NanoString is a relatively new molecular profiling technique that can obtain accurate genomic information from a small number of fixed patient tissues. The NanoString platform allows quantification of mRNA expression using numeric, color-coded barcodes or codesets labeled to sequence-specific probes (Geiss et al Nat Biotechnol. 2008Mar; 26 (3); 317-25, Das et al NanoString expression profiling candidates of RAD001 response in quantitative nucleic acid cancers, ESMO Open2016, 1-9). The NanoString system hybridises two probes to each target transcript: a biotin-labeled capture probe and a fluorescent barcode-labeled reporter probe. The reporter probes hybridize to specific RNAs in the sample and the capture probes lock them to the static surface via avidin. The NanoString nCounter assay system uses its barcodes to count immobilized RNAs.

The lung cancer may be non-small cell lung cancer (NSCLC). The NSCLC may be selected from invasive adenocarcinoma (LUAD), squamous cell carcinoma (LUSC), large cell carcinoma, adenosquamous carcinoma, carcinosarcoma, large cell neuroendocrine carcinoma, undifferentiated non-small cell lung carcinoma or bronchioloalveolar carcinoma. LUAD and LUSC account for the majority of NSCLC cases, and other types are often grouped together. NSCLC can be stage I, stage II, stage III, or stage IV.

The sample may be from a surgically excised tumor. The sample may be from a lung tissue or lung tumor biopsy.

Prognosis may provide a risk assessment.

The method may further comprise determining a treatment regimen. Thus, we also describe a method for determining a treatment regimen for a subject, the method comprising the above method and further comprising the further step of determining a treatment regimen. The treatment regimen may be selected from surgical treatment, chemotherapy, surgery, radiation therapy, immunotherapy, or CAR-T therapy. Such treatment regimens are well known in the art. It will be appreciated that there are various types of immunotherapy, such as immune checkpoint inhibitors, oncolytic virus therapy, T cell therapy and cancer vaccines. Appropriate therapy may be selected.

We also describe a composition comprising a set of reagents that specifically bind to each member of a set of biomarkers comprising or consisting of: ANLN, ASPM, CDCA4, ERRFI1, FURIN, GOLGA8A, ITGA6, JAG1, LRP12, MAFF, MRPS17, PLK1, PNP, PPP1R13L, PRKCA, PTTG1, PYGB, RPP25, SCPEP1, SLC46A3, SNX7, TPBG, and XBP 1.

We also describe kits comprising an agent that specifically binds to each member of a group of biomarkers comprising ANLN, ASPM, CDCA4, erfi 1, FURIN, GOLGA8A, ITGA6, JAG1, LRP12, MAFF, MRPS17, PLK1, PNP, PPP1R13L, PRKCA, PTTG1, PYGB, RPP25, SCPEP1, SLC46A3, SNX7, TPBG and XBP 1.

The reagent in the above composition or kit may be a nucleic acid. We also describe the use of the composition or kit in a method of providing a prognosis for a subject with lung cancer as described above. We also describe the use of the composition or kit in a method of providing a treatment to a subject with lung cancer as described above.

We also describe the use of ANLN, ASPM, CDCA4, erfi 1, FURIN, GOLGA8A, ITGA6, JAG1, LRP12, MAFF, MRPS17, PLK1, PNP, PPP1R13L, PRKCA, PTTG1, PYGB, RPP25, SCPEP1, SLC46A3, SNX7, TPBG and XBP1 in a method of providing a prognosis for a subject with lung cancer as described above. We also describe the use of ANLN, ASPM, CDCA4, erfi 1, FURIN, GOLGA8A, ITGA6, JAG1, LRP12, MAFF, MRPS17, PLK1, PNP, PPP1R13L, PRKCA, PTTG1, PYGB, RPP25, SCPEP1, SLC46A3, SNX7, TPBG and XBP1 in a method of providing treatment to a subject with lung cancer as described above.

We also describe a method of treating a lung cancer subject, comprising the step of predicting the level of risk of mortality in a lung cancer subject, the method comprising (a) contacting a biological sample from the subject with an agent that specifically binds to each member of a group of biomarkers comprising ANLN, ASPM, CDCA4, erfi 1, FURIN, GOLGA8A, ITGA6, JAG1, LRP12, MAFF, MRPS17, PLK1, PNP, PPP1R13L, PRKCA, PTTG1, PYGB, RPP25, SCPEP1, SLC46A3, SNX7, TPBG, XBP 1; (b) determining a risk score for the subject based on the nucleic acid expression level of the biomarker in the sample; (c) comparing the risk score to a threshold to predict whether the subject is at high risk of mortality; (d) selecting a treatment regimen; (e) the treatment regimen is implemented.

We also describe a method of generating a biomarker signature for a cancer subject, the method comprising: generating training data from a plurality of cancer subjects, the training data comprising gene expression data for a plurality of genes for each of a plurality of subjects; calculating an intratumoral heterogeneity metric and an intratumoral heterogeneity metric for each gene of the plurality of genes from the gene expression data; and selecting genes for which intratumoral heterogeneity is below an intratumoral heterogeneity threshold and intratumoral heterogeneity is above an intratumoral heterogeneity threshold using a heterogeneity filter; wherein the biomarker signature comprises at least some of the selected genes. The methods are applicable to a variety of different cancers, particularly cancers associated with ITH.

The method may further comprise: calculating a consensus score for each gene; and applying a consensus filter to select genes for which the consensus score is below the consensus threshold. The identity filter can be considered as a heterogeneous filter that removes confounding genes. The identity score for the selected genes can be calculated after applying the heterogeneity filter. Alternatively, a coherence filter may be applied prior to calculating the intra-and inter-tumor heterogeneity metrics.

The intra-tumor heterogeneity metric for each gene can be calculated by: obtaining gene expression values for each gene at a plurality of locations within the same tumor, calculating a metric for each tumor indicative of the resulting gene expression values for each gene, and obtaining an intratumoral heterogeneity metric that is an average of the indicative metrics for each gene in each tumor. The measure indicative of the gene expression value may be selected from the group consisting of standard deviation, median absolute deviation, and coefficient of variation.

The measure of intratumoral heterogeneity can be calculated by: obtaining a gene expression value for each gene in each subject in one of a plurality of regions of the tumor; and a standard deviation is taken for the obtained value. The method may further comprise a plurality of iterative obtaining and evaluating steps, and averaging the standard deviation of the iterations to obtain an inter-tumor heterogeneity metric. It should be understood that other measures than standard deviation may be used, such as coefficient of variation and median absolute deviation.

The biomarker signature may be prognostic. The method may further comprise: generating training data including associated survival data for each of a plurality of subjects; calculating a prognostic metric for each of the plurality of genes from the survival data; and applying a prognostic filter to select genes having a prognostic metric above a prognostic threshold. Prognostic metrics can be calculated using Cox univariate regression analysis.

The biomarker signature can predict a subject's response to a particular treatment (e.g., immunotherapy). The method may further comprise: generating training data including relevant response data (e.g., results of a particular treatment) for each of a plurality of subjects; calculating a predictive metric for each of the plurality of genes from the response data; and applying a prediction filter to select genes with a prediction metric above a prediction threshold. The predictive metric may be calculated using regression analysis to correlate gene expression with a therapeutic response or a proxy metric for a therapeutic response. The method can be used to establish predictive features of treatment response, help classify patients, and obtain the most appropriate treatment regimen. Thus, the biomarker signature generated as described above makes it possible to distinguish between various cancer subtypes and determine a treatment strategy according to the cancer subtype. It is to be understood that the methods of providing a prognosis, methods of determining a treatment regimen for a subject, compositions, kits, methods of treatment and uses described above can be applied to any of the features produced as described above.

We also describe a method of providing a prognosis for a subject with cancer, the method comprising: contacting a biological sample from a subject with an agent that specifically binds to each member of a set of biomarkers in a signature generated as described above; determining a risk score for the subject based on the nucleic acid expression level of the biomarker in the sample; and providing a prognosis of the cancer based on the risk score of the subject. We also describe a method for determining a treatment regimen for a subject, the method comprising a method of providing a prognosis and further comprising the step of determining a treatment regimen. We also describe a composition comprising a set of reagents that specifically bind to each member of a set of biomarkers in a signature generated as described above. We also describe a kit comprising reagents that specifically bind to each member of a set of biomarkers in the signature generated as described above.

We also describe the use of biomarkers in the signatures generated as described above in a method of providing a prognosis for a subject with cancer. We also describe the use of biomarkers in the signatures generated as described above in a method of providing a treatment regimen for a subject with cancer. We also describe a method of treating a cancer subject comprising the step of predicting the level of risk of mortality in the cancer subject, the method comprising contacting a biological sample from the subject with an agent that specifically binds to each member of a panel of biomarkers in a signature generated as described above; determining a risk score for the subject based on the nucleic acid expression level of the biomarker in the sample; comparing the risk score to a threshold to predict whether the subject is at high risk of mortality; selecting a treatment regimen; the treatment regimen is implemented.

It is also possible to provide a computer apparatus comprising at least one processor; and instructions which, when executed by at least one processor, cause the computer apparatus to perform any of the determining, calculating and comparing steps of the above method. It is also possible to provide a tangible, non-transitory computer-readable storage medium having instructions recorded thereon which, when executed by a computer apparatus, cause the computer apparatus to arrange in the above-described manner and/or cause the computer apparatus to perform any of the steps associated with the method as described above. It is also possible to provide a kit comprising a computer means and a microarray of tissue samples and/or one or more reagents for determining the presence of a biomarker.

So far we have described the use of a set of biomarkers comprising or consisting of 23 specific biomarkers. We now describe embodiments using a panel of biomarkers comprising two or more biomarkers selected from 23 specific biomarkers.

We also describe a method of providing a prognosis for a subject with lung cancer, the method comprising: (a) contacting a biological sample from the subject with an agent that specifically binds to each member of a set of biomarkers comprising at least two biomarkers selected from the group consisting of ANLN, ASPM, CDCA4, erfi 1, FURIN, GOLGA8A, ITGA6, JAG1, LRP12, MAFF, MRPS17, PLK1, PNP, PPP1R13L, PRKCA, PTTG1, PYGB, RPP25, SCPEP1, SLC46A3, SNX7, TPBG, XBP 1; (b) determining a risk score for the subject based on the nucleic acid expression level of the biomarker in the sample; and (c) providing a prognosis for lung cancer based on the risk score of the subject.

Determining a risk score for a subject may comprise: for each selected biomarker, determining a score indicative of the level of nucleic acid expression in the tissue sample; calculating a risk score based on the determined scores, wherein the risk score is calculated by adding the scores of the weighted biomarkers, wherein the scores of the biomarkers are based on the determined scores and the score of each biomarker has an associated weight; and comparing the risk score to a threshold. As such, each subject may be assigned, for example, to a high risk group (e.g., a risk score above a threshold) or a low risk group (e.g., a risk score at or below a threshold). For example, when considering all types of lung cancer, the survival rate may be lower in the high risk group and higher in the low risk group. Alternatively, when early cancer is considered, a high risk group may relapse more easily than a low risk group. The associated weights for each biomarker score in GOLGA8A, SCPEP1, SLC46A3 and XBP1 may be negative, indicating that they are favorable genes. The relevant weights for biomarker scores in ANLN, ASPM, CDCA4, ERRFI1, FURIN, ITGA6, JAG1, LRP12, MAFF, MRPS17, PLK1, PNP, PPP1R13L, PRKCA, PTTG1, PYGB, RPP25, SNX7, and TPBG may be positive values.

The weighted sum of the risk scores may be determined by:

risk score b1x1i+b2x2i+…+bnxni

Wherein x1i、x2i、……、xniIs the biomarker score of four selected biomarkers per subject i, b1、b2、……、bnIs a set of associated weights for each biomarker score.

The method can further include determining weights for the weighted sum using a Cox proportional hazards model that is trained using training data that includes information about a plurality of biomarkers for a group of subjects. The method can include identifying a plurality of biomarkers for use in a Cox proportional hazards model, wherein the plurality of biomarkers is selected from the group consisting of ANLN, ASPM, CDCA4, erfi 1, FURIN, GOLGA8A, ITGA6, JAG1, LRP12, MAFF, MRPS17, PLK1, PNP, PPP1R13L, PRKCA, PTTG1, PYGB, RPP25, SCPEP1, SLC46A3, SNX7, TPBG, XBP 1.

The threshold may be a median risk score of the training data.

Determining a score indicative of a biomarker level may include determining a zoom intensity score. The biomarker score may be based on a scaled intensity score that has been adjusted by subtracting an adjustment factor. Determining a score indicative of a biomarker level may include assigning a first value when the level is above a threshold and assigning a second value when the level is below the threshold. Determining a score indicative of a biomarker level may include assigning a first value when the level is above an upper threshold, assigning a second value when the level is below the upper threshold but above a lower threshold, and assigning a third value when the level is below the lower threshold.

The agent may be a nucleic acid.

The lung cancer may be non-small cell lung cancer (NSCLC). The NSCLC may be selected from invasive adenocarcinoma (LUAD), squamous cell carcinoma (LUSC), large cell carcinoma, adenosquamous carcinoma, carcinosarcoma, large cell neuroendocrine carcinoma, undifferentiated non-small cell lung carcinoma or bronchioloalveolar carcinoma. LUAD and LUSC account for the majority of NSCLC cases, and other types are often grouped together. NSCLC can be stage I, stage II, stage III, or stage IV.

The sample may be from a surgically excised tumor. The sample may be from a lung tissue or lung tumor biopsy.

Prognosis may provide a risk assessment.

The method may further comprise determining a treatment regimen. Thus, we also describe a method for determining a treatment regimen for a subject, the method comprising the above method and further comprising the further step of determining a treatment regimen. The treatment regimen may be selected from surgical treatment, chemotherapy, surgery, radiation therapy, immunotherapy, or CAR-T therapy. Such treatment regimens are well known in the art. It will be appreciated that there are various types of immunotherapy, such as immune checkpoint inhibitors, oncolytic virus therapy, T cell therapy and cancer vaccines. An appropriate treatment may be selected.

We also describe a composition comprising a set of reagents that specifically bind to each member of a set of biomarkers comprising or consisting of at least two biomarkers selected from the group consisting of: ANLN, ASPM, CDCA4, ERRFI1, FURIN, GOLGA8A, ITGA6, JAG1, LRP12, MAFF, MRPS17, PLK1, PNP, PPP1R13L, PRKCA, PTTG1, PYGB, RPP25, SCPEP1, SLC46A3, SNX7, TPBG, and XBP 1.

We also describe a kit comprising reagents that specifically bind to each member of a set of biomarkers comprising at least two biomarkers selected from the group consisting of: ANLN, ASPM, CDCA4, ERRFI1, FURIN, GOLGA8A, ITGA6, JAG1, LRP12, MAFF, MRPS17, PLK1, PNP, PPP1R13L, PRKCA, PTTG1, PYGB, RPP25, SCPEP1, SLC46A3, SNX7, TPBG, and XBP 1. The reagent in the above composition or kit may be a nucleic acid. We also describe the use of the composition or kit in a method of providing a prognosis for a subject with lung cancer as described above. We also describe the use of the composition or kit in a method of providing a treatment regimen for a subject with lung cancer as described above.

We also describe the use of at least two biomarkers selected from ANLN, ASPM, CDCA4, erfi 1, FURIN, GOLGA8A, ITGA6, JAG1, LRP12, MAFF, MRPS17, PLK1, PNP, PPP1R13L, PRKCA, PTTG1, PYGB, RPP25, SCPEP1, SLC46A3, SNX7, TPBG and XBP1 in a method of providing a prognosis for a lung cancer subject. We also describe the use of at least two biomarkers selected from ANLN, ASPM, CDCA4, erfi 1, FURIN, GOLGA8A, ITGA6, JAG1, LRP12, MAFF, MRPS17, PLK1, PNP, PPP1R13L, PRKCA, PTTG1, PYGB, RPP25, SCPEP1, SLC46A3, SNX7, TPBG and XBP1 in a method of providing a treatment regimen for a lung cancer subject as described above.

We also describe a method of treating a subject with lung cancer, comprising the step of predicting the level of risk of mortality in a subject with lung cancer, the method comprising: (a) contacting a biological sample from the subject with an agent that specifically binds to each member of a set of biomarkers comprising at least two biomarkers selected from the group consisting of ANLN, ASPM, CDCA4, erfi 1, FURIN, GOLGA8A, ITGA6, JAG1, LRP12, MAFF, MRPS17, PLK1, PNP, PPP1R13L, PRKCA, PTTG1, PYGB, RPP25, SCPEP1, SLC46A3, SNX7, TPBG, XBP 1; (b) determining a risk score for the subject based on the nucleic acid expression level of the biomarker in the sample; (c) comparing the risk score to a threshold to predict whether the subject is at high risk of mortality; (d) selecting a treatment regimen; (e) the treatment regimen is implemented.

In each embodiment of the invention, the set of biomarkers comprises a selection of biomarkers, which as will be understood by the skilled person, may comprise or consist of at least 3 biomarkers selected from the group of biomarkers consisting of: ANLN, ASPM, CDCA4, ERRFI1, FURIN, GOLGA8A, ITGA6, JAG1, LRP12, MAFF, MRPS17, PLK1, PNP, PPP1R13L, PRKCA, PTTG1, PYGB, RPP25, SCPEP1, SLC46A3, SNX7, TPBG, XBP 1. It is also to be understood that the set of biomarkers may comprise or consist of at least 4 biomarkers selected from the group consisting of: ANLN, ASPM, CDCA4, ERRFI1, FURIN, GOLGA8A, ITGA6, JAG1, LRP12, MAFF, MRPS17, PLK1, PNP, PPP1R13L, PRKCA, PTTG1, PYGB, RPP25, SCPEP1, SLC46A3, SNX7, TPBG, XBP 1. It is also to be understood that the set of biomarkers may comprise or consist of at least 5 biomarkers selected from the group consisting of: ANLN, ASPM, CDCA4, ERRFI1, FURIN, GOLGA8A, ITGA6, JAG1, LRP12, MAFF, MRPS17, PLK1, PNP, PPP1R13L, PRKCA, PTTG1, PYGB, RPP25, SCPEP1, SLC46A3, SNX7, TPBG, XBP 1. It is also to be understood that the set of biomarkers may comprise or consist of at least 6 biomarkers selected from the following biomarkers: ANLN, ASPM, CDCA4, ERRFI1, FURIN, GOLGA8A, ITGA6, JAG1, LRP12, MAFF, MRPS17, PLK1, PNP, PPP1R13L, PRKCA, PTTG1, PYGB, RPP25, SCPEP1, SLC46A3, SNX7, TPBG, XBP 1. It is also to be understood that the set of biomarkers may comprise or consist of at least 7 biomarkers selected from the group consisting of: ANLN, ASPM, CDCA4, ERRFI1, FURIN, GOLGA8A, ITGA6, JAG1, LRP12, MAFF, MRPS17, PLK1, PNP, PPP1R13L, PRKCA, PTTG1, PYGB, RPP25, SCPEP1, SLC46A3, SNX7, TPBG, XBP 1. It is also to be understood that the set of biomarkers may comprise or consist of at least 8 biomarkers selected from the group consisting of: ANLN, ASPM, CDCA4, ERRFI1, FURIN, GOLGA8A, ITGA6, JAG1, LRP12, MAFF, MRPS17, PLK1, PNP, PPP1R13L, PRKCA, PTTG1, PYGB, RPP25, SCPEP1, SLC46A3, SNX7, TPBG, XBP 1. It is also to be understood that the set of biomarkers may comprise or consist of at least 9 biomarkers selected from the group consisting of: ANLN, ASPM, CDCA4, ERRFI1, FURIN, GOLGA8A, ITGA6, JAG1, LRP12, MAFF, MRPS17, PLK1, PNP, PPP1R13L, PRKCA, PTTG1, PYGB, RPP25, SCPEP1, SLC46A3, SNX7, TPBG, XBP 1. It is also to be understood that the set of biomarkers may comprise or consist of at least 10 biomarkers selected from the group of biomarkers consisting of: ANLN, ASPM, CDCA4, ERRFI1, FURIN, GOLGA8A, ITGA6, JAG1, LRP12, MAFF, MRPS17, PLK1, PNP, PPP1R13L, PRKCA, PTTG1, PYGB, RPP25, SCPEP1, SLC46A3, SNX7, TPBG, XBP 1. It is also to be understood that the set of biomarkers may comprise or consist of at least 11 biomarkers selected from the group of biomarkers consisting of: ANLN, ASPM, CDCA4, ERRFI1, FURIN, GOLGA8A, ITGA6, JAG1, LRP12, MAFF, MRPS17, PLK1, PNP, PPP1R13L, PRKCA, PTTG1, PYGB, RPP25, SCPEP1, SLC46A3, SNX7, TPBG, XBP 1. It is also to be understood that the set of biomarkers may comprise or consist of at least 12 biomarkers selected from the group consisting of: ANLN, ASPM, CDCA4, ERRFI1, FURIN, GOLGA8A, ITGA6, JAG1, LRP12, MAFF, MRPS17, PLK1, PNP, PPP1R13L, PRKCA, PTTG1, PYGB, RPP25, SCPEP1, SLC46A3, SNX7, TPBG, XBP 1. It is also to be understood that the set of biomarkers may comprise or consist of at least 13 biomarkers selected from the group consisting of: ANLN, ASPM, CDCA4, ERRFI1, FURIN, GOLGA8A, ITGA6, JAG1, LRP12, MAFF, MRPS17, PLK1, PNP, PPP1R13L, PRKCA, PTTG1, PYGB, RPP25, SCPEP1, SLC46A3, SNX7, TPBG, XBP 1. It is also to be understood that the set of biomarkers may comprise or consist of at least 14 biomarkers selected from the following biomarkers: ANLN, ASPM, CDCA4, ERRFI1, FURIN, GOLGA8A, ITGA6, JAG1, LRP12, MAFF, MRPS17, PLK1, PNP, PPP1R13L, PRKCA, PTTG1, PYGB, RPP25, SCPEP1, SLC46A3, SNX7, TPBG, XBP 1. It is also to be understood that the set of biomarkers may comprise or consist of at least 15 biomarkers selected from the following biomarkers: ANLN, ASPM, CDCA4, ERRFI1, FURIN, GOLGA8A, ITGA6, JAG1, LRP12, MAFF, MRPS17, PLK1, PNP, PPP1R13L, PRKCA, PTTG1, PYGB, RPP25, SCPEP1, SLC46A3, SNX7, TPBG, XBP 1. It is also to be understood that the set of biomarkers may comprise or consist of at least 16 biomarkers selected from the following biomarkers: ANLN, ASPM, CDCA4, ERRFI1, FURIN, GOLGA8A, ITGA6, JAG1, LRP12, MAFF, MRPS17, PLK1, PNP, PPP1R13L, PRKCA, PTTG1, PYGB, RPP25, SCPEP1, SLC46A3, SNX7, TPBG, XBP 1. It is also to be understood that the set of biomarkers may comprise or consist of at least 17 biomarkers selected from the group consisting of: ANLN, ASPM, CDCA4, ERRFI1, FURIN, GOLGA8A, ITGA6, JAG1, LRP12, MAFF, MRPS17, PLK1, PNP, PPP1R13L, PRKCA, PTTG1, PYGB, RPP25, SCPEP1, SLC46A3, SNX7, TPBG, XBP 1. It is also to be understood that the set of biomarkers may comprise or consist of at least 18 biomarkers selected from the following biomarkers: ANLN, ASPM, CDCA4, ERRFI1, FURIN, GOLGA8A, ITGA6, JAG1, LRP12, MAFF, MRPS17, PLK1, PNP, PPP1R13L, PRKCA, PTTG1, PYGB, RPP25, SCPEP1, SLC46A3, SNX7, TPBG, XBP 1. It is also to be understood that the set of biomarkers may comprise or consist of at least 19 biomarkers selected from the following biomarkers: ANLN, ASPM, CDCA4, ERRFI1, FURIN, GOLGA8A, ITGA6, JAG1, LRP12, MAFF, MRPS17, PLK1, PNP, PPP1R13L, PRKCA, PTTG1, PYGB, RPP25, SCPEP1, SLC46A3, SNX7, TPBG, XBP 1. It is also to be understood that the set of biomarkers may comprise or consist of at least 20 biomarkers selected from the group consisting of: ANLN, ASPM, CDCA4, ERRFI1, FURIN, GOLGA8A, ITGA6, JAG1, LRP12, MAFF, MRPS17, PLK1, PNP, PPP1R13L, PRKCA, PTTG1, PYGB, RPP25, SCPEP1, SLC46A3, SNX7, TPBG, XBP 1. It is also to be understood that the set of biomarkers may comprise or consist of at least 21 biomarkers selected from the group consisting of: ANLN, ASPM, CDCA4, ERRFI1, FURIN, GOLGA8A, ITGA6, JAG1, LRP12, MAFF, MRPS17, PLK1, PNP, PPP1R13L, PRKCA, PTTG1, PYGB, RPP25, SCPEP1, SLC46A3, SNX7, TPBG, XBP 1. It is also to be understood that the set of biomarkers may comprise or consist of at least 22 biomarkers selected from the group consisting of: ANLN, ASPM, CDCA4, ERRFI1, FURIN, GOLGA8A, ITGA6, JAG1, LRP12, MAFF, MRPS17, PLK1, PNP, PPP1R13L, PRKCA, PTTG1, PYGB, RPP25, SCPEP1, SLC46A3, SNX7, TPBG, XBP 1.

In each embodiment of the invention, the set of biomarkers comprises a selection of biomarkers, which as will be understood by the skilled person may comprise or consist of ANLN and at least one of the following biomarkers: ASPM, CDCA4, ERRFI1, FURIN, GOLGA8A, ITGA6, JAG1, LRP12, MAFF, MRPS17, PLK1, PNP, PPP1R13L, PRKCA, PTTG1, PYGB, RPP25, SCPEP1, SLC46A3, SNX7, TPBG, XBP 1. It is also to be understood that the set of biomarkers may comprise or consist of ASPM and at least one of the following biomarkers: ANLN, CDCA4, ERRFI1, FURIN, GOLGA8A, ITGA6, JAG1, LRP12, MAFF, MRPS17, PLK1, PNP, PPP1R13L, PRKCA, PTTG1, PYGB, RPP25, SCPEP1, SLC46A3, SNX7, TPBG, XBP 1. It is also to be understood that the set of biomarkers may include or consist of CDCA4 and at least one of the following biomarkers: ASPM, ANLN, ERRFI1, FURIN, GOLGA8A, ITGA6, JAG1, LRP12, MAFF, MRPS17, PLK1, PNP, PPP1R13L, PRKCA, PTTG1, PYGB, RPP25, SCPEP1, SLC46A3, SNX7, TPBG, XBP 1. It should also be understood that the set of biomarkers may include or consist of erri 1 and at least one of the following biomarkers: ASPM, ANLN, CDCA4, FURIN, GOLGA8A, ITGA6, JAG1, LRP12, MAFF, MRPS17, PLK1, PNP, PPP1R13L, PRKCA, PTTG1, PYGB, RPP25, SCPEP1, SLC46A3, SNX7, TPBG, XBP 1. It is also to be understood that the set of biomarkers may include or consist of FURIN and at least one of the following biomarkers: ASPM, ANLN, CDCA4, ERRFI1, GOLGA8A, ITGA6, JAG1, LRP12, MAFF, MRPS17, PLK1, PNP, PPP1R13L, PRKCA, PTTG1, PYGB, RPP25, SCPEP1, SLC46A3, SNX7, TPBG, XBP 1. It should also be understood that the set of biomarkers may comprise or consist of GOLGA8A and at least one of the following biomarkers: ASPM, ANLN, CDCA4, ERRFI1, FURIN, ITGA6, JAG1, LRP12, MAFF, MRPS17, PLK1, PNP, PPP1R13L, PRKCA, PTTG1, PYGB, RPP25, SCPEP1, SLC46A3, SNX7, TPBG, XBP 1. It is also to be understood that the set of biomarkers may comprise or consist of ITGA6 and at least one of the following biomarkers: ASPM, ANLN, CDCA4, ERRFI1, FURIN, GOLGA8A, JAG1, LRP12, MAFF, MRPS17, PLK1, PNP, PPP1R13L, PRKCA, PTTG1, PYGB, RPP25, SCPEP1, SLC46A3, SNX7, TPBG, XBP 1. It should also be understood that the set of biomarkers may include or consist of JAG1 and at least one of the following biomarkers: ASPM, ANLN, CDCA4, ERRFI1, FURIN, GOLGA8A, ITGA6, LRP12, MAFF, MRPS17, PLK1, PNP, PPP1R13L, PRKCA, PTTG1, PYGB, RPP25, SCPEP1, SLC46A3, SNX7, TPBG, XBP 1. It is also to be understood that the set of biomarkers may comprise or consist of LRP12 and at least one of the following biomarkers or LRP12 and at least one of the following biomarkers: ASPM, ANLN, CDCA4, ERRFI1, FURIN, GOLGA8A, ITGA6, JAG1, MAFF, MRPS17, PLK1, PNP, PPP1R13L, PRKCA, PTTG1, PYGB, RPP25, SCPEP1, SLC46A3, SNX7, TPBG, XBP 1. It is also to be understood that the set of biomarkers may comprise or consist of MAFF and at least one of the following biomarkers: ASPM, ANLN, CDCA4, ERRFI1, FURIN, GOLGA8A, ITGA6, JAG1, LRP12, MRPS17, PLK1, PNP, PPP1R13L, PRKCA, PTTG1, PYGB, RPP25, SCPEP1, SLC46A3, SNX7, TPBG, XBP 1. It is also to be understood that the set of biomarkers may comprise or consist of MRPS17 and at least one of the following biomarkers: ASPM, ANLN, CDCA4, ERRFI1, FURIN, GOLGA8A, ITGA6, JAG1, LRP12, MAFF, PLK1, PNP, PPP1R13L, PRKCA, PTTG1, PYGB, RPP25, SCPEP1, SLC46A3, SNX7, TPBG, XBP 1. It is also to be understood that the set of biomarkers may comprise or consist of PLK1 and at least one of the following biomarkers: ASPM, ANLN, CDCA4, ERRFI1, FURIN, GOLGA8A, ITGA6, JAG1, LRP12, MAFF, MRPS17, PNP, PPP1R13L, PRKCA, PTTG1, PYGB, RPP25, SCPEP1, SLC46A3, SNX7, TPBG, XBP 1. It is also to be understood that the set of biomarkers may comprise or consist of PNP and at least one of the following biomarkers: ASPM, ANLN, CDCA4, ERRFI1, FURIN, GOLGA8A, ITGA6, JAG1, LRP12, MAFF, MRPS17, PLK1, PPP1R13L, PRKCA, PTTG1, PYGB, RPP25, SCPEP1, SLC46A3, SNX7, TPBG, XBP 1. It is also to be understood that the set of biomarkers may include or consist of PPP1R13L and at least one of the following biomarkers: ASPM, ANLN, CDCA4, ERRFI1, FURIN, GOLGA8A, ITGA6, JAG1, LRP12, MAFF, MRPS17, PLK1, PNP, PRKCA, PTTG1, PYGB, RPP25, SCPEP1, SLC46A3, SNX7, TPBG, XBP 1. It is also to be understood that the set of biomarkers may comprise or consist of PRKCA and at least one of the following biomarkers: ASPM, ANLN, CDCA4, ERRFI1, FURIN, GOLGA8A, ITGA6, JAG1, LRP12, MAFF, MRPS17, PLK1, PNP, PPP1R13L, PTTG1, PYGB, RPP25, SCPEP1, SLC46A3, SNX7, TPBG, XBP 1. It is also to be understood that the set of biomarkers may comprise or consist of PTTG1 and at least one of the following biomarkers: ASPM, ANLN, CDCA4, ERRFI1, FURIN, GOLGA8A, ITGA6, JAG1, LRP12, MAFF, MRPS17, PLK1, PNP, PPP1R13L, PRKCA, PYGB, RPP25, SCPEP1, SLC46A3, SNX7, TPBG, XBP 1. It is also to be understood that the set of biomarkers may comprise or consist of PYGB and at least one of the following biomarkers: ASPM, ANLN, CDCA4, ERRFI1, FURIN, GOLGA8A, ITGA6, JAG1, LRP12, MAFF, MRPS17, PLK1, PNP, PPP1R13L, PRKCA, PTTG1, RPP25, SCPEP1, SLC46A3, SNX7, TPBG, XBP 1. It is also to be understood that the set of biomarkers may include or consist of RPP25 and at least one of the following biomarkers: ASPM, ANLN, CDCA4, ERRFI1, FURIN, GOLGA8A, ITGA6, JAG1, LRP12, MAFF, MRPS17, PLK1, PNP, PPP1R13L, PRKCA, PTTG1, PYGB, SCPEP1, SLC46A3, SNX7, TPBG, XBP 1. It is also to be understood that the set of biomarkers may comprise or consist of SCPEP1 and at least one of the following biomarkers: ASPM, ANLN, CDCA4, ERRFI1, FURIN, GOLGA8A, ITGA6, JAG1, LRP12, MAFF, MRPS17, PLK1, PNP, PPP1R13L, PRKCA, PTTG1, PYGB, RPP25, SLC46A3, SNX7, TPBG, XBP 1. It is also understood that the set of biomarkers may comprise or consist of SLC46A3 and at least one of the following biomarkers: ASPM, ANLN, CDCA4, ERRFI1, FURIN, GOLGA8A, ITGA6, JAG1, LRP12, MAFF, MRPS17, PLK1, PNP, PPP1R13L, PRKCA, PTTG1, PYGB, RPP25, SCPEP1, SNX7, TPBG, XBP 1. It is also to be understood that the set of biomarkers may comprise or consist of SNX7 and at least one of the following biomarkers: ASPM, ANLN, CDCA4, ERRFI1, FURIN, GOLGA8A, ITGA6, JAG1, LRP12, MAFF, MRPS17, PLK1, PNP, PPP1R13L, PRKCA, PTTG1, PYGB, RPP25, SCPEP1, SLC46A3, TPBG, XBP 1. It should also be understood that the set of biomarkers may include or consist of TPBG and at least one of the following biomarkers: ASPM, ANLN, CDCA4, ERRFI1, FURIN, GOLGA8A, ITGA6, JAG1, LRP12, MAFF, MRPS17, PLK1, PNP, PPP1R13L, PRKCA, PTTG1, PYGB, RPP25, SCPEP1, SLC46A3, SNX7, XBP 1. It is also to be understood that the set of biomarkers may comprise or consist of XBP1 and at least one of the following biomarkers: ASPM, ANLN, CDCA4, ERRFI1, FURIN, GOLGA8A, ITGA6, JAG1, LRP12, MAFF, MRPS17, PLK1, PNP, PPP1R13L, PRKCA, PTTG1, PYGB, RPP25, SCPEP1, SLC46A3, SNX7, TPBG.

One of skill in the art will appreciate that any combination of two or more biomarkers of ANLN, ASPM, CDCA4, ERRFI1, FURIN, GOLGA8A, ITGA6, JAG1, LRP12, MAFF, MRPS17, PLK1, PNP, PPP1R13L, PRKCA, PTTG1, PYGB, RPP25, SCPEP1, SLC46A3, SNX7, TPBG, XBP1 is sufficient to provide a prognosis or determine a treatment regimen for a lung cancer subject.

While there have been shown and described what are at present considered to be the preferred embodiments of the invention, it will be understood by those skilled in the art that various changes and modifications may be made herein without departing from the scope of the invention as defined by the appended claims.

Drawings

For a better understanding of the present invention, and to show how embodiments thereof may be carried into effect, reference will now be made, by way of example only, to the accompanying drawings, in which:

FIG. 1a is a schematic diagram of a lung tumor showing sampling points illustrating the tumor sampling bias problem;

FIG. 1b is a schematic representation of the steps of the prediction method and the clinical significance of the tumor sampling bias problem of FIG. 1 a;

FIG. 1c is a risk score-patient graph plotted using certain known characteristics;

FIGS. 1d and 1e are bar graphs showing the proportion of low, high and inconsistent risk patients using two known features;

figure 1f is a graph of the percentage of patients with each tumor region of 9 known characteristics belonging to the same cluster versus cluster number.

FIG. 2a is a flow chart of steps of a method of developing and validating prognostic features;

FIG. 2b is a flow chart of the steps of the prognostic method;

FIG. 2c is a schematic block diagram of system components implementing the method described in FIG. 2 b;

FIGS. 2d and 2e are graphs of the proportion of patients who all belonged to the same cluster versus the number of clusters for two genes, CKMT2 and HOXC11, respectively;

FIG. 2f is a graph of hierarchical clustering identity against each gene;

FIG. 3a illustrates the steps of calculating RNA intratumoral heterogeneity of multiple genes;

FIG. 3b is a graph of the Median Absolute Deviation (MAD) versus standard deviation score for each gene;

FIG. 3c is a graph of Coefficient of Variation (CV) versus standard deviation score for each gene;

FIG. 3d illustrates a random sampling process for calculating a measure of heterogeneity between RNA tumors;

FIG. 3e is a graph of the number of RNA intratumoral heterogeneity (y-axis) versus the number of RNA intratumoral heterogeneity (x-axis) for multiple genes;

FIG. 4a shows the prognostic value of each of the three features of the validation queue;

FIG. 4b shows a forest map of predicted values versus known risk factors;

4c, 4d, and 4e show prognostic values for sub-staging criteria, current clinical guidelines for chemotherapy, and phase I patient output characteristics, where improved risk prediction may impact clinical decision making;

FIG. 4f illustrates risk scores for a plurality of patient output features;

FIG. 4g shows a graph of prognostic value evaluation using the RNA-Seq dataset and four microarray datasets;

FIG. 4h shows a graph indicating that any subset of ORACLE features may have a prognostic value;

FIG. 5a shows the prognostic relevance of the RNA heterogeneity quadrants of different cancer types;

FIG. 5b compares the expression of genes in each quadrant of multiple cancer types to determine whether the quadrants are enriched or deficient for prognostic genes;

FIG. 6a is a graph of gene expression ITH versus copy number ITH;

FIG. 6b shows the difference in expression of subcloned chromosomal copy number changes (decrease and increase, respectively);

FIG. 6c shows clonal copy number increase for the RNA heterogeneity quadrant; and

FIG. 6d shows the enriched reaction set pathways (reactant pathways) in the RNA heterogeneity quadrant Q4.

Detailed Description

As described above, fig. 1a to 1f show that sampling bias can confound the use of molecular biomarkers in several cancer types. This is because intratumoral heterogeneity (i.e., spatial variability of genetic and transcriptome characteristics within a single tumor) can affect the outcome of molecular biomarker application as a substrate for tumor evolution, as the outcome may depend on the location of the tumor sample being tested. Various solutions to the tumor sampling bias problem have been proposed, including the use of multi-region sequencing to (i) pool multiple biopsies, to obtain an overall molecular risk estimate for a single tumor (as described in "Stability and Heterogeneity of Expression Profiles in Lung Cancer specificities Harvested surgery Resection" published by Blackhall et al, Neolasia (2004)); or (ii) identifying a "lethal" subclone with maximal immune evasion (e.g., as described in "Comprehensive intrametric immunity Quantification and Major immunity of immunoshore on Survival" published by mlernik et al in JNCL J natural Cancer Inst 110,97-108 (2018)) or metastatic potential (e.g., such as described in "distance measurements and nucleic acid analysis of genetic cancer" published by Yachida et al in Nature 467,1114-1117(2010) ". however, in the clinic, multi-region sequencing is currently impractical.

FIG. 2a is a flow chart of the steps for developing a set of biomarkers (or gene expression signatures) that yield reliable prognostic results, suitable for single-region tumor samples routinely collected in clinical practice. As explained in more detail below, the set of biomarkers includes genes with low intra-tumor heterogeneity but high inter-tumor heterogeneity, minimizing the confounding effects of sampling bias, but maximizing the ability to differentiate between patients.

The first step S100 is to collect training data such as gene expression and survival data for 959 patients with stage I to III NSCLC from cancer genomic profiling (TCGA) (469 patients with LUAD and 490 patients with LUSC). This data forms a training data set for deriving features as described below. The downloaded data can thus be processed to form training data according to standard methods in the RNA-seq pre-processing pipeline. For example, alignment can be performed with the human genome, e.g., using the mapspice package described in 67. Gene expression can then be quantified, for example, using the genomicFeatures and Genomic Ranges packages from Bioconductor. Then, an expression filter can be applied to retain at least 0.5CPM genes in at least 2 tumor samples, as shown in step S101. Then, the variance from the package of DESeq2 was stably transformed using the variance stabilizing transformation described in "modeled evaluation of fold change and dispersion for RNA-seq data with DESeq 2" published by Love et al on Genome Biol 15, 550(2014) to obtain normalized counts of the filtered genes. It will be appreciated that data may also be collected for different patients when developing prognostic signatures for different diseases.

The next step S102 is to calculate a prognostic metric for each gene, identifying significant prognostic genes. Then a first filtering step S104 is applied to remove genes according to their prognostic effect (i.e. to select genes with a prognostic measure above a threshold). Each of these genes has an unknown effect on the overall survival of each patient. The prognostic metric can be calculated using any suitable method.

For example, Cox univariate regression analysis can be employed. The Cox model is represented by a risk function represented by h (t). The risk function may be interpreted as the risk of death at time t. It can be estimated as follows:

h(t)=h0(t)×exp(b1x1+b2x2+…+bpxn)

where t denotes the time-to-live, h (t) is a set of n covariates (x)1,x2,……,xn) The determined risk function-in this case a gene, group (b)1,b2,……,bn) Is the weight (or coefficient) of each covariate, the term h0Referred to as baseline risk, with all xiRisk values that are all equal to 0 (the number exp (0) is equal to 1) correspond. "t" in h (t) alerts us that risk may change over time. However, the temporal variance can be eliminated, so that the model can be rewritten in a linear form by taking the logarithm of the risk ratio of patient i to the reference group, which can be written as:

this linear equation is called the Cox proportional hazards model, which contains a set of n covariates (i.e., genes) (x) for each patient i1i,x2i,……,xni) And a set of weights (b) to optimize the model for all patients1,b2,……,bn). Univariate analysis refers to considering each variable individually. Typically, for each variable, the coefficient and the lower and upper limits of the 95% confidence interval around the coefficient (CI 95L and CI95U, respectively) are calculated. The P-value is a measure of the statistical significance of the variable, calculated using either the Wald test or the time series test (log-rank). Q value is represented by Benjamini&Adjusted P value by Hochberg method.

More than one prognostic filter may be applied, as shown in step S104. For example, a first filter may include filtering all genes according to a prognostic significance threshold, e.g., P <0.05 may reduce the number of genes from 19026 to 4240 in this embodiment. A second filter may be employed to filter the genes according to a median threshold, e.g., all genes for which the prognostic measure is below the prognostic threshold may be removed. In this example, this can reduce the number of genes from 19026 to 9512. These two thresholds can be considered together as a prognostic threshold, so in general the first filtering step can reduce the number of genes from 19026 to 2023.

A second filtering step S106 may then be employed. This filter may be referred to as a clonal expression filter or a heterogeneity filter. As explained in more detail below, the clonal expression filter can remove genes that have neither low nor high intratumoral heterogeneity (i.e., select genes that have both low and high intratumoral heterogeneity). In this example, this may reduce the number of genes from 2023 to 176.

A third filtering step S108 may then be employed. This filter, which may be referred to as a consensus filter, may screen out the remaining genes based on the gene cluster consensus scores. The cluster consistency score may be calculated using any suitable method. For example, consistency can be determined by hierarchical cluster analysis of Cancer Expression data, where multiple samples are obtained from each tumor, e.g., using the manhattan metric Ward method, as described in "inside or heterogeneous effects Gene Expression Test scientific skin formation in Early Breast Cancer" published by gyanchandanni et al, clin. Identity was determined at each gene level as the percentage of tumors that all samples were assigned to the same cluster. Cluster analysis can be iterated from 2 to the total number of patients (e.g., 28 of this TRACERx LUAD cohort). For each gene, the number of patients with all regions belonging to the same cluster can be plotted against the number of clusters to obtain a curve. For example, as shown in figures 2d and 2e, for both genes CKMT2 and HOXC11, the proportion of patients with all samples belonging to the same cluster was plotted against the cluster number. The cluster identity score for each gene was then summarized as the area under the curve. The identity scores for each gene can be plotted as shown in FIG. 2f, which shows the hierarchical cluster identity for each gene. Once the identity score for each gene is calculated, all genes whose identity scores are below the identity threshold may be removed. The consistency threshold (i.e., cutoff) may be determined using ten-fold cross validation. In this example, this may reduce the number of genes from 176 to 90.

For practical prognostic kits, this number of genes may still be too large, and thus the number of genes may optionally be further reduced using standard methods, such as Lasso regression (S110). Lasso regression can be applied in the rsofware environment using the glmnet package described in "regulated Paths for Generalized Linear Models vitamin code Description" published by Friedman et al at J Stat software 33,1-22(2010) for a Cox Proportional risk Model using a lasso penalty (α ═ 1) (e.g., as described in "Regulation Paths for Cox's Proporting Hazards Model vitamin code Description" published by Simon et al at J Stat software 39,1-13 (2011)). In this example, this may reduce the number of genes from 90 to 23. Then, the resulting 23 genomes (i.e., features) are output (S112). The resulting signature may be referred to as the ORACLE signature (outcome risk-related clonal lung expression). The prognostic accuracy of the output features may be evaluated using the validation data (S114).

It should be understood that each of the filtering steps in fig. 2a may be applied in any order. The order shown is merely exemplary and not limiting.

Prognostic biomarker signatures include the following genes: ANLN, ASPM, CDCA4, ERRFI1, FURIN, GOLGA8A, ITGA6, JAG1, LRP12, MAFF, MRPS17, PLK1, PNP, PPP1R13L, PRKCA, PTTG1, PYGB, RPP25, SCPEP1, SLC46A3, SNX7, TPBG, XBP 1. There are five genes involved in cell proliferation: ANLN, ASPM, CDCA4, PLK1, PRKCA, and 6 genes (ERRFI, FURIN, ITGA6, JAG1, PPP1R13L, PTTG1) are involved in oncogenic signaling pathways. Only 7 of these genes were previously used for the LUAD prognostic signature, namely ASPM, FURIN, PLK1, PNP, PRKCA, PTTG1 and TTBG. Prognostic biomarker prediction is independent of the survival risk of the treatment.

A method of providing a prognosis or a predicted risk level for a lung cancer subject, the method comprising:

a) contacting a biological sample from a subject with an agent that specifically binds to each member of a set of biomarkers comprising or consisting of: ANLN, ASPM, CDCA4, ERRFI1, FURIN, GOLGA8A, ITGA6, JAG1, LRP12, MAFF, MRPS17, PLK1, PNP, PPP1R13L, PRKCA, PTTG1, PYGB, RPP25, SCPEP1, SLC46A3, SNX7, TPBG, XBP 1;

(b) determining a risk score for the subject based on the nucleic acid expression level of the biomarker in the sample; and

(c) providing a prognosis for lung cancer based on the risk score of the subject.

The method may also include collecting a sample from the patient. The sample may be a tumor sample. The reagents used in the methods, kits, and compositions provided herein can be nucleic acids, such as oligonucleotides or primers.

The prognosis as described herein relates to clinical outcome, such as overall survival, intermediate or long term mortality (e.g., 1, 2,3, 4 or 5 years), or disease-free survival.

It should be understood that fig. 2a illustrates one method for generating prognostic signatures, however, the method can be readily adapted for prediction by replacing the prognostic metric with a predictive metric.

FIG. 2b provides an example of a method of prognosis. Steps that may be performed in a prognostic method using output features are shown. The first step is to contact the biological sample (step S200). The biological sample may be a tumor sample obtained using any suitable method, such as a donor sample obtained from a biopsy. The value of each of the 23 genes in the tumor sample is then determined using standard techniques (step S202).

The next step (S204) is to determine a risk score based on the weighted sum of the values of each of the 23 genes. Thus, the risk score may be calculated by:

risk score b1x1i+b2x2i+…+bnxni

In the formula x1i、x2i、……、xniIs the number of 23 selection genes per patient i, b1、b2、……、bnIs each oneA set of associated weights for the genes. The weights may be determined using lasso regression as described above.

For example, suitable weights for each gene in the signature are shown below. Genes with a positive beta coefficient are associated with a risk ratio >1 (i.e. "unfavorable genes", predicting poor survival), and genes with a negative coefficient ("favorable genes") are the opposite. It should be understood that these weights are indicative of suitable values and are not limiting.

Returning to fig. 2b, once the risk score is determined, it is compared to a risk score threshold (step S206). If the risk score is equal to or above the threshold, the risk of the patient failing to survive is considered high, and the patient is classified as a high risk patient (step S210). Conversely, if the risk score is below the threshold, the risk of the patient failing to survive is considered low, and the patient is classified as a low risk patient (step S212). The threshold may be, for example, the median risk score of the data used to derive the feature and/or the median risk score of the most significant segmentation (timing test P < 0.01). In other words, the threshold may be the risk score that most significantly separates the training cohort into relapsed and non-relapsed (i.e., cured) patients.

As an alternative to step S208, the risk score may be compared to an upper threshold and a lower threshold. If the risk score is equal to or above the upper threshold, the patient is classified as a high risk patient. If the risk score is below the lower threshold, the patient is classified as a low risk patient. If the risk score lies between these two thresholds, the patient is classified as a patient at risk. As explained below, the upper and lower thresholds may be determined as the tertile of the risk score determined for the training cohort.

This can optionally be used to decide the most appropriate treatment once the risk score is determined. For example, for high risk patients, adjuvant chemotherapy is recommended to supplement surgery. This treatment can increase overall survival compared to simple chemotherapy. This is particularly important for stage I patients, where there is a lack of clinical indicators identifying high risk patients. Currently, phase I patients tend not to receive chemotherapy, resulting in approximately 25% of phase I patients relapsing within 5 years. In contrast, for low risk patients, treatment may be selected from surgery alone or the combination surgical methods specified above. In this case, both options are equally effective.

A schematic diagram of a related system for performing the method is shown in fig. 2 c. The system includes a computing device 210, which computing device 210 may be a hand-held portable device that a clinician may carry with it for different patients and on which an application to calculate a risk score may be installed. Computing device 210 includes standard components (e.g., a processing unit or processor 220), a user interface unit 222 for allowing a user to input information (e.g., determine a score), and a memory 224 for storing code to perform calculations and/or for comparing thresholds for calculating risk scores. The user interface may display information or, alternatively, there may be a display 224 for displaying information to the user, such as calculated risk scores and/or treatment recommendations as described above, and a communication module 228 for communicating with other devices and/or accessing the cloud 240, e.g., for processing the risk scores. A tissue sample 230 is also schematically shown.

The illustrative system may be constructed, in part or in whole, using specialized hardware. Terms such as "module" or "unit" as used herein may include, but are not limited to, a hardware device that performs certain tasks or provides related functions, such as a circuit in discrete or integrated component form, a Field Programmable Gate Array (FPGA), or an Application Specific Integrated Circuit (ASIC). In some embodiments, the described elements may be configured to reside on a tangible, persistent, addressable storage medium and may be configured to execute on one or more processors. In some embodiments, these functional elements may include, for example, components (e.g., software components, object-oriented software components, class components, and task components), procedures, functions, attributes, procedures, subroutines, segments of program code, drivers, firmware, microcode, circuitry, data, databases, data structures, tables, arrays, and variables. Although example embodiments have been described with reference to components discussed herein, these functional elements may be combined into fewer elements or separated into other elements:

figures 3a to 3e illustrate how a clonal expression filter is created. For each gene, an RNA intratumoral heterogeneity value and an RNA intratumoral heterogeneity value were calculated. These per-gene indices can quantify variability by standard deviation between regions within the same tumor to generate intratumoral heterogeneity values, and by standard deviation between the same tumor regions of different tumors to generate intratumoral heterogeneity values. They can be calculated using multi-region RNAseq data (normalized count values).

Figures 3a to 3e plot tumor sample data from a 100 NSCLC patient dataset from a TRACERx lung cancer study sponsored by the university college of london. Multiple-region sampling was performed to obtain DNA and RNA sequentially from the same tissue. Whole exome sequencing was performed on the DNA samples. In a cohort of 100 tumors, RNA samples of sufficient quality were obtained from 174 regions of 68 tumors. Of these, at least two samples were from 48 tumors.

Further processing may be performed as desired. For example, an alignment is performed, e.g., "STAR" published by Dobin et al in Bioinformatics29,15to 21 (2013); the STAR package described in the ultra fast Universal RNA-seq aligner "maps reads to the human genome. For example, "RSEM" published by Li et al on BMC Bioinformatics 12,323 (2011); the RSEM package described by the attached transcript quantification from RNA-Seq data with or with out a reference gene "quantifies transcript expression to generate counts and number of transcripts per million expression values (TPM). At least 20% (30/156) of the genes for at least 1TPM in the tumor samples were retained using the expression filter. Finally, variance stabilizing transformations were applied to the counts from the filtered genes using the DESeq2 package described above (assuming the counts were in a negative binomial distribution). The output covariance and library size normalized count values are used as follows. In this example, there may be 19206 genes to consider.

As shown in fig. 3a, for each patient (e.g., CRUK0003), gene expression of each gene at multiple locations (e.g., R1 through R8) was determined. For example, the graph on the left of fig. 3a shows gene expression at multiple positions for EDC4, CALM2, and PROM 1. For a given tumor, the standard deviation of the expression values of a particular gene in each tumor region can be calculated, yielding a gene-specific, patient-specific measure of intratumoral heterogeneity of RNA (σ)g,p). For the example patient, these are shown in the table at the center of fig. 3a, with three genes being 0.075, 0.552, and 2.248, respectively. Thus, EDC4 changed little throughout the tumor, but PROM1 changed significantly. This operation can then be repeated for all genes, and then for all tumors, generating sigmag,pThe matrix of values, in this example, is shown as a table with patients (p) in the columns and genes (g) in the rows.

The gene-by-gene RNA intratumoral heterogeneity values can be summarized as the mean (median) of each gene (σ) for all tumors in the cohortg). These values may be determined, for example, by plotting a graph such as that shown on the right side of figure 3 a. For three exemplary genes, σgThe values are 0.096, 0.246 and 1.380, respectively. Alternatively, the intra-tumor heterogeneity values for patient-by-patient RNA can be summarized as the mean (median) of all expressed genes in the cohort per tumor (σ)p)。

FIG. 3b is a plot of the Median Absolute Deviation (MAD) versus the standard deviation score for each gene. Similarly, FIG. 3c is a plot of the Coefficient of Variation (CV) for each gene against the standard deviation score. FIGS. 3b and 3c show that MAD and CV are surrogate markers for quantifying the RNA-ITH of a gene, and they show good agreement with the standard deviation score.

As shown in fig. 3d, the measure of intratumoral heterogeneity for each gene can be derived by randomly sampling a region for each patient (e.g., patient CRUK001 for R1 region, patient CRUK002 for R2 region, etc.). The standard deviation of the resulting single biopsy cohort can then be derived. The random sampling and calculation of the standard deviation may be repeated multiple times (e.g., 10 times) to obtain an average score for the iteration. As a check, the same approach can be applied to TCGA NSCLC data sets, which is a true single biopsy queue. This examination found good agreement with the scores calculated in the trauecrx cohort (PMCC 0.94, P <0.001), indicating that the calculation of the inter-tumor heterogeneity score was reproducible.

FIG. 3e plots the value of RNA intratumoral heterogeneity (also called score, these terms are used interchangeably) for each gene (y-axis) versus the value of RNA intratumoral heterogeneity (x-axis). The plot in figure 3b is divided into quadrants by the mean intratumoral heterogeneity values (horizontal dashed lines) and the mean intratumoral heterogeneity values (vertical dashed lines). These quadrants are numbered as Q1, Q2, Q3, and Q4, and indicate the number of genes per quadrant. Q1 represents the low and high intratumoral heterogeneity value genes, comprising 798 genes. Q2 represents the low inter-and intra-tumor heterogeneity value genes, comprising 9642 genes. Q3 represents the high inter-and intra-tumor heterogeneity value genes, comprising 4766 genes. Q4 represents the high and low intratumoral heterogeneity score genes, comprising 1080 genes. The genes in Q2 and Q4 showed homogenous expression within the tumor (i.e., low inter-tumor heterogeneity), which may limit sampling bias. However, in Q2, the genes also have low inter-tumor heterogeneity, which means that they show homogenous expression between different tumors, and thus, do not provide information for patients to be classified into high/low risk groups. Thus, the genome in Q4 is more useful, so the clonal expression filter can filter out all genes outside the Q4 quadrant, i.e., genes with an inter-tumor heterogeneity value above an inter-tumor threshold (e.g., median) and an intra-tumor heterogeneity value below an intra-tumor threshold (e.g., median).

Figures 4a to 4e show example results of the last optional step of the method of figure 2a, using validation data to assess the prognostic accuracy of the Output (ORACLE) signature. In this example, validation data was taken from the "Uppsala II" dataset, which is an independent cohort of early stage LUAD patients (UII, n 103, stages I to III). Validation data included pretreated Uppsala RNAseq and clinical data of 170 NSCLC patients (103 LUAD +67 LUSC) in the Uppsala NSCLC II cohort downloaded from the high-throughput Gene Expression database (Gene Expression Omnibus). The cohort is described in "Profiling cameras testins in non-small-cell regulating cameras" published by Djureinovic et al in JCL Insight 1 (2016).

Genetic information is extracted from the data set using known procedures. For example, alignments to the human Genome can be performed, for example, using the TopHat package described in "TopHat 2: acquisition alignment of transactions in the presence of entities, deletions and gene fusions", published by Kim et al in Genome Biol 14, R36 (2013). The original read length is then calculated, for example, using The smooth package described in "The smooth aligner: fast, available and scalable read mapping by seed-and-volume" published by Liao et al in Nucleic Acids Res 41, e108 (2013). The gene ID is converted to HGNC ID using the biorart package described in "Mapping Identifiers for the Integration of Genomic data with the R/Bioconductor package biorart" published by Durinck et al at Nat Protoc 4,1184-1191 (2009). The maximum value is then selected for the multiple mapping probe. The under-expressed genes identified in the training dataset were filtered from the validation dataset and a variance stabilizing transformation was applied using the DESeq2 package described above to output a normalized count value. Additional clinical information (e.g., treatment status and tumor size) is also obtained.

FIG. 4a compares the output of the sexual performance of the 23 gene signature with the performance of similar signatures based on known papers. Feature A the pipeline was constructed Based on the features described by Shukla et al in "Development of a RNA-seq Based qualitative Signature in Lung Adenoccymoma" published by JNCL J Natl Cancer Inst 109 (2017). The features were derived from the genes identified in the Shukla paper and several genes for the features were selected using standard techniques. For example, univariate Cox regression analysis was performed using a training dataset from the TCGA database, in particular LUAD patients, and a primary prognostic filter (univariate Cox analysis P <0.00025) was applied to reduce the number of genes identified in the Shukla paper to 108. Another prognostic filter, this time univariate Cox analysis FDR <0.02, reduced 108 genes to 15. Finally, forward conditional stepwise regression was applied to generate 6-gene signatures. The procedure outlined in the Shukla paper was therefore followed, but different training data yielded 6-gene signatures instead of a prognostic model comprising 4 genes on page 4 of Shukla.

Feature B is based on "A representative molecular assay to predict Survival in selected non-square, non-small lung cancer" published by Kratz et al in Lancet 379,823 and 832 (2012); the features described in the definitions and international validations clients "construct the pipeline. In the development of feature B, all genes identified in the papers listed in the background section table are first sorted into a list. For example, using a training dataset from the TCGA database, in particular LUAD patients, a univariate Cox regression analysis was performed and a first-order prognostic filter (univariate Cox analysis P <0.00025) was applied to reduce the number of identified genes to 249. Using a secondary prognostic filter, the number of genes was reduced from 249 to 56 by listing only the genes associated with cancer. Finally, lasso regression was applied to obtain 24 gene prognostic signatures. Like feature A, this feature B was derived using the method described in the Kratz paper, but resulted in a different gene selection due to the training cohort. Both of these characteristics are comparable to the above-mentioned 24 gene characteristics.

Fig. 4a shows the prognosis values for each of the three features. The prognostic accuracy of these three features was tested using validation data from the Uppsala dataset. As shown, the process shown in fig. 2a produces features that predict significant risk of survival (time series test P ═ 0.006) and are superior to features a and B. In other words, fig. 4a shows that using the features derived from the process of fig. 2a to calculate a risk score, the patients in the validation cohort are likely to be more successfully divided into subgroups with significantly different survival times than using the risk scores of features a and B.

Fig. 4b is a forest map showing the predicted values of this new feature when combined with other known risk factors. In fig. 4b, multivariate (rather than previous univariate) analysis was performed to demonstrate that the calculated risk score (as input to the continuous variable) remains significant even when clinical information is integrated to predict survival. The relative risk of death (risk score) is shown as a solid block and is a comprehensive function of tumor stage (e.g., stages I to III), treatment status (with or without some adjunctive treatment), and risk score calculated using Output (ORACLE) features. The 95% confidence level is also indicated by the bar graph. The higher the risk ratio, the greater the risk of death, not expected, and the highest value for stage III patients. Figure 4b shows that the output signature is significant when multivariate analysis of tumor stage and treatment status is employed (Cox mvap ═ 0.0247) as this signature provides additional prognostic information.

Fig. 4c to 4e show clinically actionable information for phase I patients. There were approximately 60 such patients in the validation dataset. Figure 4c shows that phase I patients are divided into two groups: classification into IA (n-42) and IB (n-18) is based on the sublevel criterion (chronology P-0.52). Classifying patients in this manner does not effectively classify patients as having high overall survival and low overall survival. Similarly, fig. 4f shows the classification of stage I patients into high risk patients and low risk patients according to tumor size. Current clinical guidelines suggest that patients with stage I LUAD with a stage IB tumor greater than 4cm are high risk patients, while other patients (i.e., stage IA tumors or stage IB tumors less than 4cm in size) are low risk patients. Of the 60 patients, only 5 patients were among high risk patients. As shown in fig. 4d, these patients were not well classified as high overall survival and low overall survival.

Fig. 4e shows the use of the Output (ORACLE) feature to classify phase I patients into two groups: high risk group (red) and low risk group (blue). As shown, this partitioning is much more effective in predicting patient survival.

Fig. 4f illustrates the effect of tumor sampling bias on Output (ORACLE) characteristics. The calculated risk score is used to classify the tumor region as "high risk" or "low risk". Then, the inconsistent classifications for individual patients are evaluated, whereby different regions from the same tumor may be classified as having different molecular risk characteristics. As shown in the figure, only 3/28 patients (i.e. 11%) were inconsistent, which is much lower than the rate of inconsistency shown in fig. 1c and 1 d.

FIG. 4g shows a graph of prognostic value evaluation using the RNA-Seq dataset and four microarray datasets. To study the consistency of multiple queues, an Output (ORACLE) feature was applied to four microarray datasets. Specifically, the prognostic value of the Output (ORACLE) profile was evaluated in meta-analyses of five validation cohorts of LUAD patients (n 904 patients in phase I-III LUAD). Univariate Cox analysis was performed in one RNA-Seq dataset and four microarray datasets. In the microarray cohort, 19 of 23 genes were available for analysis (ASPM, CDCA4, FURIN, GOLGA8A, ITGA6, JAG1, LRP12, MAFF, MRPS17, PLK1, PNP, PPP1R13L, PRKCA, PYGB, SCPEP1, SLC46a3, SNX7, TPBG, XBP 1). The risk ratios with 95% confidence intervals for each cohort are shown and plotted on a natural logarithmic scale. It is expected that the output signature will perform worse because only 19 of the 23 genes match the microarray probe set and feature weights trained on RNA-Seq data are used. However, ORACLE is significantly associated with survival in three of the four microarray datasets. Meta-analysis takes into account all validation queues-diamonds represent the risk ratio of five validation queue meta-analyses-this indicates that ORACLE is significantly correlated with the results, with an overall risk ratio of 3.57. These data indicate that by controlling RNA-ITH in biomarker design, a survival association can be obtained that is not affected by differences in expression profiling techniques. For more information on this analysis, see "A cyclic expression biobased assays with lung cancer monitor", D.Biswas et al, Nature Medicine 25,1540-1548 (2019).

FIG. 4h plots prognostic values for combinations of 1 to 23 genes selected from ORACLE characteristics. Two programs of selecting gene combinations from the complete ORACLE signature were considered as computationally efficient alternatives to exhaustive search of each combination of 23 genes. The reverse construction subset starts with a complete model containing all 23 genes, evaluates all 22 gene combinations, and then selects the best combination with the highest prognostic significance. This procedure is repeated, removing one gene at a time, until one gene remains. The forward construction subset starts from a model that does not contain any genes, then the genes that give the highest prognostic significance to the model are added, one gene at a time, until all 23 genes are included. Importantly, the weights of each gene are not retrained, so each combination is evaluated as a subset of the complete ORACLE feature defined above. These data indicate that any combination of two or more of the 23 genes characteristic of ORACLE may have prognostic value. The data for both procedures are shown in appendix a.

FIGS. 5a and 5b illustrate that the method described in FIG. 2a above may have prognostic relevance for other cancer types. The clonal expression filters described in figures 3a to 3f were generated by using the complete multi-region RNAseq dataset from the TRACERx lung cohort containing data from multi-region lucc tumors and other NSCLC histologies. This data was then used to calculate the intratumoral and intratumoral heterogeneity scores for each gene using the same gene-by-gene index described above. As described above, these genes are divided into four quadrants.

Then, the proportion of each gene giving a prognosis value for pan-cancer significance in each quadrant was evaluated and shown in fig. 5 a. For example, pan-cancer gene-by-gene prognosis values can be downloaded from The PRECOG resource described in "The diagnostic landscapes of genes and profiling animal cells" published by Gentles et al at Nat Med 21,938-945 (2015). The PRECOG resource is a meta-dataset that aggregates 166 microarray datasets covering 39 different malignant histologies. The data set includes Z-scores previously calculated using Cox univariate regression analysis. Genes with | z | score >1.96 (corresponding to bilateral P <0.05) were selected. Consistent with the analysis in figures 3a to 3f, the genes in the Q4 quadrant (i.e., genes with high inter-and low intra-tumor heterogeneity values) exhibited significantly higher pan-carcinoma Z-scores (reflecting significant prognostic power) than all other quadrants.

FIG. 5b also compares the expression of genes in each quadrant to determine whether the genes are enriched or deficient for prognostic genes. Each point in fig. 5b corresponds to one of the 33 cancer types from the PRECOG database. The number of genes with a significant prognosis (| z | score >1.96) in each NSCLC RN heterogeneity quadrant per cancer type was expressed as insignificant (grey), significantly enriched (red), or significantly absent (blue). As shown in fig. 5b, the gene in Q4 was significantly enriched for prognostic genes in 49% (19/39) of the cancer types, and was significantly deficient only in 3/% (1/39) of head and neck cancer. In contrast, the genes in Q1 (genes with low inter-and high intra-tumor heterogeneity values) were not significantly enriched in any cancer type and were absent in 56% (22/39) of the cancers. Genes in Q2 (genes with low inter-and intra-tumor heterogeneity values) and Q3 (genes with high inter-and intra-tumor heterogeneity values) showed similar numbers of deficiencies and enriched cancer types.

FIGS. 6a to 6c explore the genomic mechanisms supporting RNA-ITH. First consider the relationship between RNA-ITH scores calculated using multi-region RNAseq data as described above and copy number heterogeneity quantified using multi-region WES data described in "Tracking the Evolution of Non-Small Cell Lung Cancer" published by Jamal-Hanjani et al, N Engl J Med 376, 2109-. FIG. 6a is a graph showing the relationship between gene expression ITH and copy number ITH. From the TRACERx LUAD cohort, patient-by-patient RNA-ITH scores were plotted against patient-by-patient SCNA-ITH scores. Figure 6a shows that there is a significant correlation between median RNA-ITH scores per patient and percent subcloned SCNA events per patient (Rs 0.48, P0.0162). This suggests that SCNA-ITH may contribute to transcriptome heterogeneity.

Figure 6b shows that there is a highly significant correlation between increase in subclone copy number and increase in expression and between deletion of subclone copy number and decrease in expression (P < 0.001). This data indicates that there is a correlation between chromosomal copy number increase and deletion at the subclone level and gene transcription, and that RNA-ITH reflects the likely choice of ongoing CIN and heterogeneous DNA copy number events.

Figure 6c shows the clonal copy number increase odds ratio for each quadrant. FIG. 6c assesses the relative enrichment of genes in each quadrant in the TRACERx cohort that most often showed an increase in clonal copy number (upper quartile) and very little (lower quartile). Fig. 6c shows that the Q4 gene was highly significantly enriched in the TRACERx with the occurrence of clonal copy number increase events (P ═ 1.18e-05, Fisher (Fisher) exact test), while the Q3 gene was less enriched (P ═ 0.000109, Fisher exact test). In contrast, the Q2 gene is absent (P ═ 6.86e-08, fisher exact test). This data suggests that homogeneous expression in tumors may result from changes in the copy number of cloned DNA selected early in tumor evolution.

Figure 6d shows enriched response group pathways in Q4 that are involved in cell proliferation, including mitosis, nucleosome assembly, and epigenetic regulation. In contrast, the same analysis of the gene pathways in the other quadrants showed no significant enrichment of the Q1 gene, the Q2 gene was shown to be involved in the RNA splicing process, and the Q3 gene was shown to be involved in GPCR ligand binding and extracellular matrix organization. This analysis suggests that the Q4 gene may be associated with specific biological characteristics of tumor invasiveness, which may explain their prognostic differential pathways.

Various combinations of optional features have been described herein, and it should be understood that the described features may be combined in any suitable combination. In particular, features of any one example embodiment may be combined with features of any other embodiment as appropriate, unless such combinations are mutually exclusive. In this specification, the term "comprising" means including the specified component or components, but not excluding the presence of other components.

Attention is directed to all papers and documents which are filed concurrently with or previous to this specification in connection with this application and which are open to public inspection with this specification, and the contents of all such papers and documents are incorporated herein by reference.

All of the features disclosed in this specification (including any accompanying claims, abstract and drawings), and/or all of the steps of any method or process so disclosed, may be combined in any combination, except combinations where at least some of such features and/or steps are mutually exclusive.

Each feature disclosed in this specification (including any accompanying claims, abstract and drawings) may be replaced by alternative features serving the same, equivalent or similar purpose, unless expressly stated otherwise. Thus, unless expressly stated otherwise, each feature disclosed is one example only of a generic series of equivalent or similar features.

The invention is not restricted to the details of the foregoing embodiments. The invention extends to any novel one, or any novel combination, of the features disclosed in this specification (including any accompanying claims, abstract and drawings), or to any novel one, or any novel combination, of the steps of any method or process disclosed herein.

Appendix a-specific combined data for biomarkers with prognostic value, such as that obtained using the forward and reverse construct subset program of fig. 4 h.

Forward analysis

Reverse analysis

88页详细技术资料下载
上一篇:一种医用注射器针头装配设备
下一篇:经冷轧和涂覆的钢板及其制造方法

网友询问留言

已有0条留言

还没有人留言评论。精彩留言会获得点赞!

精彩留言,会给你点赞!