Method for researching colon adenocarcinoma genomic variation and tumor evolution relation

文档序号:1075099 发布日期:2020-10-16 浏览:10次 中文

阅读说明:本技术 一种结肠腺癌基因组变异与肿瘤进化关系的研究方法 (Method for researching colon adenocarcinoma genomic variation and tumor evolution relation ) 是由 祝让飞 于 2019-12-30 设计创作,主要内容包括:本发明涉及基因组变异技术领域,且公开了一种结肠腺癌基因组变异与肿瘤进化关系的研究方法,包括以下步骤:S1、数据下载;S2、数据预处理:TCGA数据预处理:A、对TCGA的临床样本数据做以下预处理:a、去掉没有临床信息或者生存时间小于30天的样本;b、去掉正常组织样本数据;B、对突变数据做以下预处理:a、去掉silent和intron突变位点。该种结肠腺癌基因组变异与肿瘤进化关系的研究方法,收集来自TCGA的结肠腺癌(COAD)基因组的数据,并从其体细胞突变(SSNV)和拷贝数变异(SCNV)的组库中推断出每个肿瘤样本的克隆(亚克隆)组成,利用该肿瘤克隆系统发育数据,进一步评估肿瘤内异质性与基因组不稳定性关系,从而使这些特征可以作为潜在肿瘤复杂性的标志。(The invention relates to the technical field of genome variation, and discloses a method for researching the evolution relationship between colon adenocarcinoma genome variation and tumors, which comprises the following steps: s1, downloading data; s2, preprocessing data: pretreatment of TCGA data: A. clinical sample data of TCGA were pretreated as follows: a. removing samples without clinical information or with survival time less than 30 days; b. removing normal tissue sample data; B. mutation data were pre-processed as follows: a. the silent and intron mutation sites were removed. The method for researching the colon adenocarcinoma genomic variation and tumor evolution relation collects data of a colon adenocarcinoma (COAD) genome from TCGA, deduces the clone (subclone) composition of each tumor sample from a group library of somatic mutation (SSNV) and copy number variation (SCNV) of the colon adenocarcinoma, and further evaluates the relation between heterogeneity and genomic instability in tumors by using the phylogenetic data of the tumor clones, so that the characteristics can be used as a marker of potential tumor complexity.)

1. A method for researching the evolution relation between colon adenocarcinoma genomic variation and tumors is characterized in that: the method comprises the following steps:

s1, downloading data;

s2, preprocessing data:

pretreatment of TCGA data:

A. clinical sample data of TCGA were pretreated as follows:

a. removing samples without clinical information or with survival time less than 30 days;

b. removing normal tissue sample data;

B. mutation data were pre-processed as follows:

a. removing the silent and intron mutation sites;

b. removing the super-mutated samples (hyper-mutated samples);

C. CNV data were preprocessed as follows:

a. removing data with CNV interval >1 Mb;

b. v22 using the code of GRCh38 version to match the CNV interval to the corresponding gene; 319 samples meeting the conditions are finally obtained;

s3, COAD genome compilation features:

a. the samples were evaluated for purity (purity), chromosome ploidy (ploidy):

firstly, calculating the purity, ploidy and ABSOLUTE DNA copy number of each sample by using an ABSOLUTE algorithm; the second step integrates all possible mutation polymorphisms (m: 1to local absolute copy number) and p (AF) to evaluate the probability of CCF;

b. analyzing genome mutation signature;

identifying SNV signatures by using a brunet algorithm in NMF, respectively evaluating diphenic and rss when k is 2-10 (namely SNVsignature is 2-10), selecting k is 3 (namely 3 SNV signatures) as an optimal number according to the two indexes, obtaining 3 SNV signatures according to a trinucleotide mutation mode, defining the SNV signatures as signatures A-C, dividing SSNV into click events and sub-click events based on CCF, comparing the contributions of the two types of SSNV in the 3 click signatures, calculating the contribution degree of the signatures A-C in each sample, and calculating the similarity between the 3 signatures and COSMIC signatures (using correlation coefficients for representation);

s4, clone and subclone variation analysis:

the method comprises the steps of utilizing the coordinate position of code v22 to enable CNV to correspond to a specific gene, integrating the clone/sub-clone events data of SCNV and SSNV, analyzing the clone and sub-clone structures of a COAD sample, taking the genes with the SCNV and SSNV occurrence frequency of more than 5% in all samples, and respectively obtaining 47 SCNV genes with the highest occurrence frequency and 76 genes with SSNV occurrence frequency;

s5, analyzing the time sequence relation of mutation and tumor evolution:

firstly, sequencing 47 SCNAs and 76 SSNVs with the highest mutation frequency according to a CCF value, constructing a time sequence of possible mutation occurrence in the tumor evolution process by utilizing clone events and sub-clone events from the same sample, when a clone event and a sub-clone event occur in the same sample, a connection (edge) is established between the two, the same analysis is carried out on all samples, and finally a gene network with a specific direction is obtained, the nodes (nodes) of the network represent genes, and the edge (edge) indicates that the relationship between the two genes, namely, the clone and the sub-clone exists, enrichment analysis is carried out according to the in-edges and out-edges quantity of each node (gene), fisher exact test is carried out for significance test, BH method is used for calculating FDR, nodes (genes) of FDR <0.05 and out-edges > in-edges for SSNV and SCNV are defined as genes that occur Early (Early); the same node with FDR <0.05 and in-edges > out-edges, was defined as the gene that appeared Late (Late); the genes in other cases are defined as genes appearing in the middle (Intermediate), wherein the temporal order of the SCNA and the SSNV is respectively deduced, and part of conflicting edges are removed, so that 115 SCNA calls and 2201 SSNV calls are finally obtained;

s6, relationship of clonal or subclonal events to prognosis:

analyzing the prognostic relation (mutation number is greater than 5%) between the clone states of 47 high-frequency SCNA genes and 76 high-frequency SSNV genes and the total survival by using a kaplan-meier method, and taking log rank test p <0.1 to respectively obtain 1 early-stage gene, 12 middle-stage genes and 1 late-stage gene which have a relatively obvious relation with the total survival rate;

s7, relationship of clonal or subcloning events to clinical characteristics:

obtaining the clone events of SCNA and SSNV based on the method, combining the clinical information provided by TCGA, analyzing the relation between the clone events and sub-clone events and clinical characteristics, and analyzing the difference of clone/sub-clone events on TNM, stage, age, genter and tissue type;

s8, relationship of cloning or subcloning events to Tumor Mutational Burden (TMB)/neoantigen (Neoantigens); analyzing the relationship between the clone/sub-clone events and TMB and neoantigen, evaluating the correlation between them by using a Sperarman method, and further evaluating the clone/sub-clone difference between a sample (YES) with MMR gene mutation and a sample (NO) without mutation;

s9, summarize:

A. mutation signature analysis the mutations of the TCGA COAD samples were divided into 3 significantly different signatures by comparison to 30 known signatures in codic;

B. identifying the COAD mutant clonal/sub-clonal status by clonal/sub-clonal events analysis of SCNA and SSNV;

C. taking the genes mutated in more than 5% of samples to respectively obtain 47 SCNA and 76 SSNV high-frequency mutated genes;

D. analyzing the relation between the mutation and the tumor evolution by utilizing clone/sub-clone events to obtain a group of genes with mutation in early, intermediate and late stages of the tumor evolution;

E. using a single-factor cox regression analysis method to research the relationship between the state of the clone/sub-clone and the prognosis, and respectively obtaining 1 early gene, 12 middle genes and 1 late gene which have a relatively significant relationship with the overall survival rate;

F. the relationship between clinical characteristics and the clone/sub-clone events is complex, the N and Stage stages have significant difference in clone events number, and the recurrent group and the non-recurrent group also have significant difference in clone events number;

G. there was a significant correlation between the clone events and TMB/Neoantegens.

2. The method of claim 1, wherein the method comprises the steps of: the data sources in S1 are cancer gene map (TCGA) transcriptome sequencing technology (RNA-Seq) data, copy number variation data, and clinical follow-up information data.

3. The method of claim 1, wherein the method comprises the steps of: the definition of the hypermutation sample in S2 is that the number of mutations per Mb exceeds 11.4.

4. The method of claim 1, wherein the method comprises the steps of: in step a of S3, for each SCNV and SSNV, p (CCF greater than 0.85) is classified as a clonal event according to the value p being greater than 0.5, and vice versa, where CCF is a cancer cell fraction.

5. The method of claim 1, wherein the method comprises the steps of: in S4, using ABSOLUTE algorithm to evaluate copy number variation of COAD, first screening CNVs obtained by ABSOLUTE, and reserving CNV intervals satisfying the following conditions:

1) modal CN <2(Loss) or modal CN >2 (Gain);

2) CNV interval <1 Mb.

Technical Field

The invention relates to the technical field of genome variation, in particular to a method for researching the evolution relation of colon adenocarcinoma genome variation and tumors.

Background

The development of cancer is driven by the gradual accumulation of somatic changes, and mutations acquired at different stages of tumor evolution may be associated with different clinical outcomes, a large number of such changes exhibited by cancer cells, often driven by defects in the DNA repair pathway or by external mutagens (e.g., smoking or uv radiation), highly altered cells are called genomic instability, and the main consequence of genomic instability is that a single tumor often consists of cells (subclones) that have accumulated different changes, a diversity called intratumoral heterogeneity.

Since each tumor is a complex of multiple clones that may have a large impact on tumor metastasis and therapeutic response, there is a need to study the relationship between genomic instability and intratumoral heterogeneity.

Disclosure of Invention

Technical problem to be solved

Aiming at the defects of the prior art, the invention provides a method for researching the genomic variation of the colon adenocarcinoma and the tumor evolution relation, has the advantages of utilizing the tumor clone phylogenetic data to further evaluate the relation between heterogeneity and genomic instability in tumors and the like, and solves the problems provided by the background technology.

(II) technical scheme

In order to realize the purpose of further evaluating the relation between heterogeneity in tumors and genomic instability by using the tumor clone phylogenetic data, the invention provides the following technical scheme: a method for researching the evolution relation between colon adenocarcinoma genomic variation and tumors comprises the following steps:

s1, downloading data;

s2, preprocessing data:

pretreatment of TCGA data:

A. clinical sample data of TCGA were pretreated as follows:

a. removing samples without clinical information or with survival time less than 30 days;

b. removing normal tissue sample data;

B. mutation data were pre-processed as follows:

a. removing the silent and intron mutation sites;

b. removing the super-mutated samples (hyper-mutated samples);

C. CNV data were preprocessed as follows:

a. removing data with CNV interval >1 Mb;

b. v22 matches CNV intervals to corresponding genes using code version GRh 38; 319 samples meeting the conditions are finally obtained;

s3, COAD genome compilation features:

a. the samples were evaluated for purity (purity), chromosome ploidy (ploidy):

firstly, calculating the purity, ploidy and ABSOLUTE DNA copy number of each sample by using an ABSOLUTE algorithm; the second step integrates all possible mutation polymorphisms (m: 1to local absolute copy number) and p (AF) to evaluate the probability of CCF;

b. analyzing genome mutation signature;

identifying SNV signatures by using a brunet algorithm in NMF, respectively evaluating diphenic and rss when k is 2-10 (namely SNVsignature is 2-10), selecting k is 3 (namely 3 SNV signatures) as an optimal number according to the two indexes, obtaining 3 SNV signatures according to a trinucleotide mutation mode, defining the SNV signatures as signatures A-C, dividing SSNV into click events and sub-click events based on CCF, comparing the contributions of the two types of SSNV in the 3 click signatures, calculating the contribution degree of the signatures A-C in each sample, and calculating the similarity between the 3 signatures and COSMIC signatures (using correlation coefficients for representation);

s4, clone and subclone variation analysis:

the method comprises the steps of utilizing the coordinate position of code v22 to enable CNV to correspond to a specific gene, integrating the clone/sub-clone events data of SCNV and SSNV, analyzing the clone and sub-clone structures of a COAD sample, taking the genes with the SCNV and SSNV occurrence frequency of more than 5% in all samples, and respectively obtaining 47 SCNV genes with the highest occurrence frequency and 76 genes with SSNV occurrence frequency;

s5, analyzing the time sequence relation of mutation and tumor evolution:

firstly, sequencing 47 SCNAs and 76 SSNVs with the highest mutation frequency according to a CCF value, constructing a time sequence in which mutation possibly occurs in the tumor evolution process by utilizing clone events and sub-clone events from the same sample, establishing a connecting line (edge) between the clone events and the sub-clone events when the clone events and the sub-clone events occur in the same sample, carrying out the same analysis on all samples, finally obtaining a gene network with specific orientation, wherein nodes (nodes) of the network represent genes, edges (edge) represent that the relationship between the clone and the sub-clone exists between the two genes, carrying out enrichment analysis according to the number of in-edges and out-edges of each node (gene), carrying out significance test on fisher exact test, calculating FDR by a BH method, and carrying out FDR of 0.05 and the number of out-edges in-edges for the SSNVs and the SCNVs, genes defined as occurring Early (Early); the same node with FDR <0.05 and in-edges > out-edges, was defined as the gene that appeared Late (Late); the genes in other cases are defined as genes appearing in the middle (Intermediate), wherein the temporal order of the SCNA and the SSNV is respectively deduced, and part of conflicting edges are removed, so that 115 SCNA calls and 2201 SSNV calls are finally obtained;

s6, relationship of clonal or subclonal events to prognosis:

analyzing the prognostic relation (mutation number is greater than 5%) between the clone states of 47 high-frequency SCNA genes and 76 high-frequency SSNV genes and the total survival by using a kaplan-meier method, and taking log rank test p <0.1 to respectively obtain 1 early-stage gene, 12 middle-stage genes and 1 late-stage gene which have a relatively obvious relation with the total survival rate;

s7, relationship of clonal or subcloning events to clinical characteristics:

obtaining the clone events of SCNA and SSNV based on the method, combining the clinical information provided by TCGA, analyzing the relation between the clone events and sub-clone events and clinical characteristics, and analyzing the difference of clone/sub-clone events on TNM, stage, age, genter and tissue type;

s8, relationship of cloning or subcloning events to Tumor Mutational Burden (TMB)/neoantigen (Neoantigens);

analyzing the relationship between the clone/sub-clone events and TMB and neoantigen, evaluating the correlation between them by using a Sperarman method, and further evaluating the clone/sub-clone difference between a sample (YES) with MMR gene mutation and a sample (NO) without mutation;

s9, summarize:

A. mutation signature analysis the mutations of the TCGA COAD samples were divided into 3 significantly different signatures by comparison to 30 known signatures in codic;

B. identifying the COAD mutant clonal/sub-clonal status by clonal/sub-clonal events analysis of SCNA and SSNV;

C. taking the genes mutated in more than 5% of samples to respectively obtain 47 SCNA and 76 SSNV high-frequency mutated genes;

D. analyzing the relation between the mutation and the tumor evolution by utilizing clone/sub-clone events to obtain a group of genes with mutation in early, intermediate and late stages of the tumor evolution;

E. using a single-factor cox regression analysis method to research the relationship between the state of the clone/sub-clone and the prognosis, and respectively obtaining 1 early gene, 12 middle genes and 1 late gene which have a relatively significant relationship with the overall survival rate;

F. the relationship between clinical characteristics and the clone/sub-clone events is complex, the N and Stage stages have significant difference in clone events number, and the recurrent group and the non-recurrent group also have significant difference in clone events number;

G. there was a significant correlation between the clone events and TMB/Neoantegens.

Preferably, the data in S1 are derived from cancer gene profiling (TCGA) transcriptome sequencing technology (RNA-Seq) data, copy number variation data, and clinical follow-up information data.

Preferably, the definition of the hypermutation sample in S2 is more than 11.4 mutations per Mb.

Preferably, in step a of S3, for each SCNV and SSNV, p (CCF greater than 0.85) is classified as a clonal events according to its value greater than 0.5, and vice versa, where CCF is cancer cell fraction.

Preferably, in S4, using the ABSOLUTE algorithm to evaluate copy number variation of COAD, first screening the CNVs obtained by ABSOLUTE, and reserving CNV intervals satisfying the following conditions:

1) modal CN <2(Loss) or modal CN >2 (Gain);

2) CNV interval <1 Mb.

(III) advantageous effects

Compared with the prior art, the invention has the following beneficial effects:

the method for researching the colon adenocarcinoma genomic variation and tumor evolution relation collects data of a colon adenocarcinoma (COAD) genome from TCGA, deduces the clone (subclone) composition of each tumor sample from a group library of somatic mutation (SSNV) and copy number variation (SCNV) of the colon adenocarcinoma, and further evaluates the relation between heterogeneity and genomic instability in tumors by using the phylogenetic data of the tumor clones, so that the characteristics can be used as a marker of potential tumor complexity.

Drawings

FIG. 1 is a flow chart of the research method of the present invention.

Detailed Description

The technical solutions in the embodiments of the present invention will be clearly and completely described below with reference to the drawings in the embodiments of the present invention, and it is obvious that the described embodiments are only a part of the embodiments of the present invention, and not all of the embodiments. All other embodiments, which can be derived by a person skilled in the art from the embodiments given herein without making any creative effort, shall fall within the protection scope of the present invention.

Referring to fig. 1, a method for studying the relationship between genomic variation of colon adenocarcinoma and tumor evolution includes the following steps:

s1, downloading data;

further, the data in S1 are derived from cancer gene map (TCGA) transcriptome sequencing technology (RNA-Seq) data, copy number variation data, and clinical follow-up information data;

s2, preprocessing data:

pretreatment of TCGA data:

D. clinical sample data of TCGA were pretreated as follows:

a. removing samples without clinical information or with survival time less than 30 days;

b. removing normal tissue sample data;

E. mutation data were pre-processed as follows:

a. removing the silent and intron mutation sites;

b. removing the super-mutated samples (hyper-mutated samples);

F. CNV data were preprocessed as follows:

a. removing data with CNV interval >1 Mb;

b. v22 matches CNV intervals to corresponding genes using code version GRh 38;

319 samples meeting the conditions are finally obtained;

further, the definition of the hypermutation sample in S2 is that the number of mutations per Mb exceeds 11.4;

s3, COAD genome compilation features:

a. the samples were evaluated for purity (purity), chromosome ploidy (ploidy):

calculating the purity, ploidy and ABSOLUTE DNA copy number of each sample by using an ABSOLUTE algorithm, evaluating the probability distribution density of CCF by using the number of mutant reads, the number of non-mutant reads, the tomor purity and local CNV for each mutant site (including CNV and SNV), firstly calculating the tomor DNProport, and then calculating the probability of the Allel Fraction (AF) according to the two-item probability density distribution, wherein the influence of the normal component is removed to obtain p (AF); secondly, integrating all possible probabilistic multiplicities (m: 1to local absolute copy number) and p (AF) to evaluate the probability of CCF, and finally obtaining 24486/6046 clonal/sub-clonal events (80.2%) for SSNV and 96836/3223clonal/sub-clonal events (96.8%) for SCNV respectively, and comparing the clonal and sub-clonal number distributions of SSNV and SCNV to find that the sub-clonal number of SCNV is remarkably small (fisher test p <1 e-5);

further, in step a of S3, for each SCNV and SSNV, p (CCF greater than 0.85) is classified as a clonal events according to its value greater than 0.5, otherwise, it is classified as a sub-clonal event, where CCF is cancer cell fraction;

b. analyzing genome mutation signature;

the mutant Signature can reflect the potential influence of previous exposure to different carcinogens and some characteristic changes related to DNA damage and repair processes in COAD tumors, SNVsignatures are identified by using a brunet algorithm in NMF, in order to ensure that the optimal number of SNV signatures are identified, the subphenotics and rss when k is 2-10 (namely that the SNVsignatures are 2-10) are respectively evaluated, k is 3 or 3 SNV signatures are selected according to the two indexes, 3 SNV signatures are obtained according to a trinucleotide mutation pattern, the Signature is defined as Signature A-C, the six base mutation patterns of Signature A have smaller difference in composition ratio, no significant prominent trinucleotide mutation pattern is observed, but relative to B and C, T > A, T > C and T > G, the Signature A has a significantly higher ratio, and the Signature B and C mainly constitute T > C, however, the 16 trinucleotide mutation patterns of C > T are generally higher in composition ratio on signature B, and only partial trinucleotide mutation patterns of C > T are prominent in signature C, and in addition, the C > G mutation patterns are almost not existed on signature C, SSNV is divided into cyclic events and sub-cyclic events based on CCF, the contributions of the two types of SSNV in 3 mutant signatures are compared, and a significant difference is observed, for example, the sub-cyclic event ratio of signature A is significantly higher than the cyclic events, which indicates that the cyclic events and the sub-cyclic events have preference in mutation patterns, in order to evaluate the heterogeneity of the mutant signatures, the contribution degree of signature A-C in each sample is calculated (the larger value indicates that the signature accounts in the sample is higher), the signature C is found to be generally higher in most samples, while the signature A and the signature C account for a specific sample only in a high ratio, using the known 30 mutant signatures provided by COSMIC, 3 signatures were calculated to have the best similarity to the cosmidcminutional signatures (expressed using correlation coefficients), Signature a was found to have the best similarity to Signature 3 and Signature 22, Signature B was found to have the highest similarity to Signature 2, Signature 7 and Signature 11, Signature C has a high similarity to Signature 1, Signature 2 has been found in all 22 cancer types but is most commonly found in cervical and bladder cancers, the occurrence of Signature 2 is related to the activity of AID/APOBEC family cytidine deaminases and is commonly found in the same sample with Signature 13, consistent with observations that signatures 2 and 13 are common in some cancers with local hypermutation, the presence of partial hypermutation samples in COAD court of TCGA was also observed (mutation of 80 samples >11.4/Mb type), Signature 1 was found in all cancers, is formed by endogenous mutations triggered by spontaneous deamination of 5-methylcytosine, the number of which mutations is correlated with the age at which cancer is diagnosed;

s4, clone and subclone variation analysis:

for subsequent analysis, the coordinate position of the gene code v22 is used to correspond CNV to a specific gene, the clone/sub-clone events data of SCNV and SSNV are integrated, the clone and sub-clone structures of the COAD sample are analyzed, genes with the number of occurrences of SCNV and SSNV of more than 5% in all samples are taken, 47 SCNV genes with the highest occurrence frequency and 76 genes with SSNV are obtained respectively, and from the results, the genes such as TP53, TTN, APC, KRAS and PIK3CA have the highest mutation number (20%) among samples and are mainly clone events, which indicates that the genes are more likely to occur as early mutation events; the RBL1 gene has the highest CNV (gain) number (29%) among samples, and is mainly clonal events;

further, in S4, using ABSOLUTE algorithm to evaluate copy number variation of COAD, first screening CNVs obtained by ABSOLUTE, and reserving CNV intervals satisfying the following conditions:

1) modal CN <2(Loss) or modal CN >2 (Gain);

2) CNV interval <1 Mb;

s5, analyzing the time sequence relation of mutation and tumor evolution:

in order to analyze mutations involved in the occurrence and development of COAD, firstly, according to CCF values, 47 SCNAs and 76 SSNVs with highest mutation frequency are ranked, the CCF of the SCNVs is obviously higher than that of SNV (rank test p <1e-5, mean ccc: 0.9326/0.9154) and the SCNVs are mainly Gain and have little Loss ratio (Gain/Loss: 1068/71), the time sequence of possible occurrence of the mutations in the tumor evolution process is constructed by using the clone events and sub-clone events from the same sample, when the clone events and sub-clone events occur in the same sample, a connecting line (edge) is established between the clone events and sub-clone events, the same analysis is carried out on all samples, and finally a gene network with specific orientation is obtained, the nodes (node) of the network represent genes, and the edge (edge) represents that the relationship between the clone and sub-clone exists between two genes, enrichment analysis is carried out according to the in-edges and out-edges number of each node (gene), the fisherexact test is carried out for significance test, the BH method is used for calculating FDR, and the nodes (genes) of FDR <0.05 and out-edges > in-edges are defined as genes appearing in Early stage (Early) for SSNV and SCNV; the same node with FDR <0.05 and in-edges > out-edges, was defined as the gene that appeared Late (Late); the genes in other cases are defined as genes appearing in the middle stage (Intermediate), and since the genes for calculating the occurrence of the SCNA are obtained from the CNV interval and the gff interval of the chip data, which may cause some SCNAs of the genes to be false positive and affect the result of SSNV, the temporal orders of the SCNAs and the SSNV are inferred, part of the conflicting edges are removed, and finally 115 SCNA calls and 2201 SSNV calls are obtained, and in the temporal order result of the SSNV, we find that the time of occurrence of TP53, KRAS, APC, etc. in COAD is earliest and may be used as driver events of COAD; while CSMD3, TTN, ERBB4, etc. appeared at the latest in COAD, presumably related to the progression of COAD, no genes were obtained in the temporal order results for SCNA that were defined as early, possibly related to significantly fewer sub-clonal events for SCNA (47 SCNA clonal/sub-clonal: 1038/101, significantly less than 2219/383 for SSNV), 10 genes appeared in the middle of SCNA, 3 genes appeared in the late, including the RBL1 gene with high mutation rate in COAD;

s6, relationship of clonal or subclonal events to prognosis:

in order to research the influence of the clonal or sub-clonal events on the survival of patients, the kaplan-meier method is used for analyzing the prognostic relation (mutation number > 5%) between the clonal states of 47 high-frequency SCNA genes and 76 high-frequency SSNV genes and the total survival, log rank test p is taken as 0.1, 1 early gene, 12 middle genes and 1 late gene which are more significant in relation to the total survival rate are obtained respectively, the clonal events of early gene APC have a significant influence on the prognosis than sub-clonal events, the clonal events and sub-clonal events of the middle genes both show a significant influence on the OS prognosis, and the ERBB4 of the late gene is clonal/sub-clonal events which has a poorer prognosis corresponding to the OS are obtained from the overall survival rate KM curve;

s7, relationship of clonal or subcloning events to clinical characteristics:

obtaining the clone events of SCNA and SSNV based on the method, analyzing the relation between the clone events and sub-clone events and clinical characteristics by combining the clinical information provided by TCGA, analyzing the difference of the clone/sub-clone events on TNM, stage, age, genter and tissue types, and as a result, the N, Stage stage has a significant difference in the number of clone events and whether the recurrence is significant on the clone events, which seems to indicate that the clone events have the COAD recurrence corresponding to high risk, and the TM stage, age and genter factors do not observe the significant difference of the clone/sub-clone events;

s8, relationship of cloning or subcloning events to Tumor Mutational Burden (TMB)/neoantigen (Neoantigens);

TMB and neoantigen as important biomarkers for immune checkpoint treatment, and the occurrence of the clonal/sub-clonal events has an important influence on the occurrence and the progression of tumors as well, therefore, the relationship between the clonal/sub-clonal events and the TMB and neoantigen is analyzed, since the distribution of the TMB, the neoantigen and the clonal/sub-clonal events does not meet the distribution (shapetest p 1e-5), the correlation between the same is evaluated by using the spaemann method, the significance test shows that the occurrence of the clonal events has a very significant relationship with the TMB and the neoantigen, but the correlation between the sub-clonal and the neoantigen is weak, which seems to indicate that the occurrence of the clonal events has an important contribution to the tumor mutation load and the new antigen, the occurrence of the mutation of the critical gene (allele, R) of the mismatch repair system has an important influence on the occurrence of the genomic mutation, and the occurrence of the MMR of the genomic mutation has an important influence on the MMonant sample (MMR) occurrence of the genomic mutation and MMonant mutation, the clinical events in the NO group are higher than those in the YES group, but NO significance is observed in the sub-clinical events, the TMB and neoantigens in the NO group are extremely higher than those in the YES group, and although NO significant difference exists between the TMB and neoantigens in the NO group, the prognosis of the YES group is better than that of the NO group, which is consistent with the better prognosis of the MMR-deficient patients in the existing research;

s9, summarize:

the mutation signature analysis divides the mutations of the TCGA COAD sample into 3 signatures with remarkable difference, and the comparison with 30 known signatures in COSMIC shows that the similarity of signature A with signature 3 and signature 22 is the best, the similarity of signature B with signature 2, signature 7 and signature 11 and the similarity of signature C with signature 1 are high;

identifying the COAD mutant clone/sub-clone status by analyzing clone/sub-clone events of the SCNA and the SSNV, wherein 3 signatures have significant differences in clone/sub-clone status;

taking genes mutated in more than 5% of samples to obtain 47 SCNA and 76 SSNV high-frequency mutated genes respectively, wherein the genes such as TP53, KRAS and APC have the highest mutation number (20%) among the samples, and are mainly clonalevents, and the RBL1 gene has the highest CNV (gain) number (29%) among the samples, and is also mainly clonal vents;

analyzing the relation between mutation and tumor evolution by utilizing clone/sub-clone events to obtain a group of genes with mutation in early, intermediate and late stages of tumor evolution, wherein the time of the gene mutation possibly has important influence on the occurrence and the progression of tumors;

the relation between the clonal/sub-clonal state and the prognosis is researched by using a single-factor cox regression analysis method, 1 early gene, 12 metaphase genes and 1 late gene which have a more significant relation with the overall survival rate are respectively obtained, and different characteristics are shown by the impact of the clonal/sub-clonal state of the genes on the prognosis;

the clinical features are complicated in relation to the clonal/sub-clonal events, the N and Stage stages have significant differences in the number of clonal events, and the relapsed group and the non-relapsed group also have significant differences in the number of clonal events, indicating that these factors have important relationships with the occurrence of mutations. No significant differences were observed for the clone/sub-clone events in the TM, age and gen factors;

there was a significant correlation between the clonal events and TMB/Neoantegens, indicating that the presence of the clonal events contributes significantly to TMB and Neoantegens, and that the presence of MMR deficient samples was significantly less than the MMR normal samples in both the clonal/sub-clonal events, TMB and Neoantegens, as compared to defects in the mismatch repair system, which may be associated with MMR deficiency as a favorable prognostic factor for COAD.

Although embodiments of the present invention have been shown and described, it will be appreciated by those skilled in the art that changes, modifications, substitutions and alterations can be made in these embodiments without departing from the principles and spirit of the invention, the scope of which is defined in the appended claims and their equivalents.

12页详细技术资料下载
上一篇:一种医用注射器针头装配设备
下一篇:一种基于遗传疾病致病基因的分析系统及其应用

网友询问留言

已有0条留言

还没有人留言评论。精彩留言会获得点赞!

精彩留言,会给你点赞!