Method and system for detecting tumor heterogeneity degree

文档序号:1312732 发布日期:2020-07-10 浏览:34次 中文

阅读说明:本技术 一种检测肿瘤异质性程度的方法及系统 (Method and system for detecting tumor heterogeneity degree ) 是由 金皓玄 方文峰 陈龙昀 苏小凡 廖裕威 于 2020-03-27 设计创作,主要内容包括:本发明提供一种检测肿瘤异质性程度的方法及系统,该方法包括:获取肿瘤组织样本和对照样本的测序数据,进行体细胞变异检测,获得体细胞变异位点;将体细胞变异位点按簇进行聚类;根据聚类结果,判断主克隆突变簇和亚克隆突变簇;计算肿瘤组织样本中亚克隆突变簇中的体细胞变异数量占所有体细胞变异数量的比值,该比值即为肿瘤异质性数值。本发明充分利用样本各突变位点检测的突变频率,估算出对应肿瘤克隆的细胞占比,从而推算出肿瘤异质性程度的具体数值,为后续预测免疫治疗疗效提供了一个数值上可以参考的依据。(The invention provides a method and a system for detecting tumor heterogeneity degree, wherein the method comprises the following steps: obtaining sequencing data of a tumor tissue sample and a control sample, and carrying out somatic mutation detection to obtain a somatic mutation site; clustering somatic cell mutation sites according to clusters; judging a main cloning mutant cluster and a sub-cloning mutant cluster according to the clustering result; and calculating the ratio of the number of somatic variations in the subclone mutation cluster in the tumor tissue sample to the number of all somatic variations, wherein the ratio is the tumor heterogeneity value. The invention fully utilizes the mutation frequency detected by each mutation site of the sample to estimate the cell proportion of the corresponding tumor clone, thereby calculating the specific numerical value of the heterogeneity degree of the tumor and providing a numerically referable basis for the subsequent prediction of the curative effect of the immunotherapy.)

1. A method for detecting the degree of tumor heterogeneity, comprising the steps of:

obtaining sequencing data of the same tumor tissue sample and a control sample from each subject, and carrying out somatic mutation detection on the tumor tissue sample and the control sample to obtain a somatic mutation site;

clustering the somatic cell variant sites according to clusters;

according to the clustering result, if the cluster with the highest tumor cell cluster proportion contains two or more variant abnormal results, judging the cluster with the highest tumor cell cluster proportion as a main clone mutation cluster, and judging the rest mutation clusters as sub clone mutation clusters; if the cluster with the highest tumor cell cluster proportion only contains one abnormal result of variation, judging the mutant cluster with the highest tumor cell cluster proportion and the second highest tumor cell cluster proportion as a main cloning mutant cluster at the same time, and judging the rest mutant clusters as sub-cloning mutant clusters;

and calculating the ratio of the number of somatic variations in the subclone mutation cluster in the tumor tissue sample to the number of all somatic variations, wherein the ratio is a tumor heterogeneity value.

2. The method of claim 1, wherein the somatic mutation sites are clustered by calculating mutation allele sequencing depth, mutation allele frequency, mutation site copy number and tumor purity value according to the somatic mutation sites, and performing cluster analysis according to the somatic mutation sites, the mutation allele sequencing depth, the mutation allele frequency, the mutation site copy number and the tumor purity value.

3. The method of claim 2, wherein the variant allelic sequencing depth ViThe number of variant sequences of somatic variation at corresponding sites in sequencing data is referred to;

the allelic frequency of the variationRiRefers to the reference allelic sequencing depth, i.e. the number of normal sequences without somatic variation at the corresponding sites in the sequencing data;

the tumor purity value refers to the ratio Pur of the number of tumor cells in the total number of the tumor tissue sample cells, the value range is (0, 1), and the tumor cells refer to the sum of all cells with somatic cell variation;

the calculation process of the copy number of the variant site is as follows: according to somatic mutation site variCopy number variation of the region CNViCalculating the somatic mutation site variReference copy number of the region in NCNiAnd actual total copy number TCNiWherein:

and obtaining somatic mutation sites variThe two chromosomes on which the copy number variation CNV is allele-specifici,major、CNVi,minorWherein CNVi,major≥CNVi,minor

Thereby calculating the actual allele-specific copy number CNi,major、CNi,minor

4. The method of claim 1, wherein the cells in the tumor tissue sample of the subject are classified into three categories for any type of somatic variation in cluster analysis: normal cell N, tumor cell T not carrying said somatic variationwtAnd tumor cells T carrying said somatic variationsmuutTumor cells T carrying said somatic variationsmutAll tumor cells (T)mut+Twt) Is called the somatic cell changeIf the proportion of the variant tumor cells of two or more somatic variations meets the requirement in the same distribution model, the variations in the same distribution model are endowed with the same cluster label and are clustered into a cluster, namely a clone;

each subject's per cluster tag Cj(j-1, …, c) all having a proportion of tumor cell clusters corresponding thereto

If the tumor cell cluster ratio is highAbnormal results of two or more variation contained in the highest cluster, and the ratio of the tumor cell clusters is determinedThe highest cluster is judged as the main clone mutation cluster Cmain(ii) a If the tumor cell cluster ratio is highIf the highest cluster only contains one abnormal result of variation, the cluster with the highest tumor cell cluster proportion and the next highest tumor cell cluster proportion are simultaneously judged as the main clone mutant cluster Cmain

Wherein j is 1,.. multidot.c, k is 1,. multidot.c, and k is not equal to j;

simultaneously, the remaining clusters are judged as a subcloned mutant cluster Csub

Csub=Cl,l∈{1,...,c},l≠j,l≠k;

Statistical master cloning of mutant Cluster CmainMiddle somatic mutation site variNumber n ofmainAnd subcloning of mutant cluster CsubMiddle somatic mutation site variNumber n ofsubCalculating a tumor heterogeneity value ITH, which is the ratio of the number of variations in a subcloned cluster to the number of all variations:

5. the method of claim 1, further comprising: setting a threshold value according to the tumor heterogeneity value ITH, judging the subjects corresponding to the samples smaller than or equal to the threshold value as low-risk subjects, and judging the cases corresponding to the samples larger than the threshold value as high-risk subjects.

6. The method of claim 5, wherein the median of the tumor heterogeneity values of all subjects is used as a threshold for determining high/low tumor heterogeneity of each subject, and subjects with clone levels below the threshold have low tumor heterogeneity, whereas subjects with high tumor heterogeneity are selected.

7. The method of claim 1, wherein the subject is a patient with a solid tumor, preferably a lung cancer, nasopharyngeal cancer, or melanoma;

and/or, the somatic variation is selected from at least one of point mutation, insertion/deletion, structural variation, copy number variation;

and/or the sequencing method of the tumor tissue sample and the control sample is whole genome sequencing, whole exome sequencing or probe capture sequencing, preferably whole exome sequencing.

8. A system for detecting the degree of tumor heterogeneity, the system comprising:

a data acquisition module for acquiring sequencing data of the same tumor tissue sample and control sample from each subject;

the somatic mutation detection module is used for carrying out somatic mutation detection on the tumor tissue sample and the control sample to obtain a somatic mutation site;

the clustering module is used for clustering the somatic cell mutation sites according to clusters;

the main clone and sub-clone judging module is used for judging the cluster with the highest proportion of the tumor cell clusters as a main clone mutation cluster and the rest mutation clusters as sub-clone mutation clusters according to the clustering result if the cluster with the highest proportion of the tumor cell clusters contains two or more variant abnormal results; if the cluster with the highest tumor cell cluster proportion only contains one abnormal result of variation, judging the mutant cluster with the highest tumor cell cluster proportion and the second highest tumor cell cluster proportion as a main cloning mutant cluster at the same time, and judging the rest mutant clusters as sub-cloning mutant clusters;

tumor heterogeneity degree calculation module: and calculating the ratio of the number of somatic variations in the subcloned mutant clusters in the tumor tissue sample to the number of all somatic variations, wherein the ratio is a tumor heterogeneity value.

9. An apparatus for detecting the degree of tumor heterogeneity, the apparatus comprising:

a memory for storing a program;

a processor for implementing the method of any one of claims 1 to 7 by executing a program stored by the memory.

10. A computer-readable storage medium, characterized by comprising a program which is executable by a processor to implement the method of any one of claims 1-7.

Technical Field

The invention relates to the technical field of bioinformatics, in particular to a method and a system for detecting tumor heterogeneity degree.

Background

Cancer is one of the most major non-infectious diseases in the world and is a disease with a high mortality rate, and in China, nearly 430 thousands of people are diagnosed with cancer every year and over 280 thousands of people die from cancer.

The anti-tumor targeted drug and the immune checkpoint inhibitor are effective means for treating cancers at present, and currently, compared with accepted immune checkpoint inhibitor anti-PD- (L) 1 curative effect evaluation potential indexes, such as TMB (tumor mutation load), MSI (microsatellite instability) and the like, can not completely screen out patients who benefit the immune checkpoint inhibitor.

Currently, there are no authoritative methods for quantifying and calculating the ITH, and the ITH indexes calculated by the methods can not be verified or can be verified in a small data set to evaluate the curative effect of the immune checkpoint inhibitor anti-PD- (L) 1.

For example, Chinese patent application publication No. CN106676178A discloses a method and apparatus for assessing tumor heterogeneity, wherein the method comprises 1) sequencing cfDNA of patients (preferably high throughput sequencing), obtaining sequencing information, 2) using the sequencing information to determine ctDNA variation, determining the number of mutations in the region based on the sequencing information and the determined ctDNA variation, calculating the allele frequency of the variation, determining the actual total copy number of the region where the variation is located, calculating the ratio of ctDNA to cfDNA, 3) clustering the ctDNA variation based on the ratio determined in step 2) and the sequencing information and copy number information of the ctDNA variation, determining each cluster obtained by clustering as a molecular clone, obtaining a clustered clone level, 4) assessing tumor heterogeneity of the patients according to their clone level, the patients having more clone levels and having more tumor heterogeneity, the main defect that the amount of ctDNA in blood is lower, about 1% or even 0.01% of the whole cfDNA [1] DNA F, K, M, C, D, DNA of the amount of mutation detected by the expression of protein I, protein II, protein I, protein II, protein I, protein II, protein I, protein II, protein I, protein II, protein I, protein II, protein I, protein II, protein I, protein II, protein I, protein II, protein I, protein II, protein I, protein II, protein I, protein II, III, protein I, III, protein I, protein II, protein I, III, protein I, III, protein II, III.

Thus, the prior art fails to detect the degree of tumor heterogeneity.

Disclosure of Invention

The invention mainly solves the technical problems that the curative effect evaluation of an immune checkpoint inhibitor anti-PD- (L) 1 lacks enough potential indexes, and how to enable an ITH index to evaluate the curative effect of the immune checkpoint inhibitor PD-1 in a larger scale and a wider range.

According to a first aspect, there is provided in one embodiment a method of detecting the degree of tumor heterogeneity, comprising the steps of:

obtaining sequencing data of the same tumor tissue sample and a control sample from each subject, and carrying out somatic mutation detection on the tumor tissue sample and the control sample to obtain a somatic mutation site;

clustering the somatic cell variant sites according to clusters;

according to the clustering result, if the cluster with the highest tumor cell cluster proportion contains two or more variant abnormal results, judging the cluster with the highest tumor cell cluster proportion as a main clone mutation cluster, and judging the rest mutation clusters as sub clone mutation clusters; if the cluster with the highest tumor cell cluster proportion only contains one abnormal result of variation, judging the mutant cluster with the highest tumor cell cluster proportion and the second highest tumor cell cluster proportion as a main cloning mutant cluster at the same time, and judging the rest mutant clusters as sub-cloning mutant clusters;

and calculating the ratio of the number of somatic variations in the subclone mutation cluster in the tumor tissue sample to the number of all somatic variations, wherein the ratio is a tumor heterogeneity value.

As will be appreciated by those skilled in the art, a somatic mutation may also be referred to as a somatic mutation, and the site of the mutation may also be referred to as a mutation site.

It will be understood by those skilled in the art that the same tumor tissue sample and control sample refer to a tumor tissue sample and a control sample derived from the same subject.

In some embodiments, a variant allelic sequencing depth, a variant allelic frequency, a variant locus copy number, a tumor purity value is calculated from the somatic variant loci, and the somatic variant loci are clustered by performing cluster analysis based on the somatic variant loci and the variant allelic sequencing depth, the variant allelic frequency, the variant locus copy number, the tumor purity value.

In some embodiments, the variant allelic sequencing depth ViThe number of variant sequences of somatic variation at corresponding sites in sequencing data is referred to;

the allelic frequency of the variationRiRefers to the reference allelic sequencing depth, i.e., the number of normal sequences in the sequencing data in which the somatic variation did not occur at the corresponding site;

the tumor purity value refers to the ratio Pur of the number of tumor cells in the total number of the tumor tissue sample cells, the value range is (0, 1), and the tumor cells refer to the sum of all cells with somatic cell variation;

the calculation process of the copy number of the variant site is as follows: according to somatic mutation site variCopy number variation of the region CNViCalculating the somatic mutation site variReference copy number of the region in NCNiAnd actual total copy number NCNiWherein:

and obtaining somatic mutation sites variThe two chromosomes on which the copy number variation CNV is allele-specifici,major、CNVi,minorWherein CNVi,major≥CNVi,minor

Thereby calculating the factThe number of actual allele-specific copies CNi,major、CNi,minor

In some embodiments, the cells in the subject's tumor tissue sample are classified into three categories, normal cells (N), tumor cells that do not carry the variation (T), for any type of somatic variation at the time of cluster analysiswt) And tumor cells (T) carrying the mutationmut) Tumor cells (T) carrying said somatic variationsmut) Account for all tumor cells (T)mut+Twt) If the proportion of the variant tumor cells of two or more variant sites meets the requirement in the same distribution model, the variants in the same distribution model are endowed with the same cluster label and clustered into a cluster, which is called a clone;

each subject's per cluster tag Cj(j-1, …, c) all having a proportion of tumor cell clusters corresponding thereto

If the tumor cell cluster ratio is highAbnormal results in which the highest cluster contains two or more mutations, the tumor cell cluster ratio is determinedThe highest cluster is judged as the main clone mutation cluster Cmain(ii) a If the tumor cell cluster ratio is highExample (b)If the highest cluster only contains one abnormal result of mutation, the cluster with the highest tumor cell cluster proportion and the next highest tumor cell cluster proportion are simultaneously judged as the main clone mutation cluster Cmain

Wherein j is 1,.. multidot.c, k is 1,. multidot.c, and k is not equal to j;

simultaneously, the remaining clusters are judged as a subcloned mutant cluster Csub

Csub=Cl,l∈{1,...,c},l≠j,l≠k;

Statistical master cloning of mutant Cluster CmainMiddle somatic mutation site variNumber n ofmainAnd subcloning of mutant cluster CsubMiddle somatic mutation site variNumber n ofsubCalculating a tumor heterogeneity value ITH, which is the ratio of the number of somatic variation sites in a subcloned mutation cluster to the number of all somatic variation sites:

as will be appreciated by those skilled in the art, the somatic mutation site variMay also be referred to as variant vari

In some embodiments, further comprising: setting a threshold according to the ratio of the number of somatic variations in the subclone mutation cluster in the tumor tissue sample to the number of all somatic variations, judging the subject corresponding to the sample smaller than or equal to the threshold as a low-risk subject, and judging the case corresponding to the sample larger than the threshold as a high-risk subject.

In some embodiments, the median of the tumor heterogeneity values for all subjects is used as a threshold for determining high/low tumor heterogeneity for each subject, and subjects with clonal levels below this threshold have lower tumor heterogeneity, whereas subjects with higher tumor heterogeneity are considered.

In some embodiments, the subject is a solid tumor patient, preferably a lung cancer, nasopharyngeal cancer, or melanoma patient.

In some embodiments, the somatic variation is selected from at least one of a point mutation (SNV), an insertion/deletion (indel), a Structural Variation (SV), a Copy Number Variation (CNV). For example, in some embodiments, the reference signal may specifically be SNV, indel, in other embodiments, SNV, indel, SV, and in other embodiments, SNV, indel, SV, CNV may also be used.

In some embodiments, the sequencing method of the tumor tissue sample and the control sample is whole genome sequencing, whole exome sequencing or probe capture sequencing, preferably whole exome sequencing.

According to a second aspect, there is provided a system for detecting the degree of tumor heterogeneity, the system comprising:

a data acquisition module for acquiring sequencing data of the same tumor tissue sample and control sample from each subject;

the somatic mutation detection module is used for carrying out somatic mutation detection on the tumor tissue sample and the control sample to obtain a somatic mutation site;

the clustering module is used for clustering the somatic cell mutation sites according to clusters;

the main clone and sub-clone judging module is used for judging the cluster with the highest proportion of the tumor cell clusters as a main clone mutation cluster and the rest mutation clusters as sub-clone mutation clusters according to the clustering result if the cluster with the highest proportion of the tumor cell clusters contains two or more mutation abnormal results; if the cluster with the highest tumor cell cluster proportion only contains one abnormal result of variation, judging the mutant cluster with the highest tumor cell cluster proportion and the second highest tumor cell cluster proportion as a main cloning mutant cluster at the same time, and judging the rest mutant clusters as sub-cloning mutant clusters;

tumor heterogeneity degree calculation module: and calculating the ratio of the number of somatic variations in the subcloned mutant clusters in the tumor tissue sample to the number of all somatic variations, wherein the ratio is a tumor heterogeneity value.

According to a third aspect, there is provided an apparatus for detecting the degree of tumor heterogeneity, the apparatus comprising:

a memory for storing a program;

a processor for implementing the method as described in the first aspect by executing the program stored by the memory.

According to a fourth aspect, there is provided a computer readable storage medium comprising a program executable by a processor to implement the method of the first aspect.

The mutation detection software includes, but is not limited to VarScan and mutec, and specifically may be VarScan (v2.4.1), mutec (v4.0.12.0), and the like.

In some embodiments, the copy number detection software used includes, but is not limited to, CNVkit, ascatNgs, and specifically, CNVkit (v0.8.1), ascatNgs (v3.1.0).

In some embodiments, the clustering software is selected from PyClone software.

In some embodiments, other versions of PyClone software or other variant cluster analysis software, such as CloneSig (v0.1), may be employed.

According to the method and the system of the embodiment, the invention provides a specific method for calculating the tumor heterogeneity degree of a sample, which fully utilizes the mutation frequency detected at each mutation site of the sample to estimate the cell proportion of corresponding tumor clones, thereby calculating a specific numerical value of the tumor heterogeneity degree and providing a numerically referable basis for the subsequent prediction of the immunotherapy curative effect.

Drawings

FIG. 1 is a block diagram showing a flow chart of the detection of the degree of tumor heterogeneity in an embodiment of the present invention;

FIG. 2 shows the results of the measurement of tumor heterogeneity levels of the lung cancer cohort tissue samples in example 1 of the present invention.

FIG. 3 is a graph showing the prediction of the outcome of immunotherapy efficacy as a function of the degree of tumor heterogeneity in a lung cancer cohort tissue sample in example 1 of the present invention.

FIG. 4 shows the results of the detection of tumor heterogeneity of nasopharyngeal carcinoma cohort tissue samples in example 2.

FIG. 5 shows the prediction of the outcome of immunotherapy efficacy as a function of tumor heterogeneity in nasopharyngeal carcinoma cohort tissue samples according to example 2 of the present invention.

Detailed Description

The present invention will be described in further detail with reference to the following detailed description and accompanying drawings. Wherein like elements in different embodiments are numbered with like associated elements. In the following description, numerous details are set forth in order to provide a better understanding of the present application. However, those skilled in the art will readily recognize that some of the features may be omitted or replaced with other elements, materials, methods in different instances. In some instances, certain operations related to the present application have not been shown or described in detail in order to avoid obscuring the core of the present application from excessive description, and it is not necessary for those skilled in the art to describe these operations in detail, so that they may be fully understood from the description in the specification and the general knowledge in the art.

Furthermore, the features, operations, or characteristics described in the specification may be combined in any suitable manner to form various embodiments. Also, the various steps or actions in the method descriptions may be transposed or transposed in order, as will be apparent to one of ordinary skill in the art. Thus, the various sequences in the specification and drawings are for the purpose of describing certain embodiments only and are not intended to imply a required sequence unless otherwise indicated where such sequence must be followed.

The numbering of the components as such, e.g., "first", "second", etc., is used herein only to distinguish the objects as described, and does not have any sequential or technical meaning.

It must be noted that, as used in this specification and the appended claims, the singular forms "a," "an," and "the" include plural referents unless the content clearly dictates otherwise.

As used herein, the terms "comprises," "comprising," "includes," "including," "contains," "containing," and any variations thereof, are intended to cover a non-exclusive inclusion, such that a process, method, article, or composition of matter that comprises, or contains an element or list of elements does not include only those elements but may include other elements not expressly listed or inherent to such process, method, article, or composition of matter.

As used herein, the term "providing," as used in the context of a sample, is intended to encompass any and all means of obtaining the sample. The term encompasses all direct or indirect means of causing the presence of the sample in the practice of the claimed method.

As used herein, the term "patient" preferably refers to a human, but also encompasses other mammals. The terms "organism," "individual," "subject" or "patient" are used as synonyms for interchangeable use.

The invention is applicable to all cancer patients. The cancer may be a respiratory system cancer, or subtypes and stages thereof (phase), the respiratory system including the respiratory tract (nasal cavity, pharynx, larynx, trachea, bronchi) and lungs, and in some embodiments, the cancer includes, but is not limited to, lung cancer, nasopharyngeal cancer, laryngeal cancer, pharyngeal cancer, tracheal cancer, and the like. In some embodiments, the cancer may also include, but is not limited to, breast cancer, lung cancer, prostate cancer, colorectal cancer, brain cancer, esophageal cancer, gastric cancer, bladder cancer, pancreatic cancer, cervical cancer, head and neck cancer, ovarian cancer, melanoma, and multidrug resistant cancers; or its subtype and stage (phase).

In some embodiments, the subject may also be a solid tumor patient, including but not limited to a lung cancer, nasopharyngeal carcinoma, or melanoma patient.

As used herein, the term "tumor" refers to all tumor cell growth and proliferation, either malignant or benign, as well as all precancerous and cancerous cells and tissues. Such cancers include, but are not limited to, cancers of the respiratory tract including, but not limited to, lung cancer, nasopharyngeal cancer, laryngeal cancer, pharyngeal cancer, tracheal cancer, and the like; the cancer may also be other lymphoproliferative cancers, such as precursor B lymphoblastic leukemia/lymphoblastic lymphoma, follicular B cell non-Hodgkin's lymphoma, Hodgkin's lymphoma precursor T cell lymphoblastic leukemia/lymphoblastic lymphoma, neoplasms of immature T cells, neoplasms of T cells after peripheral thymus, T cell prolymphocytic leukemia, peripheral T cell lymphoma, undefined anaplastic large cell lymphoma, adult T cell leukemia/lymphoma, chronic lymphocytic leukemia, mantle cell lymphoma, follicular lymphoma, marginal zone lymphoma, hairy cell leukemia, diffuse large B cell lymphoma, Burkitt's lymphoma, lymphoplasmacytic lymphoma, precursor T lymphoblastic leukemia/lymphoblastic lymphoma, T cell prolymphocytic leukemia, angioimmunoblastic lymphoma or hodgkin's lymphoma mainly composed of nodular lymphocytes.

As used herein, variant allelic sequencing depth ViAlso called mutation depth, variant allelic frequency VAFiAlso referred to as mutation frequency.

27页详细技术资料下载
上一篇:一种医用注射器针头装配设备
下一篇:基于层次注意力网络的蛋白质序列分类方法

网友询问留言

已有0条留言

还没有人留言评论。精彩留言会获得点赞!

精彩留言,会给你点赞!