System for ranking immunogenic cancer-specific epitopes

文档序号：1538816 发布日期：2020-02-14 浏览：27次中文

阅读说明：本技术 致免疫性的癌症特异抗原决定位的排名系统 (System for ranking immunogenic cancer-specific epitopes ) 是由杨沛佳郑人豪陈映嘉陈淑贞陈华键于 2018-03-31 设计创作，主要内容包括：本发明与能判断、预测及排名致免疫性T细胞抗原决定位的系统与方法有关,尤其是判断由疾病相关突变所产生的抗原决定位,且其中的抗原决定位被预测为能引起T细胞免疫反应。具体来说,本发明同时考虑了胜肽链级信息(包含胜肽链在主要组织兼容性复合体类型一及类型二的呈现和辅助性与胞杀性T细胞的免疫反应上的影响)以及样本级信息(包含突变群落性和主要组织兼容性复合体等位基因表现量)。在一些实施例中,上述系统与方法被使用在癌症个人化医疗上。(The present invention relates to systems and methods for determining, predicting and ranking immunogenic T cell epitopes, and more particularly to determining epitopes generated by disease-associated mutations, wherein the epitopes are predicted to elicit a T cell immune response. In particular, the present invention considers both peptide chain level information (including the effect of peptide chains on the presentation of major histocompatibility complex types one and two and the immune response of helper and cytotoxic T cells) as well as sample level information (including mutant community and major histocompatibility complex allele expression). In some embodiments, the above-described systems and methods are used in cancer personalized medicine.)

1. A method of screening for at least one immunogenic and mutation-informative peptide chain, comprising the steps of:

(a) obtaining a plurality of sequences with mutation information;

(b) determining at least one epitope from the disease-associated mutation;

(d) determining the importance of the plurality of features;

(e) determining an immunogenic value of the at least one epitope by the importance of the plurality of features;

(f) ranking the at least one epitope; and

(g) selecting said immunogenic and mutational informative peptide chains based on said ranking in (f), wherein said immunogenic and mutational informative peptide chains comprise at least one epitope and are likely to elicit a T cell immune response.

2. The method of claim 1, wherein: wherein the steps (c) to (e) are achieved by using a machine learning model.

3. The method of claim 1, wherein: wherein the number of selected epitopes is ≦ 100.

4. The method of claim 3, wherein: wherein the number of selected epitopes is ≦ 50.

5. The method of claim 4, wherein: wherein the number of selected epitopes is ≦ 30.

6. The method of claim 5, wherein: wherein the number of selected epitopes is ≦ 10.

7. The method of claim 5, wherein: wherein the number of selected epitopes is 10 to 30.

8. The method of claim 1, wherein: wherein the plurality of features are associated with presentation of the antigenic determinants on Major Histocompatibility Complex (MHC) type one and type two.

9. The method of claim 8, wherein: wherein the median inhibitory concentration value (IC50) selected for the binding affinity (binding affinity) of the epitope to type one of the major histocompatibility complex is < 1500 (nM).

10. The method of claim 8 or 9, wherein: wherein the plurality of features comprises the epitope and binding stability of the major histocompatibility complex type one.

11. The method of claim 8, wherein: wherein the plurality of characteristics comprises protein mass, gene expression level, or a combination thereof.

12. The method of claim 1, wherein: wherein the plurality of characteristics is associated with the ability of the epitope to elicit a cytotoxic T cell (cytoxic T cell) immune response.

13. The method of claim 1, wherein: wherein the plurality of characteristics is associated with the ability of the epitope to elicit a helper T cell (helper T cell) immune response.

14. The method of claim 12 or 13, wherein: wherein the plurality of features comprises similarity of the epitope to its native peptide chain.

15. The method of claim 12 or 13, wherein: wherein the plurality of features comprises homology of the epitope to a known antigen.

16. The method of claim 1, wherein: wherein the mutation has a variation frequency (variantfrequency) of at least 10%.

17. The method of claim 16, wherein: wherein the mutation has a variation frequency (variantfrequency) of at least 30%.

18. The method of claim 1, wherein: wherein the mutation has a copy number (copy number) of at least 2.

19. The method of claim 1, wherein: wherein the plurality of features comprises a loss of heterogeneity (lossof heterozygosity).

20. The method of claim 1, wherein: wherein the plurality of characteristics comprises an allelic amount (allel dose).

21. The method of claim 1, wherein: wherein the plurality of characteristics comprises the clonality of the disease-associated mutation.

22. The method of claim 1, wherein: wherein the immunogenicity number is calculated by integrating the plurality of features, the plurality of features including a feature for calculating a peptide-level score and a feature for calculating a sample-level score.

23. The method of claim 1, wherein: wherein said immunogenic value is calculated by integrating said plurality of features including the presentation of said epitope on major histocompatibility complex type one and type two, the ability of said epitope to elicit a helper and cytocidal T cell immune response, and the sociability of said disease-associated mutation.

24. A system for screening at least one immunogenic and mutationally informative peptide chain, wherein the system uses a screening step comprising:

(a) obtaining a plurality of sequences with mutation information;

(b) determining at least one epitope from the disease-associated mutation;

(d) determining the importance of the plurality of features;

(e) determining an immunogenic value of the at least one epitope by the importance of the plurality of features;

(f) ranking the at least one epitope; and

(g) selecting said immunogenic and mutationally informative peptide chains based on said ranking in (f), wherein said immunogenic and mutationally informative peptide chains comprise at least one epitope and are likely to elicit a T cell immune response.

Technical Field

The present invention relates to immunogenic epitopes, and more particularly to a system and method for determining, predicting and ranking immunogenic cancer-specific epitopes.

Background

The use of tumor-specific antigens (tumor-specific antigens) to elicit an immune response against tumor cells has created a trigger for cancer resistance. These antigens are recognized as a link between tumor genomics and clinical utility of immunotherapy. In general, genes with oncogenic mutations will produce peptide chains with mutations. These peptides are linked to primary histocompatibility complex (MHC) types one and two and are presented on the surface of tumor cells as antigens. In particular, cytotoxic T cells (cytotoxic T cells) and helper T cells (helper T cells) recognize these antigens as heterozygotes in the immune system and elicit an immune response. Many of these antigens are tumor-specific and have not been recognized by the immune system. Therefore, they are suitable targets for immunotherapy because treatment can be applied to tumor cells without damaging normal cells.

The approach of eliciting T cell responses with tumor specific antigens showed different results. With these antigens, two obstacles are faced: the first item: immune cells must recognize these antigens as xenobiotics and elicit an immune response without attacking normal cells; the second term is: even if the antigen recognized by T cells is a heterozygote, the cells in our body, including tumor cells with mutations, have a safety check mechanism (called immune checkpoint) that prevents T cells from long-term high-intensity attack, and the success of immunotherapy depends on the loss of function of these immune checkpoints to be effective. The second obstacle mentioned above has become a highlight in recent years. As drugs, clinical trials, and target cancer species grow, immune checkpoint inhibitors (immune checkpoint inhibitors) such as: antibodies that inhibit PD1, PDL1, and CTLA4 were developed. However, there is still much room for improvement because the response rate of immune checkpoint inhibitor therapy is only about 20% or less. Therefore, prior to treatment, it is desirable to screen patients for susceptibility to treatment. With high quality or high amounts of tumor specific antigens, is believed to be highly correlated with response rate and survival rate of the therapy. To speed up the screening of patients for immune checkpoint inhibitors, there is a high need to be able to accurately identify these antigens.

In contrast, progress in developing directly selectable antigens that are recognizable by T cells remains relatively arrested. T cells trigger an attack after recognizing an external antigen, however, there is no effective method for T cells to recognize tumor-specific antigens. Treatment methods can be broadly divided into cancer vaccines and cell infusion therapies. The therapeutic cancer vaccine targets a pool of naive T cells

T cell reporters) and reactivates naive T cells to retard tumor growth and allow tumor shrinkage. The vaccine consists of tumor specific antigens and these antigens are selected for their ability to elicit an immune response. However, poor immunogenic antigen screening has hindered vaccine development and not performed well. Cell infusion therapy focuses directly on training immune cells to attack tumor cells. Immune cells, usually T cells or dendritic cells, are collected from the patient and further cultured in the laboratory. T cells that successfully eliminate tumor cells by recognizing tumor specific antigens are then selected and returned to the patient. However, this method has a low success rate due to poor efficiency of the antigen selection method. From the above two methods, it is known that the selection of the optimal immunogenic tumor specific antigen is necessary for the clinical efficacy of immunotherapy.

A method that can reliably determine immunogenic tumor specific antigens has wide application and key utility for a variety of immunotherapy strategies. Current methods for determining tumor-specific antigens generally involve the identification of mutations and the prediction of antigenic determinants (i.e., positions (epitopes) within the antigen at which the determinant antigen is capable of eliciting an immune response) and binding affinity of major histocompatibility complexes. There are several tools for predicting epitopes, but the results of the predictions do not match each other, and experiments can only verify that about 55% of the epitopes are predicted (Rajasagi M et al, blood.2014 Jul 17; 124(3): 453-62.). Typical methods are based on peptide chain sequences without simultaneous consideration of both types of major histocompatibility complexes and their corresponding immune cells. Moreover, each patient or sample has specific properties that affect the prediction, and these sample-specific properties are not taken into account by current cancer antigen (neoantigen) ranking methods. These properties can be described in summary as the number of alleles (allel dock). When the number of alleles with mutations and major histocompatibility complex alleles is high, the immune system has a higher chance to recognize tumor specific antigens, which in turn affects the prediction of epitope. The present invention discloses systems and methods for determining, predicting and ranking immunogenic T cell epitopes, using peptide-based information and sample-based information. Peptide chain level information includes presentation of major histocompatibility complex types one and two, CD4 activation, and CD8 activation, while sample level information includes allele count, i.e., clonality of the allele with the mutation, i.e., the number of alleles of the major histocompatibility complex. Moreover, the system and method integrate a complete list of factors, each based on cellular biochemical processes, tumor specific properties, antigen presentation processes, and immune activation processes. The invention discloses the optimal screening of antigenic determinants by weighting each factor. The invention also discloses a method for ranking epitope, which can be used for developing personalized treatment methods, such as cancer vaccines, cell infusion therapy (adaptive cell transfer), immune checkpoint inhibitors, and the like.

Disclosure of Invention

A system and method are disclosed for determining epitopes from tumor tissue of a patient and predicting and ranking whether the epitopes will elicit an immune response against the disease. The system and the method simultaneously consider peptide chain level (peptide-level) information of the epitope and sample level (sample-level) information of tumor tissues, wherein the peptide chain level characteristics are characteristics of peptide chain sequences related to Major Histocompatibility Complex (MHC) type I and type II, activation of helper T cells (helper T cells) and activation of cytocidal T cells (cytotoxic T cells); sample-level information is tumor-specific information, including the clonality of mutant alleles and the number of major histocompatibility complexes. The present system and method integrates the above factors and calculates weights for various factors, representing the extent to which an immune response can be induced. The system and method gives each epitope an immunogenic score (immunogenicity score) and prioritizes the epitopes to provide a reference for subsequent personalized medicine.

The mutation site and copy number variation information and other sequencing related information obtained by sequencing analysis of next generation are required by the system to comprise: and (4) operating the original sequencing fragment and the type of the main histocompatibility complex. In some embodiments, the major histocompatibility complex type may be the same as or a different individual than the individual comprising the mutation site. The system outputs a set of epitopes associated with mutations, comprising: (a) a peptide chain sequence having a mutation; (b) peptide-level score, which represents the ability of a peptide chain to be presented and activate an immune response; (c) sample-level score, which represents the mutational community in heterogeneous tumors; (d) ranking of epitopes, which represents the priority of their predicted efficacy when used in immunotherapy.

The system and method of the present disclosure encompasses some or all of the following steps: (1) identifying mutations determined by next generation sequencing analysis comprising: mutation site analysis, mutation site labeling, copy number analysis, loss of heterogeneity (loss of heterogeneity) analysis, tumor purity (tumor purity) analysis; (2) analyzing the characteristics of the gene with the mutation site; (3) judging gene expression amount from tissue specific and disease specific data in public database; (4) judging the protein quality from tissue specific and disease specific data in a public database; (5) obtaining peptide chain containing mutation site. The peptide chain associated with major histocompatibility complex type one is 8-15 amino acids in length, with 8-11 amino acids being preferred. The length of the peptide chain related to the type II of the main histocompatibility complex is 9-23 amino acids; (6) predicting binding of the peptide chain to major histocompatibility complex type one and type two; (7) predicting the ability of the peptide chain to activate CD8+ T cell and CD4+ T cell immune responses; (8) predicting whether a peptide chain will be presented on the cell surface via an antigen presentation procedure; (9) comparing the difference between the mutated and non-mutated peptide chains; (10) comparing the difference between the peptide chain and a known antigen; (11) judging the allele amount of the first type of the main histocompatibility complex, and adding the analysis of the first type of the histocompatibility complex; (12) combining and integrating steps 1-11, calculating the weight of peptide chain level factors and predicting peptide chain level immunogenicity; (13) calculating mutation frequency (clonal mutation frequency) of colony mutation sites and using the mutation frequency and the sample-level value; (14) the integration peptide chain level value and the sample level value are immunogenic values; (15) judging that the gene copy is missing, and setting the immunogenicity value to be zero when the gene loses all copies; (16) the immunogenicity values are ranked.

Factors determining epitope immunogenicity include one, more, or any combination of but are not limited to (i) the frequency of variation of the mutation; (ii) variation in copy number; (iii) loss of heterogeneity; (iv) tumor purity; (v) the clonality of the mutant allele; (vi) homology to known antigen sequences (antigenic homology); (vii) similarity to wild type in major histocompatibility complex type-interaction (self-similarity); (viii) similarity to wild type in major histocompatibility complex type two interactions (self-similarity); (ix) gene expression level; (x) Protein mass; (xi) Proteasome cleavage site propensity (proteosome cleavage); (xii) TAP transport efficiency (TAP transport); (xiii) Major histocompatibility complex type one binding affinity; (xiv) Major histocompatibility complex type two binding affinity; (xv) Primary histocompatibility complex type one binding stability; (xvi) Allelic amount of major histocompatibility complex type one (allel dosage); (xvii) Similarity of consensus sequence matrix (consensus sequence matrix) of peptide chain sequences and immunogenic T cell epitope sequences.

In some embodiments, some individuals have higher allelic amounts, such as the major histocompatibility complex (histocompatibility complex) that possesses a pure pair. Higher allele numbers may result in an increased effect on antigen presentation. The model adds the effect of increasing the amount of alleles to the calculation of major histocompatibility complex type one.

In peptide chain-level numerical calculations, we constructed four sets of machine learning models using the factors vi to xv described above. The first model predicts presentation of the first major histocompatibility complex type, including gene expression, protein mass, proteasome cleavage site propensity, TAP trafficking efficiency, first major histocompatibility complex type-binding affinity, first major histocompatibility complex type-binding stability, and allelic amount of the first major histocompatibility complex. Model two predicts the appearance of major histocompatibility complex type two, including major histocompatibility complex type two binding affinity. Model three predicts the activation of helper T cells, including self-similarity and antigenic homology. Model four predicts activation of cytotoxic T cells (cytoxic T cells) comprising: self-similarity, antigenic homology, and major histocompatibility complex type one. Training data for model three and model four were derived from the results of in vitro T cell immune response experiments. We integrate these four models and their combinations using machine learning regressors and data analysis methods. The final model comprises the weighted factors, the feature screening and the iteratively adjusted optimized machine learning model. Finally, the model was validated using epitopes known to be immunogenic.

In the sample-level numerical calculation, we calculated whether the mutation is a colony mutation using the above-mentioned factors (i) to (v). Tumors may contain several populations, each with a unique genetic makeup. If a mutation occurs in most communities, the mutation is defined as a community mutation, indicating that the community mutation occurs at a "trunk" time point early in cancer evolution. Colony mutation-derived tumor-specific antigens are present in most tumor cells and are therefore highly likely to be subject to immune attack. In contrast, tumor-specific antigens derived from subpopulations are "branched" mutations, present only in a few tumor cells, and other populations have no effect even if attacked. Judging a colony mutation requires counting the number of mutant alleles using the expected variation frequency of maximum likelihood (maximum likelihoods), followed by estimating the purity of the sub-colony (subclonal purity). The sample grade values can be calculated after the purity of the sub-population and the tumor purity are obtained.

The immunogenicity values include peptide chain level values and sample level values. The system ranks each epitope according to the immunogenicity value. The system finally outputs the epitope, the immunogenicity value and the ranking.

Drawings

FIG. 1 is a system flow diagram. And presenting the whole flow and main steps of the system in operation.

Fig. 2 is an input and output flow diagram. The system needs to input variation and sample information generated by next generation sequencing analysis and major histocompatibility complex information. The system can output peptide chain sequence, peptide chain grade value, sample grade value and immunogenic ranking.

FIG. 3 is a graph comparing the magnitude distribution of immunoreactive peptide chains, i.e., immunoreactive and non-reactive, based on T cell characteristics; A) the features of example 5 include CD4+ cell-related autologous similarity and antigenic homology; B) the features described in example 6 included CD8+ immunity. The p-value in the above graph was calculated from the median difference assay in independent double samples.

FIG. 4 is a graph of the numerical distribution of peptide chains comparing immunoreactions, i.e., immunoreactions and non-reactions, based on antigen expression and T cell characteristics; A) the features of example 7 included CD4+ cell-associated autologous similarity, antigenic homology, and the features of example 4; B) the features of example 8 included CD8+ associated CD8+ immunity and the features of example 1. The p-values in the above figures were calculated from the median difference assay in the independent double samples.

FIG. 5 is a graph comparing immune responses based on antigen prediction values and T cell characteristics. Peptide chain series value distribution with and without immune response; A) the characteristics of example 9 include CD4+ cell-associated self-similarity, antigenic homology, and the values predicted by the model of example 4; B) the features of example 10 include CD8+ associated CD8+ immunity and the values predicted by the model of example 1. The p-value in the graph was calculated from the median difference assay in the independent double samples.

FIG. 6 is a comparison of immune responses based on antigen expression information and two T cell characteristics. Peptide chain series value distribution with and without immune response; A) example 11 includes example 5 and example 6; B) example 8 includes example 9 and example 10.

FIG. 7 shows the reactive peptide chains contained in the first 50 peptide chain class values. The bar graph shows the number of peptide chains that were experimentally confirmed to have a CD8+ response, while the peptide chains were in the first 50 peptide chain level values. Each graph represents a patient. The dashed line represents the total number of immunoreactive peptide chains in each patient.

Detailed Description

In some embodiments, the present invention discloses an integrated system and method for determining disease specific epitopes (epitopes) and predicting the immunogenicity of the epitopes and ranking the epitopes for further patient-specific personalized treatment in a critical program of precision medicine. The system and method incorporates sequencing-based mutational site analysis (variant calling), sequencing-based copy number (copy number) discrimination, sequence alignment, similarity matrices, machine learning, optimization methods, and mathematical modeling for accurate and tangible determination of immunogenic epitopes (as disclosed in fig. 1). The system and method considers the composition properties (components) of each component cell biochemical reaction process (cellular process), tumor specific property, antigen presentation process (antigen presentation) and immune activation process, the composition properties in each process are calculated as the factors (factors) in the system according to the actual functions in the cell, then each factor gives weight according to the contribution degree of the factor to the immunogenicity of the epitope, and a factor giving weight can help to discuss the immunogenic cause of the epitope and promote the clinical and research progress. The system considers the factors representing peptide-level information and sample-level information and uses the factors to score the immunogenicity of the epitope, and the determined epitope is ranked according to the predicted immunogenicity value of the epitope.

The terms used in this document should be construed to describe embodiments and claims, and temporal variations and radical changes of any term should not be construed to limit the effect of this document, nor should other conventional synonyms of any term be used to limit the effect of this document.

While various alternatives are possible, the invention is not limited to the specific methods or procedures described herein, and the specific embodiments described herein are exemplary only, and should not be construed as limiting the scope of the disclosure.

As used in this invention, the singular forms of the articles: the plural forms of what is meant by a word or words are included, unless the context clearly dictates otherwise.

The term "component" means a characteristic of a mutation or a gene or a specific step in a biochemical reaction of a cell or a specific property of a sample.

The term "factor" means a representation of a composition property in a calculation, wherein the factor may be calculated by a mathematical formula or predicted by a calculation tool or classified as a category.

The term "peptide" refers to amino acid sequences of various lengths, which may or may not be immunogenic and may or may not be tumor-associated. The term "antigen" refers to a peptide chain that is immunogenic and recognized by the immune system. The term "epitope" means a short fragment of an antigen that can be presented on the cell surface, which epitope may be generated by cleavage of a long fragment of the antigen by the "proteasome".

The term "cancer vaccine" means a therapeutic vaccine intended to treat cancer and to combat cancer by enhancing the human immune system, and should not be confused with a prophylactic vaccine that is commonly administered for prophylactic purposes prior to the onset of disease.

The term "Major Histocompatibility Complex (MHC)" is intended to mean any variant and name thereof, including but not limited to its type, its substitute name such as "human leukocyte antigen (HLA"), its class such as A, B, C, DRB1, DPA1, DPB1, DQA1 and DQB1, and the like.

The term "mutation" (mutation) means, unless otherwise specified, nonsynonymous somatic mutation, including missense mutation (missense mutation), frameshift mutation (frameshift mutation) and splice site mutation (splice site mutation). The term "variant" includes mutations but further includes structural variations, including copy number variations (copy number variations), chromosomal rearrangements (rearrangments), fusions (fusions), translocations (translocations), and inversions (inversions). Somatic mutation (somatic variant) is defined as a mutation that does not occur in germ cells and occurs later in life, particularly during cancer development, and which may lead to tumor formation or to the development of a cancer associated therewith.

The term "depth of sequencing (total depth)" means the total number of sequencing fragments (reads) sequenced at a particular gene position.

In some embodiments, the present systems and methods may receive next generation sequencing data (as disclosed in FIG. 2). The next generation sequencing data may be (VCF) profiles, (SAM) profiles, (BAM) profiles, FASTQ profiles, or any other unprocessed or processed profile. The VCF profile contains information on all mutations in the genome, including but not limited to mutant alleles, reference alleles (reference alleles), chromosomes, locations on chromosomes, mutation frequencies of mutations, and sequencing depths. In some embodiments, the user must provide information on large segment variations, including: copy number variation, tumor purity and loss of heterogeneity. In some embodiments, the system may receive a SAM file or a BAM file, and the information may be obtained from the SAM file or the BAM file. In some embodiments, the system may receive FASTQ files, all of which may be obtained after aligning the sequences with a reference gene.

The present system may receive the type of primary histocompatibility complex (as disclosed in fig. 2). In some embodiments, the major histocompatibility complex comprises various subtypes of type one, but not limited to type a, B, and C, and requires a four digit resolution. In some embodiments, the major histocompatibility complex comprises various subtypes of type two, but not limited to DRB1, DPA1, DPB1, DQA1, and DQB1, and requires four-digit resolution. In some embodiments, the primary histocompatibility complex type may be derived from next generation sequencing data.

Systems and methods for determining antigenic determinants and predicting immunogenicity are described, including but not limited to (i) mutation frequency of mutations determined by mutation site analysis; (ii) variation in copy number; (iii) loss of heterogeneity of mutations; (iv) tumor purity; (v) the clonality of the mutant allele; (vi) judging the homology with the known antigen sequence by sequence alignment (antigen homology); (vii) calculating the binding affinity of the mutant peptide chain to type one of the major histocompatibility complex and the binding affinity of the wild-type peptide chain to type one of the major histocompatibility complex, and calculating the ratio of the two binding affinities to determine the similarity of the mutant peptide chain to the wild-type peptide chain; (viii) calculating the binding affinity of the mutant peptide chain to type two of the major histocompatibility complex and the binding affinity of the wild-type peptide chain to type two of the major histocompatibility complex, and calculating the ratio of the two binding affinities to determine the similarity of the mutant peptide chain to the wild-type peptide chain; (ix) obtaining the gene expression quantity judged by tissue specific and disease specific experimental data from a public database; (x) Obtaining the protein quality judged by tissue specific and disease specific experimental data from a public database; (xi) Proteasome cleavage site propensity as determined from protein degradation data; (xii) TAP transport efficiency determined by TAP transport rate data; (xiii) Binding affinity of major histocompatibility complex type one as determined by in vitro assay results; (xiv) Binding affinity of major histocompatibility complex type two as determined by in vitro assay results; (xv) Binding stability of primary histocompatibility complex type one; (xvi) (ii) an allelic amount of major histocompatibility complex type one; (xvii) Peptide chain sequence immunity was determined as a result of in vitro (in vitro) and ex vivo (ex vivo) T cell expansion experiments.

The epitope will be expressed on the cell surface via an antigen presentation program. The presentation of cancer-specific epitopes is performed by first performing mutagenesis derived from genetic mutations to generate peptides, cleaving the peptides into small fragments by proteasome, and then entering the endoplasmic reticulum via TAP. In the endoplasmic reticulum, peptide chains bind to the major histocompatibility complex and are presented together on the cell surface to provide for immune cell recognition. Each step in the antigen presentation procedure described above affects the immunogenicity of the epitope.

Mutations in tumors are not found in all tumor cells. If an immunogenic epitope derived from a mutation is present in most tumor cells, the immune cells have a better chance to recognize and attack most tumor cells, thereby destroying the tumor. Therefore, the proportion of cells containing mutations (represented by a variation frequency of 0-100%) becomes an important criterion for determining epitope immunogenicity. A higher frequency of variation indicates that a mutation is present in most tumors, thereby affecting the effectiveness of immune attack. Other variation information, including copy number variation, loss of heterogeneity, tumor purity, grouping of mutant alleles, etc., also reflects whether tumor cells will produce antigenic determinants with mutations, resulting in immune attack on tumor cells.

One prerequisite for the generation of antigenic determinants is the necessity of gene expression. Detecting gene expression in tumor samples can be performed by next generation sequencing (e.g., RNA sequencing), microarray, real-time polymerase chain reaction (quantitative real-time PCR), Northern Blot (Northern Blot), or other assays. Tissue and cancer specific gene expression data can be obtained from public databases. The data in the public database can avoid the noise caused by low expression gene and obtain the information of the real expression gene. Although transcription has a complex set of regulatory mechanisms, it is known that any position of the genome can be transcribed, and low-expression genes can still be detected experimentally, which causes excessive noise. Thus, a gene that is expressed in the same diseased tissue by most people may represent a common expression of the gene in diseased cells. After gene expression, it is translated to generate epitopes. In some data sets, gene expression may be a qualitative expression profile, such as: low, medium, high. In these data sets, qualitative data can be converted into numerical values, such as: 0.1, 2 and 3. In other data sets, gene expression can be in any unit of value, such as a ratio or a self-judged unit. In some embodiments, the machine learning model of the present system may receive values or converted values. An unexpressed gene may be indicated as low, 0, or no expression. In other embodiments, genes that are not expressed are screened out. Conversely, genes that are determined to be highly expressed by the above experimental data will help determine the amount of epitopes. A high representation of the epitope has a higher chance of contacting the major histocompatibility complex and is more readily presented on the cell surface.

Protein mass information can be detected by mass spectrometry, immunofluorescence, immunohistochemistry, or Western Blot (Western Blot). Protein quality can be obtained from public databases. The quality of the protein with the mutated epitope helps to determine the binding of the epitope to the major histocompatibility complex. An epitope, while potentially very immunogenic, may be present in very small amounts and thus unable to elicit an immune response. In some data sets, protein quality can be a qualitative manifestation such as: low, medium, high. In these data sets, qualitative data can be converted into numerical values, such as: 0.1, 2 and 3. In other data sets, protein mass may be a number in various units, such as a ratio or a self-judged unit. In some embodiments, the machine learning model of the present system receives values or transformed values. An undetected protein is represented by low, 0, or no expression. In other embodiments, proteins that are not detected are screened out. Conversely, proteins with high amounts, as determined by the above experimental data, will help determine the amount of epitope. An epitope of high protein quality has a higher chance of contacting the major histocompatibility complex and is more readily presented on the cell surface.

The present systems and methods will determine the similarity between a mutated peptide chain and a non-mutated wild-type peptide chain. If a mutated peptide chain is very similar to a wild-type peptide chain, the immune cell may consider the mutated peptide chain to be autologous and tolerate its presence. Judging the similarity of the mutant and wild-type peptide chains allows calculation of the binding affinities of both and the major histocompatibility complex, and the ratio of the two binding affinities. Both major histocompatibility complex type one and type two are calculated.

The present systems and methods determine the homology of a mutant peptide chain to a known antigen. Antigens are known to be derived from bacteria, viruses, or other pathogens, and in most cases cause an immune response by T cells. If a mutant peptide chain and known antigen is very similar, more likely to cause immune response. We used sequence alignment analysis to determine the identity (identity) and length of sequence identity between the mutated peptide chain and the known antigen to determine antigen homology. Homology refers to the proportion of the mutated peptide chain that contains identical antigen sequences.

The epitope will be exposed to the major histocompatibility complex in the endosome. Before entering the endosome, the mutant peptide chain must first be cleaved by the proteasome into epitopes of appropriate size. The proteasome cleavage site is predicted to be a value of 0 to 1. In the best case, the epitope does not contain a site that is likely to be cleaved by the proteasome and is less likely to be broken down before it is presented. The epitope then needs to be transported into the endosome via the TAP protein. TAP shipping efficiency can be expressed in terms of IC50 values, where a lower IC50 value represents more efficient shipping. Effectively transportable epitopes have a high probability of contacting major histocompatibility complexes.

The epitope must bind to the major histocompatibility complex in order to be presented on the cell surface. Major histocompatibility complex types-can bind epitopes of 8-15 amino acids in length, but 8-11 amino acids are preferred. The second major histocompatibility complex type binds epitopes of 9-23 amino acids in length, but 15 and 16 amino acids in length are preferred. The epitope and the ligation site (anchorption) of the major histocompatibility complex will vary with the type of major histocompatibility complex. The ability of a particular amino acid on an epitope to bind to a binding site is important for presentation to the antigen and for prediction of binding affinity. An IC50 value of less than 1500nM or 1000nM indicates better binding affinity to both type one and type two major histocompatibility complexes, and more preferably less than 500nM, which indicates that the epitope is most likely to bind to the major histocompatibility complex and be present on the cell surface.

In addition to binding avidity, binding stability is also an important factor in antigen presentation programs. An epitope or permit forms a very strong binding force with the major histocompatibility complex, but may not be presented if it does not bind to the major histocompatibility complex for an adequate length of time. That is, if an epitope is separated from the major histocompatibility complex before it is presented, the epitope cannot be presented on the cell surface. The binding stability of the major histocompatibility complex (with a half-life value of 0 to 1) represents the time for the epitope to bind to the major histocompatibility complex. The longer the binding time, the more chance of epitope being presented on the cell surface.

Major histocompatibility complex type one immunity represents the ability of the composition of the epitope to elicit an immune response. Specific epitope sequences may have a biochemical response to a T-cell receptor (T-cell receptor) that activates cytotoxic T cells. Triggering T cell receptors is the first step in immune attack. The major histocompatibility complex type-immunity is a number from-1 to 1. Higher major histocompatibility complex type-immunity represents a higher probability for the epitope to cause T cell expansion.

Some individuals possess the same type of major histocompatibility complex, meaning that the major histocompatibility complex from a parent is of the same type, called a homozygosity pair (homozygosity pair). The homozygote major histocompatibility complex allele (homozygous MHC allele) may cause additive effects due to the effect of the amount of allele (allele dosage). Pure and daughter major histocompatibility complex alleles possess a higher number of alleles that can bind to an epitope, thus increasing the likelihood that the epitope will be presented on the cell surface. In addition, higher amounts of major histocompatibility complex on the cell surface also increases the probability of T cells recognizing epitopes. Therefore, this multiplication effect is also incorporated into the calculation.

According to any of the above methods relating to the data on the immune response obtained by an individual receiving immunotherapy, the absence of an immune response may be due to a defect in the factors responsible for the antigen presentation mechanism. These defects disable the antigen presentation process, so that they are not presented on the cell surface even if the epitope is immunogenic. These conditions are confounding factors (cluttering factors) in the calculation of the efficacy of immunotherapy and therefore individuals with defects in the antigen presentation mechanism will not be included in the calculation.

In any of the above methods, the weight of each factor is determined from the present decision system. The judgment system comprises feature selection (feature selection), machine learning (machine learning), verification (validation), iterative model tuning and optimization (iterative model tuning for optimization). The features included after feature selection are as follows:

peptide chain class characteristics

Primary histocompatibility Complex type one Presence: gene expression, protein mass, proteasome cleavage site propensity, TAP transport, major histocompatibility complex type-binding affinity, major histocompatibility complex type-stability.

Primary histocompatibility complex type two presentation: major histocompatibility complex type two binding affinities.

Helper T cell activity: self-similarity, antigenic homology.

Cytotoxic T cell activity: self-similarity, antigen homology, major histocompatibility complex type-immunity

Sample level characterization

The clonality of the mutant allele, the amount of allele of major histocompatibility complex type one.

By utilizing the corresponding characteristics, peptide chain grade values of four models can be calculated, wherein the models comprise: primary histocompatibility complex type one, primary histocompatibility complex type two, helper T cell activation, cytocidal T cell activation. In addition, major histocompatibility complex type one contains the sample level characteristics allele count of major histocompatibility complex type one. We use the results of the machine learning regressor to integrate these four models and their combinations. We also calculated peptide chain values using iterative model tuning and optimization and validated the model using known immunogenic epitopes. We then mathematically or analytically (e.g., by product) integrate any two or more models to arrive at a final model.

We obtained sample-level values from the calculation of the clonality of the mutant alleles. Mutations that develop during the early stages of cancer progression are colony mutations or "stem" mutations, which represent the time periods when they occur in the stem but not in the branches of cancer mutation evolution. Colony mutations occur in most cancer cells. Judging community mutation requires finding out expected variation frequency and observing statistical significance of variation frequency, calculating expected number of mutant alleles, and calculating subgroup purity by using the number of mutant alleles. The sample-level value is the ratio of the subpopulation purity to the tumor purity.

The immunogenic value is obtained by integrating the peptide chain value and the sample-level value. The model is adjusted by iterative calculation, and the model is reconstructed and trained every generation. The model with the best performance is the final model. An immunogenic value is calculated for each epitope, and the magnitude of the immunogenic value represents the system's ranking for each cancer-specific epitope.

The model is constructed by a machine learning method, which comprises: primary histocompatibility complex type one, primary histocompatibility complex type two, helper T cell activation, cytocidal T cell activation, and final immunogenicity values obtained by integrating these models and sample-level values. Various robotics-type models can be used to train the model, such as: regression-based models, tree-based models, Bayesian models, support vector machines, boosting models, and neural network models.

The systems and methods of the present disclosure can be useful in cancer immunology. The system provides a set of practices that can assist in patient therapy. The immunogenic epitopes identified by the present system can be used in various humanized medical treatments and various immunotherapies, for example: an immune-checkpoint inhibitor (immune-checkpoint inhibitor), a cancer vaccine (cancer vaccine), or an adaptive cell transfer. In cancer vaccines and cell infusion therapies, the ranked epitopes provide a high potential peptide chain selection for vaccine production or for immune cell training. In immunodetection point inhibitor therapy, the number of immunogenic epitopes can be used to predict the efficacy of the drug after administration. The system is suitable for precise medical treatment aiming at individuals or therapy for the broad masses.

Experimental examples

39页详细技术资料下载

上一篇：一种医用注射器针头装配设备

下一篇：钙粘结剂用于防止死产的用途

System for ranking immunogenic cancer-specific epitopes

相关技术

网友询问留言