Methods and compositions relating to the detection of recombination and rearrangement events

文档序号:1431858 发布日期:2020-03-17 浏览:42次 中文

阅读说明:本技术 涉及检测重组事件和重排事件的方法和组合物 (Methods and compositions relating to the detection of recombination and rearrangement events ) 是由 弗雷德里克·W·阿尔特 胡佳志 雪莉·林 杜洲 张宇 陈欢 于 2018-02-13 设计创作,主要内容包括:本文描述了用于检测细胞中的重组事件和/或重排事件的方法和分析。在一些实施方式中,所述方法和/或分析涉及线性扩增介导的PCR((LAM)-PCR)。在一些实施方式中,所述重组事件是V(D)J重组事件。(Described herein are methods and assays for detecting recombination events and/or rearrangement events in a cell. In some embodiments, the methods and/or assays involve linear amplification mediated PCR ((LAM) -PCR). In some embodiments, the recombination event is a v (d) J recombination event.)

1. A high throughput whole genome translocation sequencing (HTGTS) -based detection method for recombination events and/or rearrangement events in a cell, the method comprising the steps of:

a. extracting genomic DNA and/or mRNA from the cells;

b. optionally, generating a fragmented DNA and/or mRNA sample;

c. generating a single-stranded PCR product from genomic DNA by linear amplification-mediated PCR (LAM-PCR) using at least one first locus-specific primer; and/or

Generating cDNA from the mRNA by reverse transcription using at least one first locus-specific primer;

d. generating ligated DNA and/or cDNA products by ligating the single-stranded PCR products or cDNA generated in step (c) to an adaptor, wherein the adaptor comprises:

a distal portion of a known DNA sequence that can be used to design PCR primers for nested PCR amplification;

a proximal portion of random nucleotides; and

3' protruding;

e. amplifying a nucleic acid sequence comprising said recombination event and/or rearrangement event by nested PCR with adaptor-specific primers and at least one second locus-specific primer to produce nested PCR products using said ligation products of step (d);

f. optionally, digesting the PCR product of step (e) with a restriction enzyme to block unrearranged decoy-containing fragments;

g. generating a sequenced nested PCR product by sequencing the nested PCR product; and

h. aligning the sequenced nested PCR products to a reference sequence or antigen receptor database.

2. The method of claim 1, wherein the recombination event is a V (D) J recombination event.

3. A method for high throughput repertoire sequencing-based detection of Ig repertoire sequences in a cell, the method comprising the steps of:

a. extracting genomic DNA and/or mRNA from the cells;

b. optionally, generating a fragmented DNA and/or mRNA sample;

c. generating a single-stranded PCR product from genomic DNA by linear amplification-mediated PCR (LAM-PCR) using at least one first locus-specific primer; and/or

Generating cDNA from the mRNA by reverse transcription using at least one first locus-specific primer;

d. generating ligated DNA and/or cDNA products by ligating the single-stranded PCR products or cdnas generated in step (c) to an adaptor, wherein the adaptor comprises:

a distal portion of a known DNA sequence that can be used to design PCR primers for nested PCR amplification;

a proximal portion of random nucleotides; and

3' protruding;

e. using the ligation products of step (d) to amplify nucleic acid sequences comprising the Ig-group repertoire sequences by performing nested PCR with adaptor-specific primers and at least one second locus-specific primer to generate nested PCR products;

f. optionally, digesting the PCR product of step (e) with a restriction enzyme to block unrearranged decoy-containing fragments;

g. generating a sequenced nested PCR product by sequencing the nested PCR product; and

h. aligning the sequenced nested PCR products to a reference sequence or antigen receptor database.

4. The method of claim 3, wherein the detected repertoire comprises V (D) J recombination events and/or somatic hypermutations (SMHs).

5. The method of claim 3 or 4, wherein the repertoires detected comprise an Ig heavy chain repertoire, an Ig light chain repertoire, a V use repertoire, and a CDR3 repertoire.

6. The method of any one of claims 1-5, wherein the cell is selected from the group consisting of:

mature B lymphocytes, developing B lymphocytes, mature T lymphocytes, developing T lymphocytes, cells obtained from germinal centers, and cells obtained from peyer's patches.

7. The method of any one of claims 1-6, wherein the method further comprises providing the cell, wherein the cell is obtained from an animal immunized with an antigen.

8. The method of any one of claims 1-7, wherein the method further comprises providing the cell, wherein the cell comprises a V (D) J exon that has undergone somatic hypermutation.

9. The method of claim 8, wherein the cell is a germinal center B lymphocyte or a peyer's disease B lymphocyte.

10. The method of any one of claims 1-9, further comprising, prior to performing step (a), the steps of:

immunizing an animal with an antigen; and

obtaining cells from the animal.

11. The method of any one of claims 1-10, wherein the at least one first locus-specific primer specifically anneals to a J gene segment.

12. The method of any one of claims 1-11, wherein the method further comprises using a plurality of first locus-specific primers and/or a plurality of second locus-specific primers.

13. The method of claim 12, wherein each of the plurality of primers specifically anneals to a different V gene segment, D gene segment, and/or J gene segment.

14. The method of claim 13, wherein each of the plurality of primers specifically anneals to a different J gene segment present in the genome of the cell or organism prior to v (d) J recombination.

15. The method of claim 14, wherein the plurality of primers collectively specifically anneal to JH1、JH2、JH3 or JH4, respectively.

16. The method of claim 14, wherein the plurality of primers collectively specifically anneal to JHGene segment, JKGene segment and JLAt least one sequence of each of the gene segments, said JHGene segment, JKGene segment and JLThe gene segment is present in the genome of the cell or organism prior to V (D) J recombination.

17. The method of any one of claims 1-16, wherein the at least one first locus-specific primer specifically anneals to a degenerate region of the targeted gene segment.

18. The method of any one of claims 1-17, further comprising, prior to performing step (a), the step of differentiating the source cell or source tissue to initiate v (d) J recombination.

19. The method of claim 18, wherein the source cell is an induced pluripotent stem cell.

20. The method of claim 18, wherein the source cells are primary stem cells.

21. The method of any one of claims 1 to 20, wherein prior to performing step (a), the cell or source is transduced with a RAG1/2 endonuclease to initiate v (d) J recombination.

22. The method of any one of claims 1-21, further comprising the step of contacting the cell with one or more agents that initiate v (d) J recombination or SHM.

23. The method of claim 22, wherein the agent that initiates v (d) J recombination is imatinib.

24. The method of claim 23, wherein the cell is a v-abl virus-transformed B-cell.

25. The method of claims 1-24, wherein the rearrangement event involves an oncogene and/or a RAG off-target cleavage site.

26. The method of any one of claims 1-25, wherein the cell is selected from the group consisting of:

a cell expressing AID, a cancer cell, a cell expressing a RAG endonuclease, or a nervous system cell.

27. The method of any one of claims 1-26, wherein the first locus-specific primer comprises an affinity tag.

28. The method of claim 27, wherein the method further comprises: isolating the product of step (c) by affinity purification.

29. The method of any one of claims 27-28, wherein the affinity tag is biotin.

30. The method of claim 29, wherein the affinity purification comprises binding biotin with streptavidin.

31. The method of any one of claims 28-30, wherein the affinity purification comprises binding the product of step (c) to a substrate.

32. The method of claim 31, wherein the substrate is a bead.

33. The method of any one of claims 1-32, wherein the primers used in the nested PCR step comprise barcode sequences.

34. The method of any one of claims 1-33, wherein the fragmenting is performed by sonication or restriction enzyme digestion.

35. The method of any one of claims 1-34, wherein the fragmenting is performed by randomly shearing genomic DNA or with frequent cutting restriction enzymes.

36. The method of any one of claims 1-35, wherein ligating the product of step (c) to an adaptor comprises contacting the product with a population of adaptors having the same distal portion sequence and random proximal portion sequence.

37. The method of any one of claims 1-36, wherein the proximal portion of the adapter is 3-10 nucleotides in length.

38. The method of any one of claims 1-37, wherein the proximal portion of the adapter is 5-6 nucleotides in length.

39. The method of any one of claims 1-38, wherein the adapter comprises a barcode sequence between the distal portion and the proximal portion.

40. The method of any one of claims 1 to 39, wherein the PCR product produced in step (e) is size selected prior to sequencing.

41. The method of any one of claims 1-40, wherein the cells are present in the tissue prior to step (a).

42. The method of any one of claims 1-41, wherein the sequencing is performed using a next generation sequencing method.

43. The method of any one of claims 1-42, wherein the aligning step is performed by a non-human machine.

44. The method of claim 43, wherein the non-human machine comprises computer executable software.

45. The method of claim 43, further comprising a display module for displaying the results of the comparing step.

46. The method of any one of claims 1 to 45, wherein the result of the aligning step is a mutation profile across a set of V (D) J rearranged nucleotide or amino acid sequences.

47. The method of any one of claims 1-46, wherein the cell is a mammalian cell.

48. The method of any one of claims 1-47, wherein the blocking digestion step (f) is omitted.

49. The method of any one of claims 1-48, wherein no end repair is performed prior to step (c).

50. The method of any one of claims 1-49, wherein one or more of said primers comprises a sequence selected from the group consisting of SEQ ID Nos: 1-SEQ ID No:32 or SEQ ID No:43-SEQ ID No:65, in sequence.

51. The method of any one of claims 1 to 50, wherein one or more of the primers is selected from the group consisting of SEQ ID Nos: 1-SEQ ID No:32 and SEQ ID No:43-SEQ ID No: 65.

Technical Field

The technology described herein relates to the detection of recombination events and/or rearrangement events (e.g., v (d) J recombination) in cells by high throughput genome-wide translocation sequencing (HTGTS) based methods.

Background

V (d) identification and characterization of J recombination events are of interest in helping to understand the immune system and in developing and optimizing antibody-based therapeutics. Existing DNA-based methods of detecting V (d) J recombination rely on the use of upstream and downstream degenerate V primers, which can cover most (but not all) of the V (d) J exons and provide unequal coverage of possible exons. Furthermore, such methods only detect rearranged sequences between two primers, and thus do not find RAG-generated junctions (joins) for most off-target sequences. RNA-based methods severely underestimate non-productive rearrangements due to reduced transcription levels and miss many off-target rearrangements within a locus due to lack of expression.

Disclosure of Invention

Described herein are enhanced HTGTS methods for detecting recombination events and/or rearrangement events, e.g., at Ig loci. The assays and methods described herein allow for the detection and characterization of any such event with greater sensitivity and less bias than existing methods.

In one aspect of any embodiment, described herein is a high throughput whole genome translocation sequencing (HTGTS) -based detection method for recombination events and/or rearrangement events in a cell, the method comprising the steps of:

a. extracting genomic DNA and/or mRNA from the cells;

b. optionally, generating a fragmented DNA and/or mRNA sample;

c. generating a single-stranded PCR product from genomic DNA by linear amplification-mediated PCR (LAM-PCR) with at least one first locus-specific primer; and/or

Generating cDNA from the mRNA by reverse transcription using at least one first locus-specific primer;

d. generating ligated DNA and/or cDNA products by ligating the single-stranded PCR products or cDNA generated in step (c) to an adaptor, wherein the adaptor comprises:

a distal portion of a known DNA sequence that can be used to design PCR primers for nested PCR amplification;

a proximal portion of random nucleotides; and

3' overhang (overlap);

e. amplifying a nucleic acid sequence comprising a recombination event and/or a rearrangement event by nested PCR with the adaptor-specific primer and the at least one second locus-specific primer to produce nested PCR products using the ligation products of step (d);

f. optionally, digesting the PCR product of step (e) with a restriction enzyme to block unrearranged decoy-containing fragments;

g. generating a sequenced nested PCR product by sequencing the nested PCR product; and

h. aligning the sequenced nested PCR products to a reference sequence or antigen receptor database.

In some embodiments of any aspect, the recombination event is a v (d) J recombination event. In some embodiments of any aspect, the cell is selected from the group consisting of: mature B lymphocytes, developing B lymphocytes, mature T lymphocytes, or developing T lymphocytes. In some embodiments of any aspect, the method further comprises providing a cell, wherein the cell is obtained from an animal immunized with the antigen. In some embodiments of any aspect, the method further comprises providing a cell, wherein the cell comprises a v (d) J exon that has undergone somatic hypermutation. In some embodiments of any aspect, the cell is a germinal center B lymphocyte.

In some embodiments of any aspect, prior to performing step (a), the method further comprises the steps of: immunizing an animal with an antigen; and obtaining cells from the animal.

In some embodiments of any aspect, the method further comprises using a plurality of first locus-specific primers and/or a plurality of second locus-specific primers. In some embodiments of any aspect, the plurality of primers specifically anneal to different V gene segments, D gene segments, or J gene segments.

In some embodiments of any aspect, prior to performing step (a), the method further comprises the step of differentiating the source cell or source tissue to initiate v (d) J recombination. In some embodiments of any aspect, the source cell is an induced pluripotent stem cell. In some embodiments of any aspect, the source cell is a primary stem cell.

In some embodiments of any aspect, prior to performing step (a), the cell or source is transduced with a RAG1/2 endonuclease to initiate V (D) J recombination. In some embodiments of any aspect, the method further comprises the step of contacting the cell with one or more agents that initiate v (d) J recombination. In some embodiments of any aspect, the agent that initiates v (d) J recombination is imatinib.

In some embodiments of any aspect, the cell is a v-abl virus-transformed B cell.

In some embodiments of any aspect, the rearrangement event involves an oncogene (oncogene) and/or a RAG off-target cleavage site. In some embodiments of any aspect, the cell is selected from the group consisting of: cells expressing AID; cancer cells; a cell expressing a RAG endonuclease; or a nervous system cell.

In some embodiments of any aspect, the first locus-specific primer comprises an affinity tag. In some embodiments of any aspect, the method further comprises the following: isolating the product of step (c) by affinity purification. In some embodiments of any aspect, the affinity tag is biotin. In some embodiments of any aspect, affinity purification comprises binding biotin with streptavidin. In some embodiments of any aspect, affinity purification comprises binding the product of step (c) to a substrate. In some embodiments of any aspect, the substrate is a bead.

In some embodiments of any aspect, the primers used in the nested PCR step comprise barcode sequences.

In some embodiments of any aspect, the fragmenting is performed by sonication or restriction enzyme digestion. In some embodiments of any aspect, the fragmenting is by randomly shearing genomic DNA or fragmenting with frequently cut restriction enzymes. In some embodiments of any aspect, ligating the product of step (c) to an adaptor comprises contacting the product with a population of adaptors having the same distal portion sequence and random proximal portion sequence.

In some embodiments of any aspect, the proximal portion of the adapter is 3-10 nucleotides in length. In some embodiments of any aspect, the proximal portion of the adapter is 5-6 nucleotides in length.

In some embodiments of any aspect, the adaptor comprises a barcode sequence between the distal portion and the proximal portion.

In some embodiments of any aspect, the PCR product produced in step (e) is size selected prior to sequencing. In some embodiments of any aspect, prior to step (a), the cells are present in a tissue. In some embodiments of any aspect, the sequencing is performed using a next generation sequencing method. In some embodiments of any aspect, the step of aligning is performed by a non-human machine. In some embodiments of any aspect, the non-human machine comprises computer executable software.

In some embodiments of any aspect, the method further comprises providing a display module for displaying the results of the step of aligning.

In some embodiments of any aspect, the result of the aligning step is a mutation profile across a set of v (d) J rearranged nucleotide or amino acid sequences.

In some embodiments of any aspect, the cell is a mammalian cell. In some embodiments of any aspect, the blocking digestion step (f) is omitted. In some embodiments of any aspect, no end repair is performed prior to step (c).

Drawings

FIG. 1 depicts a diagram of adapted repertoire sequencing by linear amplification mediated high throughput whole genome translocation (HTGTS-Rep-seq) for VH and DH use in precursor (pro) -B cells and splenic B cells. The VH repertoire from VDJH junctions and in-frame (in-frame) or non-productive information are shown on the left; d usage in DJH engagement is shown on the right. The library was generated using JH4 encoding terminal primers as shown by the primers on the top schematic. Libraries were prepared from wild-type 129sve DNA from purified pro-B cells and splenic B cells.

FIG. 2 depicts a schematic diagram of HTGTS-Rep-seq. Simplified IgH loci are shown at the top as an example. V (d) J sequences along with DJ sequences and J germline (germ-line) sequences were amplified linearly from reverse transcribed total messenger rna (mrna) or from fragmented whole genomic DNA with JH-specific biotinylated primers. The amplified products were then enriched and prepared as HTGTS libraries (Frock et al, Nat Biotech, 2015; Hu et al, Nat Protoc, 2016) for paired-end sequencing of Illumina Miseq or other high throughput sequencing methods. The sequencing data was then passed through custom channels for genomic alignment and IgBlast.

Fig. 3A-3D. FIG. 3A depicts a locus overview of V-DJ junctions (junctions) or D-J junctions (loci-wide views) identified in IgH loci from representative splenic B cells or myeloid pro-B cells from two libraries. The white boxes represent JH segments and the shaded triangles represent the Recombined Signal Sequences (RSS). Arrows indicate primer sites and orientation. The black lines above the line graph indicate the positions of the V, D and J segments. The convention for reading VH sequences from upstream leader sequence to downstream RSS is defined by (+); the opposite direction is defined by (-). FIG. 3B depicts the proportion of pseudo VH or functional VH used in the splenic B cell repertoire or pro-B cell repertoire. FIG. 3C depicts a global view of the V-J linked loci identified in the Ig kappa loci of V-Abl virus transformed B cell lines from a representative library. The label is as shown in (fig. 3A). The gray box represents the pseudo jk. RS refers to true (bona fide) RSs without adjacent V or J segments; it is often used during V (D) J recombination to partially delete the Ig kappa locus on the allele that produces the non-productive VJ kappa. FIG. 3D depicts the proportion of pseudo or functional Vkappa used in the V-Abl transformed B cell line repertoire.

FIGS. 4A-4B show that this analysis can also be used to distinguish and quantify the in-frame and out-of-frame (out-of-frame) V (D) J exons. Fig. 4A depicts a frequency plot of functional VH used in splenic B cells or pro-B cells. The Y-axis shows the number of combined in-frame and out-of-frame reads (reads) on each VH from the representative libraries in fig. 3A-3D. Data were extracted after IgBlast analysis. As shown in FIG. 4A, FIG. 4B depicts Vkappa in a V-Abl-transformed B cell line.

FIGS. 5A-5C depict two examples of stitched paired end Iluma Miseq sequences (stitched paired-end Iluma Miseq sequences) extracted from IgH or Igk libraries. FIG. 5A depicts VH length distributions captured from a representative pro-B cell library. Approximately 33% of the recovered VDJ exons had a VH alignment longer than or equal to 285bp (3353/11431). This percentage can easily be greatly increased by using high throughput sequencing methods that result in longer read lengths. Fig. 5B depicts an exemplary stitching paired-end read sequence extracted from a representative JH4 library. Locus specific primers downstream of JH4 are shown. JH segments and DH segments are shown. VH CDR1, VH CDR2 and VHCDR3 are shown. Adaptor sequences (ligated to the linearly amplified PCR fragments) are shown. Fig. 5B discloses SEQ ID NO: 33. FIG. 5C is as shown in FIG. 5A, but shows an exemplary stitched VJ κ read. Fig. 5C discloses SEQ ID NO: 34.

fig. 6 depicts a computer device or system 1000 that includes one or more processors 1030 and memory 1040, the memory 1040 storing one or more programs 1050 for execution by the one or more processors 1030.

FIGS. 7A-7F depict V in purified splenic B cells and partially enriched pro-B cells of C57BL/6 miceHDJHAnd DJHHTGTS-Rep-seq of the repertoire. FIG. 7A depicts a schematic of the murine IgH gene locus, showing VHZone, DHRegion, JHRegion and CHAnd (4) a zone. Arrow denotes J H4 encodes the terminal decoy primer. FIG. 7B depicts the VH library with both productive and non-productive information from pro-B cells (supra) and IgM+V in splenic B cells (lower)HDJHAnd (6) jointing. As shown, some of the most commonly used V's are highlighted by arrowsH. FIG. 7C depicts the number of functional VH and dummy VH used for the 16 families in the HTGTS-Rep-seq library depicted in FIG. 7B. FIG. 7D depicts a graph showing the productivity V from the library depicted in FIG. 7BHDJHBonding and non-productiveVHDJHPie chart of average population percentage of binding. FIG. 7E depicts pro-B cells and IgM as indicated+V in splenic B cellsHDJHBinding and DJHD in combination. FIG. 7F depicts pro-B cells and IgM as indicated+DJ in splenic B cellsH:VHDJHAnd (4) the ratio. All data are shown as mean ± SEM, N ═ 3.

FIGS. 8A-8C depict a reference to four J' sHIgM of bait+V in splenic B cellsHDJHAnd DJHAnd (4) grouping the libraries. FIG. 8A depicts the use of each J as shownHIgM encoding terminal decoy primers+From V in B cells of the spleenHDJHV of joint productivity information and non-productivity informationHGroup library (left) and shows Productivity VHDJHBonding and non-productive VHDJHPie chart of average population percentage of engagement (right). FIG. 8B depicts the use of each JHIgM encoding terminal decoy primers+DJ in splenic B cellsHComparison of D in the join. FIG. 8C depicts the use of each JHIgM encoding terminal decoy primers+DJ in splenic B cellsH:VHDJHAnd (5) comparing the ratios. Mean ± SEM, N ═ 3 for all data. Other analytical details are as described with respect to fig. 7A-7F.

FIGS. 9A-9C show C57BL/6 mouse IgM using J κ 5 decoy primers+HTGTS-Rep-seq of the VJ kappa repertoire in splenic B cells. Fig. 9A depicts a schematic of the murine Ig κ locus showing vk and jk. The gray bars indicate functional vk, with convergent and tandemly transcribed directions (convergent and tandemstrandescriptive orientations), respectively, pointing to the downstream jk. The black bar represents the pseudo vk. Arrows indicate J κ 5 encoding the terminal decoy primer. In fig. 9B: left panel: IgM with a J.kappa.5 decoy primer alone (top) or a J.kappa.5 decoy primer from a combination of J.kappa.5 decoy primers (bottom)+A vk panel library with productive and non-productive information from VJ κ engagement in splenic B cells. As shown, the V κ utilized by some of the 4 different J κs is highlighted by an arrow. Right sideThe following drawings: pie charts showing the overall percentage of productive and non-productive VJ κ engagements. Representative results from two replicates are shown. FIG. 9C depicts the utilization number of functional Vκ and pseudo-Vκ across 20 families in the repertoire depicted in FIG. 9B.

FIGS. 10A-10B show that representative V's can be generated from small amounts of starting genomic DNAHDJHAnd (4) grouping the libraries. FIG. 10A depicts IgM+From V in B cells of the spleenHDJHV of joint productivity information and non-productivity informationHBank of groups (left) and shows Productivity VHDJHBonding and non-productive VHDJHPie chart of average overall percentage of engagement (right), where J is usedH4 was cloned from the indicated amount of genomic DNA. Mean ± SEM, N ═ 3. FIG. 10B depicts V divided by familyHThe number of uses, as shown in fig. 7C.

FIGS. 11A-11B depict schematic diagrams for an HTGTS-Rep-seq. Fig. 11A depicts a schematic of the generation of DJ rearrangements and VDJ rearrangements via V (D) J recombination, wherein V (dark grey), D (black) and J (light grey) are shown. Representative DJ and VDJ engagement events are shown. FIG. 11B depicts a schematic diagram of an overview of the HTGTS-Rep-seq method. Briefly, genomic DNA from a B cell population was sonicated and linearly amplified with biotinylated primers that anneal downstream of one specific J segment. The biotin-labeled single-stranded DNA product was enriched with streptavidin beads and the 3' end was ligated in an unbiased manner (unbiased manner) with a bridging adaptor containing random nucleotides of 6 nucleotides (highlighted in the rectangular box). The product was then prepared for 2 × 300bp sequencing on Illumina Miseq. The resultant reads were analyzed using the Ig/TCR-Repetoire analysis channel described in this method.

FIGS. 12A-12F depict pro-B cells and IgM from 129SVE mice+V in splenic B cellsHDJHAnd DJHHTGTS-Rep-seq of the repertoire. FIG. 12A depicts showing VH(functional gray; pseudo-black), DHAnd JHSchematic representation of the murine IgH locus. Arrow denotes J H4 encodes the terminal decoy primer. FIG. 12B shows pro-B cells (top) and IgM+From V in splenic B cells (lower)HDJHV of joint productivity information and non-productivity informationHAnd (4) grouping the libraries. As shown, some of the most commonly used VHHighlighted with an arrow. Mean ± SEM, N ═ 3. FIG. 12C depicts functional V across 16 families in the HTGTS-Rep-seq library depicted in FIG. 12BHOr pseudo VHThe utilization amount of (c). FIG. 12D depicts pro-B cells (top) and IgM+Productivity V in splenic B cells (bottom)HDJHBonding and non-productive VHDJHAverage population percent of junctions ± SEM pie chart. FIG. 12E depicts pro-B cells and IgM as indicated+DJ in splenic B cellsHD in bonding. Mean ± SEM, N ═ 3. FIG. 12F depicts pro-B cells and IgM as indicated+DJ in splenic B cellsH:VHDJHAnd (5) comparing the ratios. Mean ± SEM, N ═ 3. The details of the analysis are as described with respect to fig. 7A-7F.

FIGS. 13A-13C depict the use of four different J' sHDecoy 129SVE mouse IgM+V in splenic B cellsHDJHAnd DJHAnd (4) comparing the group libraries. FIG. 13A depicts the use of J aloneHIgM encoding terminal decoy primers+From V in B cells of the spleenHDJHV of joint productivity information and non-productivity informationHBank of groups (left) and shows Productivity VHDJHBonding and non-productive VHDJHAverage population percent of junctions ± pie chart of SEM (right). FIG. 13B depicts the use of each JHIgM encoding terminal decoy primers+DJ in splenic B cellsHComparison used in D in the splice. FIG. 13C depicts the use of each JHIgM encoding terminal decoy primers+DJ in splenic B cellsH:VHDJHAnd (5) comparing the ratios. Mean ± SEM, N ═ 3 for all groups. Other analytical details are as described with respect to fig. 7A-7F.

FIGS. 14A-14B illustrate a view about JH1-J H4 whole JHEncodingEnd length reading frame VHDJHAnd (4) proportion. FIG. 14A depicts JH1-J H4. The sequence was extracted from the mm9 genome and was highly conserved between 129SVE and C57 BL/6. The sequence encoding WGXG is red. J. the design is a squareHThe length is marked with an arrow, and 1 represents the nucleotide closest to the decoy primer. Fig. 14A discloses SEQ ID NOs: 35-SEQ ID NO: 38. in fig. 14B, a line graph shows information about each JHBait, retaining designated J every 10,000HTotal v (d) number of J-joints of length (right x-axis). The histogram shows J at each reservationHLength in reading frame v (d) percentage of J exon (left x axis). Mean ± SEM, N ═ 3.

FIGS. 15A-15C depict four J types using combinationsHIgM in decoy 129SVE mice+Spleen B cell VHDJHSpectra were used. Fig. 15A depicts a schematic of IgH loci as in fig. 7A-7F. Red arrows indicate binding of each JHDownstream mixed primers. FIG. 15B depicts according to JHV with separate zone luresHSpectra were used. A representative spectrum from two replicates of the combinatorial primer HTGTS-Rep-seq library is shown here. FIG. 15C depicts the use of each JHIgM encoding terminal decoy primers+DJ in splenic B cellsHD in bonding.

FIGS. 16A-16B depict C57BL/6 mouse IgM using different J kappa baits+Ig κ group pool in splenic B cells. As shown in fig. 3, fig. 16A depicts a schematic of the Ig κ locus. Arrows indicate the position of the jk decoy primer used. FIG. 16B depicts IgM+V κ usage profile and overall productivity/non-productivity ratio of vjκ separated by jk decoy in splenic B cells. In each group, a representative V κ group library with productive and non-productive information from vjk junctions is shown with individual jk decoy primers (top) or individual jk decoy primers from combined jk primers (bottom). As shown, the V κ utilized by some of the 4 different jκs is highlighted by arrows (see also fig. 9). Representative results from two replicates are shown.

FIGS. 17A-17D depict Productivity VHDJHAnd CDR3 length distribution and consensus motifs for VJ κ exons. FIG. 17A depicts a cross-sectional view using JHProductivity V in the C57BL/6 partially enriched pro-B library made with 4 decoy primersHDJHThe CDR3 length distribution of exons. A consensus CDR3 base sequence diagram was made for a subset of 11aa-13aa length CDR3 sequences flanked on both ends by a consensus cysteine and a consensus tryptophan. FIG. 17B: as shown in FIG. 17A, for J H4 bait primers to make C57BL/6 spleen B library. FIG. 17C: as shown in fig. 17A, for the four J typesHA C57BL/6 spleen B library made with bait primers. Mean ± SEM, N ═ 3 (fig. 17A-17C). FIG. 17D: as shown in FIG. 17A, for the C57BL/6 spleen B library made with the J κ 5 primer. Note that we found some errors in our CDR3 sequence analysis due to the basic level of sequencing errors of current high throughput sequencing methods (including Illumina Miseq) and read lengths (600 bp maximum) that are insufficient to cover the entire sequence of long DNA fragments containing the v (d) J exons. However, we eliminate such potential ambiguities (ambiguities) by including only overlapping joined reads in our analysis and/or by raising the threshold for read quality.

Fig. 18A-18B depict features of a unique CDR3 read. Figure 18A depicts the proportion of unique CDR3 sequences from each technical replicate library of figure 10. Mean ± SEM, N ═ 3. Figure 18B depicts the number of identical CDR3 sequences between technical duplicate libraries at different amounts of starting material.

FIGS. 19A-19D depict V from splenic GC cells and naive B cells of three NP-CGG immunized (10D) C57BL/6 miceHUse, clonotype (CDR3) selection and VH1-VH72, SHM mode.

FIGS. 20A-20B depict V between splenic naive B cells and PP naive B cells and between splenic GC B cells and PP GC B cells from three NP-CGG immunized (10d) C57BL/6 miceHThe comparison used.

FIGS. 21A-21D depict V of PP GC B cells versus PP naive B cellsHUse, and grams from WTC57BL/6 mice and AID-/-C57BL/6 miceSelection of the clone (CDR 3).

FIGS. 22A-22B depict V of PP GC B cells and PP primary B cells in PP from different individuals of the same mouseHThe preparation is used.

FIG. 23 shows the most highly enriched V in PP GCH147 and VH11-2 did accumulate mutations but did not show any repeated selection (recurrent selection) in the CDR region mutations.

FIG. 24 depicts the experimental procedure used in example 5 for the detection of IgH panel libraries.

FIG. 25A depicts JH1-J H4 position of the primer selected from highly degenerate regions. Fig. 25A discloses SEQ ID NOs: 39-SEQ ID NO: 42. FIG. 25B depicts a view taken from hVHhV made of 12 baitH1-2DJ each J in the junctionHAnd mixed with JH1-J H4 libraries made with bait were compared.

FIG. 26A depicts the spleen and V in PP GC in indicated individual NP-CGG immunized miceHThe preparation is used. FIG. 26B depicts V in PP GCH172, in SHM mode.

FIGS. 27A-27B depict V from PP GC B cells and PP primary B cells from individual non-immunized miceHUse and VH11-2 clonotype selection.

FIGS. 28A-28B depict V from PP GC B cells and PP naive B cells from individual AID-/-miceHUsing and comparing PP naive B cell mean V between WT mice and AID-/-miceHAnd (4) grouping the libraries. FIG. 28C depicts V from PP GC B cells and PP naive B cells of the indicated AID-/-miceLThe preparation is used.

FIG. 29A shows the most commonly used V in PP GCH1-72 (although not significantly enriched) accumulated sequence-internal (SHM). FIG. 29B depicts the highest VHSHM patterns of clonotypes 1-47.

FIG. 30 depicts HTGTS-Rep-seq IGH panel library, HTGTS-Rep-seq IGK panel library, HTGTS-Rep-seq IGL panel library from purified human peripheral blood B cells. The diagram shows the respective passage through for J H4. Of primers for the J λ 2/3 and J κ 1 coding endsIGH, IGL and IGK V were used. In-frame and non-productive rearrangements are shown. Functional V-distal to proximal (left to right) from most D/J are listed, followed by the pseudo V utilized (#) and the un-localized/orphan V (#).

Detailed Description

Described herein are robust linear amplification-mediated high throughput whole genome translocation sequencing (HTGTS) methods to identify recombination events and/or rearrangement events in cells. In some embodiments of any aspect, the recombination event is a v (d) J recombination event. The method is particularly useful for identifying recombinations and/or rearrangements at Ig loci.

Thus, the method is useful, for example, for identifying and/or characterizing anyone who wishes to, for example, recombine V (D) J. The same method can also be used to screen agents for their effect on V (D) J recombination.

In one aspect of any embodiment, described herein is a high throughput genome wide translocation sequencing (HTGTS) -based detection method for recombination events and/or rearrangement events in a cell, the method comprising the steps of: (a) extracting genomic DNA and/or mRNA from the cells; (b) optionally, generating a fragmented DNA and/or mRNA sample; (c) i) generating single-stranded Polymerase Chain Reaction (PCR) products from genomic DNA by linear amplification mediated PCR (LAM-PCR) with at least one first locus-specific primer; and/or ii) generating complementary DNA (cDNA) from mRNA by reverse transcription using at least one first locus-specific primer; (d) generating ligated DNA and/or cDNA products by ligating the single-stranded PCR products or cdnas generated in step (c) to an adaptor, wherein the adaptor comprises: a distal portion of a known DNA sequence that can be used to design PCR primers for nested PCR amplification; a proximal portion of random nucleotides; and 3' overhang; (e) amplifying a nucleic acid sequence comprising the recombination event and/or rearrangement event by nested PCR with adaptor-specific primers and at least one second locus-specific primer using the ligation products of step (d) to generate nested PCR products; (f) optionally, digesting the PCR product of step (e) with a restriction enzyme to block unrearranged decoy-containing fragments; (g) generating a sequenced nested PCR product by sequencing the nested PCR product; and (h) aligning the sequenced nested PCR products to a reference sequence or antigen receptor database.

In one aspect of any embodiment, described herein is a method for high-throughput repertoire sequencing-based detection of Ig repertoire sequences in a cell, comprising the steps of:

a. extracting genomic DNA and/or mRNA from the cells;

b. optionally, generating a fragmented DNA and/or mRNA sample;

c. generating a single-stranded PCR product from genomic DNA by linear amplification-mediated PCR (LAM-PCR) using at least one first locus-specific primer; and/or

Generating cDNA from the mRNA by reverse transcription using at least one first locus-specific primer;

d. generating ligated DNA and/or cDNA products by ligating the single-stranded PCR products or cdnas generated in step (c) to an adaptor, wherein the adaptor comprises:

a distal portion of a known DNA sequence that can be used to design PCR primers for nested PCR amplification;

a proximal portion of random nucleotides; and

3' protruding;

e. using the ligation products of step (d) to amplify nucleic acid sequences comprising the Ig-group repertoire sequences by performing nested PCR with adaptor-specific primers and at least one second locus-specific primer to generate nested PCR products;

f. optionally, digesting the PCR product of step (e) with a restriction enzyme to block unrearranged decoy-containing fragments;

g. generating a sequenced nested PCR product by sequencing the nested PCR product; and

h. aligning the sequenced nested PCR products to a reference sequence or antigen receptor database.

As used herein, "Ig repertoire (Ig retetoreie)" refers to a group of Ig gene sequences (or a portion of these sequences) produced in a cell or organism following at least one of v (d) J recombination, somatic hypermutation, activation, selection, and the like. The Ig repertoire of individual cells obtained from a single organism can vary. In detecting an Ig repertoire, all Ig sequences in a sample (e.g., a cell or a group of cells) can be detected or portions of those sequences can be detected (e.g., the J gene segment used but not the V gene segment used; or the J gene segment used but not the SHM). The methods described herein are applicable to the detection of all portions of an Ig panel library.

In some embodiments of any aspect, detecting the Ig panel repertoire comprises detecting at least a v (d) J recombination event and/or a somatic hypermutation (SMH). In some embodiments of any aspect, detecting the Ig panel repertoire comprises detecting one or more of an Ig heavy chain, an Ig light chain, V usage, D usage, J usage, and CDR panel repertoire.

Methods for extracting genomic DNA or mRNA are well known in the art, see, e.g., Tan and Yiap, J Biomedand Biotechnol 2009; and Varma et al, Biotechnol J20072: 386-392; each of which is incorporated by reference herein in its entirety. In some embodiments of any aspect, the genomic DNA or mRNA extraction can be performed using commercially available kits, e.g., WIZARD genomic DNA purification kit (catalog number A1120; Promega, Madison, Wis.) or ReliaPrepTMRNA cell and tissue Miniprep System (Cat. No. Z6010; Promega, Madison Wis.).

The DNA and/or mRNA sample may be fragmented by any method known in the art, including but not limited to sonication, restriction enzyme digestion, random shearing, restriction with frequent cutting restriction enzymes (restriction enzyme), nebulization, acoustic shearing, point-and-slot shearing, needle shearing, and French press (a French press). In some embodiments of any aspect, the fragmenting of the nucleic acid sample can be performed by restriction enzyme digestion. Frequent cleaving enzymes are well known to those skilled in the art, which typically cleave every 4bp and use the target genomic sequence as a template, one can screen their effect on the target genome in silico (insilico). For example, MspI is a suitable frequent-cleaving enzyme in human cells, but the skilled person can easily replace this enzyme according to the needs of any given genome. As used herein, the term "fragmented DNA sample" or "fragmented mRNA sample" refers to a nucleic acid sample that has undergone a fragmentation process such that a statistically significant greater number of Double Strand Breaks (DSBs) are present in the sample than prior to the fragmentation process. In some embodiments of any aspect, the fragmented nucleic acid sample no longer comprises intact chromosomes. The skilled person can readily select a fragmentation process (including its intensity and duration) that will provide a desired degree of fragmentation, e.g. will produce a population of nucleic acid molecules of a desired size.

In some embodiments of any aspect, the fragmenting of the nucleic acid sample can be performed by sonication. Sonication provides random, unbiased fragmentation, as opposed to specific fragmentation by restriction digestion (e.g., as described in U.S. patent publication 20140234847; incorporated herein by reference in its entirety). In some embodiments of any aspect, the end repair is performed after fragmentation prior to LAM-PCR. In some embodiments of any aspect, the end repair is not performed after fragmentation but before LAM-PCR.

In some embodiments of the various aspects described herein, the genomic DNA and/or mRNA is sheared, rather than digested by specific frequent cutter enzymes (frequency enzymes). Enzymes may have a bias in linking the enriched whole genome.

In some embodiments of any aspect, the methods and compositions described herein involve performing PCR. PCR refers to the process of specifically amplifying (i.e., increasing the abundance of) a nucleic acid sequence of interest, and in some embodiments of any aspect, exponential amplification occurs when the product of a previous polymerase extension serves as a template for successive rounds of extension. The PCR amplification protocol according to the present invention comprises at least one (e.g. at least 1, at least 2, at least 5, at least 10, at least 15, at least 20, at least 25, at least 30, at least 35 or more) iterative cycle, wherein each cycle comprises the steps of: 1) strand separation (e.g., heat denaturation); 2) annealing the oligonucleotide primer to the template molecule; and 3) nucleic acid polymerase extension of the annealed primers. The conditions and time required for each of these steps can be designed by one of ordinary skill in the art. The amplification protocols according to the methods described herein are preferably performed in thermal cyclers, many of which are commercially available.

Linear amplification mediated PCR (LAM-PCR) is a PCR in which primers to a known sequence (bait) are used to generate single stranded dna (ssdna) from a target nucleic acid sequence, wherein the PCR product includes the sequence downstream of the site of primer annealing. The sequence of the PCR product may be unknown, for example, if a recombination event and/or a rearrangement event occurs in the vicinity of the decoy sequence. The ssDNA is then converted to double stranded DNA (dsDNA) and further PCR amplification reactions can be performed. In e.g. Schmidt et al, Nature Methods 20074: 1051-7; U.S. patent nos. 6,514,706; U.S. patent application US 2007/0037139; and Harkey et al, (2007) Stem Cells dev, June; 16(3): 381-392, LAM-PCR is described in further detail; each of which is incorporated by reference herein in its entirety. In some embodiments of any aspect, the LAM-PCR step can produce a single-stranded PCR product from genomic DNA.

In some embodiments of any aspect, the methods and compositions described herein involve performing a reverse transcriptase reaction, for example, by performing the reaction using an RNA template (cDNA), a primer, and an RNA-dependent DNA polymerase. Protocols and reagents for performing reverse transcription are well known in the art and are commercially available. In some embodiments of any aspect, the reverse transcription step can produce a cDNA product from the mRNA.

In some embodiments of any aspect, the LAM-PCR step is performed using a first locus-specific primer. In some embodiments of any aspect, the reverse transcription step is performed using a first locus-specific primer.

The first locus-specific primer is a primer capable of specifically annealing to: a known sequence at least one V, D or J segment; sequences flanking the V, D or J segments; or sequences flanking sequences known/suspected to be involved in rearrangement. In some embodiments of any aspect, the first locus-specific primer is a primer capable of specifically annealing to a known sequence of at least one V segment, D segment, or J segment. In some embodiments of any aspect, the first locus-specific primer is a primer capable of specifically annealing to: sequences flanking the V, D or J segment, e.g., sequences within 10bp, 20bp, 30bp, 50bp, 100bp, 200bp, 300bp, 400bp, 500bp, or 1kb of the V, D or J segment. In some embodiments of any aspect, the first locus-specific primer is a primer capable of specifically annealing to: v segment, D segment, or J segment flanking sequences, e.g., sequences within 10bp, 20bp, 30bp, 50bp, 100bp, 200bp, 300bp, or 400bp of a V segment, D segment, or J segment. In some embodiments of any aspect, the first locus-specific primer is a primer capable of specifically annealing to: sequences flanking sequences known to be involved in or suspected of being involved in rearrangement, for example, sequences within 10bp, 20bp, 30bp, 50bp, 100bp, 200bp, 300bp, 400bp, 500bp, or 1kb of sequences known to be involved in rearrangement or suspected of being involved in rearrangement. In some embodiments of any aspect, the first locus-specific primer is a primer capable of specifically annealing to: sequences flanking sequences known to be involved in or suspected of being involved in rearrangement, for example, sequences within 10bp, 20bp, 30bp, 50bp, 100bp, 200bp, 300bp, or 400bp of sequences known to be involved in rearrangement or suspected of being involved in rearrangement.

In some embodiments of any aspect, a plurality of first locus-specific primers and/or a plurality of second locus-specific primers can be used, e.g., to detect recombination and/or rearrangement at multiple loci, and/or to detect multiple individual recombination events and/or rearrangement events at the same locus. In some embodiments of any aspect, a plurality of first locus-specific primers and/or a plurality of second locus-specific primers can be used, e.g., to detect a plurality of possible recombination events and/or rearrangement events, e.g., to screen for events that occur in a plurality of possible events. In some embodiments of any aspect, the plurality of first locus-specific primers or the plurality of second locus-specific primers specifically anneal to: a different V gene segment, D gene segment, or J gene segment; sequences flanking different V, D or J segments; different portions of the same V gene segment, D gene segment, or J gene segment; and/or different sequences flanking the same V, D or J segment. In some embodiments of any aspect, one or both of the LAM-PCR, reverse transcriptase, and/or nested PCR steps can be performed in a multiplexed manner (multiplex fast), e.g., multiple primers are present in the same reaction mixture. In some embodiments of any aspect, multiple primers are present in the respective reaction mixtures, e.g., they are used in parallel.

In some embodiments of any aspect, the at least one first locus-specific primer specifically anneals to a J gene segment. In some embodiments of any aspect, a plurality of first locus-specific primers are used, and each first locus-specific primer specifically anneals to a different J gene segment. In some embodiments of any aspect, a plurality of first locus-specific primers are used, and the first locus-specific primers collectively specifically anneal to a respective different J gene segment present in the genome of the cell or organism, as the gene segment is present prior to v (d) J recombination. In some embodiments of any aspect, a plurality of first locus-specific primers are used, and the first locus-specific primers collectively specifically anneal to J H1、J H2、J H3 and J H4. In some embodiments of any aspect, a plurality of first locus-specific primers are used, and collectively, the first locus-specific primers specifically anneal to respectively different J's present in the genome of the cell or organism prior to v (d) J recombinationHGene segments, jk gene segments, and jλ gene segments.

In some embodiments of any aspect, a plurality of first locus-specific primers are used, and each first locus-specific primer specifically anneals to a different V gene segment, D gene segment, and/or J gene segment. In some embodiments of any aspect, a plurality of first locus-specific primers are used, and collectively, because they are present, the first locus-specific primers specifically anneal to respectively different V gene segments, D gene segments, and/or J gene segments present in the genome of the cell or organism, because the gene segments are present prior to V (D) J recombination.

In some embodiments of any aspect, the first locus-specific primer specifically anneals to a degenerate region of a target gene segment. In some embodiments of any aspect, the first locus-specific primer specifically anneals to the most degenerate region (most degenerate region) of the target gene segment.

In some embodiments of any aspect, the first locus-specific primer can comprise an affinity tag, e.g., to perform affinity purification using a substrate with an appropriate affinity domain. The affinity domain and tag pair can complex the two molecules by non-covalent means. In some embodiments of any aspect, the first locus-specific primer can comprise an affinity tag that can specifically bind to an affinity domain. Many affinity tags and domains are well known in the art, which are described in: for example, Lichty et al, Protein Expr Purif 200541: 98-105; zhao et al, J Analytical Methods in chemistry 2013; kimple et al, Current Protocols in Protein Science 200436: 939: 9.1-9.9.19; and Giannone et al, Methods and Protocols "Protein Affinity Tags" HumanaPress 2014; each incorporated herein by reference in its entirety. Non-limiting examples of compatible affinity domain and affinity tag pairings can include: antibodies or antigen binding fragments thereof and epitopes; an anti-His antibody or antigen-binding fragment thereof and a His tag; an anti-HA antibody or antigen-binding fragment thereof and an HA tag; an anti-FLAG antibody or antigen-binding fragment thereof and a FLAG tag; an anti-myc antibody or antigen-binding fragment thereof and a myc tag; an anti-V5 antibody or antigen-binding fragment thereof and a V5 tag; an anti-GST antibody or antigen-binding fragment thereof and a GST tag; an anti-MBP antibody or antigen-binding fragment thereof and an MBP tag; an adaptor and a target molecule recognized by the adaptor; such as streptavidin and biotin. In some embodiments of any aspect, the affinity tag and/or affinity domain is located at or near one end of the molecule, e.g., within 10 nucleotides of the end. The affinity tag and/or affinity domain may be, but is not limited to: antibodies, antigens, lectins, proteins, peptides, nucleic acids (DNA, RNA, PNA, and nucleic acids that are mixtures thereof or that comprise nucleotide derivatives or analogs); receptor molecules, such as insulin receptor; ligands for the receptor (e.g., insulin for the insulin receptor); and a biological, chemical or other molecule having affinity for another molecule. In some embodiments of any aspect, the affinity domain can be an adaptor.

One example of using an affinity domain and an affinity tag to complex two molecules is biotin-avidin conjugation or biotin-streptavidin conjugation. In this method, one of the molecular members to be conjugated together (e.g., a nuclease or a template nucleic acid) is biotinylated and the other molecule is conjugated to avidin or streptavidin. A number of commercial kits are available for biotinylating molecules such as proteins. For example, aminooxy-biotin (AOB) can be used to covalently attach biotin to molecules having aldehyde or ketone groups. In addition, primers can be coupled to biotin receptor peptides, such as AviTag or receptor peptides, (referred to as AP; Chen et al, 2nat. methods 99 (2005)). The receptor peptide sequence allows for site-specific biotinylation by the E.coli (E.coli) enzyme biotin ligase (BirA; Id.). Another non-limiting example of the use of conjugation to an affinity domain/affinity tag is the biotin sandwich method. See, e.g., Davis et al, 103PNAS 8155 (2006). In this method, two molecules to be conjugated together are biotinylated and then conjugated together using tetravalent streptavidin. In some embodiments of any aspect, the affinity tag can be biotin.

In some embodiments of any aspect, the method may further comprise isolating the PCR product (LAM-PCR product or reverse transcription product) produced in step (c) by affinity purification. In some embodiments of any aspect, affinity purification can comprise binding the PCR products and/or reverse transcription products produced in step (c) to a substrate (e.g., beads and/or columns). In some embodiments of any aspect, the substrate can be a bead. In some embodiments of any aspect, affinity purification can include binding biotin with streptavidin, e.g., binding biotin-labeled PCR products to streptavidin-containing beads, substrates, and/or columns.

The product resulting from reverse transcription and/or PCR with the first locus specific primer (optionally after isolation (e.g. affinity purification)) may be ligated to an adapter molecule. In the ligation step, one typically uses nucleic acid (e.g., DNA) at a concentration of less than 1.5 ng/microliter. Different concentrations from about 1.0 ng/microliter to about 2.5 ng/microliter can be used, and the skilled person will be able to optimize the nucleic acid concentration using routine methods.

Adaptor molecules are double-stranded oligonucleotides, e.g., dsDNA molecules comprising distal portions of known DNA sequences that can be used to design PCR primers for nested PCR amplification; and a proximal portion comprising random nucleotides and a 3' overhang. In some embodiments of any aspect, the 3 'end of the distal portion and the 3' end of the proximal portion of the adapter are modified to prevent self-ligation, e.g., by providing a 3 'dideoxynucleotide (e.g., 3' ddC). In some embodiments of any aspect, the end that does not contain a 3' protruding adaptor (e.g., the end that comprises the distal portion) is a blunt end. In some embodiments of any aspect, the 3' overhang may anneal to ss-DNA PCR products and/or reverse transcription products.

In some embodiments of any aspect, the proximal portion of the adapter can be 3-10 nucleotides in length. In some embodiments of any aspect, the proximal portion of the adapter can be 5-6 nucleotides in length. In some embodiments, the proximal portion may have some nucleotides immobilized.

In some embodiments of any aspect, the proximal portion of the adaptor molecule may consist of a 3' overhang. In some embodiments of any aspect, the proximal portion of the adapter can be 3-10 nucleotides in length. In some embodiments of any aspect, the proximal portion of the adapter can be 5-6 nucleotides in length.

In some embodiments of any aspect, the adapter can further comprise a barcode sequence, for example, between the distal portion and the proximal portion. In some embodiments of any aspect, the distal portion of the adapter comprises a sequence complementary to the adapter-specific primer used in the nested PCR step.

In some embodiments of any aspect, ligating the single stranded PCR product to an adaptor can comprise: the PCR products are contacted with a population of adaptors having the same distal portion and different random proximal portion sequences.

In the nested PCR step, a PCR reaction is performed using primers that anneal to amplified sequences resulting from the first reaction (e.g., LAM-PCR reaction and/or reverse transcription reaction) to increase the specificity of the end product. Thus, nested PCR performed on the ligated DNA products with the adaptor-specific primers and the at least one second locus-specific primer will amplify and/or replicate the nucleic acid sequence around the recombination sites and/or the rearrangement sites. Theoretically, there is no minimum or maximum as to how many rounds of nested PCR can be used. In some embodiments of any aspect, the nested PCR comprises at least 1 round, at least 2 rounds, or at least 3 rounds. In some embodiments of any aspect, the nested PCR comprises 1, 2, or 3 rounds. In some embodiments of any aspect, the nested PCR comprises 1 round, 2 rounds, 3 rounds, 1 round-2 rounds, 1 round-3 rounds, or 1 round-5 rounds. More rounds may be less useful because they can only increase amplification of an already excessive number of sequences-nested PCR (typically 2 rounds) is used to increase the specificity of the amplification reaction by using separate primer sets for the same locus. In some embodiments of any aspect, the third round or third reaction may add a barcode required for sequencing, such as 454 sequencing. Such a third round or reaction can be skipped if barcode primers are used in round 2 (or nested PCR step) or if other sequencing methods are used that do not require additional barcodes. In some aspects of all embodiments of the invention, 1 round nested PCR and an additional 1 round are performed to introduce tags or labels into the PCR products, allowing the application of specific sequencing protocols to analyze the sequence of recombination sites and/or rearrangement sites. In some aspects of all embodiments of the invention, 2 rounds of nested PCR and an additional 1 round are performed to introduce tags or labels into the PCR products, allowing the application of specific sequencing protocols to analyze the sequence of recombination sites and/or rearrangement sites.

In some embodiments of any aspect, the second locus-specific primer used in the nested PCR step can overlap with the first locus-specific primer used in the LAM-PCR or reverse transcription step. In some embodiments of any aspect, the primers are designed such that the 3 'end of the second locus specific primer anneals to a recombination site and/or a rearrangement site that is closer (e.g., by at least 1 nucleotide, by 1-2 nucleotides, by 1-3 nucleotides, by 1-5 nucleotides, etc.) than the 3' end of the first locus specific primer. In some embodiments of any aspect, the sequence of the second locus-specific primer can comprise a portion of the sequence of the first locus-specific primer. In some embodiments of any aspect, the sequence of the second locus-specific primer can comprise a 3' portion of the sequence of the first locus-specific primer. In some embodiments of any aspect, the sequence of the second locus-specific primer can comprise the sequence of the first locus-specific primer.

In some embodiments of any aspect, the one or more primers for the nested PCR step can comprise a barcode sequence. As used herein, "barcode" refers to a DNA sequence used as a barcode or tag for identifying a target molecule. In some embodiments of any aspect, the DNA sequence is foreign and/or exotic with respect to the genome of the organism to be analyzed.

In some embodiments of any aspect, the ligated DNA may be digested with a blocking enzyme (blocking enzyme), e.g., 1) after nested PCR but before sequencing, or 2) before nested PCR. Blocking enzymatic digestion can block amplification of target alleles that are not recombined and/or rearranged in subsequent steps (e.g., during nested PCR or sequencing). Based on the DNA sequence of the locus where recombination or rearrangement takes place, it is generally necessary to select in each individual case a blocking enzyme-any common restriction enzyme that cuts in the non-recombined/non-rearranged product via an enzyme restriction site (e.g. an I-SceI restriction site) and should therefore not be present in the recombined/rearranged product can be used as blocking enzyme. The selection is conventional and is based on each individual sequence. Thus, the skilled person can easily find suitable blocking enzymes for the analysis. In some embodiments of any aspect, the blocking digestion is not performed, e.g., omitted.

As used herein, the term "blocking enzyme" refers to a restriction enzyme that cleaves in the non-recombinant product and/or the non-rearranged product distal (relative to the first locus-specific primer) to the recombination site and/or the rearrangement site. The blocking enzyme does not cleave in the non-recombined/non-rearranged product proximal (relative to the first locus-specific primer) to the recombination site and/or the rearrangement site. Thus, the blocking enzyme and its sequence specificity are determined by the specific sequence of the DNA and/or mRNA used in the method, the sequence of the first locus-specific primer and the recombination and/or rearrangement. Any restriction enzyme with the appropriate specificity may be used. Given such parameters, the skilled person is readily able to select a restriction enzyme with the necessary specificity.

DNA sequencing of the nested PCR products can be performed by any method known in the art. In some embodiments of any aspect, the sequencing can be performed by a next generation sequencing method. As used herein, "next-generation sequencing" refers to an oligonucleotide sequencing technique that has the ability to sequence oligonucleotides at a higher rate than is possible with conventional sequencing methods (e.g., Sanger sequencing) due to the fact that thousands to millions of sequencing reactions are performed and read out in parallel. Non-limiting examples of next generation sequencing methods/platforms include: massively parallel tag sequencing (LynxTherapeutics); 454 pyro-sequencing (454Life Sciences/Roche Diagnostics); solid-phase, reversible dye-terminator sequencing (solid-phase, reversible dye-terminator sequencing, Solexa/Illumina): SOLiD technology (Applied Biosystems); ION semiconductor sequencing (ION Torrent); DNA nanosphere sequencing (Complete Genomics); and the Technologies available from Pacific Biosciences, Intelligen Bio-systems, Oxford Nuclear Technologies and Helicos Biosciences. In some embodiments of any aspect, the sequencing primer can comprise a portion that is compatible with the selected next generation sequencing method. The constraints and design parameters for The Next generation sequencing technology and related sequencing primers are well known in The art (see, e.g., Shender et al, "Next-Generation DNA sequencing," Nature, 2008, vol.26, No.10, 1135-1145; Mardis, "therapy of Next-Generation sequencing technology on genetics," Trends in genetics, 2007, Vol.24, No.3, pp. 133-141; Su et al, "Next-Generation sequencing and iterative in molecular diagnostics," Expert Rev Mol Diag, 11 (3): 333-43; Zhang et al, "The therapy of New-Generation sequencing on genetics," J tGenomics, 95-38 (3): 95-2011, No. 2011P, No. 208, No. 11: Writ No. 3: Writ, No. 11, No. 3: Writ No. 11, No. 3: Writ-11, No. 3: Writ-No. 3, No. 3: Writ-No. 11: Writ-Writ No. 11: Writ No. 3: Writ-No. 3: Writ No. 11: Writ No. 3: Writ 6,818,395, respectively; U.S. patent nos. 6,911,345; U.S. publication No. 2006/0252077; U.S. publication No. 2007/0070349; and U.S. publication numbers 20070070349; which is incorporated herein by reference in its entirety.

In some embodiments of any aspect, the nested PCR products can be size-selected prior to sequencing. Any reasonable size can be selected, for example, to exclude non-specific amplification products (e.g., multi-primer amplification products). In some embodiments of any aspect, nested PCR products of about 400bp to about 1kb can be selected, e.g., to exclude non-specific multi-primer amplification products. In some embodiments of any aspect, nested PCR products of about 200bp to about 1kb can be selected, e.g., to exclude non-specific multi-primer amplification products.

In some embodiments of any aspect, the sequence of the nested PCR product can be aligned with a reference sequence and/or antigen receptor database to identify the presence of sequences produced by recombination and/or rearrangement, V segments, D segments, and/or J segments involved in recombination events, or variants, mutations, and/or hypermutations associated with recombination and/or rearrangement, for example. In some embodiments of any aspect, the sequence of the nested PCR product can be aligned to a reference sequence. The reference sequence may be a sequence comprising a DNA sequence involved in recombination and/or rearrangement. Alternatively, the reference sequence may be a sequence comprising known recombination and/or rearrangement products that occur at the locus of interest (locus). The reference sequence may be, for example, a genomic sequence from the cell type to be analyzed.

In some embodiments of any aspect, the sequence of the nested PCR products can be aligned to an antigen receptor database. The antigen receptor database comprises sequences encoding antigen receptors or sequences that can be recombined to encode antigen receptors, such as Ig genes, V gene segments, D gene segments, and/or J gene segments. Antigen receptor databases are known in the art or can be assembled from data. An exemplary database is IgBLAST, which is available on the world wide web at ncbi.nlm.nih.gov/IgBLAST/free, and which allows the user to enter recombination sequences and obtain matches from a database of germline gene sequences.

In some embodiments of any aspect, the step of aligning can be performed by a non-human machine. In some embodiments of any aspect, the non-human machine may include computer executable software. In some embodiments of any aspect, the method may further comprise a display module for displaying the results of the step of aligning.

Fig. 6 depicts a computer device or system 1000 comprising one or more processors 1030 and memory 1040, the memory 1040 storing one or more programs 1050 for execution by the one or more processors 1030.

In some embodiments of any aspect, device or computer system 1000 may further include a non-transitory computer-readable storage medium 1060, the non-transitory computer-readable storage medium 1060 storing one or more programs 1050 for execution by the one or more processors 1030 of the device or computer system 1000.

In some embodiments of any aspect, the device or computer system 1000 may further comprise one or more input devices 1010 that may be configured to send information to or receive information from any one of the group consisting of: an external device (not shown), one or more processors 1030, memory 1040, non-transitory computer-readable storage medium 1060, and one or more output devices 1070. One or more input devices 1010 may be configured to wirelessly transmit information to or receive information from an external device via means for wireless communication, such as antenna 1020, a transceiver (not shown), or the like.

In some embodiments of any aspect, the device or computer system 1000 may further comprise one or more output devices 1070 that may be configured to send information to or receive information from any of the group consisting of: an external device (not shown), one or more input devices 1010, one or more processors 1030, memory 1040, and a non-transitory computer-readable storage medium 1060. One or more output devices 1070 may be configured to wirelessly transmit information to or receive information from an external device via means for wireless communication, such as antenna 1080, a transceiver (not shown), or the like.

In one aspect, described herein is a computer-implemented method for high-throughput whole genome translocation sequencing (HTGTS) and detecting recombination events and/or rearrangement events, the method comprising: on a device having one or more processors and memory storing one or more programs for execution by the one or more processors, the one or more programs including instructions for: the sequenced nested PCR products are aligned with a reference sequence to identify the sites of recombination events and/or rearrangement events and the parent sequences involved in the event.

In some embodiments of any aspect, the step of aligning is performed by an alignment program. In some embodiments of any aspect, the alignment program is Bowtie 2. In some embodiments of any aspect, the step of aligning comprises a best path search algorithm for determining the alignment. In some embodiments of any aspect, the step of aligning comprises demultiplexing the sequence reads (de-multiplexed sequence reads). In some embodiments of any aspect, demultiplexing the sequence reads comprises using a fastq-multx tool. In some embodiments of any aspect, the aligning step comprises pruning the adaptor sequences. In some embodiments of any aspect, pruning the adaptor sequence comprises using the SeqPrep utility. In some embodiments of any aspect, the step of aligning comprises mapping the reads to a reference sequence or database using Bowtie2, wherein the first fifty alignments with reported alignment scores above 50 represent perfect 25nt local alignments (perfect 25nt localization).

In some embodiments of any aspect, the step of aligning comprises a best path search algorithm to select the best aligned sequence that describes the composition of the reads. In some embodiments of any aspect, the step of aligning comprises filtering. In some embodiments of any aspect, the filtering comprises bait alignment and prey alignment. As used herein, "bait (bait)" refers to the sequence to which or adjacent to the sequence to which the first locus specific primer will anneal. A "prey" sequence is a sequence that is not contiguous with the bait sequence prior to the recombination event and/or rearrangement event, but is contiguous with the bait sequence after the recombination sequence and/or rearrangement sequence. In some embodiments of any aspect, the decoy aligns no more than 10 nucleotides than the target site (e.g., the site to which the primer anneals). In some embodiments of any aspect, the step of aligning comprises use of a vector control, an off-set nick (off-set nicking) with multiple sites, and a distal target site. In some embodiments of any aspect, the step of aligning comprises comparing the discarded alignments (discard alignments) to the selected prey alignments. In some embodiments of any aspect, if any discarded alignments exceed in both coverage and score thresholds relative to the prey alignment, the reads are filtered due to low mapping quality. In some embodiments of any aspect, the step of aligning comprises extending the bait alignment by more than 10 nucleotides beyond the primer to remove possible guide error events and other artifacts. In some embodiments of any aspect, the step of aligning comprises removing potential duplicates (duplicates) by comparing the coordinates of the end of the bait alignment to the coordinates of the start of the prey alignment in all reads. In some embodiments of any aspect, the step of aligning comprises marking a read as a repeat if the read has a prey alignment offset within 2nt of the prey alignment and prey alignment of another read and a prey alignment offset within 2 nt. In some embodiments of any aspect, the step of aligning comprises applying post-filter stringencies to remove links with gaps greater than 30nt and decoy sequences shorter than 50 nt. In some embodiments of any aspect, the step of aligning comprises removing reads having alignments to telomeric repeat sequence prey.

In some embodiments of any aspect, the computer-implemented method is used with a method for high throughput whole genome translocation sequencing (HTGTS) -based detection of recombination events and/or rearrangement events in a cell, the method comprising the steps of: (a) extracting genomic DNA and/or mRNA from the cells; (b) optionally, generating a fragmented DNA and/or mRNA sample; (c) generating a single-stranded PCR product from genomic DNA by linear amplification-mediated PCR (LAM-PCR) using at least one first locus-specific primer; and/or producing cDNA from mRNA by reverse transcription using at least one first locus-specific primer; (d) generating ligated DNA and/or cDNA products by ligating the single-stranded PCR products or cdnas generated in step (c) to adapters, wherein the adapters comprise: a distal portion of a known DNA sequence that can be used to design PCR primers for nested PCR amplification; a proximal portion of random nucleotides; and 3' overhang; (e) using the ligation products of step (d) to generate nested PCR products by nested PCR with the adaptor-specific primer and the at least one second locus-specific primer, thereby amplifying the nucleic acid sequence comprising the recombination event and/or the rearrangement event; (f) optionally, digesting the PCR product of step (e) with a restriction enzyme to block unrearranged decoy-containing fragments; (g) generating a sequenced nested PCR product by sequencing the nested PCR product; and (h) aligning the sequenced nested PCR products to a reference sequence or antigen receptor database.

In one aspect, described herein is a computer system for high throughput whole genome translocation sequencing (HTGTS) -based detection of recombination events and/or rearrangement events in a cell, the computer system comprising: one or more processors and memory to store one or more programs, the one or more programs comprising instructions for: the sequenced nested PCR products are aligned with a reference sequence and/or database to identify and/or characterize recombination events and/or rearrangement events.

In one aspect, described herein is a non-transitory computer readable storage medium storing one or more programs for high throughput whole genome translocation gene sequencing (HTGTS) -based detection of a recombination event and/or a rearrangement event in a cell, the one or more programs being executable by one or more processors of a computer system, the one or more programs comprising instructions for: the sequenced nested PCR products are aligned with a reference sequence and/or database to identify and/or characterize recombination events and/or rearrangement events.

In some embodiments of any aspect, a modern alignment program (e.g., BOWTIE 2)TM) For alignment with a reference sequence. In some embodiments of any aspect, a best path search algorithm may be used to determine the alignment. Using such algorithms allows further characterization of the break point at the join and/or the use of paired end reads.

In an exemplary embodiment, FASTQ-MULTIX from ea-utils, respectively, may be usedTMTools (available on the world wide web at code. google. com/p/eautils) and SEQPREPTMUtilities (available at githu. com/jstjohn/SeqPrep on the world wide web) demultiplex sequence reads and adaptor sequence pruning. BOWTIE2 may be usedTM(available on the world Wide Web at bowtiebo. sourceforce. net/bowtie2/manual. shtml.) maps reads to reference sequences. The highest alignment can be used, e.g., top ten, topTwenty, thirty, forty, fifty or more alignments. In some embodiments of any aspect, an alignment with an alignment score above a threshold alignment score (or the highest alignment) may be used. In some embodiments of any aspect, the threshold alignment score may be 50, representing a perfect 25nt local alignment.

In some embodiments of any aspect, a best path search algorithm can be used to select the best alignment sequence that describes the read composition, typically looking for that alignment. Aligned reads can be filtered, for example, under the following conditions: (1) reads must include both bait and prey alignments, and (2) bait alignments cannot extend more than 10 nucleotides beyond the target site. In some embodiments of any aspect, for offset nicks and vector controls with multiple sites, a distal target site can be used. The rejected alignments can be compared to the selected prey alignments; if any of the rejected alignments exceed both the coverage threshold and the score threshold relative to the prey alignment, the read can be filtered due to low mapping quality.

In some embodiments of any aspect, the decoy alignment can extend more than 10 nucleotides beyond the primer in order to remove possible guide error events and other potential artifacts. Potential duplicates can be removed by comparing the coordinates of the end of the bait alignment with the coordinates of the start of the prey alignment in all reads. A read may be marked as a repeat if it has a prey alignment offset within 2nt of the prey and prey alignments of another read and a prey alignment offset within 2 nt. Post-filtration stringency can be applied to remove linkages with gaps greater than a predetermined nucleotide length (e.g., 10nt, 20nt, 30nt, 40nt, 50nt, etc.) and with decoy sequences shorter than a predetermined length (e.g., 70nt, 60nt, 50nt, 40nt, 30nt, etc.). Reads with alignments to telomeric repeat sequences markers can also be removed.

Each of the above identified modules or programs corresponds to a set of instructions for performing a function described above. These modules and programs (i.e., sets of instructions) need not be implemented as separate software programs, procedures or modules, and thus various subsets of these modules may be combined or otherwise rearranged in various embodiments. In some implementations of any aspect, the memory can store a subset of the modules and data structures identified above. In addition, the memory may store additional modules and data structures not described above.

The illustrated aspects of the disclosure may also be practiced in distributed computing environments (distributed computing environments) where certain tasks are performed by remote processing devices that are linked through a communications network. In a distributed computing environment, program modules may be located in both local and remote memory storage devices.

Further, it should be understood that the various components described herein may comprise circuitry that may include circuit elements and components having suitable values in order to implement embodiments of the present innovations. Further, it is understood that many different components may be implemented on one or more Integrated Circuit (IC) chips. For example, in one embodiment, a set of components may be implemented in a single IC chip. In other embodiments, one or more of the respective components are fabricated or implemented on separate IC chips.

What has been described above includes examples of embodiments of the invention. It is, of course, not possible to describe every conceivable combination of components or methodologies for purposes of describing the claimed subject matter, but it is to be understood that many further combinations and permutations of the innovation are possible. Accordingly, the claimed subject matter is intended to embrace all such alterations, modifications and variations that fall within the spirit and scope of the appended claims. Furthermore, the above description of illustrated embodiments of the present disclosure, including what is described in the abstract, is not intended to be exhaustive or to limit the disclosed embodiments to the precise forms disclosed. Although specific embodiments and examples are described herein for illustrative purposes, various modifications may be made within the scope of such embodiments and examples, as those skilled in the relevant art will recognize.

In particular and in regard to the various functions performed by the above described components, devices, circuits, systems and the like, the terms used to describe such components are intended to correspond, unless otherwise indicated, to any component which performs the specified function of the described component (e.g., a functional equivalent), even though not structurally equivalent to the disclosed structure, which performs the function in the herein illustrated exemplary aspects of the claimed subject matter. In this regard, it will also be recognized that the innovation includes a system as well as a computer-readable storage medium having computer-executable instructions for performing the acts and/or events of the various methods of the claimed subject matter.

The above-described systems/circuits/modules have been described with respect to interaction between several components/blocks. It will be appreciated that such systems/circuits and components/blocks may include those components or specified sub-components, some of the specified components or sub-components, and/or additional components, and in accordance with various permutations and combinations of the foregoing. Sub-components may also be implemented communicatively coupled to components of other components rather than included (layered) within parent components. Additionally, it should be noted that one or more components may be combined into a single component providing integrated functionality or divided into several separate sub-components, and any one or more middle layers (e.g., a management layer) may be provided to communicatively couple to such sub-components to provide integrated functionality. Any components described herein may also interact with one or more other components not specifically described herein but known to those of skill in the art.

In addition, while a particular feature of the subject innovation may have been disclosed with respect to only one of several implementations, such feature may be combined with one or more other features of the other implementations as may be desired and advantageous for any given or particular application. Furthermore, to the extent that the terms "includes/including/contains", "has", "variants thereof, and other similar words are used in either the detailed description or the claims, these terms are intended to be open-ended in a manner similar to the term" comprising/containing "as the open-ended term without precluding any additional or other elements.

As used in this application, the terms "component," "module," "system," and the like are generally intended to refer to a computer-related entity, either hardware (e.g., a circuit), a combination of hardware and software, or an entity associated with an operating machine having one or more specific functionalities. For example, a component may be, but is not limited to being, a process running on a processor (e.g., a digital signal processor), a processor, an object, an executable, a thread of execution, a program, and/or a computer. For example, both an application running on a controller and the controller can be a component. One or more components may reside within a process and/or thread of execution and a component may be localized on one computer and/or distributed between two or more computers. Further, "device" may take the form of: specially designed hardware; general purpose hardware that is specially constructed for the purposes of executing software thereon that enables the hardware to perform its particular functions; software stored on a computer readable medium; or a combination thereof.

Moreover, the word "example" or "exemplary" is used herein to mean serving as an example, instance, or illustration. Any aspect or design described herein as "exemplary" is not necessarily to be construed as preferred or advantageous over other aspects or designs. Rather, use of the word "example" or "exemplary" is intended to present concepts in a particular fashion. As used in this application, the term "or" is intended to mean an open (exclusive) or rather than a closed (exclusive) or. That is, unless specified otherwise or clear from context, "X employs a or B" is intended to mean any of the natural open permutations. That is, if X employs A; x is B; or X employs both A and B, then "X employs A or B" is satisfied under any of the foregoing instances. In addition, the articles "a" and "an" as used in this application and the appended claims should generally be construed to mean "one or more" unless specified otherwise or clear from context to be directed to a singular form.

Computing devices typically include a variety of media that may include computer-readable storage media and/or communication media, where these two terms are used differently from one another herein, as described below. Computer-readable storage media can be any available storage media that can be accessed by the computer and typically is non-transitory in nature and can include both volatile and nonvolatile media, removable media, and non-removable media. By way of example, and not limitation, computer-readable storage media may be implemented in connection with any method or technology for storage of information (e.g., computer-readable instructions, program modules, structured data, or unstructured data). Computer-readable storage media may include, but are not limited to, RAM, ROM, EEPROM, flash memory or other memory technology, CD-ROM, Digital Versatile Disks (DVD) or other optical disk storage, magnetic cassettes, magnetic tape, magnetic disk storage or other magnetic storage devices, or other tangible and/or non-transitory media, which can be used to store the desired information. The computer-readable storage media may be accessed by one or more local or remote computing devices, e.g., through access requests, queries, or other data retrieval protocols, for various operations with respect to information stored by the media.

Communication media, on the other hand, typically embodies computer readable instructions, data structures, program modules, or other structured or unstructured data in a data signal that may be transitory (e.g., a modulated data signal such as a carrier wave or other transport mechanism) and includes any information delivery or transmission media. The term "modulated data signal" or signal refers to a signal that has: one or more of its features arranged or changed in such a manner as to encode information in the one or more signals. By way of example, and not limitation, communication media includes wired media such as a wired network or direct-wired connection, and wireless media such as acoustic, RF, infrared and other wireless media.

In view of the exemplary systems described above, methodologies that may be implemented in accordance with the described subject matter will be better appreciated with reference to the flow charts of the various figures. The methodologies are depicted and described as a series of acts for simplicity of explanation. However, acts in accordance with the present disclosure may occur in various orders and/or concurrently, and with other acts not presented and described herein. Moreover, not all illustrated acts may be required to implement a methodology in accordance with the disclosed subject matter. Additionally, those skilled in the art will understand and appreciate that the methodology could alternatively be represented as a series of interrelated states via a state diagram or events. Additionally, it should be appreciated that the methodologies disclosed herein are capable of being stored on an article of manufacture (an article of manufacture) to facilitate transporting and transferring such methodologies to computing devices. The term "article of manufacture" as used herein is intended to encompass a computer program accessible from any computer-readable device or storage medium.

In some embodiments of any aspect, the results of the step of aligning are displayed on a display module. In some embodiments of any aspect, the results of the comparing step are displayed on a computer monitor. In some embodiments of any aspect, the results of the aligning step are displayed by a printable medium. The display module may be any suitable device configured to receive and display computer-readable information to a user from a computer. Non-limiting examples include, for example, general purpose computers (e.g., computers based on Intel PENTIUM-type processors, Motorola PowerPC, Sun UltraSPARC, Hewlett-Packard PA-RISC processors), any of a variety of processors commercially available from Advanced Micro Devices (AMD) of Sunnyvale, Calif., or any other type of processor, visual display Devices (e.g., flat panel displays, cathode ray tubes, etc.), and various types of computer printers.

In some embodiments of any aspect, a web browser is used to provide a user interface for displaying content based on the comparison results. It should be understood that other modules of the present invention may be adapted to have a web browser interface. Through a Web browser, the user can create a request to retrieve data from the comparison. Thus, a user typically points to and clicks on a user interface element (e.g., a button, a drop down menu, a scroll bar, etc., as is conventionally used in graphical user interfaces).

In some embodiments of any aspect, the result of the aligning step is a mutation profile across a set of v (d) J rearranged nucleotide or amino acid sequences. In some embodiments of any aspect, the result of the aligning step is displayed as a mutation profile across a set of v (d) J rearranged nucleotide or amino acid sequences. Detection of a number of recombination events and/or rearrangement events (in parallel or multiplex reactions) and alignment of the events to a reference sequence/database may lead to the identification of point mutations, insertions/deletions (indels), and/or alterations of recombination/rearrangement linkages and optionally the relative frequency of such events.

The cells for the methods and assays described herein can be any type of cell, including but not limited to: eukaryotic cells, mammalian cells, human cells, plant cells, neuronal cells, fibroblasts, in vitro (in vitro) cells or in vivo (in vivo) cells. The cell may be of any type as long as it contains DNA. In some embodiments of any aspect, the cell may be a cell capable of being maintained in culture. The cell may be a primary cell or an immortalized cell. Differentiated cells as well as partially differentiated cells, pluripotent cells and stem cells (including embryonic stem cells) may also be used. In some embodiments of any aspect, the cell is a mammalian cell. In some embodiments of any aspect, the cell is a human cell.

In some embodiments of any aspect, the cell can be a somatic hypermutated v (d) J exon-containing cell, e.g., the cell can be a germinal center B lymphocyte. In some embodiments of any aspect, the cell is a mature B lymphocyte, a developing B lymphocyte, a mature T lymphocyte, or a developing T lymphocyte. In some embodiments of any aspect, the cell is a mature B lymphocyte, a developing B lymphocyte, a mature T lymphocyte, a developing T lymphocyte, a cell obtained from a germinal center, and/or a cell obtained from a peyer's patch. In some embodiments of any aspect, the cell is a germinal center B lymphocyte or a peyer's syndrome B lymphocyte. In some embodiments of any aspect, the cells can be activated using activation conditions well known to those skilled in the art to induce cell division and recombination events.

In some embodiments of any aspect, prior to step (a), the cells may be present in a tissue (e.g., in vivo). In some embodiments of any aspect, the cell may be present in an animal prior to step (a). In some embodiments of any aspect, prior to step (a), the cell may be present in an animal immunized with the antigen. In some embodiments of any aspect, the method further comprises providing a cell, wherein the cell is obtained from an animal immunized with the antigen. In some embodiments of any aspect, prior to step (a), the method further comprises immunizing the animal with the antigen and isolating the cells from the animal.

Prior to performing step (a), v (d) J recombination may be induced in the cell or source of cells. As a non-limiting example, V (D) J recombination may be induced in a cell, tissue or animal by transduction and/or ectopic expression of a RAG1/2 endonuclease. Another non-limiting example of an agent that can induce V (D) J recombination is imatinib (imatinib) (i.e., GLEEVEC, mesylate, or STI-571). In some embodiments of any aspect, the cell is a v-abl transformed B cell.

The term "agent" generally refers to any entity that is not normally present, or is not present at levels for administration to a cell, tissue, or subject. The agent may be selected from the group including, but not limited to: polynucleotides, polypeptides, small molecules, and antibodies or antigen-binding fragments thereof. The polynucleotide may be RNA or DNA, may be single-stranded or double-stranded, and may be selected from the group comprising, for example, nucleic acids and nucleic acid analogues encoding polypeptides. The polypeptide may be, but is not limited to, a naturally occurring polypeptide, a mutated polypeptide, or a fragment that retains the function of interest of the polypeptide. Other examples of agents include, but are not limited to: aptamers, peptide-nucleic acids (PNA), Locked Nucleic Acids (LNA), small organic or inorganic molecules; a sugar; an oligosaccharide; a polysaccharide; biomacromolecule, peptide mimetics; nucleic acid analogs and derivatives; extracts made from biological materials (e.g., bacteria, plants, fungi, or mammalian cells or tissues), and naturally occurring or synthetic compositions. The agent may be administered to a medium in which it is brought into contact with the cells and induces their effect. Alternatively, the agent may be intracellular by introducing into the cell a nucleic acid sequence encoding the agent, and its transcription results in the production of a nucleic acid and/or protein environmental stimulus within the cell. In some embodiments of any aspect, the agent is any chemical, entity, or moiety (moiety), including but not limited to synthetic and naturally occurring non-proteinaceous entities. In certain embodiments, the agent is a small molecule having a chemical moiety selected from, for example: for example, unsubstituted or substituted alkyl, aromatic, or heterocyclyl moieties, including macrolides, leptin, and related natural products or analogs thereof). The agent may be known to have a desired activity and/or property, or may be selected from a library of different compounds. As used herein, the term "small molecule" may refer to a "natural product-like" compound, however, the term "small molecule" is not limited to a "natural product-like" compound. Indeed, small molecules are typically characterized as containing several carbon-carbon bonds and having a molecular weight greater than about 50 daltons, but less than about 5000 daltons (5 kD). Preferably, the small molecule has a molecular weight of less than 3kD, more preferably less than 2kD, most preferably less than 1 kD. In some cases, it is preferred that the small molecule have a molecular mass (molecular mass) of equal to or less than 700 daltons.

In some embodiments of any aspect, prior to performing step (a), the method may further comprise: a step of differentiating the source cell or the source tissue to initiate V (D) J recombination. In some embodiments of any aspect, the source cell is a primary stem cell. In some embodiments of any aspect, the source cell is an Induced Pluripotent Stem Cell (IPSC). Methods of differentiating particular cells and/or tissues to initiate v (d) J recombination are known in the art, for example, methods of differentiating cells into B-lymphocyte or T-lymphocyte lineages.

In some embodiments of any aspect, the rearrangement event involves an oncogene and/or a RAG off-target cleavage site.

In some embodiments of any aspect, the cell may be a cell expressing AID; cancer cells; a cell expressing a RAG endonuclease; or a nervous system cell.

In one aspect, described herein is a kit comprising at least one first locus-specific primer that will anneal specifically within 400bp of a V segment, D segment, or J segment. In some embodiments of any aspect, the kit may further comprise an adaptor comprising: a distal portion of a known DNA sequence that can be used to design PCR primers for nested PCR amplification; a proximal portion of random nucleotides; and 3' overhang. In some embodiments of any aspect, the kit can further comprise at least one second locus-specific primer. In some embodiments of any aspect, the kit can further comprise at least one nested PCR primer. In some embodiments of any aspect, the kit can further comprise a substrate comprising an affinity domain, wherein the first locus-specific primer or the second locus-specific primer comprises an affinity tag. In some embodiments of any aspect, the kit can further comprise a cell.

A kit is any article of manufacture (e.g., a package or container) comprising at least one reagent (e.g., a first locus-specific primer and/or a second locus-specific primer) advertised, distributed, or sold as a unit for performing the methods described herein. The kits described herein may optionally comprise other components for carrying out the methods described herein. For example, the kit may comprise: fluids and compositions suitable for carrying out one or more reactions according to the methods described herein (e.g., buffers, dntps, etc.), instructional materials describing the performance of the methods described herein, and the like. In addition, the kit may comprise instruction manuals and/or may provide information regarding the relevance of the results obtained.

For convenience, the meanings of some of the terms and phrases used in the specification, examples, and appended claims are provided below. Unless otherwise indicated, or implied from the context, the following terms and phrases include the meanings provided below. These definitions are provided to aid in the description of the specific embodiments and are not intended to limit the claimed invention, as the scope of the invention is defined only by the claims. Unless defined otherwise, all technical and scientific terms used herein have the same meaning as commonly understood by one of ordinary skill in the art to which this invention belongs. To the extent that a term is clearly contradicted by its use in the art and by the definition provided herein, the definition provided herein shall govern.

For convenience, certain terms used herein in the specification, examples, and appended claims are collected here.

As used herein, "contacting" refers to any suitable means for delivering or exposing an agent to at least one cell. Exemplary delivery methods include, but are not limited to, direct delivery to cell culture medium, perfusion, injection, or other delivery methods known to those skilled in the art.

In various embodiments, the methods described herein involve performing a PCR amplification protocol with at least one primer (e.g., an oligonucleotide primer). As used herein, a "primer" refers to a DNA or RNA polynucleotide molecule or analog thereof that is capable of sequence-specifically annealing to a polynucleotide template and providing a 3' end that is a substrate for a template-dependent polymerase to produce an extension product that is complementary to the polynucleotide template. The conditions for initiation and extension typically include the presence of at least one, but more preferably all four different deoxyribonucleoside triphosphates and a polymerization inducing agent (e.g., a DNA polymerase or reverse transcriptase) in a suitable buffer (in this context, "buffer" includes solvents (typically aqueous) and the addition of necessary cofactors (cofactors) and agents that affect pH, ionic strength, etc.) and at a suitable temperature. Primers useful in the methods described herein are typically single-stranded, and the primer and its complementary strand can anneal to form a double-stranded polynucleotide. The length of the primers according to the methods and compositions described herein may be less than or equal to 300 nucleotides, for example, less than or equal to 300, or 250, or 200, or 150, or 100, or 90, or 80, or 70, or 60, or 50, or 40, and preferably 30 or less, or 20 or less, or 15 or less, but at least 10 nucleotides in length.

In some embodiments of any aspect, the PCR reaction described herein involves the use of a primer set. As used herein, the term "primer set" refers to a collection of at least two primers, including a forward primer and a reverse primer, one of which anneals to a first strand of a target nucleic acid sequence and the other of which anneals to the complementary strand of the first strand. In some embodiments of any aspect, a first primer of a subset of primer pairs can anneal to a first strand of the target nucleic acid sequence, and a second primer (e.g., a reverse primer) of the subset of primer pairs can anneal to a complementary strand of the strand. When annealing to the target and/or its complementary strand, the orientation of the primers may be such that nucleic acid synthesis starting from primer extension of one primer of the subset of primer pairs will generate a nucleic acid sequence complementary to at least a region of a second primer of the subset of primer pairs. The "first strand" of the nucleic acid target and/or sequence may be any strand of a double-stranded nucleic acid comprising the target nucleotide sequence and/or target site locus sequence, but once selected, its complementary strand is defined as the second strand. Thus, as used herein, a "forward primer" is a primer that anneals to a first strand of a nucleic acid target, while a "reverse primer" of the same group is a primer that anneals to a complementary strand of the first strand of the nucleic acid target. As used herein, "specificity," when used in the context of a primer specific for a target nucleic acid, refers to the level of complementarity between the primer and the target such that there is an annealing temperature: at this annealing temperature, the primer will anneal to and mediate amplification of the target nucleic acid, but not to or mediate amplification of non-target sequences present in the sample.

Methods for preparing primers are well known in the art, and a variety of commercial sources provide oligonucleotide synthesis services suitable for providing primers according to the methods and compositions described herein, e.g., INVITROGENTMThe custom DNA oligonucleotide of (a); life technologies; grand Island, NY or custom DNA oligonucleotides from IDT; coralville, IA.

In some embodiments of any aspect, one or more of the primers can be selected from SEQ ID No. 1-SEQ ID No. 32 or SEQ ID No. 43-SEQ ID No. 65. In some embodiments of any aspect, one or more of the primers can comprise a sequence selected from SEQ ID No. 1-SEQ ID No. 32 or SEQ ID No. 43-SEQ ID No. 65.

TABLE 4

Figure BDA0002230788130000341

Figure BDA0002230788130000351

Figure BDA0002230788130000361

PCR requires the use of nucleic acid polymerases. As used herein, the phrase "nucleic acid polymerase" refers to an enzyme that catalyzes the template-dependent polymerization of nucleoside triphosphates to form a primer extension product that is complementary to a template nucleic acid sequence. The nucleic acid polymerase initiates synthesis at the 3 'end of the annealed primer and proceeds in a direction toward the 5' end of the template. Various nucleic acid polymerases are known in the art and are commercially available. One preferred group of nucleic acid polymerases are thermostable, i.e., they maintain function after being subjected to a temperature sufficient to denature annealed complementary nucleic acid strands (e.g., 94 ℃ or sometimes higher). As understood in the art, PCR may require cycling including a strand separation step (typically involving heating the reaction mixture). As used herein, the term "strand separation" or "separating strands" means treating a nucleic acid sample such that complementary double-stranded molecules are separated into two single strands, which can be used to anneal to oligonucleotide primers. More specifically, strand separation according to the methods described herein is achieved by heating a nucleic acid sample above its Tm. Generally, for samples containing nucleic acid molecules in a buffer suitable for a nucleic acid polymerase, heating to 94 ℃ is sufficient to effect strand separation. An exemplary buffer contains 50mM KCl, 10mM Tric-HCl (pH 8.8 at 25 ℃), 0.5mM to 3mM MgCl2And 0.1% BSA.

As is also understood in the art, PCR requires annealing of primers to a template nucleic acid. As used herein, "annealing" refers to allowing two complementary or substantially complementary nucleic acid strands to hybridize, more specifically, when used in the context of PCR, such that a primer extension substrate for a template-dependent polymerase is formed. The primer-target nucleic acid annealing conditions vary according to the length and sequence of the primer, and are based on the calculated Tm of the primer. In general, the annealing step in the amplification protocol involves reducing the temperature after the strand separation step to a temperature based on the calculated Tm of the primer sequences for a time sufficient to allow such annealing. Tm can be readily predicted by one skilled in the art using any of a number of widely available algorithms, such as OLIGOTM (Molecular Biology instruments Inc., Colorado) primer design software and VENTRO NTITM(Invitrogen, Inc., California) Primer design software and programs available on the Internet (including Primer3 and Oligo sealer). For example, Tm can be calculated using NetPrimer software (Premier Biosoft; Paloalto, Calif.; and freely available over the world Wide Web http:// www.premierbiosoft.com/NetPrimer/netprlaunch/Help/xnetprlaunch. html). The Tm for the primers can also be calculated using the formula used by NetPrimer software and described in more detail in Frieir et al, PNAS 198683: 9373-9377, which is incorporated herein by reference in its entirety. Tm ═ Δ H/(Δ S + R × ln (C/4)) +16.6log ([ K +)]/(1+0.7[K+]))-273.15

Wherein Δ H is the enthalpy of helix formation; Δ S is the entropy of helix formation; r is the molar gas constant (1.987 cal/C. mol); c is the nucleic acid concentration; [ K ]+]Is the salt concentration. For most amplification protocols, the annealing temperature is selected to be about 5 ℃ lower than the predicted Tm, however, temperatures near and above Tm (e.g., 1 ℃ to 5 ℃ lower than the predicted Tm, or 1 ℃ to 5 ℃ higher than the predicted Tm) can be used, as can temperatures more than 5 ℃ lower than the predicted Tm (e.g., 6 ℃ lower, 8 ℃ lower, 10 ℃ lower, or lower). In general, the closer the annealing temperature is to the Tm, the higher the specificity of annealing. The time allowed for primer annealing in a PCR amplification protocol depends largely onDepending on the volume of the reaction (larger volumes require longer time), but also on the concentration of primer and template (the higher the relative concentration of primer to template requires less time than the lower relative concentration). Depending on the volume and relative concentration of primer/template, the primer annealing step in the amplification protocol may be between about 1 second and 5 minutes, but is typically between 10 seconds and 2 minutes, preferably between about 30 seconds and 2 minutes. As used herein, "substantially annealed" refers to the degree of annealing during a PCR amplification protocol that is sufficient to generate detectable levels of a specific amplification product.

PCR also relies on polymerase extension of the annealed primers at each cycle. As used herein, the term "polymerase extension" means the incorporation of at least one complementary nucleotide to the 3' end of an annealed primer in a template-dependent manner by means of a nucleic acid polymerase. Polymerase extension preferably adds more than one nucleotide, preferably up to and including the nucleotide corresponding to the full length of the template. Conditions for polymerase extension vary with the type of polymerase. The temperature used for polymerase extension is generally based on the known activity properties of the enzyme. While annealing temperatures require, for example, lower than optimal temperatures for the enzyme, it is generally acceptable to use lower extension temperatures. Generally, polymerase extension involving the most commonly used thermostable polymerases (e.g., Taq polymerase and variants thereof) is performed at 65 ℃ -75 ℃ (e.g., 68 ℃ -72 ℃) although the enzyme retains at least some activity at temperatures below the optimal extension temperature of the enzyme.

Primer extension is performed under conditions that allow extension of the annealed oligonucleotide primer. As used herein, the term "conditions that allow extension of the annealed oligonucleotide to generate an extension product" refers to a set of conditions including, for example, temperature, salt and cofactor concentration, pH, and enzyme concentration under which a nucleic acid polymerase catalyzes primer extension. Such conditions will vary with the type of nucleic acid polymerase used, however, conditions for a large number of useful polymerases are well known to those skilled in the art. An exemplary condition is 50mM KCl, 10mM Tric-HCl (pH 8.8 at 25 ℃), 0.5mM-3mM MgCl 2200 μ M of each dNTP and 0.1% BSA, 72 ℃ under which conditions Taq polymerase catalytic primersAnd (4) extending.

As used herein, "amplification product" or "PCR product" refers to a polynucleotide produced by a PCR reaction that is a copy of a portion of a particular target nucleic acid sequence and/or its complement, whose nucleotide sequence corresponds to the template nucleic acid sequence and/or its complement. The amplification product may be double-stranded or single-stranded.

As used herein, the terms "protein" and "polypeptide" are used interchangeably herein to refer to a series of amino acid residues linked to each other by peptide bonds between the α -amino group and the α -carboxyl group of adjacent residues the terms "protein" and "polypeptide" refer to a polymer of amino acids, including modified amino acids (e.g., phosphorylated, glycated, glycosylated, etc.) and amino acid analogs, regardless of size or function, "protein" and "polypeptide" are often used to refer to relatively large polypeptides, while the term "peptide" is often used to refer to small polypeptides, however, the use of these terms in the art overlaps with each other.

As used herein, the term "nucleic acid" or "nucleic acid sequence" refers to any molecule, preferably a polymeric molecule, comprising units of ribonucleic acid, deoxyribonucleic acid, or analogs thereof. The nucleic acid may be single-stranded or double-stranded. The single-stranded nucleic acid may be one nucleic acid strand of denatured double-stranded DNA. Alternatively, the single-stranded nucleic acid may be a single-stranded nucleic acid not derived from any double-stranded DNA. In one aspect, the nucleic acid can be DNA. In another aspect, the nucleic acid can be RNA. Suitable nucleic acid molecules are DNA, including genomic DNA or cDNA. Other suitable nucleic acid molecules are RNA, including mRNA.

The term "statistically significant" or "significant" refers to statistical significance, and generally means a difference of 2 standard deviations (2SD) or greater.

Other than in the operating examples, or where otherwise indicated, all numbers expressing quantities of ingredients or reaction conditions used herein are to be understood as being modified in all instances by the term "about". The term "about" when used in connection with a percentage may mean ± 1%.

As used herein, the term "comprising" or "comprises" is used to denote compositions, methods, and respective components thereof, necessary for a method or composition, and remains open to inclusion of unspecified elements, whether or not necessary.

The term "consisting of …" relates to the compositions, methods and respective components thereof described herein, excluding any elements not listed in the description of the embodiments.

As used herein, the term "consisting essentially of …" refers to those elements required for a given implementation. The terms allow for elements that do not materially affect the basic and novel characteristics or characteristics of the embodiments.

The singular terms "a" and "the" encompass plural referents unless the context clearly dictates otherwise. Similarly, the word "or" is intended to encompass "and (and)" unless the context clearly indicates otherwise. Although methods and materials similar or equivalent to those described herein can be used in the practice or testing of the present disclosure, suitable methods and materials are described below. The abbreviation "e.g." is derived from latin, for example (exempli gratia), and is used herein to denote non-limiting examples. The abbreviation "e.g." is therefore synonymous with the term "e.g. (for example)".

Definitions of terms commonly used in cell biology and molecular biology can be found in the following works: "The Merck Manual of Diagnosis and Therapy", 19 th edition, published by Merck Research Laboratories, 2006(ISBN 0-911910-19-0); robert S.Porter et al (eds.), The Encyclopedia of molecular biology, Blackwell Science Ltd., 1994(ISBN 0-632-02182-9); benjamin Lewis, Genes X, published by Jones & Bartlett Publishing, 2009 (ISBN-10: 0763766321); kendrew et al (ed.), Molecular Biology and Biotechnology a Comprehensive Desk Reference, VCHPublishers, Inc. published 1995(ISBN 1-56081-; and Current Protocols in ProteinSciences 2009, Wiley Interscience, Coligan et al (ibid.).

Unless otherwise indicated, the invention is carried out using standard procedures such as those described in the following works: sambrook et al, Molecular Cloning: a Laboratory Manual (4 th edition), Cold Spring Harbor Laboratory Press, Cold Spring Harbor, N.Y., USA (2012); davis et al, Basic Methods in molecular biology, Elsevier Science Publishing, Inc., New York, USA (1995); or Methods in enzymology, Guide to Molecular Cloning technologies, Vol.152, S.L.Berger and A.R.Kimmel, Academic Press Inc., San Diego, USA (1987); current Protocols in protein Science (CPPS) (John E.Coligan et al, John Wiley and Sons, Inc.), Current Protocols in Cell Biology (CPCB) (Juan S.Bonifacino et al, John Wiley and Sons, Inc.); and the Culture of Animal Cells A Manual of Basic technical by R.IanFreshney, publishers: Wiley-Liss; 5 th edition (2005), Animal Cell Culture Methods (Methods in Cell Biology, Vol.57, eds. Jennie P.Mather and David Barnes, Academic Press, 1 st edition, 1998); they are incorporated by reference herein in their entirety.

Other terms are defined herein in the description of the various aspects of the invention.

All patents and other publications (including text publications, issued patents, published patent applications, and co-pending (co-pending) patent applications) cited throughout this application are hereby expressly incorporated herein by reference for the purpose of description and disclosure, e.g., the methodologies described in such publications that may be used in connection with the techniques described herein. These publications are provided solely for their disclosure prior to the filing date of the present application. In this respect, it should not be taken as an admission that the inventors are not entitled to antedate such disclosure by virtue of prior invention or for any other reason. All statements as to the date or representation as to the contents of these documents is based on the information available to the applicant and does not constitute any admission as to the correctness of the dates or contents of these documents.

The description of the embodiments of the present disclosure is not intended to be exhaustive or to limit the disclosure to the precise forms disclosed. While specific embodiments of, and examples for, the disclosure are described herein for illustrative purposes, various equivalent modifications are possible within the scope of the disclosure, as those skilled in the relevant art will recognize. For example, while method steps or functions are presented in a given order, other embodiments may perform the functions in a different order, or may perform the functions substantially simultaneously. The teachings of the disclosure provided herein may be applied to other procedures or methods in an appropriate manner. The various embodiments described herein can be combined to provide further embodiments. Aspects of the disclosure can be modified, if necessary, to employ compositions, functions and concepts of the foregoing references and applications to provide yet further embodiments of the disclosure. Furthermore, for reasons of biological functional equivalence, some changes can be made in protein structure without affecting biological or chemical effects in kind or quantity. These and other changes can be made to the disclosure in light of the detailed description. All such modifications are intended to fall within the scope of the appended claims.

Particular elements of any of the preceding embodiments may be combined or substituted for elements of other embodiments. Moreover, while advantages associated with certain embodiments of the disclosure have been described in the context of those embodiments, other embodiments may also exhibit such advantages, but not all embodiments need necessarily exhibit such advantages to fall within the scope of the disclosure.

The techniques described herein are further illustrated by the following examples, but in no way should be construed as further limiting the techniques described herein.

Some embodiments of the technology described herein may be defined according to any of the following numbered paragraphs:

1. a high throughput whole genome translocation sequencing (HTGTS) -based detection method for recombination events and/or rearrangement events in a cell, the method comprising the steps of:

a. extracting genomic DNA and/or mRNA from the cells;

b. optionally, generating a fragmented DNA and/or mRNA sample;

c. generating a single-stranded PCR product from genomic DNA by linear amplification-mediated PCR (LAM-PCR) using at least one first locus-specific primer; and/or

Generating cDNA from the mRNA by reverse transcription using at least one first locus-specific primer;

d. generating ligated DNA and/or cDNA products by ligating the single-stranded PCR products or cDNA generated in step (c) to an adaptor, wherein the adaptor comprises:

a distal portion of a known DNA sequence that can be used to design PCR primers for nested PCR amplification;

a proximal portion of random nucleotides; and

3' protruding;

e. using the ligation products of step (d) to generate nested PCR products by nested PCR with the adaptor-specific primer and the at least one second locus-specific primer, thereby amplifying the nucleic acid sequence comprising the recombination event and/or the rearrangement event;

f. optionally, digesting the PCR product of step (e) with a restriction enzyme to block unrearranged decoy-containing fragments;

g. generating a sequenced nested PCR product by sequencing the nested PCR product; and

h. aligning the sequenced nested PCR products to a reference sequence or antigen receptor database.

2. The method of paragraph 1 wherein the recombination event is a V (D) J recombination event.

3. The method of paragraph 2 wherein the cells are selected from the group consisting of:

mature B lymphocytes, developing B lymphocytes, mature T lymphocytes, or developing T lymphocytes.

4. The method of any of paragraphs 2-3, wherein the method further comprises providing the cell, wherein the cell is obtained from an animal immunized with an antigen.

5. The method of any of paragraphs 2-4, wherein the method further comprises providing the cell, wherein the cell comprises a V (D) J exon that has undergone somatic hypermutation.

6. The method of paragraph 5 wherein said cells are germinal center B lymphocytes.

7. The method of any of paragraphs 2-6, further comprising, prior to performing step (a), the step of:

immunizing an animal with an antigen; and

obtaining cells from said animal.

8. The method of any of paragraphs 1-7, wherein the method further comprises using a plurality of first locus-specific primers and/or a plurality of second locus-specific primers.

9. The method of paragraph 8, wherein the plurality of primers specifically anneal to different V gene segments, D gene segments, or J gene segments.

10. The method of any of paragraphs 1-9, further comprising the step of differentiating the source cell or source tissue to initiate v (d) J recombination prior to performing step (a).

11. The method of paragraph 10 wherein the source cell is an induced pluripotent stem cell.

12. The method of paragraph 10 wherein the source cells are primary stem cells.

13. The method of any of paragraphs 1-12, wherein, prior to performing step (a), the cell or source is transduced with a RAG1/2 endonuclease to initiate v (d) J recombination.

14. The method of any one of paragraphs 1-13, further comprising the step of contacting the cell with one or more agents that initiate v (d) J recombination.

15. The method of paragraph 14, wherein said agent that initiates V (D) J recombination is imatinib.

16. The method of paragraph 15 wherein the cell is a v-abl virus transformed B cell.

17. The method of paragraph 1 wherein the rearrangement event involves oncogene and/or RAG off-target cleavage sites.

18. The method of paragraph 1 or 17, wherein the cell is selected from the group consisting of:

a cell expressing AID, a cancer cell, a cell expressing a RAG endonuclease, or a nervous system cell.

19. The method of any of paragraphs 1-18, wherein the first locus-specific primer comprises an affinity tag.

20. The method of paragraph 19 wherein the method further comprises isolating the product of step (c) by affinity purification.

21. The method of any of paragraphs 19-20, wherein the affinity tag is biotin.

22. The method of paragraph 21 wherein said affinity purification comprises binding biotin with streptavidin.

23. The method of any of paragraphs 20-22, wherein the affinity purification comprises binding the product of step (c) to a substrate.

24. The method of paragraph 23 wherein the substrate is a bead.

25. The method of any of paragraphs 1-24, wherein the primers used in the nested PCR step comprise barcode sequences.

26. The method of any of paragraphs 1-25, wherein said fragmenting is performed by sonication or restriction enzyme digestion.

27. The method of any of paragraphs 1-26, wherein the fragmenting is performed by randomly shearing genomic DNA or with frequent cutting restriction enzymes.

28. The method of any of paragraphs 1-27, wherein ligating the product of step (c) to an adaptor comprises contacting the product with a population of adaptors having the same distal portion sequence and random proximal portion sequence.

29. The method of any of paragraphs 1-28, wherein the proximal portion of the adapter is 3-10 nucleotides in length.

30. The method of any of paragraphs 1-29, wherein the proximal portion of the adapter is 5-6 nucleotides in length.

31. The method of any of paragraphs 1-30, wherein said adapter comprises a barcode sequence between a distal portion and a proximal portion.

32. The method of any of paragraphs 1-31, wherein the PCR product produced in step (e) is size-selected prior to sequencing.

33. The method of any of paragraphs 1-32, wherein, prior to step (a), said cells are present in a tissue.

34. The method of any of paragraphs 1-33, wherein the sequencing is performed using a next generation sequencing method.

35. The method of any of paragraphs 1-34, wherein the aligning step is performed by a non-human machine.

36. The method of paragraph 35 wherein the non-human machine contains computer executable software.

37. The method of paragraph 35 further comprising a display module for displaying the results of said aligning step.

38. The method of any of paragraphs 34-37, wherein the result of said aligning step is a mutation profile across a set of v (d) J rearranged nucleotide or amino acid sequences.

39. The method of any of paragraphs 1-38, wherein said cell is a mammalian cell.

40. The method of any of paragraphs 1-39, wherein the blocking digestion step (f) is omitted.

41. The method of any of paragraphs 1-40, wherein no end repair is performed prior to step (c).

42. The method of any of paragraphs 1-41, wherein one or more of said primers comprises a sequence selected from the group consisting of SEQ ID No. 1-SEQ ID No. 32.

43. The method of any of paragraphs 1-41, wherein one or more of said primers is selected from the group consisting of SEQ ID No:1 to SEQ ID No: 32.

Some embodiments of the technology described herein may be defined according to any of the following numbered paragraphs:

1. a high throughput whole genome translocation sequencing (HTGTS) -based detection method for recombination events and/or rearrangement events in a cell, the method comprising the steps of:

a. extracting genomic DNA and/or mRNA from the cells;

b. optionally, generating a fragmented DNA and/or mRNA sample;

c. generating a single-stranded PCR product from genomic DNA by linear amplification-mediated PCR (LAM-PCR) using at least one first locus-specific primer; and/or

Generating cDNA from the mRNA by reverse transcription using at least one first locus-specific primer;

d. generating ligated DNA and/or cDNA products by ligating the single-stranded PCR products or cDNA generated in step (c) to an adaptor, wherein the adaptor comprises:

a distal portion of a known DNA sequence that can be used to design PCR primers for nested PCR amplification;

a proximal portion of random nucleotides; and

3' protruding;

e. using the ligation products of step (d) to generate nested PCR products by nested PCR with the adaptor-specific primer and the at least one second locus-specific primer, thereby amplifying the nucleic acid sequence comprising the recombination event and/or the rearrangement event;

f. optionally, digesting the PCR product of step (e) with a restriction enzyme to block unrearranged decoy-containing fragments;

g. generating a sequenced nested PCR product by sequencing the nested PCR product; and

h. aligning the sequenced nested PCR products to a reference sequence or antigen receptor database.

2. The method of paragraph 1 wherein the recombination event is a V (D) J recombination event.

3. A method for high throughput repertoire sequencing-based detection of Ig repertoire sequences in a cell, the method comprising the steps of:

a. extracting genomic DNA and/or mRNA from the cells;

b. optionally, generating a fragmented DNA and/or mRNA sample;

c. generating a single-stranded PCR product from genomic DNA by linear amplification-mediated PCR (LAM-PCR) using at least one first locus-specific primer; and/or

Generating cDNA from the mRNA by reverse transcription using at least one first locus-specific primer;

d. generating ligated DNA and/or cDNA products by ligating the single-stranded PCR products or cDNA generated in step (c) to an adaptor, wherein the adaptor comprises:

a distal portion of a known DNA sequence that can be used to design PCR primers for nested PCR amplification;

a proximal portion of random nucleotides; and

3' protruding;

e. using the ligation products of step (d) to generate nested PCR products by nested PCR with adaptor-specific primers and at least one second locus-specific primer, thereby amplifying nucleic acid sequences comprising Ig-group repertoire sequences;

f. optionally, digesting the PCR product of step (e) with a restriction enzyme to block unrearranged decoy-containing fragments;

g. generating a sequenced nested PCR product by sequencing the nested PCR product; and

h. aligning the sequenced nested PCR products to a reference sequence or antigen receptor database.

4. The method of paragraph 3 wherein the repertoire detected comprises a V (D) J recombination event and/or a somatic hypermutation (SMH).

5. The method of any of paragraphs 3-4, wherein the repertoires detected comprise Ig heavy chain, Ig light chain, V usage and CDR3 repertoires.

6. The method of any of paragraphs 1-5, wherein said cells are selected from the group consisting of:

mature B lymphocytes, developing B lymphocytes, mature T lymphocytes, developing T lymphocytes, cells obtained from germinal centers, and cells obtained from peyer's patches.

7. The method of any of paragraphs 1-6, wherein the method further comprises providing the cell, wherein the cell is obtained from an animal immunized with an antigen.

8. The method of any of paragraphs 1-7, wherein the method further comprises providing the cell, wherein the cell comprises a V (D) J exon that has undergone somatic hypermutation.

9. The method of paragraph 8 wherein said cell is a germinal center B lymphocyte or a Payer's syndrome B lymphocyte.

10. The method of any of paragraphs 1-9, further comprising, prior to performing step (a), the step of:

immunizing an animal with an antigen; and

obtaining cells from said animal.

11. The method of any of paragraphs 1-10, wherein the at least one first locus-specific primer specifically anneals to a J gene segment.

12. The method of any of paragraphs 1-11, wherein the method further comprises using a plurality of first locus-specific primers and/or a plurality of second locus-specific primers.

13. The method of paragraph 12, wherein each primer of the plurality of primers specifically anneals to a different V gene segment, D gene segment, and/or J gene segment.

14. The method of paragraph 13 wherein each of the plurality of primers specifically anneals to a different J gene segment present in the genome of the cell or organism prior to v (d) J recombination.

15. The method of paragraph 14 wherein the plurality of primers anneal to J together specificallyH1、J H2、J H3 or J H4, respectively.

16. The method of paragraph 14 wherein the plurality of primers anneal to J together specificallyHGene segment, JKGene segment and JLAt least one sequence of each of the gene segments, said JHGene segment, JKGene segment and JLThe gene segment is present in the genome of the cell or organism prior to V (D) J recombination.

17. The method of any of paragraphs 1-16, wherein the at least one first locus specific primer specifically anneals to a degenerate region of the target gene segment.

18. The method of any of paragraphs 1-17, further comprising the step of differentiating the source cell or source tissue to initiate v (d) J recombination prior to performing step (a).

19. The method of paragraph 18 wherein the source cell is an induced pluripotent stem cell.

20. The method of paragraph 18 wherein the source cells are primary stem cells.

21. The method of any of paragraphs 1-20, wherein, prior to performing step (a), the cell or source is transduced with a RAG1/2 endonuclease to initiate v (d) J recombination.

22. The method of any of paragraphs 1-21, further comprising the step of contacting the cell with one or more agents that initiate v (d) J recombination or SHM.

23. The method of paragraph 22 wherein the agent that initiates V (D) J recombination is imatinib.

24. The method of paragraph 23 wherein the cell is a v-abl virus transformed B cell.

25. The method of paragraphs 1-24 wherein the rearrangement event involves off-target cleavage sites by oncogenes and/or RAGs.

26. The method of any of paragraphs 1-25, wherein said cells are selected from the group consisting of:

a cell expressing AID, a cancer cell, a cell expressing a RAG endonuclease, or a nervous system cell.

27. The method of any of paragraphs 1-26, wherein the first locus-specific primer comprises an affinity tag.

28. The method of paragraph 27 wherein the method further comprises isolating the product of step (c) by affinity purification.

29. The method of any of paragraphs 27-28, wherein the affinity tag is biotin.

30. The method of paragraph 29 wherein said affinity purification comprises binding biotin with streptavidin.

31. The method of any of paragraphs 28-30 wherein said affinity purification comprises binding the product of step (c) to a substrate.

32. The method of paragraph 31 wherein the substrate is a bead.

33. The method of any of paragraphs 1-32, wherein the primers used in the nested PCR step comprise a barcode sequence;

34. the method of any of paragraphs 1-33, wherein said fragmenting is performed by sonication or restriction enzyme digestion.

35. The method of any of paragraphs 1-34, wherein the fragmenting is performed by randomly shearing genomic DNA or with frequent cutting restriction enzymes.

36. The method of any of paragraphs 1-35, wherein ligating the product of step (c) to an adaptor comprises contacting the product with a population of adaptors having the same distal portion sequence and a random proximal portion sequence.

37. The method of any of paragraphs 1-36, wherein the proximal portion of the adapter is 3-10 nucleotides in length.

38. The method of any of paragraphs 1-37, wherein the proximal portion of the adapter is 5-6 nucleotides in length.

39. The method of any of paragraphs 1-38, wherein said adapter comprises a barcode sequence between a distal portion and a proximal portion.

40. The method of any of paragraphs 1-39, wherein the PCR product produced in step (e) is size-selected prior to sequencing.

41. The method of any of paragraphs 1-40, wherein, prior to step (a), said cells are present in a tissue.

42. The method of any of paragraphs 1-41, wherein said sequencing is performed using a next generation sequencing method.

43. The method of any of paragraphs 1-42, wherein said aligning step is performed by a non-human machine.

44. The method of paragraph 43 wherein the non-human machine contains computer executable software.

45. The method of paragraph 43 further comprising a display module for displaying the results of the step of aligning.

46. The method of any of paragraphs 1-45, wherein the result of said aligning step is a mutation profile across a set of V (D) J rearranged nucleotide or amino acid sequences.

47. The method of any of paragraphs 1-46, wherein said cells are mammalian cells.

48. The method of any of paragraphs 1-47, wherein the blocking digestion step (f) is omitted.

49. The method of any of paragraphs 1-48, wherein no end repair is performed prior to step (c).

50. The method of any of paragraphs 1-49, wherein one or more of said primers comprises a sequence selected from the group consisting of SEQ ID No. 1-SEQ ID No. 32 or SEQ ID No. 43-SEQ ID No. 65.

51. The method of any of paragraphs 1-50, wherein one or more of said primers is selected from the group consisting of SEQ ID No:1-SEQ ID No:32 and SEQ ID No:43-SEQ ID No: 65.

204页详细技术资料下载
上一篇:一种医用注射器针头装配设备
下一篇:用于检测核酸标识符污染的测定方法和组合物

网友询问留言

已有0条留言

还没有人留言评论。精彩留言会获得点赞!

精彩留言,会给你点赞!

技术分类