Compositions and methods for digital polymerase chain reaction

文档序号:704492 发布日期:2021-04-13 浏览:72次 中文

阅读说明:本技术 用于数字聚合酶链反应的组合物和方法 (Compositions and methods for digital polymerase chain reaction ) 是由 翁莉 马利克·法哈姆 邓凌锋 林盛榕 于 2019-07-03 设计创作,主要内容包括:在一些方面,本公开提供了用于鉴定核酸样品中的序列变体的方法。在一些实施方案中,方法包括区分多核苷酸中的真正突变与在扩增步骤过程中引入的随机错误。在一些实施方案中,该方法减少由数字PCR测定报告的假阳性的数目。在一些实施方案中,该方法提高数字PCR测定的准确性。(In some aspects, the present disclosure provides methods for identifying sequence variants in a nucleic acid sample. In some embodiments, the method comprises distinguishing true mutations in the polynucleotide from random errors introduced during the amplification step. In some embodiments, the method reduces the number of false positives reported by a digital PCR assay. In some embodiments, the method improves the accuracy of digital PCR assays.)

1. A method of identifying sequence variants in a nucleic acid sample comprising a plurality of polynucleotides, the method comprising:

(a) circularizing said plurality of polynucleotides to form a plurality of circularized polynucleotides;

(b) amplifying the plurality of circularized polynucleotides to generate a plurality of concatemers, each concatemer comprising a plurality of sequence repeats;

(c) partitioning the plurality of concatemers into a plurality of partitions such that on average no more than one concatemer comprising the target sequence is present in an individual partition,

wherein individual partitions of said plurality of partitions contain at least one of a first probe and a second probe, wherein said first probe binds to said target sequence lacking said sequence variant and generates a first signal, and said second probe binds to said target sequence containing said sequence variant and generates a second signal;

(d) detecting the first signal and the second signal from the separate partitions; and

(e) identifying the sequence variant as present in the target sequence only if the level of the second signal exceeds a threshold level indicative of one copy of the target sequence and the level of the first signal is below a threshold level indicative of one copy of the target sequence.

2. The method of claim 1, further comprising identifying the sequence variant as absent when the level of the first signal exceeds a threshold level indicative of one copy of the target sequence and the level of the second signal is below a threshold level indicative of one copy of the target sequence.

3. The method of claim 1 or 2, further comprising identifying a false positive when the level of the first signal exceeds a threshold level indicative of one copy of the target sequence and the level of the second signal exceeds a threshold level indicative of one copy of the target sequence.

4. The method of any one of claims 1-3, further comprising outputting a result based on the identifying.

5. The method of claim 4, wherein false positives are omitted from the results.

6. The method of any one of claims 1-5, wherein the plurality of polynucleotides comprises single stranded polynucleotides.

7. The method of any one of claims 1-6, wherein the plurality of polynucleotides comprises cell-free DNA.

8. The method of any one of claims 1-7, wherein said circularizing comprises ligating the 5 'end and 3' end of at least one polynucleotide of said plurality of polynucleotides.

9. The method of any one of claims 1-8, wherein the circularizing comprises ligating an adaptor to the 5 'end, the 3' end, or both the 5 'end and the 3' end of at least one polynucleotide of the plurality of polynucleotides.

10. The method of any one of claims 1-9, wherein the amplifying comprises amplifying using a polymerase having strand displacement activity.

11. The method of any one of claims 1-10, wherein the amplifying comprises amplifying the plurality of circularized polynucleotides using rolling circle amplification.

12. The method of any one of claims 1-11, wherein the amplifying comprises subjecting the plurality of circular polynucleotides to an amplification reaction mixture comprising random primers.

13. The method of any one of claims 1-11, wherein the amplifying comprises subjecting the plurality of circular polynucleotides to an amplification reaction mixture comprising one or more primers, each primer specifically hybridizing to a different target sequence by sequence complementarity.

14. The method of any one of claims 1-13, wherein the plurality of concatemers is not enriched prior to the partitioning.

15. The method of any one of claims 1-14, further comprising, prior to the partitioning, fragmenting the plurality of concatemers to generate a plurality of fragmented concatemers.

16. The method of claim 15, further comprising, after the fragmenting and before the allocating, selecting a plurality of the fragmented concatemers based on size.

17. The method of any one of claims 1-16, wherein the plurality of partitions comprise emulsion-based droplets.

18. The method of claim 17, wherein the emulsion-based droplets comprise picoliter or nanoliter sized droplets.

19. The method of any one of claims 1-16, wherein the plurality of partitions comprise pores or tubes.

20. The method of any one of claims 1-19, wherein the first probe comprises a first detectable label and the second probe comprises a second detectable label.

21. The method of claim 20, wherein the first detectable label comprises a first fluorescent label and the second detectable label comprises a second fluorescent label.

22. The method of claim 21, wherein the first fluorescent label and the second fluorescent label differ in emission spectra.

23. The method of any one of claims 1-22, wherein the detecting further comprises measuring the strength of the first signal and the second signal.

24. The method of any one of claims 1-23, wherein the sequence variant is a single nucleotide polymorphism.

25. The method of any one of claims 1-24, wherein the first probe and the second probe are Taqman assay based probes.

26. The method of claim 25, further comprising, after the dispensing and before the detecting, performing a polymerase chain reaction on the concatemer to amplify a region of the plurality of sequence repeats.

27. A method of reducing errors in a digital polymerase chain reaction on a nucleic acid sample comprising less than 50ng of a polynucleotide, the method comprising:

(a) circularizing individual polynucleotides in said nucleic acid sample to generate a plurality of circularized polynucleotides;

(b) amplifying the plurality of circularized polynucleotides to form a plurality of concatemers, each concatemer comprising a plurality of sequence repeats;

(c) partitioning the plurality of concatemers into a plurality of partitions such that on average no more than one concatemer comprising the target sequence is present in an individual partition,

wherein individual partitions of the plurality of partitions contain at least one of a first probe and a second probe, wherein the first probe binds to the plurality of sequence repeats lacking the sequence variant and generates a first signal, and the second probe binds to the plurality of sequence repeats containing the sequence variant and generates a second signal;

(d) detecting the first signal and the second signal from the separate partitions; and

(e) a false positive is identified when the level of the first signal exceeds a threshold level indicative of one copy of the target sequence and the level of the second signal exceeds a threshold level indicative of one copy of the target sequence.

28. The method of claim 27, further comprising outputting a result.

29. The method of claim 28, wherein the results exclude the false positives.

30. The method of any one of claims 27-29, wherein the method reduces false positives by at least 20%.

31. The method of any one of claims 27-30, wherein the nucleic acid sample comprises cell-free polynucleotides.

32. The method of claim 31, wherein the cell-free polynucleotide comprises circulating tumor DNA.

33. The method of any one of claims 27-32, wherein the nucleic acid sample is from a subject.

34. The method of claim 33, wherein the nucleic acid sample is urine, blood, stool, saliva, tissue, or body fluid.

35. A system for detecting sequence variants, the system comprising:

(a) a computer configured to receive a user request for a detection reaction on a sample;

(b) an amplification system that performs a nucleic acid amplification reaction on the sample or portion thereof in response to the user request, wherein the amplification reaction comprises: (i) circularizing individual polynucleotides of said sample to form a plurality of circularized polynucleotides; and (ii) amplifying the plurality of circularized polynucleotides to form a plurality of concatemers, each concatemer comprising a plurality of sequence repeats;

(c) a partitioning system that partitions the plurality of concatemers into a plurality of partitions such that there is no more than one concatemer comprising a target sequence on average in an individual partition; and

(d) a detection system that detects a level of the first signal and a level of the second signal from the individual partitions,

wherein the first signal is generated when a first probe binds to the plurality of sequence repeats lacking the sequence variant and the second signal is generated when a second probe binds to the plurality of sequence repeats containing the sequence variant; and

(e) a report generator that sends a report to a recipient, wherein the report comprises a detection result of the sequence variant.

36. The system of claim 35, wherein the report identifies the presence of the sequence variant when the level of the second signal exceeds a threshold level indicative of one copy of the target sequence and the level of the first signal is below a threshold level indicative of one copy of the target sequence.

37. The system of claim 35 or 36, wherein the report identifies the absence of the sequence variant when the level of the first signal exceeds a threshold level indicative of one copy of the target sequence and the level of the second signal is below a threshold level indicative of one copy of the target sequence.

38. The system of any one of claims 35-37, wherein the report is identified as a false positive when the level of the first signal exceeds a threshold level indicative of one copy of a target sequence and the level of the second signal exceeds a threshold level indicative of one copy of a target sequence.

39. A computer-readable medium containing code which, when executed by one or more processors, performs a method of detecting sequence variants, the method comprising:

(a) receiving a user request for a detection reaction on a sample;

(b) performing a nucleic acid amplification reaction on the sample or portion thereof in response to the user request, wherein the amplification reaction comprises: (i) circularizing individual polynucleotides of said sample to form a plurality of circularized polynucleotides; and (ii) amplifying the plurality of circularized polynucleotides to form a plurality of concatemers, each concatemer comprising a plurality of sequence repeats;

(c) partitioning the plurality of concatemers into a plurality of partitions such that on average no more than one concatemer comprising the target sequence is present in an individual partition,

wherein individual partitions of the plurality of partitions contain at least one of a first probe and a second probe, wherein the first probe binds to the plurality of sequence repeats lacking the sequence variant and generates a first signal, and the second probe binds to the plurality of sequence repeats containing the sequence variant and generates a second signal;

(d) detecting the first signal and the second signal from the separate partitions; and

(e) identifying the sequence variant as present only if the level of the second signal exceeds a threshold level indicative of one copy of the target sequence and the level of the first signal is below a threshold level indicative of one copy of the target sequence; and

(f) generating a report comprising the results of the detection of the sequence variant.

40. The computer-readable medium of claim 39, wherein the method further comprises identifying the sequence variant as absent when the level of the first signal exceeds a threshold level indicative of one copy of the sequence variant and the level of the second signal is below a threshold level indicative of one copy of the sequence variant.

41. The computer readable medium of claim 39 or 40, wherein the method further comprises identifying a false positive when the level of the first signal exceeds a threshold level indicative of one copy of the target sequence and the level of the second signal exceeds a threshold level indicative of one copy of the target sequence.

Background

Digital Polymerase Chain Reaction (PCR) is an improvement over traditional PCR methods that allows a user to directly quantify nucleic acids in a sample. Digital PCR methods typically involve distributing the sample into a plurality of discrete partitions so that each partition can be interrogated individually. Digital PCR is very sensitive but can be difficult to scale up due to the limited pathways (plex) that can be measured in one reaction. For liquid biopsies using cell-free dna (cfdna) as input, this problem may be more problematic, as starting materials are typically rare. One approach to address this problem may be to amplify cfDNA prior to performing digital PCR in order to provide enough starting material to separate into different assays. However, errors introduced during the amplification step can produce false positives by digital PCR. This can be challenging for low allele frequency variant detection. Accordingly, provided herein are compositions and methods for performing digital polymerase chain reaction on samples with small amounts of nucleic acids. The compositions and methods can provide improvements to digital PCR techniques by reducing the number of false positive determinations.

Disclosure of Invention

In one aspect, there is provided a method of identifying sequence variants in a nucleic acid sample comprising a plurality of polynucleotides, the method comprising: (a) circularizing said plurality of polynucleotides to form a plurality of circularized polynucleotides; (b) amplifying the plurality of circularized polynucleotides to generate a plurality of concatemers, each concatemer comprising a plurality of sequence repeats; (c) partitioning the plurality of concatemers into a plurality of partitions such that on average no more than one concatemer comprising a target sequence is present in an individual partition, wherein an individual partition in the plurality of partitions contains at least one of a first probe and a second probe, wherein the first probe binds to the target sequence lacking the sequence variant and generates a first signal and the second probe binds to the target sequence containing the sequence variant and generates a second signal; (d) detecting the first signal and the second signal from the separate partitions; and (e) identifying the sequence variant as present in the target sequence only if the level of the second signal exceeds a threshold level indicative of one copy of the target sequence and the level of the first signal is below a threshold level indicative of one copy of the target sequence. In some cases, the method further comprises identifying the sequence variant as absent when the level of the first signal exceeds a threshold level indicative of one copy of the target sequence and the level of the second signal is below a threshold level indicative of one copy of the target sequence. In some cases, the method further comprises identifying a false positive when the level of the first signal exceeds a threshold level indicative of one copy of the target sequence and the level of the second signal exceeds a threshold level indicative of one copy of the target sequence. In some cases, the method further comprises outputting a result based on the identifying. In some cases, false positives are ignored from the results. In some cases, the plurality of polynucleotides comprises single stranded polynucleotides. In some cases, the plurality of polynucleotides comprises cell-free DNA. In some cases, the circularizing comprises ligating the 5 'end and the 3' end of at least one polynucleotide of the plurality of polynucleotides. In some cases, the circularizing comprises ligating an adaptor to the 5 'end, the 3' end, or both the 5 'end and the 3' end of at least one polynucleotide of the plurality of polynucleotides. In some cases, the amplifying comprises amplifying using a polymerase having strand displacement activity. In some cases, the amplifying comprises amplifying the plurality of circularized polynucleotides using rolling circle amplification. In some cases, the amplifying comprises subjecting the plurality of circular polynucleotides to an amplification reaction mixture comprising random primers. In some cases, the amplifying comprises subjecting the plurality of circular polynucleotides to an amplification reaction mixture comprising one or more primers, each primer specifically hybridizing to a different target sequence by sequence complementarity. In some cases, the plurality of concatemers is not enriched prior to the partitioning. In some cases, the method further comprises, prior to the allocating, fragmenting the plurality of concatemers to generate a plurality of fragmented concatemers. In some cases, the method further comprises, after the fragmenting and before the allocating, selecting a plurality of the fragmented concatemers based on size. In some cases, the plurality of partitions comprise emulsion-based droplets. In some cases, the emulsion-based droplets comprise picoliter-or nanoliter-sized droplets. In some cases, the plurality of partitions comprises holes or tubes. In some cases, the first probe comprises a first detectable label and the second probe comprises a second detectable label. In some cases, the first detectable label comprises a first fluorescent label and the second detectable label comprises a second fluorescent label. In some cases, the emission spectra of the first fluorescent label and the second fluorescent label are different. In some cases, the detecting further comprises measuring the strength of the first signal and the second signal. In some cases, the sequence variant is a single nucleotide polymorphism. In some cases, the first probe and the second probe are probes based on a Taqman assay. In some cases, the method further comprises, after the partitioning and before the detecting, performing a polymerase chain reaction on the concatemer to amplify the region of the plurality of sequence repeats.

In another aspect, there is provided a method of reducing errors in a digital polymerase chain reaction on a nucleic acid sample comprising less than 50ng of a polynucleotide, the method comprising: (a) circularizing individual polynucleotides in said nucleic acid sample to generate a plurality of circularized polynucleotides; (b) amplifying the plurality of circularized polynucleotides to form a plurality of concatemers, each concatemer comprising a plurality of sequence repeats; (c) partitioning the plurality of concatemers into a plurality of partitions such that on average no more than one concatemer comprising a target sequence is present in individual partitions, wherein individual partitions in the plurality of partitions contain at least one of a first probe and a second probe, wherein the first probe binds to the plurality of sequence repeats lacking the sequence variant and generates a first signal and the second probe binds to the plurality of sequence repeats containing the sequence variant and generates a second signal; (d) detecting the first signal and the second signal from the separate partitions; and (e) identifying a false positive when the level of the first signal exceeds a threshold level indicative of one copy of the target sequence and the level of the second signal exceeds a threshold level indicative of one copy of the target sequence. In some cases, the method further comprises outputting the result. In some cases, the result excludes the false positive. In some cases, the method reduces false positives by at least 20%. In some cases, the nucleic acid sample comprises cell-free polynucleotides. In some cases, the cell-free polynucleotide comprises circulating tumor DNA. In some cases, the nucleic acid sample is from a subject. In some cases, the nucleic acid sample is urine, blood, stool, saliva, tissue, or body fluid.

In another aspect, there is provided a system for detecting sequence variants, the system comprising: (a) a computer configured to receive a user request for a detection reaction on a sample; (b) an amplification system that performs a nucleic acid amplification reaction on the sample or portion thereof in response to the user request, wherein the amplification reaction comprises: (i) circularizing individual polynucleotides of said sample to form a plurality of circularized polynucleotides; and (ii) amplifying the plurality of circularized polynucleotides to form a plurality of concatemers, each concatemer comprising a plurality of sequence repeats; (c) a partitioning system that partitions the plurality of concatemers into a plurality of partitions such that there is no more than one concatemer comprising a target sequence on average in an individual partition; and (d) a detection system that detects the level of a first signal and the level of a second signal from separate partitions, wherein the first signal is generated when a first probe binds to the plurality of sequence repeats lacking the sequence variant and the second signal is generated when a second probe binds to the plurality of sequence repeats containing the sequence variant; and (e) a report generator that sends a report to a recipient, wherein the report comprises the detection of the sequence variant. In some cases, the report identifies the sequence variant as present when the level of the second signal exceeds a threshold level indicative of one copy of the target sequence and the level of the first signal is below a threshold level indicative of one copy of the target sequence. In some cases, the report identifies that the sequence variant is not present when the level of the first signal exceeds a threshold level indicative of one copy of the target sequence and the level of the second signal is below a threshold level indicative of one copy of the target sequence. In some cases, the report is identified as a false positive when the level of the first signal exceeds a threshold level indicative of one copy of the target sequence and the level of the second signal exceeds a threshold level indicative of one copy of the target sequence.

In another aspect, a computer-readable medium is provided that contains code, which when executed by one or more processors performs a method of detecting sequence variants, the method comprising: (a) receiving a user request for a detection reaction on a sample; (b) performing a nucleic acid amplification reaction on the sample or portion thereof in response to the user request, wherein the amplification reaction comprises: (i) circularizing individual polynucleotides of said sample to form a plurality of circularized polynucleotides; and (ii) amplifying the plurality of circularized polynucleotides to form a plurality of concatemers, each concatemer comprising a plurality of sequence repeats; (c) partitioning the plurality of concatemers into a plurality of partitions such that on average no more than one concatemer comprising a target sequence is present in individual partitions, wherein individual partitions in the plurality of partitions contain at least one of a first probe and a second probe, wherein the first probe binds to the plurality of sequence repeats lacking the sequence variant and generates a first signal and the second probe binds to the plurality of sequence repeats containing the sequence variant and generates a second signal; (d) detecting the first signal and the second signal from the separate partitions; and (e) identifying the sequence variant as present only if the level of the second signal exceeds a threshold level indicative of one copy of the target sequence and the level of the first signal is below a threshold level indicative of one copy of the target sequence; and (f) generating a report comprising the results of the detection of the sequence variant. In some cases, the method further comprises identifying the sequence variant as absent when the level of the first signal exceeds a threshold level indicative of one copy of the sequence variant and the level of the second signal is below a threshold level indicative of one copy of the sequence variant. In some cases, the method further comprises identifying a false positive when the level of the first signal exceeds a threshold level indicative of one copy of the target sequence and the level of the second signal exceeds a threshold level indicative of one copy of the target sequence.

Is incorporated by reference

All publications, patents, and patent applications mentioned in this specification are herein incorporated by reference to the same extent as if each individual publication, patent, or patent application was specifically and individually indicated to be incorporated by reference.

Drawings

The novel features believed characteristic of the invention are set forth with particularity in the appended claims. A better understanding of the features and advantages of the present invention will be obtained by reference to the following detailed description that sets forth illustrative embodiments, in which the principles of the invention are utilized, and the accompanying drawings of which:

fig. 1 depicts an example method for performing digital PCR, according to an embodiment of the present disclosure.

FIG. 2 depicts three embodiments relating to the formation of circularized cDNA. At the top, single-stranded dna (ssdna) circularizes in the absence of adaptors, the middle scheme depicts the use of adaptors, while the bottom scheme utilizes two adaptor oligonucleotides (yielding different sequences at each end) and may further include a splint oligonucleotide that hybridizes to the two adaptors to bring the two ends into proximity.

FIGS. 3A and 3B depict two protocols for adding adapters using blocked ends of nucleic acids.

Figure 4 depicts an embodiment of circularizing a particular target by using "molecular tweezers" to bring the two ends of a single stranded DNA into spatial proximity for ligation.

Fig. 5A, 5B and 5C depict three different ways of initiating a Rolling Circle Amplification (RCA) reaction. Figure 5A shows the use of target specific primers, such as specific target genes or target sequences of interest. This usually results in amplification of only the target sequence. Fig. 5B depicts Whole Genome Amplification (WGA) using random primers, which would typically amplify all sample sequences, which are then bioinformatically sorted during processing. Fig. 5C depicts the use of adaptor primers when adaptors are used, which also results in overall non-target specific amplification.

Figure 6 shows a PCR method according to an embodiment that facilitates sequencing of a circular polynucleotide or strand containing at least two copies of a target nucleic acid sequence using a pair of primers that are oriented away from each other when aligned within a monomer of the target sequence (also referred to as "back-to-back," e.g., oriented in both directions, but not on the ends of the domain to be amplified). In some embodiments, these primer sets are used to facilitate amplicon formation into higher order multimers of the target sequence, e.g., dimers, trimers, etc., after concatemer formation. Optionally, the method may further comprise size selection to remove amplicons smaller than dimers.

Fig. 7A, 7B, 7C and 7D depict embodiments in which back-to-back (B2B) primers are used with a "touch up" PCR step, making amplification of short products (such as monomers) less favorable. In this case, the primer has two domains; the first domain hybridizes to the target sequence (grey or black arrows) and the second domain is a "universal primer" binding domain (curved rectangles; sometimes also referred to as adapters) that does not hybridize to the original target sequence. In some embodiments, the first round of PCR is accomplished by a low temperature annealing step, allowing gene-specific sequence binding. The low temperature run produced PCR products of various lengths, including short products. After a few rounds, the annealing temperature is raised so as to favor hybridization of the entire primer (both domains); as shown, these hybridizations are visible at the end of the template, while the internal binding is less stable. Thus, higher temperatures and two domains are less favorable for producing shorter products than lower temperatures or just a single domain.

Fig. 8 is a diagram of a system according to an embodiment.

Fig. 9A, 9B, 9C, and 9D depict examples of results obtained in a digital PCR assay to detect sequence variant EGFRL858R using a method according to the present disclosure.

Fig. 10A, 10B, 10C, and 10D depict examples of results obtained in a digital PCR assay that detects sequence variant EGFRG719S using a method according to the present disclosure.

Fig. 11A, 11B, 11C, and 11D depict examples of results obtained in a digital PCR assay to detect sequence variant EGFR _ T790M using a method according to the present disclosure.

Detailed Description

The systems and methods provided herein relate generally to digital PCR techniques and improvements thereof. In some cases, the systems and methods may be applicable to nucleic acid samples containing small amounts of starting material (e.g., cell-free DNA). In some cases, the systems and methods can provide an improvement over traditional digital PCR techniques by reducing the number of false positive determinations in digital PCR assays. In some cases, the systems and methods can provide improvements over traditional digital PCR techniques by increasing the accuracy of sequence variant determination in digital PCR assays. Fig. 1 depicts an exemplary method of digital PCR assay, according to an embodiment of the present disclosure. In general, the method can include circularizing individual polynucleotides in a nucleic acid sample, and amplifying the circularized polynucleotides to generate a plurality of concatemers. In some cases, the concatemers each contain multiple sequence repeats. In some cases, at least one of the plurality of concatemers may comprise a target sequence, and the target sequence may be repeated multiple times in the concatemer. In some cases, the target sequence may comprise a sequence variant. In some cases, the target sequence may contain errors introduced into the target sequence by the amplification step. In some cases, the method can be used to distinguish between random errors and true mutations in the target sequence. As shown in fig. 1, the plurality of concatemers may be allocated into a plurality of partitions. In some cases, the plurality of concatemers may be partitioned into a plurality of partitions such that on average no more than one concatemer comprising the target sequence is present in an individual partition, the method may further comprise hybridizing the probe to the target sequence. In some cases, the probe can include a wild-type probe that is capable of binding to a wild-type sequence in the target sequence and generating a first signal (wild-type signal). In some cases, the probe can include a mutation probe that is capable of binding to a mutated sequence (e.g., comprising a sequence variant) in the target sequence and generating a second signal (a mutation signal). Without wishing to be bound by theory, if the starting polynucleotide comprises a true mutation, it is expected that each target sequence in the plurality of sequence repeats will comprise a sequence variant. Conversely, if the mutation is due to a random error during the amplification step, it is expected that most of the target sequences in the plurality of sequence repeats will comprise wild-type sequence, with a small number (1 or more) of target sequences comprising the error. Individual partitions can be interrogated and a partition with a true mutation would be expected to generate a mutant signal (rather than a wild-type signal), whereas an individual partition with a random error would be expected to generate both a mutant and a wild-type signal.

The practice of some of the embodiments disclosed herein employs, unless otherwise indicated, conventional techniques of immunology, biochemistry, chemistry, molecular biology, microbiology, cell biology, genomics, and recombinant DNA that are within the skill of the art. See, e.g., Sambrook and Green, Molecular Cloning: A Laboratory Manual, 4 th edition (2012); current Protocols in Molecular Biology series (F.M. Ausubel et al, eds.); methods In Enzymology series (Academic Press, Inc.), PCR 2: A Practical Approach (M.J. MacPherson, B.D. Hames and G.R. Taylor, eds. (1995)), Harlow and Lane, eds. (1988) Antibodies, A Laboratory Manual, and Culture of Animal Cells: A Manual of Basic technology and Specialized Applications, 6 th edition (R.I. Freshney, eds. (2010)).

The term "about" or "approximately" means within an acceptable error range for the particular value as determined by one of ordinary skill in the art, which error range will depend in part on how the value is measured or determined, i.e., the limitations of the measurement system. For example, "about" can mean within 1 or greater than 1 standard deviation, according to practice in the art. Alternatively, "about" may refer to a range of up to 20%, up to 10%, up to 5%, or up to 1% of a given value. Alternatively, particularly with respect to biological systems or processes, the term may mean within an order of magnitude, preferably within 5-fold, more preferably within 2-fold, of a value. Where a particular value is described in the application and claims, unless otherwise stated, it is to be assumed that the term "about" means within an acceptable error range for the particular value.

The terms "polynucleotide", "nucleotide sequence", "nucleic acid" and "oligonucleotide" may be used interchangeably. They refer to polymeric forms of nucleotides of any length (whether deoxyribonucleotides or ribonucleotides) or analogs thereof. The polynucleotide may have any three-dimensional structure and may perform any function, known or unknown. The following are non-limiting examples of polynucleotides: coding or non-coding regions of a gene or gene fragment, locus (loci) defined from linkage analysis, exons, introns, messenger RNA (mrna), transfer RNA (trna), ribosomal RNA (rrna), short interfering RNA (sirna), short hairpin RNA (shrna), micro RNA (mirna), ribozymes, cDNA, recombinant polynucleotides, branched polynucleotides, plasmids, vectors, isolated DNA of any sequence, isolated RNA of any sequence, nucleic acid probes, and primers. A polynucleotide may comprise one or more modified nucleotides, such as methylated nucleotides and nucleotide analogs. Modifications to the nucleotide structure, if present, may be imparted before or after assembly of the polymer. The sequence of nucleotides may be interrupted by non-nucleotide components. The polynucleotide may be further modified after polymerization, for example by conjugation with a labeling component.

Generally, the term "target polynucleotide" refers to a nucleic acid molecule or polynucleotide in a starting population of nucleic acid molecules having a target sequence, the presence, amount, and/or nucleotide sequence of which, or a change in one or more thereof, is to be determined. Generally, the term "target sequence" refers to a nucleic acid sequence on a single strand of nucleic acid. The target sequence may be a portion of a gene, regulatory sequence, genomic DNA, cDNA, RNA (including mRNA, miRNA, rRNA), or others. The target sequence may be a target sequence from a sample or a secondary target such as an amplification reaction product.

Generally, a "nucleotide probe", "probe" or "labeled oligonucleotide" refers to a polynucleotide that is used to detect or identify its corresponding target polynucleotide by hybridization to the corresponding target sequence in a hybridization reaction. Thus, an oligonucleotide probe can hybridize to one or more target polynucleotides. The labeled oligonucleotide may be fully complementary to one or more target polynucleotides in the sample, or contain one or more nucleotides that are not complementary to corresponding nucleotides in one or more target polynucleotides in the sample.

"hybridization" refers to a reaction in which: wherein one or more polynucleotides react to form a complex that is stabilized by hydrogen bonding between the bases of the nucleotide residues. This hydrogen bonding can occur by Watson-Crick base pairing, Hoogstein binding, or in any other sequence specific manner depending on base complementarity. The complex may comprise two strands forming a duplex structure, three or more strands forming a multi-stranded complex, a single strand that self-hybridizes, or any combination thereof. Hybridization reactions can constitute a step in a broader process, such as the initiation of PCR, or the cleavage of a polynucleotide by an endonuclease. The second sequence complementary to the first sequence is referred to as the "complement" of the first sequence. The term "hybridizable" as applied to a polynucleotide refers to the ability of the polynucleotide to form a complex that is stabilized by hydrogen bonding between the bases of nucleotide residues in a hybridization reaction.

"complementarity" refers to the ability of a nucleic acid to form hydrogen bonds with another nucleic acid sequence through a traditional Watson-Crick or other unconventional type. Percent complementarity refers to the percentage of residues (e.g., 5,6, 7,8, 9, 10 out of 10 are 50%, 60%, 70%, 80%, 90%, and 100% complementary, respectively) in a nucleic acid molecule that can form hydrogen bonds (e.g., Watson-Crick base pairing) with a second nucleic acid sequence. By "fully complementary" is meant that all consecutive residues of a nucleic acid sequence will hydrogen bond to the same number of consecutive residues in a second nucleic acid sequence. As used herein, "substantially complementary" refers to a degree of complementarity of at least 60%, 65%, 70%, 75%, 80%, 85%, 90%, 95%, 97%, 98%, 99%, or 100% over a region having 8, 9, 10, 11, 12, 13, 14, 15, 16, 17, 18, 19, 20, 21, 22, 23, 24, 25, 30, 35, 40, 45, 50 or more nucleotides, or to two nucleic acids that hybridize under stringent conditions. For example, to assess percent complementarity, sequence identity may be measured by any suitable alignment algorithm, including, but not limited to, the Needleman-Wunsch algorithm (see, e.g., the EMBOSS Needle aligner (aligner) available at www.ebi.ac.uk/Tools/psa/embos _ Needle/nucleotide. html, optionally with default settings), the BLAST algorithm (see, e.g., the BLAST alignment tool available at BLAST. ncbi. nlm. nih. gov/BLAST. cgi, optionally with default settings), or the Smith-Waterman algorithm (see, e.g., the EMBOSS Water aligner available at www.ebi.ac.uk/Tools/psa/embos _ Water/nucleotide. html, optionally with default settings). Any suitable parameter of the selected algorithm (including default parameters) may be used to evaluate the optimal alignment.

Generally, "stringent conditions" for hybridization refer to conditions under which a nucleic acid having complementarity to a target sequence predominantly hybridizes to the target sequence, but does not substantially hybridize to non-target sequences. Stringent conditions are generally sequence dependent and vary according to a number of factors. Generally, the longer the sequence, the higher the temperature at which the sequence specifically hybridizes to its target sequence. Non-limiting examples of stringent conditions are described In detail In Tijssen (1993), Laboratory Techniques In Biochemistry And Molecular Biology With Nucleic Acid Probes, section I, Chapter "Overview of principles of Hybridization And the strategy of Nucleic Acid probe assay", Elsevier, N.Y..

In one aspect, the present disclosure provides a method of identifying sequence variants, for example, in a nucleic acid sample. In some embodiments, the method comprises: a) circularizing said plurality of polynucleotides to form a plurality of circularized polynucleotides; b) amplifying the plurality of circularized polynucleotides to generate a plurality of concatemers, each concatemer comprising a plurality of sequence repeats; partitioning the plurality of concatemers into a plurality of partitions such that on average no more than one concatemer comprising a target sequence is present in an individual partition, wherein an individual partition in the plurality of partitions contains at least one of a first probe and a second probe, wherein the first probe binds to the target sequence lacking the sequence variant and generates a first signal and the second probe binds to the target sequence containing the sequence variant and generates a second signal; c) detecting the first signal and the second signal from the separate partitions; and d) identifying the sequence variant as being present in the target sequence only if the level of the second signal exceeds a threshold level indicative of one copy of the target sequence and the level of the first signal is below a threshold level indicative of one copy of the target sequence. In some cases, the method further comprises identifying the sequence variant as absent when the level of the first signal exceeds a threshold level indicative of one copy of the target sequence and the level of the second signal is not greater than a threshold level indicative of one copy of the target sequence.

In general, the term "sequence variant" refers to any variation in a sequence relative to one or more reference sequences. In general, sequence variants occur less frequently than a reference sequence for a given population of individuals for which the reference sequence is known. For example, a particular bacterial genus may have a common reference sequence for the 16S rRNA gene, but individual species within that genus may have one or more sequence variants within the gene (or portions thereof) that can be used to identify the species in a bacterial population. As a further example, sequences of multiple individuals of the same species (or multiple sequencing reads of the same individual) can, when optimally aligned, produce a consensus sequence, and sequence variants relative to the consensus sequence can be used to identify mutants in populations indicative of at-risk contamination. In general, a "consensus sequence" refers to a nucleotide sequence that reflects the most common base selection at each position in the sequence, where the series of related nucleic acids has undergone a significant amount of mathematical and/or sequence analysis, such as optimal sequence alignment according to any of a variety of sequence alignment algorithms. A variety of alignment algorithms can be used, some of which are described herein. In some embodiments, the reference sequence is a single known reference sequence, such as a genomic sequence of a single individual. In some embodiments, a reference sequence is a consensus sequence formed by aligning multiple known sequences, such as genomic sequences of multiple individuals as a reference population, or multiple sequencing reads of polynucleotides from the same individual. In some embodiments, the reference sequence is a consensus sequence formed by optimally aligning sequences from samples under analysis such that sequence variants represent variations relative to the corresponding sequences in the same sample. In some embodiments, the sequence variants occur at a low frequency in the population (also referred to as "rare" sequence variants). For example, sequence variants can occur at a frequency of about or less than about 5%, 4%, 3%, 2%, 1.5%, 1%, 0.75%, 0.5%, 0.25%, 0.1%, 0.075%, 0.05%, 0.04%, 0.03%, 0.02%, 0.01%, 0.005%, 0.001%, or less. In some embodiments, the sequence variant occurs at a frequency of about or less than about 0.1%.

A sequence variant may be any variation relative to a reference sequence. A sequence variation may consist of a change, insertion or deletion of one or more nucleotides (e.g., 2, 3, 4, 5,6, 7,8, 9, 10 or more nucleotides). When a sequence variant comprises two or more nucleotide differences, the different nucleotides may be contiguous with each other, or non-contiguous. Non-limiting examples of types of sequence variants include Single Nucleotide Polymorphisms (SNPs), deletion/insertion polymorphisms (DIPs), Copy Number Variants (CNVs), Short Tandem Repeats (STRs), Simple Sequence Repeats (SSRs), Variable Number Tandem Repeats (VNTRs), Amplified Fragment Length Polymorphisms (AFLPs), retrotransposon-based insertion polymorphisms, sequence-specific amplification polymorphisms, and differences in epigenetic markers (e.g., methylation differences) that can be detected as sequence variants.

Nucleic acid samples that can be subjected to the methods described herein can be derived from any suitable source. In some embodiments, the sample used is an environmental sample. The environmental sample may be from any environmental source, such as a naturally occurring or man-made atmosphere, water system, soil, or any other sample of interest. In some embodiments, environmental samples can be obtained from, for example, atmospheric pathogen collection systems, underground sediments, ground water, ancient water deep underground, plant root-soil interfaces of grasslands, coastal water, and sewage treatment plants.

The polynucleotide from the sample can be any of a number of polynucleotides, including but not limited to DNA, RNA, ribosomal RNA (rrna), transfer RNA (trna), microrna (mirna), messenger RNA (mrna), cell-free DNA (cfdna), circulating tumor DNA (ctdna), fragments of any of these, or a combination of any two or more of these. In some embodiments, the sample comprises DNA. In some embodiments, the sample comprises genomic DNA. In some embodiments, the sample may comprise a small amount of polynucleotide (<50 ng). In some embodiments, the sample comprises mitochondrial DNA, chloroplast DNA, plasmid DNA, bacterial artificial chromosomes, yeast artificial chromosomes, oligonucleotide tags, or combinations thereof. In some embodiments, the sample comprises DNA generated by amplification, for example, by a primer extension reaction using any suitable combination of primers and DNA polymerase, including but not limited to Polymerase Chain Reaction (PCR), reverse transcription, and combinations thereof. When the template for the primer extension reaction is RNA, the product of reverse transcription is referred to as complementary DNA (cDNA). Primers useful for primer extension reactions can comprise sequences specific to one or more targets, random sequences, partially random sequences, and combinations thereof. Generally, a sample polynucleotide comprises any polynucleotide present in a sample, which may or may not include a target polynucleotide. The polynucleotide may be single-stranded, double-stranded, or a combination thereof. In some embodiments, the polynucleotide subjected to the methods of the present disclosure is a single-stranded polynucleotide, which may or may not be present as a double-stranded polynucleotide. In some embodiments, the polynucleotide is a single-stranded DNA. Single-stranded DNA (ssDNA) may be ssDNA isolated in single-stranded form, or DNA isolated in double-stranded form and subsequently made single-stranded for one or more steps in the methods of the present disclosure.

In some embodiments, the polynucleotide is subjected to subsequent steps (e.g., circularization and amplification) without performing an extraction step and/or without performing a purification step. For example, a fluid sample can be processed to remove cells without performing an extraction step to produce a purified liquid sample and a cell sample, followed by isolation of DNA from the purified fluid sample. A variety of procedures for isolating polynucleotides are available, such as by precipitation, or non-specific binding to a substrate and subsequent washing of the substrate to release the bound polynucleotides. Where the polynucleotide is isolated from a sample without a cell extraction step, the polynucleotide will be predominantly extracellular or "cell-free" polynucleotide, which may correspond to dead or damaged cells. The characteristics of these cells can be used to characterize the cell or population of cells from which they are derived, such as tumor cells (e.g., in the detection of cancer), fetal cells (e.g., in prenatal diagnosis), cells from transplanted tissue (e.g., in the early detection of transplant failure), or members of a microbial community.

If the sample is treated to extract polynucleotides, for example, from cells in the sample, a variety of extraction methods may be used. For example, nucleic acids can be purified by organic extraction with phenol, phenol/chloroform/isoamyl alcohol or similar preparations, including TRIzol and TriReagent. Other non-limiting examples of extraction techniques include: (1) ethanol precipitation following organic extraction, e.g., using phenol/chloroform organic reagents (Ausubel et al, 1993), with or without an automated nucleic acid extractor, e.g., type 341 DNA extractor available from Applied Biosystems (Foster City, Calif.); (2) stationary phase adsorption (U.S. Pat. No. 5,234,809; Walsh et al, 1991); and (3) salt-induced nucleic acid precipitation (Miller et al, (1988)), which is commonly referred to as "salting-out". Another example of nucleic acid isolation and/or purification includes the use of magnetic particles to which nucleic acids can bind specifically or non-specifically, followed by the use of a magnet to separate the beads, and washing and eluting the nucleic acids from the beads (see, e.g., U.S. Pat. No. 5,705,628). In some embodiments, the above-described separation methods may be preceded by an enzymatic digestion step to aid in the removal of unwanted proteins from the sample, such as digestion with proteinase K or other similar proteases. See, for example, U.S. patent 7,001,724. If desired, RNase inhibitor may be added to the lysis buffer. For certain cell or sample types, it may be desirable to add a protein denaturation/digestion step to the protocol. The purification method may involve isolating DNA, RNA, or both. When both DNA and RNA are separated together during or after extraction, further steps may be employed to purify one or both separately from the other. Sub-fractions of the extracted nucleic acids may also be generated, for example, for purification based on size, sequence, or other physical or chemical characteristics. In addition to the initial nucleic acid isolation step, purification of nucleic acids can also be performed after any step of the disclosed methods, e.g., to remove excess or unwanted reagents, reactants, or products. A variety of methods are available for determining the amount and/or purity of nucleic acid in a sample, for example by absorbance (e.g., light absorbance at 260nm, 280nm, and ratios thereof) and detection of labels (e.g., fluorescent dyes and intercalators such as SYBR green, SYBR blue, DAPI, propidium iodide, Hoechst stain, SYBR gold, ethidium bromide).

If desired, polynucleotides from the sample may be fragmented prior to further processing. Fragmentation can be accomplished by any of a variety of methods, including chemical, enzymatic, and mechanical fragmentation. In some embodiments, fragments have an average or median length of about 10 to about 1,000 nucleotides, such as 10-800, 10-500, 50-500, 90-200, or 50-150 nucleotides. In some embodiments, the average or median length of fragments is about or less than about 100, 200, 300, 500, 600, 800, 1000, or 1500 nucleotides. In some embodiments, fragments are about 90-200 nucleotides and/or have an average length of about 150 nucleotides. In some embodiments, the fragmenting is accomplished mechanically, comprising subjecting the sample polynucleotides to acoustic sonication. In some embodiments, fragmenting comprises treating the sample polynucleotide with one or more enzymes under conditions suitable for the one or more enzymes to generate double-stranded nucleic acid breaks. Examples of enzymes that can be used to generate polynucleotide fragments include sequence-specific nucleases and non-sequence-specific nucleases. Non-limiting examples of nucleases include dnase I, fragmenting enzymes, restriction endonucleases, variants thereof, and combinations thereof. For example, digestion with DNase I in the absence of Mg + + and in the presence of Mn + + can induce random double-strand breaks in the DNA. In some embodiments, fragmenting comprises treating the sample polynucleotides with one or more restriction endonucleases. Fragmentation can result in fragments with 5 'overhangs, 3' overhangs, blunt ends, or a combination thereof. In some embodiments, such as when fragmentation includes the use of one or more restriction endonucleases, cleavage of the sample polynucleotides leaves overhangs with predictable sequences. The fragmented polynucleotides may be subjected to a step of size selection of the fragments by standard methods, such as column purification or separation from agarose gels.

According to some embodiments, the polynucleotide of the plurality of polynucleotides from the sample is circularized. Circularization can include ligating the 5 'end of a polynucleotide to the 3' end of the same polynucleotide, to the 3 'end of another polynucleotide in the sample, or to the 3' end of a polynucleotide from a different source (e.g., an artificial polynucleotide such as an oligonucleotide adaptor). In some embodiments, the 5 'end of a polynucleotide is ligated to the 3' end of the same polynucleotide (also referred to as "self-ligation"). In some embodiments, the conditions of the circularization reaction are selected to facilitate self-ligation of polynucleotides within a particular length range so as to produce a population of circularized polynucleotides having a particular average length. For example, the cyclization reaction conditions can be selected to facilitate self-ligation of polynucleotides that are shorter than about 5000, 2500, 1000, 750, 500, 400, 300, 200, 150, 100, 50 or fewer nucleotides in length. In some embodiments, fragments of 50-5000 nucleotides, 100-2500 nucleotides, or 150-500 nucleotides in length are biased such that the average length of the circularized polynucleotides falls within the respective ranges. In some embodiments, 80% or more of the circularized fragments are from 50 to 500 nucleotides in length, e.g., from 50 to 200 nucleotides in length. Reaction conditions that may be optimized include the length of time allotted for the conjugation reaction, the concentration of the various reagents, and the concentration of the polynucleotide to be conjugated. In some embodiments, the cyclization reaction maintains a distribution of fragment lengths present in the sample prior to cyclization. For example, the fragment lengths in the sample prior to circularization and one or more of the mean, median, mode, and standard deviation of the circularized polynucleotides are within 75%, 80%, 90%, 95% or more of each other.

One or more adapter oligonucleotides can be used such that the 5 'end and the 3' end of the polynucleotides in the sample are joined by one or more intervening adapter oligonucleotides to form a circular polynucleotide, rather than preferentially forming a self-joining circularized product. For example, the 5 'end of a polynucleotide may be ligated to the 3' end of an adapter, and the 5 'end of the same adapter may be ligated to the 3' end of the same polynucleotide. Adapter oligonucleotides include any oligonucleotide having a sequence, at least a portion of which is known, that is capable of binding to a sample polynucleotide. The adaptor oligonucleotide may comprise DNA, RNA, nucleotide analogs, non-canonical nucleotides, labeled nucleotides, modified nucleotides, or combinations thereof. The adaptor oligonucleotide may be single-stranded, double-stranded or partially duplex. Typically, the partially duplex adaptors comprise one or more single-stranded regions and one or more double-stranded regions. A double-stranded adaptor may comprise two separate oligonucleotides (also referred to as "oligonucleotide duplexes") that hybridize to each other, and the hybridization may leave one or more blunt ends, one or more 3 'overhangs, one or more 5' overhangs, one or more bulges caused by mismatched and/or unpaired nucleotides, or any combination thereof. When the two hybridizing regions of an adapter are separated from each other by a non-hybridizing region, a "bubble" structure is created. Different kinds of adapters, for example, adapters having different sequences, may be used in combination. Different adapters may be ligated to the sample polynucleotides in sequential reactions or simultaneously. In some embodiments, the same adapters are added to both ends of the target polynucleotide. For example, the first and second adapters may be added to the same reaction. The adapters can be manipulated prior to combination with the sample polynucleotides. For example, terminal phosphates may be added or removed.

In the case where an adaptor oligonucleotide is used, the adaptor oligonucleotide may comprise one or more of a variety of sequence elements, including, but not limited to, one or more amplification primer annealing sequences or complements thereof, one or more sequencing primer annealing sequences or complements thereof, one or more barcode sequences, one or more common sequences shared between a plurality of different adaptors or subsets of different adaptors, one or more restriction enzyme recognition sites, one or more overhangs complementary to one or more target polynucleotide overhangs, one or more probe binding sites (e.g., for attachment to a sequencing platform, such as a flow cell for massively parallel sequencing, such as developed by Illumina, Inc.), one or more random or near random sequences (e.g., one or more nucleotides randomly selected at one or more positions from a set of two or more different nucleotides, wherein each of the different nucleotides selected at the one or more positions is present in a set of adaptors comprising a random sequence), and combinations thereof. In some cases, adapters may be used to purify these adapter-containing loops, for example by using beads (for ease of handling, in particular magnetic beads) coated with oligonucleotides comprising complementary sequences of the adapters, which beads can "capture" closed loops with the correct adapter by hybridizing thereto, washing away those loops that do not comprise the adapter and any unligated components, and then releasing the captured loops from the beads. In addition, in some cases, the complex of the hybridized capture probe and the target loop can be used directly to generate a concatemer, for example, by direct Rolling Circle Amplification (RCA). In some embodiments, the adapters in the loop may also serve as sequencing primers. Two or more sequence elements may be non-adjacent to each other (e.g., separated by one or more nucleotides), adjacent to each other, partially overlapping, or fully overlapping. For example, the amplification primer annealing sequence can also serve as a sequencing primer annealing sequence. The sequence element may be located at or near the 3 'end, at or near the 5' end, or internal to the adaptor oligonucleotide. The sequence element can be any suitable length, for example, about or less than about 3, 4, 5,6, 7,8, 9, 10, 15, 20, 25, 30, 35, 40, 45, 50 or more nucleotides in length. The adaptor oligonucleotide may be of any suitable length, at least sufficient to accommodate the sequence element or elements it comprises. In some embodiments, the adapter is about or less than about 10, 15, 20, 25, 30, 35, 40, 45, 50, 55, 60, 65, 70, 75, 80, 90, 100, 200 or more nucleotides in length. In some embodiments, the adaptor oligonucleotide is in the range of about 12 to 40 nucleotides in length, for example about 15 to 35 nucleotides in length.

In some embodiments, the adapter oligonucleotides that are ligated to fragmented polynucleotides from one sample comprise a sequence that is common to one or more all adapter oligonucleotides and a barcode that is unique to the adapter that is ligated to the polynucleotide of that particular sample, such that the barcode sequence can be used to distinguish polynucleotides derived from one sample or adapter ligation reaction from polynucleotides derived from another sample or adapter ligation reaction. In some embodiments, the adapter oligonucleotide comprises a5 'overhang, a 3' overhang, or both that are complementary to one or more target polynucleotide overhangs. The length of the complementary overhang may be one or more nucleotides, including but not limited to 1, 2, 3, 4, 5,6, 7,8, 9, 10, 11, 12, 13, 14, 15 or more nucleotides in length. The complementary overhangs may comprise a fixed sequence. The complementary overhang of the adapter oligonucleotide may comprise a random sequence of one or more nucleotides such that one or more nucleotides are randomly selected from a set of two or more different nucleotides at one or more positions, wherein each of the different nucleotides selected at one or more positions is present in a set of adapters with complementary overhangs comprising the random sequence. In some embodiments, the adapter overhang is complementary to a target polynucleotide overhang generated by restriction endonuclease digestion. In some embodiments, the adapter overhang consists of adenine or thymine.

Various methods of circularizing polynucleotides are available. In some embodiments, circularization comprises an enzymatic reaction, such as using a ligase (e.g., RNA or DNA ligase). A variety of ligases are available, including but not limited to, CircligaseTM(Epicentre; Madison, Wis.), RNA ligase, T4 RNA ligase 1(ssRNA ligase, which acts on both DNA and RNA). In addition, T4 DNA ligase can also ligate ssDNA if no dsDNA template is present, although this is usually a slow reaction. Other non-limiting examples of ligases include: NAD-dependent ligases including Taq DNA ligase, Thermus filiformis DNA ligase, escherichia coli (e.coli) DNA ligase, Tth DNA ligase, Thermus nigrostriatus (Thermus scotoductus) DNA ligase (I and II), thermostable ligase, Ampligase thermostable DNA ligase, VanC-type ligase, 9 ° N DNA ligase, Tsp DNA ligase, and novel ligases discovered by biological exploration; ATP-dependent ligases including T4 RNA ligase, T4 DNA ligase, T3 DNA ligase, T7 DNA ligase, Pfu DNA ligase, DNA ligase 1, DNA ligase III, DNA ligase IV and novel ligases discovered by biological exploration; as well as wild-type, mutant isoforms, and genetically engineered variants thereof. When self-conjugation is desired, the concentration of the polynucleotide and enzyme can be adjusted to promote the formation of intramolecular loops rather than intermolecular structures. The reaction temperature and time can also be adjusted. In some embodiments, 60 ℃ is used to facilitate the formation of intramolecular rings. In some embodiments, the reaction time is 12 to 16 hours. The reaction conditions may be those specified by the manufacturer of the enzyme selected. In some embodiments, an exonuclease step may be included to digest any unligated nucleic acid after the circularization reaction. That is, the closed loop does not contain a free 5 'or 3' end, so introduction of a5 'or 3' exonuclease does not digest the closed loop, but does digest unligated components. This is particularly useful in multiplex systems.

Figure 2 shows three non-limiting examples of methods of circularizing polynucleotides. At the top, the polynucleotides are circularized in the absence of adapters, while the middle scheme depicts the use of adapters, while the bottom scheme uses both adapters. When two adapters are used, one may be ligated to the 5 'end of the polynucleotide and the second adapter may be ligated to the 3' end of the same polynucleotide. In some embodiments, adaptor ligation may include the use of two different adaptors and a "splint" nucleic acid that is complementary to the two adaptors to facilitate ligation. Fork or "Y" shaped adapters may also be used. In the case of using two adapters, polynucleotides having the same adapter at both ends can be removed in a subsequent step due to self-annealing.

FIGS. 3A and 3B illustrate other non-limiting exemplary methods of circularization of a polynucleotide (e.g., single stranded DNA). The adapter may be added asymmetrically to the 5 'or 3' end of the polynucleotide. As shown in fig. 3A, single-stranded dna (ssDNA) has a free hydroxyl group at the 3 'end and the adaptor has a blocked 3' end, such that in the presence of a ligase, a preferred reaction joins the 3 'end of the ssDNA to the 5' end of the adaptor. In this embodiment, it may be useful to use a reagent such as polyethylene glycol (PEG) to drive the intermolecular ligation of a single ssDNA fragment and a single adaptor before intramolecular ligation to form a loop. The opposite end order (blocked 3', free 5', etc.) can also be achieved. Once linear ligation is completed, the ligated fragments may be treated with an enzyme to remove the blocking moiety, for example by using a kinase or other suitable enzyme or chemical method. Once the blocking moiety is removed, the addition of a cyclase such as CircLigase allows an intramolecular reaction to occur to form the cyclized polynucleotide. As shown in FIG. 3B, by using a double stranded adaptor with one strand having a blocked 5 'or 3' end, a double stranded structure can be formed which upon ligation yields a double stranded fragment with a nick. The two strands can then be separated, the blocking moiety removed, and the single-stranded fragments cyclized to form a cyclized polynucleotide.

In some embodiments, molecular clamping is used to bring the two ends of a polynucleotide (e.g., single-stranded DNA) together to enhance the rate of intramolecular cyclization. Fig. 4 shows an example illustration of one such process. This can be done with or without adapters. The use of molecular tweezers may be particularly useful in cases where the average polynucleotide fragment length is greater than about 100 nucleotides. In some embodiments, the molecular clamp probe comprises three domains: a first domain, an intervening domain, and a second domain. The first and second domains will hybridize by sequence complementarity to corresponding sequences in the target polynucleotide. The intervening domain of the molecular clamp probe does not significantly hybridize to the target sequence. Thus, hybridization of the molecular clamp to the target polynucleotide brings the two ends of the target sequence closer together, which facilitates intramolecular cyclization of the target sequence in the presence of the cyclase. In some embodiments, this has additional utility, as the molecular clamp can also serve as an amplification primer.

Following circularization, the reaction products can be purified prior to amplification or sequencing to increase the relative concentration or purity of circularized polynucleotides that can participate in subsequent steps (e.g., by isolation of the circular polynucleotides or removal of one or more other molecules in the reaction). For example, the circularization reaction or components thereof can be treated to remove single stranded (unclycled) polynucleotides, such as by treatment with an exonuclease. As further examples, the cyclization reaction or portion thereof can be subjected to size exclusion chromatography, whereby small reagents (e.g., unreacted adaptors) are retained and discarded, or cyclization products are retained and released in separate volumes. A variety of kits for cleaning up ligation reactions are available, such as provided by the Zymo oligonucleotide purification kit manufactured by Zymo Reaser. In some embodiments, purification includes a treatment to remove or degrade the ligase used in the circularization reaction and/or purify the circularized polynucleotide from the ligase. In some embodiments, the treatment to degrade the ligase comprises treatment with a protease, such as proteinase K. Proteinase K treatment may follow manufacturer's protocols or standard protocols (e.g., as provided in Sambrook and Green, Molecular Cloning: A Laboratory Manual, 4 th edition (2012)). The protease treatment may also be followed by extraction and precipitation. In one example, the circularized polynucleotide is purified as follows: proteinase K (Qiagen) treatment was performed in the presence of 0.1% SDS and 20mM EDTA, extracted with 1:1 phenol/chloroform and chloroform, and precipitated with ethanol or isopropanol. In some embodiments, the precipitation is performed in ethanol.

In some cases, the circular polynucleotide can be subjected to an amplification reaction (e.g., pre-amplification) prior to performing a digital polymerase chain reaction (dPCR) according to the methods provided herein. Generally, "amplification" refers to the process of forming one or more copies from a target polynucleotide or portion thereof. There are a variety of methods available for amplifying polynucleotides (e.g., DNA and/or RNA). Amplification may be linear, exponential, or involve both linear and exponential stages in a multi-stage amplification process. The amplification method may involve a change in temperature, such as a thermal denaturation step, or may be an isothermal process that does not require thermal denaturation. Polymerase Chain Reaction (PCR) employs multiple cycles of denaturation, annealing of primer pairs to opposite strands, and primer extension to exponentially increase the copy number of the target sequence. Denaturation of annealed nucleic acid strands can be achieved by applying heat, increasing local metal ion concentration (e.g., U.S. patent 6,277,605), ultrasonic radiation (e.g., WO/2000/049176), applying a voltage (e.g., U.S. patent 5,527,670, U.S. patent 6,033,850, U.S. patent 5,939,291, and U.S. patent 6,333,157), and applying an electromagnetic field (e.g., U.S. patent 5,545,540) in combination with primers that bind to magnetically responsive materials. In a variation known as RT-PCR, complementary DNA (cDNA) is prepared from RNA using Reverse Transcriptase (RT), and the cDNA is amplified by PCR to produce multiple copies of the DNA (e.g., U.S. Pat. No. 5,322,770 and U.S. Pat. No. 5,310,652). One example of an isothermal amplification method is strand displacement amplification, commonly referred to as SDA, which uses cycles of the following process: annealing of primer sequence pairs to opposite strands of the target sequence, primer extension in the presence of dntps to produce duplex hemiphosphorothioated primer extension products, endonuclease-mediated nicking of the hemimodified restriction endonuclease recognition site, and polymerase-mediated primer extension from the 3' end of the nick to displace an existing strand and produce a strand for the next round of primer annealing, nicking and strand displacement, resulting in geometric amplification of the products (e.g., U.S. Pat. No. 5,270,184 and U.S. Pat. No. 5,455,166). Thermophilic SDA (tSDA) thermophilic endonucleases and polymerases are used at higher temperatures in essentially the same process (European patent 0684315). Other amplification methods include Rolling Circle Amplification (RCA) (e.g., Lizardi, "Rolling Circle Replication Reporter Systems," U.S. Pat. No. 5,854,033); Helicase-Dependent Amplification (HDA) (e.g., U.S. patent application publication No. US 2004-0058378A1 to Kong et al, "Helicase Dependent Amplification Nucleic Acids"; and loop-mediated isothermal amplification (LAMP) (e.g., Notomi et al, "Process for Synthesizing Nucleic Acid", U.S. Pat. No. 6,410,278). In some cases, isothermal amplification employs transcription by RNA polymerase from a promoter sequence, such as may be incorporated into an oligonucleotide primer. Transcription-based amplification methods include nucleic acid sequence-based amplification, also known as NASBA (e.g., U.S. Pat. No. 5,130,238); methods that rely on the use of RNA replicase (commonly referred to as Q.beta.replicase) for amplification of the probe molecules themselves (e.g., Lizardi, P. et al (1988) BioTechnol.6, 1197-1202); self-sustained sequence replication (e.g., Guatelli, J. et al (1990) Proc. Natl. Acad. Sci. USA 87, 1874-; and methods of generating additional transcription templates (e.g., U.S. Pat. No. 5,480,784 and U.S. Pat. No. 5,399,491). Additional isothermal nucleic acid amplification methods include the use of primers containing non-canonical nucleotides (e.g., uracil or RNA nucleotides) in combination with enzymes that cleave nucleic acids at non-canonical nucleotides (e.g., DNA glycosylase or rnase H) to expose binding sites for additional primers (e.g., U.S. Pat. No. 6,251,639, U.S. Pat. No. 6,946,251, and U.S. Pat. No. 7,824,890). The isothermal amplification process can be linear or exponential.

In some embodiments, the amplification comprises Rolling Circle Amplification (RCA). A typical RCA reaction mixture comprises one or more primers, polymerase and dntps, and concatemers are generated. In general, the polymerase in the RCA reaction is a polymerase having strand displacement activity. A variety of such polymerases are available, non-limiting examples of which include exonuclease-DNA polymerase I large (Klenow) fragment, Phi29 DNA polymerase, Taq DNA polymerase, and the like. Typically, a concatemer is a polynucleotide amplification product that comprises two or more copies of a target sequence from a template polynucleotide (e.g., about or more than about 2, 3, 4, 5,6, 7,8, 9, 10 or more copies of the target sequence; in some embodiments, about or more than about 2 copies). Amplification primers can be of any suitable length, such as about or at least about 5, 10, 15, 20, 25, 30, 35, 40, 45, 50, 55, 60, 65, 70, 75, 80, 90, 100 or more nucleotides, any portion or all of which can be complementary to a corresponding target sequence to which the primer hybridizes (e.g., about or at least about 5, 10, 15, 20, 25, 30, 35, 40, 45, 50 or more nucleotides). Three non-limiting examples of suitable primers are depicted in FIG. 5A, FIG. 5B and FIG. 5C. Figure 5A shows that without the use of adaptors and with the use of target specific primers, it can be used to detect the presence or absence of sequence variants within a particular target sequence. In some embodiments, multiple target-specific primers for multiple targets are used in the same reaction. For example, target-specific primers directed against about or at least about 10, 50, 100, 150, 200, 250, 300, 400, 500, 1000, 2500, 5000, 10000, 15000 or more different target sequences can be used in one amplification reaction in order to amplify a corresponding number of target sequences (if present) in parallel. Multiple target sequences may correspond to different portions of the same gene, different genes, or nongenic sequences. When multiple primers target multiple target sequences in a single gene, the primers can be spaced along the gene sequence (e.g., about or at least about 50 nucleotides apart, every 50-150 nucleotides, or every 50-100 nucleotides) so as to cover all or a designated portion of the target gene. FIG. 5C shows the use of primers (which in some cases may be the adaptor oligonucleotides themselves) that hybridize to the adaptor sequences.

FIG. 5B shows an example of amplification by random primers. Typically, the random primer comprises one or more random or near random sequences (e.g., one or more nucleotides randomly selected at one or more positions from a set of two or more different nucleotides, wherein each of the different nucleotides selected at one or more positions is present in a set of adaptors comprising the random sequence). By this method, polynucleotides (e.g., all or substantially all of the circularized polynucleotides) can be amplified in a sequence non-specific manner. Such a procedure may be referred to as "whole genome amplification" (WGA); however, typical WGA protocols (which do not involve a circularization step) are not efficient for amplifying short polynucleotides, such as polynucleotide fragments contemplated by the present disclosure. For further illustrative discussion of the WGA procedure, see, e.g., Li et al (2006) J mol. Diagn.8(1): 22-30.

In the case of amplification of circularised polynucleotides prior to dPCR, the amplification product may be subjected to dPCR directly without enrichment or after one or more enrichment steps. Enrichment can include purification of one or more reaction components, for example, by retaining amplification products or removing one or more reagents. For example, amplification products can be purified by hybridization to a plurality of probes attached to a substrate, followed by release of the captured polynucleotides, e.g., by a washing step. Alternatively, the amplification product may be labeled with a member of the binding pair, then bound to the other member of the binding pair attached to the substrate, and washed to release the amplification product. Possible substrates include, but are not limited to, glass and modified or functionalized glass, plastic (including acrylic, polystyrene, and copolymers of styrene with other materials, polypropylene, polyethylene, polybutylene, polyurethane, TeflonTMEtc.), polysaccharides, nylon or nitrocellulose, ceramics, resins, silica or silica-based materials, including silicon and modified silicon, carbon, metals, inorganic glass, plastics, fiber optic strands, and various other polymers. In some embodiments, the substrate is in the form of beads or other small discrete particles, which may be magnetic or paramagnetic beads, to facilitate separation by application of a magnetic field. Generally, a "binding pair" refers to one of a first and second moiety, wherein the first and second moiety have specific binding affinity for each other. Suitable binding pairs include, but are not limited to, antigens/antibodies (e.g., digoxigenin/anti-digoxigenin, Dinitrophenyl (DNP)/anti-DNP, dansyl-X-anti-dansyl,fluorescein/anti-fluorescein, and rhodamine/anti-rhodamine); biotin/avidin (or biotin/streptavidin); calmodulin Binding Protein (CBP)/calmodulin; hormone/hormone receptors; lectin/carbohydrate; peptide/cell membrane receptors; protein a/antibody; hapten/anti-hapten; (ii) an enzyme/cofactor; and enzymes/substrates.

In some embodiments, enrichment following amplification of the circularized polynucleotides comprises one or more additional amplification reactions. In some embodiments, enriching comprises amplifying a target sequence comprising sequence a and sequence B (oriented in a5 'to 3' direction) in an amplification reaction mixture comprising (a) amplified polynucleotides; (b) a first primer comprising sequence a ', wherein the first primer specifically hybridizes to sequence a of the target sequence by sequence complementarity between sequence a and sequence a'; (c) a second primer comprising sequence B, wherein the second primer specifically hybridizes to sequence B 'present in a complementary polynucleotide comprising the complement of the target sequence by sequence complementarity between B and B'; and (d) a polymerase that extends the first primer and the second primer to produce an amplified polynucleotide; wherein the distance between the 5 'end of sequence A and the 3' end of sequence B in the target sequence is 75nt or less. Figure 6 shows an exemplary arrangement of first and second primers relative to a target sequence in the case of a single repeat (which would not normally be amplified unless circular) and a concatemer comprising multiple copies of the target sequence. This arrangement may be referred to as "back-to-back" (B2B) or "inverted" primers, given the orientation of the primers relative to the monomers of the target sequence. Amplification with the B2B primer helps to enrich for circular and/or concatemeric amplification products. Furthermore, this orientation, in combination with a relatively small footprint (total distance spanned by a pair of primers), allows for amplification of a wider variety of fragmentation events around the target sequence, since ligation occurs less likely between primers than the primer arrangement (facing each other, spanning the target sequence) found in a typical amplification reaction. In some embodiments, the distance between the 5 'end of sequence a and the 3' end of sequence B is about or less than about 200, 150, 100, 75, 50, 40, 30, 25, 20, 15, or fewer nucleotides. In some embodiments, sequence a is the complement of sequence B. In some embodiments, pairs of B2B primers for multiple different target sequences are used in the same reaction to amplify multiple different target sequences in parallel (e.g., about or at least about 10, 50, 100, 150, 200, 250, 300, 400, 500, 1000, 2500, 5000, 10000, 15000, or more different target sequences). The primer may be of any suitable length, as described elsewhere herein. Amplification may include any suitable amplification reaction under appropriate conditions, such as the amplification reactions described herein. In some embodiments, the amplification is polymerase chain reaction.

In some embodiments, the B2B primer comprises at least two sequence elements: a first element that hybridizes to the target sequence by sequence complementarity, and a5 '"tail" that does not hybridize to the target sequence during a first amplification stage at a first hybridization temperature at which the first element hybridizes (e.g., due to a lack of sequence complementarity between the tail and a portion of the target sequence immediately 3' of the location to which the first element binds). For example, the first primer comprises sequence C5 ' to sequence a ', the second primer comprises sequence D5 ' to sequence B, and neither sequence C nor sequence D hybridizes to the plurality of concatemers during the first amplification stage at the first hybridization temperature. In some embodiments using such tailed primers, amplification may include a first stage and a second stage; the first stage comprises a hybridisation step at a first temperature during which first and second primers hybridise to the concatemer (or circularised polynucleotide) and the primers are extended; the second stage includes a hybridization step at a second temperature higher than the first temperature, during which the first and second primers hybridize to an amplification product comprising the extended first or second primer or its complement, and primer extension. Higher temperatures favor hybridization between the first element and the tail element of the primer in the primer extension product, while disfavoring shorter fragments formed by hybridization between only the first element in the primer and the target sequence within the concatemer. Thus, two-stage amplification can be used to reduce the extent that short amplification products may be favored, thereby maintaining a relatively high proportion of amplification products having two or more copies of the target sequence. For example, at least 5% (e.g., at least 5%, 6%, 7%, 8%, 9%, 10%, 15%, 20%, 25%, 30%, or more) of the amplified polynucleotides in the reaction mixture comprise two or more copies of the target sequence after hybridization and primer extension at a second temperature of 5 cycles (e.g., at least 5,6, 7,8, 9%, 10%, 15%, 20%, 9, 10%, or more cycles). Schematic representations of embodiments according to this two-stage tailed B2B primer amplification process are presented in fig. 7A, 7B, 7C, and 7D.

In some embodiments, enrichment comprises amplification under conditions that tend to increase amplicon length from the concatemer. For example, the primer concentration can be reduced such that not every priming site hybridizes to a primer, thereby elongating the PCR product. Similarly, shortening the primer hybridization time during cycling also results in less primer hybridization, thereby increasing the average PCR amplicon size. Furthermore, increasing the temperature and/or extension time of the cycle will similarly increase the average length of the PCR amplicons. Any combination of these techniques may be used.

In some embodiments, the amplification products are treated to filter the resulting amplicons based on size, thereby reducing and/or eliminating the number of monomers in the mixture comprising the concatemers. This can be accomplished using a variety of available techniques, including but not limited to, cutting fragments from the gel and gel filtration (e.g., to enrich for fragments greater than about 300, 400, 500, or more nucleotides in length); and size selection with SPRI beads (Agencourt AMPure XP) by fine tuning the binding buffer concentration. For example, the use of 0.6x binding buffer during mixing with DNA fragments can be used to preferentially bind DNA fragments greater than about 500 base pairs (bp).

In some aspects, the method may further comprise allocating the plurality of concatemers into a plurality of partitions. "partitioning" generally refers to the process of spatially separating a mixture comprising a plurality of molecules into at least one partition. As used herein, "partition" may refer to any container or vessel used to spatially separate a plurality of molecules. In some cases, the partitions may be wells, such as wells on a microplate. In other cases, the partitions may be small droplets, such as droplets used in droplet digital pcr (ddpcr) methods. The droplets may comprise water-in-oil emulsion droplets or oil-in-water emulsion droplets. Non-limiting examples of droplet-based PCR systems that can be used according to the methods provided herein include those commercially available from Bio-Rad, Raindance Technologies, and the like. In general, the number of individual concatemers present in an individual partition after partitioning depends on the concentration of concatemers in the mixture, as well as the number of partitions to which the mixture is partitioned. In some cases, the method comprises partitioning the plurality of concatemers into a plurality of partitions, such that any individual partition contains, on average, no more than one concatemer having the target sequence. In such cases, the individual partitions may comprise concatemers comprising a plurality of sequence repeats, wherein each of the plurality of sequence repeats comprises the target sequence. Thus, a single partition may comprise multiple target sequences arranged in tandem repeats on the same concatemer molecule. In some cases, an individual partition may also comprise one or more concatemers that do not comprise a target sequence. In some cases, individual partitions comprise on average no more than 10, 9, 8, 7,6, 5,4, 3, 2, or 1 concatemer. In some cases, multiple separate partitions may include zero concatemers.

In some aspects, individual partitions can comprise one or more probes for detecting the presence or absence of sequence variants. In some cases, the one or more probes comprise a wild-type probe that binds to a wild-type target sequence (i.e., a target sequence that lacks a sequence variant). The wild-type probe may comprise an oligonucleotide sequence that is complementary to and capable of hybridizing to a wild-type target sequence. In some cases, a wild-type probe may comprise an oligonucleotide sequence that hybridizes to a region of the target sequence that comprises a wild-type nucleotide at the nucleotide position of interest. In some cases, the one or more probes comprise a mutant probe that binds to a mutant target sequence (i.e., a target sequence that contains a sequence variant). The mutation probe may comprise an oligonucleotide sequence that is complementary to and capable of hybridizing to the mutated target sequence. In some cases, the wild-type probe and the mutant probe can hybridize to the target sequence under stringent conditions such that the wild-type probe will only bind to the wild-type target sequence and the mutant probe will only bind to the mutant probe. In some cases, separate partitions may comprise wild-type probes, mutant probes, or both.

In some aspects, the wild-type probe comprises a first detectable label that generates a first signal when the wild-type target sequence is present. In some aspects, the mutation probe comprises a second detectable label that generates a second signal when the mutated target sequence is present. In some cases, the first detectable label and the second detectable label are different such that they produce different signals that can be distinguished. The first detectable label, the second detectable label, or both may be any type of detectable label including, but not limited to, a fluorophore, an enzyme, a quencher, an enzyme inhibitor, a radiolabel, one member of a binding pair, or any combination thereof. In some cases, the first and/or second detectable label is a fluorescent molecule, e.g., a fluorophore. Non-limiting examples of fluorophores may include: fluorescein (FITC) and fluorescein derivatives such as FAM, VIC and JOE, 5- (2' -aminoethyl) aminonaphthalene-1-sulfonic acid (EDANS), coumarin and coumarin derivatives, fluorescein, NED, Texas Red, tetramethylrhodamine, tetrachloro-6-carboxyfluoroethylene, 5-carboxyrhodamine, cyanine dyes, Alexa Fluor350, Alexa Fluor 647, Oregon Green, Alexa Fluor 405, Alexa Fluor 680, Alexa Fluor 488, Alexa Fluor 750, Cy3, Alexa Fluor 532, Pacific Ore Blue, Pacific ange, Alexa Fluor 546, tetramethylrhodamine (TRITC), Alexa Fluor555, BODIPY FL, Texs Red, Alexa Fluor 568, Pacific Gr 5, Alexa Fluor Super Bri 600, Sudap Fluor 436, SYDAPTO 702, SYDAPTO 3, SYDAPTO 645, Sudap Fluor 702, Sudap Fluor Na fluoride 702, Suffor Na Fluor Na fluoride 35, Suffi III, Qdot525, Qdot 565, Qdot 605, Qdot 655, Qdot 705, Qdot 800, R-phycoerythrin (R-PE), Allophycocyanin (APC), Cyan Fluorescent Protein (CFP) and its derivatives, Green Fluorescent Protein (GFP) and its derivatives, Red Fluorescent Protein (RFP) and its derivatives, and the like. Any fluorophore having an excitation wavelength between about 300nm and about 900nm is contemplated herein.

In some aspects, the method may further comprise performing the reaction on or within the plurality of partitions. In some cases, the method may further compriseInvolving a plurality of partitionsAnd (4) PCR determination. In such cases, the wild-type probe and the mutation probe may beAnd (3) a probe.PCR assays and probes are known in the art. The 5' ends of the wild-type and mutant probes may be conjugated to different fluorescent labels (e.g., VIC, FAM). In addition, the 3' ends of the wild-type and mutant probes may be conjugated to quenchers. The quencher can quench the signal from the fluorescent label when in close proximity to the fluorescent label (e.g., when the quencher and fluorescent label are conjugated to opposite ends of the probe). The separate partitions may further comprise forward and reverse primers that hybridize to sequences on the concatemer flanking the target sequence. In some cases, the forward and reverse primers may be unlabeled. The multiple partitions may be incubated under conditions that allow the forward primer, the reverse primer, and the mutant and/or wild-type probes to hybridize to their complementary sequences (if present on the concatemer). In some cases, the method further comprises incubating the plurality of partitions in the presence of a polymerase and under conditions such that the polymerase synthesizes a new oligonucleotide strand by extending the forward and reverse primers along the template molecule. The polymerase can have endogenous 5' nuclease activity such that when the polymerase reaches the labeled probe, it can cleave the probe, thereby separating the fluorescent label and the quencher. The fluorescent label can then generate a signal that can be detected. In some cases, multiple cycles of Taqman PCR are performed on the multiple partitions such that for each cycle, the intensity of the fluorescent signal increases in proportion to the amount of amplicon synthesized.

In some cases, the method may further include dividing over the plurality of partitionsAssays other than PCR assays. Is not based onMay include, but is not limited toChemical detection based onDetection based on FAM, etc.

In a further aspect, the method may include detecting levels of the first signal and the second signal from separate partitions. The detection may include any method for detecting a signal and should be selected based on the type of detectable label present on the probe. In the case of using a fluorescent probe, the method may include illuminating the plurality of zones with a fluorescent light source (e.g., a Light Emitting Diode (LED)) and measuring the optical signals generated thereby. It will be appreciated that the wavelength of light provided by the light source should be selected based on the excitation wavelength of the detectable label and can be readily selected by one skilled in the art.

In a further aspect, the method can comprise identifying the presence or absence of a sequence variant. In some cases, identifying the presence or absence of a sequence variant can include measuring an intensity level of a first signal corresponding to the presence of the wild-type sequence, and an intensity level of a second signal corresponding to the mutant sequence. The method may further include comparing the intensity level of the first signal and the intensity level of the second signal to threshold levels. In some cases, a threshold level represents a cutoff value, signals above the threshold level are determined to be present or positive, and signals below the threshold level are determined to be absent or negative. In some cases, the threshold level is determined by the user of the assay. In some cases, the threshold level indicates the presence of one copy of the target sequence. In other words, a signal that exceeds a threshold level can be determined to contain at least one copy of the target sequence, while a signal that is below the threshold level can be determined to contain less than one copy of the target sequence.

In some cases, a sequence variant is identified as being present in the target sequence only when the level of the mutation signal exceeds a threshold level and the level of the first signal is below a threshold level. For example, if a sequence variant is present in the original sample, it will be present multiple times in a single concatemer molecule. In such cases, the mutant probe may bind to the target sequence containing the sequence variant, but the wild-type probe may not bind to the target sequence. Thus, a separate partition containing a sequence variant may generate a signal from a mutant probe, but not from a wild-type probe.

In some cases, a sequence variant is identified as absent when the level of wild-type signal exceeds a threshold level and the level of mutant signal is below the wild-type level (i.e., the target sequence is a wild-type sequence). For example, if a sequence variant is not present in the original polynucleotide molecule, it may not be present in every sequence repeat of a single concatemer molecule. In such cases, the wild-type probe may bind to the target sequence lacking the sequence variant, but the mutant probe may not bind to the target sequence. Thus, a separate region comprising a wild-type sequence may generate a signal from a wild-type probe, but not from a mutant probe.

In some cases, the methods can be used to identify false positives. In one such embodiment, a false positive is identified when the level of the wild-type signal exceeds a threshold level and the level of the mutant signal also exceeds a threshold level. For example, random errors may be introduced into the target sequence during, for example, amplification. In some cases, the target sequence may be a wild-type sequence, but errors may be introduced during rolling circle amplification that produce mutations in at least one of the tandem repeats of the concatemer. In such cases, individual partitions may comprise concatemer molecules comprising tandem repeats of the target sequence, wherein most of the repeats comprise wild-type sequence, but at least one of the repeats comprises a sequence variant (e.g., due to random error). In such cases, the wild-type probe may bind to the wild-type target sequence and the mutant probe may bind to the mutant target sequence, thereby producing both the wild-type signal and the mutant signal in the same partition. In some aspects, the method can identify such partitions as comprising false positives when both the wild-type signal and the mutant signal are present.

In a further aspect, the method may include outputting a result based on the identifying step. For example, the method may include generating a report that displays or reports the results of the identifying step. In some cases, partitions identified as containing false positives may be excluded from the report. In other cases, partitions identified as containing false positives can be flagged or reported as containing false positives.

In another aspect, the present disclosure provides a method of reducing errors in a digital polymerase chain reaction. In some cases, the method can be performed on a nucleic acid sample comprising less than about 50ng of polynucleotides, and further comprising: a) circularizing individual polynucleotides in said nucleic acid sample to generate a plurality of circularized polynucleotides; b) amplifying the plurality of circularized polynucleotides to form a plurality of concatemers, each concatemer comprising a plurality of sequence repeats; c) partitioning the plurality of concatemers into a plurality of partitions such that on average no more than one concatemer comprising a target sequence is present in an individual partition, wherein an individual partition in the plurality of partitions contains at least one of a first probe and a second probe, wherein the first probe binds to a target sequence lacking the sequence variant and generates a first signal and the second probe binds to a target sequence containing the sequence variant and generates a second signal; d) detecting the first signal and the second signal from the separate partitions; and e) identifying a false positive when the level of the first signal exceeds a threshold level indicative of one copy of the target sequence and the level of the second signal exceeds a threshold level indicative of one copy of the target sequence.

In some cases, the methods may be applicable to samples with low starting amounts of polynucleotides. In such cases, the initial amount of polynucleotide may generally be too low to be used in a digital PCR assay, and one or more amplification steps may be required before performing the digital PCR assay. However, such amplification steps can be prone to error, thereby increasing the number of false positives reported by digital PCR assays. In some cases, the methods can reduce the number of false positives reported by digital PCR assays. For example, the method can reduce the number of false positives reported by a digital PCR assay by at least 1%, 2%, 3%, 4%, 5%, 6%, 7%, 8%, 9%, 10%, 15%, 20%, 25%, 30%, 35%, 40%, 45%, 50%, or greater than about 50%.

The initial amount of polynucleotide in the sample may be small. In some embodiments, the amount of starting polynucleotide is less than 50ng, such as less than 45ng, 40ng, 35ng, 30ng, 25ng, 20ng, 15ng, 10ng, 5ng, 4ng, 3ng, 2ng, 1ng, 0.5ng, 0.1ng or less. In some embodiments, the amount of starting polynucleotide is in the range of 0.1-100ng, such as 1-75ng, 5-50ng, or 10-20 ng.

The polynucleotide may be from any suitable sample, such as the samples described herein with respect to various aspects of the present disclosure. The polynucleotide from the sample can be any of a number of polynucleotides, including but not limited to DNA, RNA, ribosomal RNA (rrna), transfer RNA (trna), microrna (mirna), messenger RNA (mrna), cell-free DNA (cfdna), circulating tumor DNA (ctdna), fragments of any of these, or a combination of any two or more of these. In some embodiments, the sample comprises DNA. In some embodiments, the polynucleotide is single stranded as it is obtained, or rendered single stranded by treatment (e.g., denaturation). Other examples of suitable polynucleotides are described herein, as described with respect to any of the various aspects of the disclosure. In some embodiments, the polynucleotide is subjected to subsequent steps (e.g., circularization and amplification) without performing an extraction step and/or without performing a purification step. For example, a fluid sample can be processed to remove cells without performing an extraction step to produce a purified liquid sample and a cell sample, followed by isolation of DNA from the purified fluid sample. A variety of procedures for isolating polynucleotides are available, such as by precipitation, or non-specific binding to a substrate and subsequent washing of the substrate to release the bound polynucleotides. Where the polynucleotide is isolated from a sample without a cell extraction step, the polynucleotide will be predominantly extracellular or "cell-free" polynucleotide, which may correspond to dead or damaged cells. The characteristics of such cells can be used to characterize the cell or population of cells from which they are derived, such as in a microbial community. If the sample is treated to extract polynucleotides, e.g., to extract polynucleotides from cells in the sample, a variety of extraction methods are available, examples of which are provided herein (e.g., as described with respect to any of the various aspects of the disclosure).

In one aspect, the present disclosure provides a system for detecting sequence variants. In some embodiments, a system may comprise: a) a computer configured to receive a user request for a detection reaction on a sample; b) an amplification system that performs a nucleic acid amplification reaction on the sample or portion thereof in response to the user request, wherein the amplification reaction comprises: (i) circularizing individual polynucleotides of said sample to form a plurality of circularized polynucleotides; and (ii) amplifying the plurality of circularized polynucleotides to form a plurality of concatemers, each concatemer comprising a plurality of sequence repeats; c) a partitioning system that partitions the plurality of concatemers into a plurality of partitions such that there is no more than one concatemer comprising a target sequence on average in an individual partition; and d) a detection system that detects the level of a first signal and the level of a second signal from separate partitions, wherein the first signal is generated when a first probe binds to a target sequence lacking the sequence variant and the second signal is generated when a second probe binds to a target sequence containing the sequence variant; and e) a report generator that sends a report to a recipient, wherein the report comprises the detection of the sequence variant. In some embodiments, the recipient is a user. FIG. 8 illustrates a non-limiting example of a system that can be used with the methods of the present disclosure.

The computer used in the system may comprise one or more processors. The processor may be associated with one or more controllers, computing units, and/or other units of the computer system, or embedded in firmware as desired. If implemented in software, the routines may be stored in any computer readable memory such as RAM, ROM, flash memory, magnetic disk, optical disk or other suitable storage medium. Also, the software may be delivered to the computing device by any known delivery method, including, for example, over a communication channel such as a telephone line, the Internet, a wireless connection, or via a removable medium such as a computer readable disk, flash drive, or the like. Various steps may be implemented as various blocks, operations, tools, modules, or techniques which may be implemented in hardware, firmware, software, or any combination of hardware, firmware, and/or software. When implemented in hardware, some or all of the blocks, operations, techniques, etc., may be implemented in, for example, a custom Integrated Circuit (IC), an Application Specific Integrated Circuit (ASIC), a field programmable logic array (FPGA), a Programmable Logic Array (PLA), etc. A client-server relational database architecture may be used in embodiments of the system. A client-server architecture is a network architecture in which each computer or process on the network is a client or server. Server computers are typically powerful computers used to manage disk drives (file servers), printers (print servers), or network traffic (web servers). Client computers include a PC (personal computer) or workstation where a user runs applications, and an example output device as disclosed herein. Client computers rely on server computers to obtain resources such as files, devices, and even processing power. In some embodiments, the server computer processes all database functions. The client computer may have software that handles all front-end data management and may also receive data input from a user.

The system may be configured to receive a user request for a detection reaction on a sample. The user request may be direct or indirect. Examples of direct requests include requests transmitted through an input device such as a keyboard, mouse, or touch screen. Examples of indirect requests include transmission over a communications medium, such as transmission over the internet (wired or wireless).

The system may further comprise an amplification system that performs a nucleic acid amplification reaction on the sample or portion thereof in response to a user request. There are a variety of methods available for amplifying polynucleotides (e.g., DNA and/or RNA). Amplification may be linear, exponential, or involve both linear and exponential stages in a multi-stage amplification process. The amplification method may involve a change in temperature, such as a thermal denaturation step, or may be an isothermal process that does not require thermal denaturation. Non-limiting examples of suitable amplification processes are described herein, as described with respect to any of the various aspects of the present disclosure. In some embodiments, the amplification comprises Rolling Circle Amplification (RCA). A variety of systems for amplifying polynucleotides are available and may vary based on the type of amplification reaction to be performed. For example, for an amplification method comprising temperature cycling, the amplification system can comprise a thermal cycler. Amplification systems may include real-time amplification and detection instruments such as those manufactured by Applied Biosystems, Roche, and Strategene. In some embodiments, the amplification reaction comprises the steps of: (i) circularizing individual polynucleotides of said sample to form a plurality of circularized polynucleotides; and (ii) amplifying the plurality of circularized polynucleotides to form a plurality of concatemers, each concatemer comprising a plurality of sequence repeats. The sample, polynucleotides, primers, polymerase and other reagents may be those described herein, as described with respect to any of the various aspects. Provided herein are non-limiting examples of circularization processes (e.g., with and without the use of adaptor oligonucleotides), reagents (e.g., type of adaptor, use of ligase), reaction conditions (e.g., to facilitate self-ligation), and optionally additional processing (e.g., post-reaction purification), as described with respect to any of the various aspects of the present disclosure. The system may be selected and/or designed to perform any such method.

The system may further comprise a distribution system that distributes the plurality of concatemers into a plurality of partitions such that no more than one concatemer comprising a target sequence on average is present in an individual partition. The partitioning system can include any number of systems that can separate a mixture comprising a plurality of polynucleotides into separate partitions. In some cases, the dispensing system is a droplet-based dispensing system, including a microfluidic-based droplet system, such as systems commercially available from Bio-Rad, Raindance Technologies, 10X Genomics, and the like. In some cases, the dispensing system is a microplate-based dispensing system, such as those commercially available from Becton, Dickinson and Company (Cellular Research), Session Bio, Takara (wafergen), and the like.

The system may further include a detection system that detects a level of the first signal and a level of the second signal from the individual partitions. In some cases, a first signal is generated when the first probe binds to a target sequence lacking a sequence variant and a second signal is generated when the second probe binds to a target sequence containing a sequence variant. The detection system may include any number of optical configurations including, for example, a light source (e.g., Light Emitting Diodes (LEDs) for illuminating individual zones), lenses, filters, dichroic mirrors, or any combination thereof. The detection system may further comprise a light detector for detecting light signals from the plurality of zones.

The system may further comprise a report generator that sends a report to the recipient, wherein the report comprises the detection of the sequence variant. For example, the report generator can generate a report identifying the presence of sequence variants in the sample. Additionally or alternatively, the report can identify the absence of sequence variants in the sample. Additionally or alternatively, the report can identify false positives generated by digital PCR assays. In some cases, false positives can be excluded from the report. In other cases, false positives may be flagged or identified on the report as false positives. Reports may be generated in real-time and updated periodically as the process progresses. Additionally or alternatively, a report may be generated at the end of the analysis. The report can be generated automatically, for example, when the system completes the step of identifying the presence or absence of a sequence variant. In some embodiments, the report is generated in response to an instruction from a user. In addition to the results of detecting sequence variants, the report may also comprise an analysis based on one or more sequence variants. For example, where one or more sequence variants are associated with a particular contaminant or phenotype, the report can contain information about the association, such as the likelihood of the presence of the contaminant or phenotype, the level of presence, and optionally recommendations based on such information (e.g., additional tests, monitoring, or remedial measures). The report may take a variety of forms. It is contemplated that data associated with the present disclosure may be transmitted over such a network or connection (or any other suitable means for transmitting information, including but not limited to mailing physical reports, such as printed reports) for receipt and/or viewing by a recipient. The recipient may be, but is not limited to, an individual or an electronic system (e.g., one or more computers, and/or one or more servers).

In another aspect, the present disclosure provides a computer-readable medium containing code that, when executed by one or more processors, performs a method of detecting sequence variants. In some embodiments, the method of this implementation comprises: a) receiving a user request for a detection reaction on a sample; b) performing a nucleic acid amplification reaction on the sample or portion thereof in response to the user request, wherein the amplification reaction comprises: (i) circularizing individual polynucleotides of said sample to form a plurality of circularized polynucleotides; and (ii) amplifying the plurality of circularized polynucleotides to form a plurality of concatemers, each concatemer comprising a plurality of sequence repeats; c) partitioning the plurality of concatemers into a plurality of partitions such that on average no more than one concatemer comprising a target sequence is present in individual partitions, wherein individual partitions in the plurality of partitions contain at least one of a first probe and a second probe, wherein the first probe binds to the plurality of sequence repeats lacking the sequence variant and generates a first signal and the second probe binds to the plurality of sequence repeats containing the sequence variant and generates a second signal; d) detecting the first signal and the second signal from the separate partitions; and e) identifying the sequence variant as present only if the level of the second signal exceeds a threshold level indicative of one copy of the target sequence and the level of the first signal is below a threshold level indicative of one copy of the target sequence; and (f) generating a report comprising the results of the detection of the sequence variant.

In some embodiments, the implemented methods further comprise identifying the sequence variant as absent when the level of the first signal exceeds a threshold level indicative of one copy of the target sequence and the level of the second signal is below a threshold level indicative of one copy of the target sequence. In some embodiments, the implemented methods further comprise identifying a false positive when the level of the first signal exceeds a threshold level indicative of one copy of the target sequence and the level of the second signal exceeds a threshold level of one copy of the target sequence.

A machine-readable medium containing computer-executable code may take many forms, including but not limited to, a tangible storage medium, a carrier wave medium, or a physical transmission medium. Non-volatile storage media include, for example, optical or magnetic disks, any storage device in any computer, etc., such as may be used to implement a database, etc. Volatile storage media includes dynamic memory, such as the main memory of such a computer platform. Tangible transmission media include coaxial cables, copper wire and fiber optics, including the wires that comprise a bus within a computer system. Carrier-wave transmission media can take the form of electrical or electromagnetic signals, or acoustic or light waves, such as those generated during Radio Frequency (RF) and Infrared (IR) data communications. Thus, common forms of computer-readable media include, for example: a floppy disk, a flexible disk, hard disk, magnetic tape, any other magnetic medium, a CD-ROM, DVD or DVD-ROM, any other optical medium, punch cards, paper tape, any other physical storage medium with patterns of holes, a RAM, a ROM, a PROM, and EPROM, a FLASH-EPROM, any other memory chip or cartridge, a carrier wave transporting data or instructions, cables or links transporting such a carrier wave, or any other medium from which a computer can read programming code and/or data. Many of these forms of computer readable media may be involved in carrying one or more sequences of one or more instructions to a processor for execution.

The present computer executable code may be executed on any suitable device comprising a processor, including a server, a PC or a mobile device, such as a smartphone or tablet computer. Any controller or computer optionally includes a monitor, which may be a cathode ray tube ("CRT") display, a flat panel display (e.g., active matrix liquid crystal display, etc.), or other monitor. The computer circuitry is typically housed in a box that contains a number of integrated circuit chips, such as microprocessors, memory, interface circuits, and the like. The case also optionally contains a hard disk drive, a floppy disk drive, a high capacity removable drive such as a writable CD-ROM, and other common peripheral components. An input device, such as a keyboard, mouse, or touch-sensitive screen, optionally provides input from a user. The computer may include suitable software for receiving user instructions, which may be in the form of user input as a set of parameter fields, such as a GUI, or as pre-programmed instructions, such as instructions pre-programmed for a variety of different specific operations.

In some embodiments of any of the various aspects of the present disclosure, the methods, compositions, and systems have therapeutic applications, such as characterizing a patient sample and optionally diagnosing a condition of a subject. Therapeutic applications may also include informing a patient of the selection of a therapy that is likely to have the best response (also referred to as "diagnosis") based on the results of the methods described herein, as well as the actual treatment of a subject in need thereof. In particular, the methods and compositions disclosed herein can be used to diagnose the presence, progression, and/or metastasis of tumors, particularly when the polynucleotides analyzed comprise or consist of cfDNA, ctDNA, or fragmented tumor DNA. In some embodiments, the efficacy of treatment on the subject is monitored. For example, by monitoring ctDNA over time, a decrease in ctDNA may be used as an indication of an effective treatment, while an increase in ctDNA may prompt selection of a different treatment or a different dose. Other uses include the assessment of organ rejection in transplant recipients (an increase in the amount of circulating DNA corresponding to the transplant donor genome is used as an early indicator of transplant rejection), and genotyping/isotyping of pathogen infections such as viral or bacterial infections (isotyping). Detection of sequence variants in circulating fetal DNA can be used to diagnose a fetal condition.

As used herein, "treat" or "treating" or "alleviating" or "improving" are used interchangeably. These terms refer to a route by which a beneficial or desired result, including but not limited to a therapeutic benefit and/or a prophylactic benefit, is obtained. By therapeutic benefit is meant any therapeutically relevant improvement or effect on one or more diseases, conditions or symptoms in treatment. For prophylactic benefit, the composition may be administered to a subject at risk of developing a particular disease, condition, or symptom, or to a subject reporting one or more physiological symptoms of a disease, even though the disease, condition, or symptom may not have yet been manifested. Generally, a prophylactic benefit includes reducing the incidence and/or slowing the progression of one or more diseases, conditions, or symptoms in a treatment (e.g., between a treated population and an untreated population, or between a treated state and an untreated state of a subject). Improving the therapeutic outcome may include diagnosing a condition of the subject in order to identify whether the subject would benefit from treatment with one or more therapeutic agents, or other therapeutic intervention (e.g., surgery). In such diagnostic applications, the overall successful treatment rate with one or more therapeutic agents may be improved (e.g., a measurement of treatment efficacy that is improved by at least about 10%, 20%, 30%, 40%, 50%, 60%, 70%, 80%, 90%, 95% or more) relative to effectiveness in patients not diagnosed according to the methods of the present disclosure.

The terms "subject", "individual" and "patient" are used interchangeably herein to refer to a vertebrate, preferably a mammal, more preferably a human. Mammals include, but are not limited to, rats, monkeys, humans, farm animals, sports animals, and pets. Also included are tissues, cells and progeny of the biological entities obtained in vivo or cultured in vitro.

The terms "therapeutic agent," "therapeutic agent," or "treatment agent" are used interchangeably and refer to a molecule or compound that brings about a beneficial effect upon administration to a subject. The beneficial effects include the realization of accurate diagnosis; a reduction in a disease, symptom, disorder, or pathological condition; reducing or preventing the onset of a disease, symptom, disorder, or condition; and generally against diseases, symptoms, disorders or pathological conditions.

In some embodiments of the various methods described herein, the sample is from a subject. The subject can be any organism, non-limiting examples of which include plants, animals, fungi, protists, anucleate protists, viruses, mitochondria, and chloroplasts. The sample polynucleotide may be isolated from a subject, such as a cell sample, tissue sample, bodily fluid sample, or organ sample (or cell culture from any of them), including, for example, a cultured cell line, biopsy, blood sample, buccal swab, or a fluid sample containing cells (e.g., saliva). In some cases, the sample does not contain intact cells, is treated to remove cells, or isolates polynucleotides without performing a cell extraction step (e.g., isolating cell-free polynucleotides, such as cell-free DNA). Other examples of sample sources include samples from blood, urine, feces, nostrils, lungs, intestines, other bodily fluids or excretions, substances derived therefrom, or combinations thereof. The subject can be an animal, including but not limited to, cattle, pigs, mice, rats, chickens, cats, dogs, etc., and is typically a mammal, such as a human. In some embodiments, the sample comprises tumor cells, such as a tumor tissue sample from a subject. In some embodiments, the sample is a blood sample or a portion thereof (e.g., plasma or serum). Serum and plasma may be of particular interest because of the relative enrichment of tumor DNA associated with higher malignant cell mortality in such tissues. The sample may be a fresh sample, or a sample that has undergone one or more storage procedures (e.g., a paraffin-embedded sample, particularly a formalin-fixed, paraffin-embedded (FFPE) sample). In some embodiments, a sample from a single individual is divided into a plurality of individual samples (e.g., 2, 3, 4, 5,6, 7,8, 9, 10 or more individual samples) that are independently subjected to the methods of the present disclosure, such as a duplicate, triplicate, quadruplicate, or more duplicate analysis. When the sample is from a subject, the reference sequence may also be derived from the subject, such as a consensus sequence from an analytical sample or a sequence of a polynucleotide from another sample or tissue from the same subject. For example, a blood sample may be analyzed for ctDNA mutations while cellular DNA from another sample (e.g., a buccal or skin sample) is analyzed to determine a reference sequence.

Polynucleotides may be extracted from a sample according to any suitable method, with or without extraction from cells in the sample. There are a variety of kits available for extracting polynucleotides, the choice of which may depend on the type of sample or the type of nucleic acid to be isolated. Provided herein are examples of extraction methods, as described with respect to any of the various aspects disclosed herein. In one example, the sample may be a blood sample, such as a sample collected in an EDTA tube (e.g., BD Vacutainer). Plasma can be separated from peripheral blood cells by centrifugation (e.g., at 1900Xg for 10 minutes at 4 ℃). Plasma separation on a 6mL blood sample in this manner will typically produce 2.5 to 3mL of plasma. Circulating cell-free DNA can be extracted from plasma samples, such as by using the QIAmp Circulating Nucleic Acid Kit (Qiagene), according to the manufacturer's protocol. The DNA can then be quantified (e.g., on an Agilent 2100 bioanalyzer using a high sensitivity DNA kit (Agilent)). For example, the yield of circulating DNA from such plasma samples from healthy persons may be 1ng to 10ng per ml of plasma, significantly higher in cancer patient samples.

Polynucleotides may also be derived from stored samples, such as frozen or archived samples. One common method for storing samples is to formalin fix and paraffin embed them. However, this process is also associated with degradation of nucleic acids. The polynucleotides processed and analyzed from the FFPE sample may include short polynucleotides, such as fragments of 50-200 base pairs or less. There are many techniques available for the purification of nucleic acids from fixed paraffin-embedded samples, such as the method described in WO2007133703 and the methods described by Foss et al, Diagnostic Molecular Pathology, (1994)3: 148-. Commercially available kits are available for purifying polynucleotides from FFPE samples, such as the Recoverall Total Nucleic acid Isolation kit from Ambion. A typical process begins with the step of removing paraffin from the tissue by extraction with xylene or other organic solvent, followed by treatment with heat and a protease such as proteinase K, which cleaves tissue and proteins and helps release genomic material from the tissue. The released nucleic acid can then be captured on a membrane or precipitated from solution, washed to remove impurities, and in the case of mRNA isolation, a dnase treatment step is sometimes added to degrade unwanted DNA. Other methods of extracting FFPE DNA are available and may be used in the methods of the present disclosure.

In some embodiments, the plurality of polynucleotides comprises cell-free polynucleotides, such as cell-free dna (cfdna) or circulating tumor dna (ctdna). Cell-free DNA circulates in both healthy and diseased individuals. Cfdna (ctdna) from tumors is not limited to any particular cancer type, but appears to be a common finding in different malignancies. According to some measurements, the free circulating DNA concentration in plasma is about 14-18ng/ml in control subjects and about 180-318ng/ml in neoplasia patients. Apoptosis and necrotic cell death result in cell-free circulating DNA in body fluids. For example, significant increases in circulating DNA levels are observed in plasma of prostate cancer patients and other prostate diseases such as benign prostatic hyperplasia and prostatitis. In addition, circulating tumor DNA is present in body fluids derived from the organ in which the primary tumor is occurring. Thus, breast cancer detection can be achieved in catheter lavage; colorectal cancer detection is achieved in stool; lung cancer detection is achieved in sputum, and prostate cancer detection is detected in urine or semen. Cell-free DNA can be obtained from a variety of sources. One common source is a blood sample from a subject. However, cfDNA or other fragmented DNA can come from a variety of other sources. For example, urine and fecal samples can be a source of cfDNA including ctDNA.

In some embodiments, the polynucleotide is subjected to subsequent steps (e.g., circularization and amplification) without performing an extraction step and/or without performing a purification step. For example, a fluid sample can be processed to remove cells without performing an extraction step to produce a purified liquid sample and a cell sample, followed by isolation of DNA from the purified fluid sample. A variety of procedures for isolating polynucleotides are available, such as by precipitation, or non-specific binding to a substrate and subsequent washing of the substrate to release the bound polynucleotides. Where the polynucleotide is isolated from a sample without a cell extraction step, the polynucleotide will be predominantly extracellular or "cell-free" polynucleotide. For example, a cell-free polynucleotide can include cell-free DNA (also referred to as "circulating" DNA). In some embodiments, the circulating DNA is circulating tumor DNA (ctdna) from a tumor cell, e.g., from a bodily fluid or an excreta (e.g., a blood sample). Tumors often exhibit apoptosis or necrosis, allowing tumor nucleic acids to be released into the body, including the subject's bloodstream, by a variety of mechanisms, in different forms and at different levels. In general, ctDNA can range in size from relatively small fragments (typically 70 to 200 nucleotides in length) at relatively high concentrations to large fragments up to thousands of kilobases at relatively low concentrations.

In some embodiments of any of the various aspects described herein, detecting the sequence variant comprises detecting a mutation in the background (e.g., a rare somatic mutation) or no mutation relative to the reference sequence, wherein the sequence variant is associated with a disease. In general, sequence variants that have statistical, biological, and/or functional evidence of association with a disease or trait are referred to as "causal genetic variants". A single causal genetic variant may be associated with more than one disease or trait. In some embodiments, the causal genetic variant may be associated with a mendelian trait, a non-mendelian trait, or both. A causal genetic variant may be represented by a variation of a polynucleotide, such as 1, 2, 3, 4, 5,6, 7,8, 9, 10, 20, 50 or more sequence differences (e.g., between a polynucleotide comprising the causal genetic variant and a polynucleotide lacking the causal genetic variant at the same relative genomic position). Non-limiting examples of causal genetic variant types include Single Nucleotide Polymorphisms (SNPs), deletion/insertion polymorphisms (DIPs), Copy Number Variants (CNVs), Short Tandem Repeats (STRs), Restriction Fragment Length Polymorphisms (RFLPs), Simple Sequence Repeats (SSRs), Variable Number Tandem Repeats (VNTRs), randomly amplified polymorphic DNA (rapds), Amplified Fragment Length Polymorphisms (AFLPs), retrotransposon-inter-amplification polymorphisms (IRAPs), long-short interspersed elements (LINEs/SINEs), long-tandem repeats (LTRs), mobile elements, retrotransposon microsatellite amplification polymorphisms, retrotransposon-based insertion polymorphisms, sequence-specific amplification polymorphisms, and heritable epigenetic modifications (e.g., DNA methylation). The causal genetic variant may also be a group of closely related causal genetic variants. Some causal genetic variants may function as sequence variations in RNA polynucleotides. At this level, some causal genetic variants are also indicated by the presence or absence of a certain RNA polynucleotide. In addition, some causal genetic variants result in sequence variations of the protein polypeptide. A number of causal genetic variants have been reported. Hb S variants of hemoglobin that cause sickle cell anemia are one example of causal genetic variants of SNPs. The delta508 mutation in the CFTR gene responsible for cystic fibrosis is an example of a causal genetic variant of DIP. Trisomy 21, which causes down syndrome, is an example of a causal genetic variant of CNV. Tandem repeats responsible for huntington's disease are one example of causal genetic variants of STRs. Table 1 provides non-limiting examples of causal genetic variants and diseases associated therewith. Other non-limiting examples of causal genetic variants are described in WO 2014015084. Table 2 provides additional examples of genes in which mutations are associated with disease and in which sequence variants can be detected according to the methods of the present disclosure.

TABLE 1 causal genetic variants and diseases associated therewith

TABLE 2 genes whose mutations may be associated with disease

In some embodiments, the method further comprises the step of diagnosing the subject based on identifying the sequence variant, e.g., diagnosing a subject with a disease associated with the detected causal genetic variant, or reporting the likelihood that the patient has suffered or will suffer from such a disease. Examples of diseases, related genes, and related sequence variants are provided herein. In some embodiments, the results are reported by a report generator, as described herein.

In some embodiments, the one or more causal genetic variants are sequence variants associated with a particular type or stage of cancer, or are sequence variants of cancers with particular characteristics (e.g., metastatic potential, drug resistance, drug responsiveness). In some embodiments, the present disclosure provides methods of determining prognosis, for example where certain mutations are known to correlate with patient outcome. For example, ctDNA has been shown to be a better prognostic biomarker for breast cancer than the traditional cancer antigen 53(CA-53) and circulating tumor cell count (see, e.g., Dawson et al, N Engl J Med 368:1199 (2013)). In addition, the methods of the present disclosure can be used for treatment decision making, guidance and monitoring, as well as development of cancer therapies and clinical trials. For example, treatment efficacy can be monitored by comparing patient ctDNA samples obtained before, during, and after treatment with a particular therapy, such as a molecular targeted therapy (monoclonal drugs), chemotherapeutic drugs, radiation therapy protocols, and the like, or combinations thereof. For example, ctDNA can be monitored to determine whether certain mutations increase or decrease after treatment, whether new mutations occur, etc., which can allow a physician to change treatment (e.g., continue, stop, or change treatment) in a shorter time than monitoring methods that track patient symptoms. In some embodiments, the method further comprises the step of diagnosing the subject based on the identifying step, e.g., diagnosing a subject with a particular stage or type of cancer associated with the detected sequence variant, or reporting the likelihood that the patient has or will have such cancer.

For example, for therapies that specifically target patients based on molecular markers such as Herceptin (Herceptin) and her2/neu status, patients are tested to determine if certain mutations are present in their tumors, and these mutations can be used to predict response or resistance to treatment and to guide the decision whether to use the therapy. Therefore, detecting and monitoring ctDNA during treatment may be very useful for guiding treatment options. Some primary (pre-treatment) or secondary (post-treatment) cancer mutations were found to be responsible for the resistance of cancer to certain therapies (Misale et al, Nature486(7404):532 (2012)).

A variety of sequence variants are known that are associated with one or more cancers and can be used for diagnosis, prognosis, or treatment decision. Suitable target sequences of oncological significance that may be used in the methods of the present disclosure include, but are not limited to, alterations of the TP53 gene, the ALK gene, the KRAS gene, the PIK3CA gene, the BRAF gene, the EGFR gene, and the KIT gene. The target sequence that can be specifically amplified and/or specifically analyzed for sequence variants can be all or part of a cancer-associated gene. In some embodiments, one or more sequence variants are identified in the TP53 gene. TP53 is one of the most common mutant genes in Human Cancers, for example, the TP53 mutation is found in 45% of ovarian Cancers, 43% of large bowel Cancers, and 42% of Cancers of the upper aerodigestive tract (see, e.g., M.Olivier et al, TP53 Mutations in Human Cancers: Origins, sequences, and Clinical use. Cold Spring Harb Perspectrum biol. 2010. 1 month; 2 (1)). Characterization of the mutational status of TP53 may aid in clinical diagnosis, provide prognostic value, and affect treatment of cancer patients. For example, the TP53 mutation may be used as a predictor of poor prognosis in patients with CNS tumors derived from glial cells, as well as a predictor of rapid disease progression in patients with chronic lymphocytic leukemia (see, e.g., McLendon RE et al, cancer.2005, 10/15; 104 (8): 1693-9; Dicker F et al, Leukemia.2009, 1/23 (1): 117-24). Sequence variations can occur anywhere within a gene. Thus, all or part of the TP53 gene can be evaluated here. That is, as described elsewhere herein, when using target-specific components (e.g., target-specific primers), multiple TP 53-specific sequences can be used, for example, to amplify and detect a fragment across the gene, rather than just one or more selected subsequences that can be used to select targets (e.g., mutation "hot spots"). Alternatively, target-specific primers can be designed that hybridize upstream or downstream of one or more selected subsequences (e.g., nucleotides or nucleotide regions associated with increased mutation rates in a class of subjects, also encompassed by the term "hot spots"). Standard primers can be designed across such subsequences and/or B2B primers can be designed that hybridize upstream or downstream of such subsequences.

In some embodiments, one or more sequence variants are identified in all or part of the ALK gene. ALK fusions have been reported in up to 7% of lung tumors, some of which are associated with EGFR Tyrosine Kinase Inhibitor (TKI) resistance (see, e.g., Shaw et al, J Clin Oncol.2009, 9/10/9; 27(26): 4247-4253). By 2013, several different point mutations across the entire ALK tyrosine kinase domain were found in patients with secondary resistance to ALK Tyrosine Kinase Inhibitors (TKIs) (Katayama R2012 Sci trans med.2012, day 2/8; 4 (120)). Therefore, detection of mutations in the ALK gene can be used to aid in cancer therapy decisions.

In some embodiments, one or more sequence variants are identified in all or part of the KRAS gene. It has been reported that approximately 15-25% of lung adenocarcinoma patients and 40% of colorectal cancer patients carry tumor-associated KRAS mutations (see, e.g., Neuman 2009, Pathol Res practice.2009; 205(12): 858-62). Most mutations were located at codons 12, 13 and 61 of the KRAS gene. These mutations activate the KRAS signaling pathway, triggering the growth and proliferation of tumor cells. Some studies indicate that patients with tumors bearing KRAS mutations are unlikely to benefit from anti-EGFR antibody therapy alone or in combination with chemotherapy (see, e.g., Amado et al, 2008J Clin oncol.2008.4 months 1; 26 (10): 1626-34, Bokemeyer et al, 2009J Clin oncol.2009, 2 months 10; 27(5): 663-71)). A particular sequence variation "hot spot" that can be targeted for the purpose of identifying sequence variations is at position 35 of the gene. The identification of KRAS sequence variants is useful in therapy selection, such as in colorectal cancer subjects.

In some embodiments, one or more sequence variants are identified in all or part of the PIK3CA gene. Somatic mutations in PIK3CA often occur in various types of cancer, for example, in 10-30% of colorectal cancers (see, e.g., Samuels et al, 2004science, 4/23/2004; 304(5670): 554.). These mutations are most often located in two "hot spot" regions within exon 9 (helical domain) and exon 20 (kinase domain) that can be specifically targeted for amplification and/or analysis for sequence variant detection. Location 3140 may also be specifically targeted.

In some embodiments, one or more sequence variants are identified in all or part of a BRAF gene. Nearly 50% of all malignant melanomas have been reported to carry somatic mutations in BRAF (see, e.g., Maldonado et al, J Natl Cancer Inst.2003, 12.17.s.; 95(24): 1878-90). BRAF mutations can be found in all melanoma subtypes, but are most commonly found in melanoma that originates from skin without chronic sun-induced damage. The most common BRAF mutation in melanoma is the missense mutation V600E, which replaces the valine at position 600 with glutamine. The BRAF V600E mutation is associated with clinical benefit of BRAF inhibitor therapy. The detection of BRAF mutations can be used in the selection of melanoma treatments and in the study of resistance to targeted therapies.

In some embodiments, one or more sequence variants are identified in all or part of the EGFR gene. EGFR mutations are commonly associated with non-small cell lung cancer (about 10% in the United states and about 35% in east Asia; see, e.g., Pao et al, Proc Natl Acad Sci USA, 2004, 9/7; 101(36): 13306-11). These mutations typically occur within exons 18-21 of EGFR and are often heterozygous. Approximately 90% of these mutations are exon 19 deletions or exon 21L858R point mutations.

In some embodiments, one or more sequence variants are identified in all or part of the KIT gene. It has been reported that nearly 85% of gastrointestinal stromal tumors (GISTs) carry KIT mutations (see, e.g., Heinrich et al, 2003J Clin Oncol.2003, 12 months 1; 21(23): 4342-9). Most KIT mutations can be found in the membrane-proximal domain (exon 11, 70%), the extracellular dimerization motif (exon 9, 10-15%), the tyrosine kinase i (tki) domain (exon 13, 1-3%) and the tyrosine kinase 2(TK2) domain and the activation loop (exon 17, 1-3%). Secondary KIT mutations are often identified following targeted therapy with imatinib and after patients develop resistance to therapy.

Other non-limiting examples of cancer-associated genes for which sequence variants of all or a portion thereof can be analyzed according to the methods described herein include, but are not limited to, PTEN; an ATM; ATR; an EGFR; ERBB 2; ERBB 3; ERBB 4; notch 1; notch 2; notch 3; notch 4; AKT; AKT 2; AKT 3; HIF; HIF1 a; HIF3 a; met; HRG; bcl 2; PPAR α; PPAR γ; WT1 (wilms' tumor); FGF receptor family members (5 members: 1, 2, 3, 4, 5); CDKN2 a; APC; RB (retinoblastoma); MEN 1; VHL; BRCA 1; BRCA 2; AR (androgen receptor); TSG 101; IGF; an IGF receptor; igf1(4 variants); igf2(3 variants); the Igf1 receptor; the Igf2 receptor; bax; bcl 2; the caspase family (9 members: 1, 2, 3, 4, 6, 7,8, 9, 12); kras; and Apc. Other examples are provided elsewhere herein. Examples of cancers that can be diagnosed based on identifying one or more sequence variants according to the methods disclosed herein include, but are not limited to, acanthoma, acinar cell carcinoma, acoustic neuroma, acromegaloblastic melanoma, apical helicoma, acute eosinophilic leukemia, acute lymphoblastic leukemia, acute megakaryoblastic leukemia, acute monocytic leukemia, mature-accompanied acute myeloblastic leukemia, acute myeloid dendritic cell leukemia, acute myeloid leukemia, acute promyelocytic leukemia, amelioma, adenocarcinoma, adenocarcinomas, adenoid cystic carcinoma, adenoma, odontogenic adenoid adenoma, adrenocortical carcinoma, adult T-cell leukemia, aggressive NK cell leukemia, AIDS-related cancer, AIDS-related lymphoma, soft tissue acinar sarcoma, ameloblastic fibroma, anal carcinoma, anaplastic large cell lymphoma, thyroid undifferentiated carcinoma, Angioimmunoblastic T-cell lymphoma, angiomyolipoma, angiosarcoma, appendiceal cancer, astrocytoma, atypical teratoid rhabdoid tumor, basal cell carcinoma, basal cell-like carcinoma, B-cell leukemia, B-cell lymphoma, Bellini's canal cancer, biliary tract cancer, bladder cancer, blastoma, bone cancer, osteoma, brain stem glioma, brain tumor, breast cancer, Brenner tumor, bronchoma, bronchioloalveolar carcinoma, Brown tumor, Burkitt's lymphoma, carcinoma of unknown primary focus, carcinoid tumor, carcinoma in situ, penile cancer, carcinoma of unknown primary focus, carcinosarcoma, Casematln's disease, central nervous system embryoma, cerebellar astrocytoma, cerebral astrocytoma, cervical cancer, cholangioma, chondrosarcoma, chordoma, choriocarcinoma, choroidal plexus papilloma, chronic lymphocytic leukemia, chronic monocytic leukemia, Chronic myelogenous leukemia, chronic myeloproliferative disease, chronic neutrophilic leukemia, clear cell tumors, colon cancer, colorectal cancer, craniopharyngioma, cutaneous T-cell lymphoma, Degos disease, dermatofibrosarcoma protruberans, dermoid cysts, desmoplastic small round cell tumors, diffuse large B-cell lymphoma, dysplastic neuroepithelioma of embryos, embryonic carcinoma, endoblastoma, endometrial carcinoma, endometrial uterine endometrioid tumor, enteropathy-related T-cell lymphoma, ependymoma, epithelioid sarcoma, erythroleukemia, esophageal cancer, nasal glioma, ewing family tumor, ewing family sarcoma, ewing sarcoma, extracranial germ cell tumor, extragonadal bile duct cancer, non-mammary paget disease, fallopian tube cancer, fetal fetus, fibroma, melanoma, neuroblastoma, fibrosarcoma, follicular lymphoma, follicular thyroid cancer, gallbladder cancer, ganglioglioma, ganglioma, gastric cancer, gastric lymphoma, gastrointestinal cancer, gastrointestinal carcinoid tumor, gastrointestinal stromal tumor, germ cell tumor, choriocarcinoma of pregnancy, trophoblastic tumor of pregnancy, giant cell tumor of bone, glioblastoma multiforme, glioma, cerebral glioma, hemangioblastoma, glucagon tumor, adenoblastoma, granulosa tumor, hairy cell leukemia, head and neck cancer, cardiac cancer, hemangioblastoma, hemangioepithelioma, angiosarcoma, hematologic malignancy, hepatocellular carcinoma, hepatosplenic T-cell lymphoma, hereditary breast cancer-ovarian cancer syndrome, Hodgkin's lymphoma, hypopharynx cancer, hypothalamic glioma, Inflammatory breast cancer, intraocular melanoma, islet cell carcinoma, juvenile myelomonocytic leukemia, kaposi's sarcoma, kidney cancer, Klatskin tumor, Krukenberg tumor, larynx cancer, malignant lentigo melanoma, leukemia, lip and oral cancer, liposarcoma, lung cancer, corpus luteum tumor, lymphangioma, lymphangiosarcoma, lymphoepithelioma, lymphoid leukemia, lymphoma, macroglobulinemia, malignant fibrous histiocytoma of bone, malignant glioma, malignant mesothelioma, malignant peripheral nerve sheath tumor, malignant rhabdomyosarcoma, malignant trientoma, MALT lymphoma, mantle cell lymphoma, mast cell lymphoma, mediastinal germ cell tumor, mediastinal tumor, thyroid medullary cancer, medulloblastoma, medullobla, Melanoma, meningioma, Merkel cell carcinoma, mesothelioma, primary-focus occult metastatic squamous neck cancer, metastatic urothelial cancer, mixed Mullerian tumors, monocytic leukemia, oral cancer, myxoma, multiple endocrine tumor syndrome, multiple myeloma, mycosis fungoides, myelodysplastic diseases, myelodysplastic syndromes, myeloid leukemia, myeloid sarcoma, myeloproliferative diseases, myxoma, nasal cancer, nasopharyngeal cancer, neoplasms, schwannoma, neuroblastoma, neurofibroma, neuroma, nodular melanoma, non-hodgkin's lymphoma, non-melanoma skin cancer, non-small cell lung cancer, ocular tumor, oligodendroglioma, eosinophilic adenoma, melanoma, and non-small cell lung cancer, Optic nerve sheath meningioma, oral cancer, oropharyngeal cancer, osteosarcoma, ovarian cancer, ovarian epithelial cancer, ovarian germ cell tumor, ovarian low malignant potential tumor, breast Paget's disease, Pancoast tumor, pancreatic cancer, papillary thyroid cancer, papillomatosis, paraganglioma, paranasal sinus cancer, parathyroid cancer, penile cancer, perivascular epithelioid cell tumor, pharyngeal cancer, pheochromocytoma, moderately differentiated pineal parenchyma tumor, pineal cell tumor, pituitary adenoma, pituitary tumor, plasmacytoma, pleuropulmonoblastoma, polyembroma, precursor T lymphoblastic lymphoma, primary central nervous system lymphoma, primary effusion lymphoma, primary hepatocellular carcinoma, primary liver carcinoma, primary peritoneal carcinoma, primary neuroectodermal tumor, prostate carcinoma, peritoneal pseudomyxoma, ovarian carcinoma, penile carcinoma, perivascular epithelioma, pharyngeal cell tumor, rectal cancer, renal cell carcinoma, respiratory tract cancer involving the NUT gene on chromosome 15, retinoblastoma, rhabdomyoma, rhabdomyosarcoma, Richter transformation, sacrococcal teratoma, salivary gland cancer, sarcoma, schwannoma Schwannomatosis, sebaceous gland cancer, secondary tumor, seminoma, serous tumor, Sertoli-Leydig cell tumor, sex cord stromal tumor, Sezary syndrome, Signet cell carcinoma, skin cancer, small blue circular cell tumor, small cell cancer, small cell lung cancer, small cell lymphoma, small intestine cancer, soft tissue sarcoma, somatostatin tumor, sooty wart, myeloma, spinal cord tumor, splenic marginal zone lymphoma, squamous cell carcinoma, gastric cancer, superficial diffuse melanoma, supratentorial primitive neuroectodermal tumor, superficial epithelial-interstitial tumor, synovial sarcoma, T-cell acute lymphoblastic leukemia, T-cell large granular lymphocytic leukemia, leukemia, T cell leukemia, T cell lymphoma, T cell prolymphocytic leukemia, teratoma, advanced lymphoma, testicular cancer, alveolar cell tumor, laryngeal cancer, thymic cancer, thymoma, thyroid cancer, transitional cell carcinoma of the renal pelvis and ureter, transitional cell carcinoma, cancer of the umbilical duct, cancer of the urinary tract, genitourinary system tumors, uterine sarcoma, uveal melanoma, vaginal cancer, Verner Morrison syndrome, verrucous cancer, retinoblastoma, vulval cancer, waldenstrom's macroglobulinemia, wardhin's tumor, wilms ' tumor, and combinations thereof. Non-limiting examples of specific sequence variants associated with cancer are provided in table 3.

TABLE 3 specific sequence variants that may be associated with cancer

In addition, the methods and compositions disclosed herein can be used to discover novel rare mutations associated with one or more cancer types, stages, or cancer characteristics. For example, a population of individuals sharing an analyzed characteristic (e.g., a particular disease, cancer type, cancer stage, etc.) can be subjected to a method of detecting sequence variants according to the present disclosure to identify the sequence variant or type of sequence variant (e.g., a mutation of a particular gene or portion of a gene). Sequence variants identified as occurring at a statistically significantly higher frequency in a group of individuals sharing the characteristic can be assigned a degree of correlation with the characteristic as compared to individuals without the characteristic. The sequence variants or types of sequence variants so identified can then be used to diagnose or treat individuals found to carry them.

Other therapeutic applications include use in non-invasive fetal diagnosis. Fetal DNA can be found in the blood of pregnant women. The methods and compositions described herein can be used to identify sequence variants in circulating fetal DNA, and thus can be used to diagnose one or more genetic diseases in the fetus, such as genetic diseases associated with one or more causal genetic variants. Non-limiting examples of causal genetic variants are described herein, including trisomy, cystic fibrosis, sickle cell anemia, and Tay-Saks disease. In this embodiment, the mother may provide a control sample and a blood sample for comparison. The control sample may be any suitable tissue and is typically treated to extract cellular DNA, which can then be sequenced to provide a reference sequence. The cfDNA sequence corresponding to fetal genomic DNA can then be identified as a sequence variant relative to a maternal reference. The father may also provide a reference sample to aid in the identification of fetal sequences and sequence variants.

Still further therapeutic applications include the detection of exogenous polynucleotides, such as those from pathogens (e.g., bacteria, viruses, fungi, and microorganisms), which information may inform diagnosis and treatment options. For example, some HIV subtypes are associated with drug resistance (see, e.g., hivdb. Similarly, HCV typing, subtyping and isotype mutation can also be achieved using the methods and compositions of the present disclosure. Furthermore, in cases where HPV subtypes are associated with risk of cervical cancer, such diagnosis may further inform the assessment of cancer risk. Further non-limiting examples of detectable viruses include Hepatitis B Virus (HBV), woodchuck hepatitis virus, hepatitis B virus of the earth squirrel (Heptodesoxyriboviridae), hepatitis virus of the duck, hepatitis B virus of the pallu, Herpes Simplex Virus (HSV) types 1 and 2, varicella-zoster virus, Cytomegalovirus (CMV), Human Cytomegalovirus (HCMV), cytomegalovirus of the mouse (MCMV), cytomegalovirus of the Guinea Pig (GPCMV), Epstein-Barr virus (EBV), human herpes virus 6(HHV variants A and B), human herpes virus 7(HHV-7), human herpes virus 8(HHV-8), Kaposi's sarcoma-associated herpes virus (KSHV), Poxvirus vaccinia virus of the B virus, variola virus, smallpox virus (smallpox virus), monkeypox virus, vaccinia virus, camelpox virus, poxvirus virus, yavirus, yawarrio virus, yavirus, yawarrio virus, and the like, Murine poxvirus, Leporipoxvirus, raccoon poxvirus, molluscum contagiosum virus, aphtha virus, parapox virus, vaccinia virus, capripoxvirus, sarcoidosis virus, fowlpox virus, canary pox virus, pigeon pox virus, sparrow pox virus, myxoma virus, lepori virus, rabbit fibroma virus, squirrel murine fibroma virus, hog pox virus, talapoxvirus, Paget pox virus, flavivirus dengue virus, Hepatitis C Virus (HCV), GB hepatitis viruses (GBV-A, GBV-B and GBV-C), West Nile virus, yellow fever virus, St.Louis encephalitis virus, Japanese encephalitis virus, Powassan virus, tick-borne encephalitis virus, qua forest disease virus, togavirus, Venezuelan Equine Encephalitis (VEE) virus, Kyorkinja virus, Podosteira virus, Podosteichoma virus, Podosteira virus, Powerera virus, Venezerian Equine Encephalitis (VEE) virus, K, Ross river virus, Mariro virus, Sindbis virus, rubella virus, retrovirus Human Immunodeficiency Virus (HIV) types 1 and 2, human T cell leukemia virus (HTLV) types 1, 2 and 5, Mouse Mammary Tumor Virus (MMTV), Rous Sarcoma Virus (RSV), lentiviruses, coronaviruses, Severe Acute Respiratory Syndrome (SARS) virus, filovirus Ebola virus, Marburg virus, Metapneumoviruses (MPV) such as Human Metapneumovirus (HMPV), rhabdovirus rabies virus, vesicular stomatitis virus, bunyavirus, Crimea-Congo hemorrhagic fever virus, rift valley fever virus, Rakrauss virus, Hantavirus, orthomyxovirus, influenza virus (A, B and C types), paramyxovirus, parainfluenza virus (PIV types 1, 2 and 3), respiratory syncytial virus (types A and B), syncytial virus (types A and B), Measles virus, mumps virus, gravel-like virus, lymphocytic choriomeningitis virus, junin virus, marhill virus, citrullinator virus, lassa virus, Ampari virus, flecker virus, epstein-barr virus, Mobala virus, moperia virus, latin america virus, barana virus, picard virus, pomtorola virus (PTV), tacaribe virus, and tamiram virus.

Specific examples of bacterial pathogens that can be detected by the methods of the present disclosure include, but are not limited to, any one or more (or any combination) of the following: acinetobacter baumannii (Acinetobacter baumannii), Actinomyces sp, Actinomyces Actinomycetes, Actinomyces sp, such as Actinomyces chlamydomonas and Actinomyces naeslundii, Aeromonas sp, such as Aeromonas hydrophila, Aeromonas hydrophila and Aeromonas sobria, Aeromonas hydrophila and Aeromonas caviae, phagocytophila, Acetobacter xylinum, Bacillus acidophilus, Acetobacter thermophilus, Bacillus stearothermophilus, Bacillus subtilis, and Bacillus subtilis, bacteroides sp (e.g. Bacteroides fragilis), Bartonella sp (e.g. Bartonella bacilli) and Bartonella henselae), Bifidobacterium sp (e.g. Bifidobacterium sp), Bordetella sp (e.g. Bordetella pertussis sp), Bordetella parapertussis (e.g. Bordetella pertussis sp), Bordetella bronchiseptica (e.g. Bordetella brachiatae), Bordetella sp (e.g. Bordetella terrestris) and Bordetella bronchiseptica), Bordetella sp (e.g. Bordetella terrestris) and Bordetella abortus sp (e.g. Bordetella terrestris), Bordetella sp (e.g. Bordetella terrestris) and Bordetella abortus), Bordetella sp (e.g. Bordetella terrestris sp) and Bordetella burdenella sp (e.g. burdenella), Bordetella sp (e sp), Bordetella sp) Bacillus (Burkholderia cepacia)), Campylobacter species (Campylobacter sp.) (such as Campylobacter jejuni (Campylobacter jejuni), Campylobacter coli (Campylobacter coli), Campylobacter iridescens (Campylobacter lari) and Campylobacter embryonated-bacterium), carbon dioxide Cellophilus species (Campylobacter sp.), Bacillus hominis (Cardiobacter hominis), Chlamydia trachomatis (Chlamydia trachymus), Chlamydia pneumoniae (Chlamydophila pneumoniae), Chlamydia thermophyla (Chlamydidium psidii.), Citrobacter species (Clostridium difficile), Clostridium difficile species (Clostridium difficile sp.), Clostridium difficile (Clostridium difficile), Clostridium difficile species (Clostridium difficile), Clostridium difficile (Clostridium difficile) and Clostridium difficile (Clostridium difficile) such as Clostridium difficile (Clostridium difficile), Clostridium difficile (Clostridium difficile) and Clostridium difficile (Clostridium difficile) strains (Clostridium difficile) are included in (Clostridium difficile), Clostridium difficile (Clostridium difficile) and Clostridium difficile (Clostridium, Escherichia coli, Enterobacter sp (Enterobacter rosins), Enterobacter species (Enterobacter sp.), Escherichia coli (Enterobacter aerogenes), Enterobacter agglomerans (Enterobacter agglomerans), Enterobacter cloacae (Enterobacter cloacae), and Escherichia coli (Escherichia coli), including opportunistic Escherichia coli, such as enterotoxigenic Escherichia coli, enteroinvasive Escherichia coli, enteropathogenic Escherichia coli, enterohemorrhagic Escherichia coli, enteroaggregative Escherichia coli, and uropathogenic Escherichia coli), Enterococcus species (Enterococcus sp.), such as Enterococcus faecalis and Enterococcus faecium, Escherichia coli, Escherichia species (Escherichia sp.), such as Escherichia coli and Escherichia coli, gardnerella vaginalis (Gardnerella vagiana), twin measles (Gemela morbifillum), Haemophilus species (Haemophilus sp.), Haemophilus influenzae (Haemophilus influenzae), Haemophilus ducreyi (Haemophilus ducreyi), Haemophilus ehrenbergii (Haemophilus aegyptius), Haemophilus parainfluenzae (Haemophilus parainfluenzae), Haemophilus haemolyticus (Haemophilus Haemophilus haemolyticus) and Haemophilus parahaemophilus parahaemolyticus), Helicobacter species (Helicobacter sp.), Helicobacter sp, such as Helicobacter pylori (Helicobacter pylori), Helicobacter sp, and Klebsiella pneumoniae (Klebsiella pneumoniae sp), Bacillus subtilis (Klebsiella pneumoniae), Bacillus subtilis sp), Bacillus subtilis (Klebsiella pneumoniae, Klebsiella pneumoniae (Klebsiella pneumoniae), Bacillus subtilis sp), Bacillus subtilis (Klebsiella pneumoniae), Bacillus sp), Bacillus subtilis (Klebsiella pneumoniae), and Klebsiella pneumoniae (Klebsiella pneumoniae), Bacillus sp), Bacillus subtilis (Klebsiella pneumoniae (Kl, Legionella pneumophila (Legionella pneumoniae), Leptospira interrogans, Streptococcus species (Peptostreptococcus sp.), Moraxella catarrhalis (Moraxella catarrhalis), Morganella species (Morganella sp.), Curvulus species (Mobilucus sp.), Micrococcus species (Micrococcus sp.), Mycobacterium species (Mycobacterium leprae), Mycobacterium tuberculosis (Mycobacterium tuberculosis), Mycobacterium intracellulare (Mycobacterium intracellularis), Mycobacterium avium (Mycobacterium bovis), Mycobacterium bovis (Mycobacterium pneumoniae) and Mycobacterium scrofulaceum (Mycobacterium sp)), Mycobacterium tuberculosis (Mycobacterium bovis) (Nocardia), Mycobacterium tuberculosis (Mycobacterium bovis (Mycobacterium), Mycobacterium species (Mycobacterium bovis (Mycobacterium pneumoniae), and Mycobacterium tuberculosis (Mycoplasma), Mycobacterium species (Mycoplasma bovis (Mycoplasma), Mycobacterium species (Mycoplasma bovis (Mycoplasma), Mycobacterium species (Mycoplasma sp), and Mycobacterium species (Mycoplasma), Mycoplasma species (Mycoplasma), Mycoplasma sp), Mycoplasma species (Mycoplasma), Mycoplasma species (Mycoplasma) and Mycoplasma species (Mycoplasma sp), Mycoplasma species (Mycoplasma) such as Mycoplasma species (Mycoplasma) and Mycoplanaria) of the genus (Mycoplanaria) of the genus, Mycoplanaria, My, Neisseria species (Neisseria sp.) such as Neisseria gonorrhoeae and Neisseria meningitidis (Neisseria meningitidis), Pasteurella multocida, Plesiomonas shigelloides, Prevotella sp, Porphyromonas sp, Prevotella melanosporum, Proteus sp, and Proteus mirabilis, Proteus sp, and Proteus mirabilis, Proteus sp, such as Proteus sp, Proteus mirabilis sp, and Proteus mirabilis, Proteus sp, such as Proteus sp, and Proteus sp, and Proteus sp, such as Proteus sp, Proteus sp, Proteus sp, Proteus, and Proteus, or a, Mite Rickettsia (Rickettsia akari) and Rickettsia pustulosis (Rickettsia prowazekii)), Orientia tsutsutsugamushi (Orientia tsutsutsusumimushi) (previously known as: serratula thermorosea (Rickettsia tsutusgamushi) and Rickettsia typhi (Rickettsia typhi), Rhodococcus species (Rhodococcus sp.), Serratia marcescens (Serratia marcescens), Stenotrophomonas maltophilia (Stenotrophora maltophila), Salmonella species (Salmonella sp.), Salmonella enterica (Salmonella enterica), Salmonella typhi (Salmonella typhi), Salmonella paratyphi (Salmonella paratyphi), Salmonella enteritidis (Salmonella enteritidis), Salmonella choleraesuis (Salmonella choleraesuis) and Salmonella typhimurium (Salmonella typhimurium)), Serratia species (Serratia sp), Salmonella typhimurium (Shigella) such as Serratia and Shigella liquidella Shigella), Shigella Shigella (Shigella dysenterica), Shigella species (Shigella dysenteriae) such as Salmonella sp, Shigella dysenteriae (Shigella dysenteriae), and Shigella flexnerla Shigella sp (Shigella dysenterica), and Shigella Shigella sp Staphylococcus species (Staphylococcus sp.) (such as Staphylococcus aureus (Staphyloccus aureus), Staphylococcus epidermidis (Staphyloccus epidermidis), Staphylococcus haemolyticus (Staphyloccus haemolyticus), Staphylococcus saprophyticus (Staphyloccus saprophyticus)), Streptococcus species (Streptococcus sp.) (such as Streptococcus pneumoniae (Streptococcus pneumoniae) resistant to serotype 4 such as Streptococcus pneumoniae (Chloramphenicol resistant serotype 4), Streptococcus pneumoniae (Streptococcus pneumoniae resistant serotype 6B), Streptococcus pneumoniae (Streptococcus resistant serotype 9V), Streptococcus pneumoniae (erythromycin resistant serotype 14), Streptococcus pneumoniae (Streptococcus oxtansine resistant serotype 14), Streptococcus pneumoniae (rifampicin resistant serotype 18C), Streptococcus pneumoniae (Streptococcus pneumoniae resistant serotype 19F), Streptococcus pneumoniae (penicillin resistant serotype 19F, and Streptococcus pneumoniae (methicillin resistant serotype 23F), Streptococcus pneumoniae (chloramphenicol resistant serotype 4, streptococcus pneumoniae of spectinomycin-resistant serotype 6B, Streptococcus pneumoniae of streptomycin-resistant serotype 9V, Streptococcus pneumoniae of ompetenin-resistant serotype 14 lung, Streptococcus pneumoniae of rifampicin-resistant serotype 18C, Streptococcus pneumoniae of penicillin-resistant serotype 19F, or Streptococcus pneumoniae of trimethoprim-resistant serotype 23F, Streptococcus agalactiae (Streptococcus agalactiae), Streptococcus mutans (Streptococcus mutans), Streptococcus pyogenes (Streptococcus pyogenes), group a Streptococcus, Streptococcus pyogenes, group B Streptococcus, Streptococcus agalactiae, group C Streptococcus, Streptococcus anginopeptitis (Streptococcus anginosus angusticis), Streptococcus equisimilis (Streptococcus equisimilis), group D Streptococcus, Streptococcus bovis (Streptococcus bovis), group F Streptococcus, and Streptococcus angiitis group G Streptococcus, Streptococcus mitis (spirillus), Streptococcus beaded (Streptococcus), Streptococcus Treponema (Treponema such as Treponema sp) Treponema pallidum, Treponema pallidum and Treponema endemicum, tropihoma whippelii, Ureaplasma urealyticum, Veillonella sp, Vibrio cholerae, Vibrio parahaemolyticus, Vibrio vulnificus, Vibrio vulgaris, Vibrio parahaemolyticus, Vibrio vulgaris, Vibrio parahaemolyticus, vibrio vulnificus, Vibrio alginolyticus (Vibrio algolyticus), Vibrio mimicus (Vibrio mimicus), Vibrio hollisae (Vibrio hollisae), Vibrio fluvialis (Vibrio fluvialis), Vibrio metchnikovii (Vibrio metchnikovii), Vibrio marini (Vibrio dalela) and Vibrio furnisii), Yersinia species (Yersinsia), such as Yersinia enterocolitica (Yersinia enterocolitica), Yersinia pestis (Yersinia pestis) and Yersinia pseudotuberculosis (Yersinia pseudotuberculosis), and Xanthomonas maltophilia (Xanthomonas malphilia).

In some embodiments, the methods and compositions of the present disclosure are used in monitoring organ transplant recipients. It will generally be found that polynucleotides from donor cells circulate in the context of polynucleotides from recipient cells. If the organ is well accepted, the level of donor circulating DNA is generally stable, and a rapid increase in donor DNA (e.g., as a frequency in a given sample) can be used as an early indication of transplant rejection. Treatment may be given at this stage to prevent graft failure. Rejection of donor organs has been shown to result in an increase in donor DNA in the blood; see Snyder et al, PNAS 108(15):6629 (2011). The present disclosure provides significant sensitivity improvements over the prior art in this field. In this embodiment, a recipient control sample (e.g., a buccal swab, etc.) and a donor control sample can be used for comparison. The recipient sample can be used to provide a reference sequence, and the sequence corresponding to the donor genome can be identified as a sequence variant relative to the reference sequence. Monitoring may include obtaining a sample (e.g., a blood sample) from a recipient over a period of time. Early samples (e.g., samples within the first few weeks) can be used to establish a baseline for the fraction of donor cfDNA. Subsequent samples may be compared to the baseline. In some embodiments, an increase in the fraction of donor cfDNA by about or at least about 10%, 20%, 30%, 40%, 50%, 60%, 70%, 80%, 90%, 95%, 100%, 250%, 500%, 1000%, or more can be an indication that the recipient is rejecting the donor tissue.

Example (b):

the following examples are given for the purpose of illustrating various embodiments of the invention and are not meant to limit the invention in any way. These examples, as well as the methods presently representative of preferred embodiments, are illustrative and not intended to limit the scope of the invention. Variations thereof and other uses will occur to those skilled in the art which are encompassed within the spirit of the invention as defined by the scope of the claims.

Example 1: ddPCR analysis of cancer variants using WGA-amplified short-fragmented cfDNA reference standard samples

cfDNA reference standards were made by mixing short fragmented DNA of approximately 150bp in size from different cancer cell lines with NA12878 at different ratios. Four different cfDNA reference standards were used in this study: 5ng of 0.25% reference standard; 10ng of 0.25% reference standard; 20ng of 0.1% reference standard; and 20ng of 0% reference standard.

Each sample had 3 replicates. The DNA sample was denatured at 96 ℃ for 30 seconds and cooled on ice for 2 minutes. Addition of ligation mix (2. mu.L of 10 XCircLigase buffer, 4. mu.L of SM betaine, 1. mu.L of 50mM MnCh, 1. mu.L of CircLigase II (Epicentre # CL9025K)) was established on cold blocks and ligation was performed at 60 ℃ for 3 hours. The ligation DNA mixture was incubated at 80 ℃ for 45 seconds on a PCR instrument followed by exonuclease treatment. mu.L exonuclease cocktail (Exol 20U/. mu.L: ExoIII 100U/. mu.L ═ 1:2) was added to each tube and the reaction was incubated at 37 ℃ for 30 min. The ligation mixture was denatured at 95 ℃ for 2 minutes and cooled To 4 ℃ on ice before adding Ready-To-Go genomeipi V3 cake (WGA). The WGA reaction was incubated at 30 ℃ for 4.5 hours, followed by heat inactivation at 65 ℃ for 10 minutes.

The WGA product was bead purified using AmpureXP magnetic beads and sonicated to an average size of 800 bp. Aliquots of the sonicated DNA samples were then used as inputs to the ddPCR analysis of the following variants: EGFRL 858R; EGFR 719S; and egfr t 790M. The Taqman primer and probe sequences used for this assay are provided in table 4. Small droplet digital PCR reactions were performed according to the manufacturer's instructions. (QX 200)TM Droplet DigitalTMPCR System, Bio-Rad Laboratories)

TABLE 4 Taqman primer and probe sequences used to detect EGFR sequence variants according to the methods provided herein.

FIGS. 9A-9D, FIGS. 10A-10D, and FIGS. 11A-11D depict results obtained from digital PCR assays. Fig. 9A, 9B, 9C, and 9D depict results obtained from digital PCR assays that identify sequence variant EGFRL 858R. Fig. 10A, 10B, 10C, and 10D depict the results obtained from a digital PCR assay to identify sequence variants EGFRG 719S. Fig. 11A, 11B, 11C, and 11D depict results obtained from digital PCR assays to identify sequence variant EGFR _ T790M. Each point in the figure corresponds to a respective droplet partition, each droplet partition containing on average one concatemer comprising the target sequence. The Y-axis corresponds to the signal level measured in channel 1(FAM), which is proportional to the amount of mutant amplicons generated in the individual partitions. The X-axis corresponds to the signal level measured in channel 2(HEX) which is proportional to the amount of wild type amplicon generated in the individual partitions. The user follows the manufacturer's instructions (QX 200)TM Droplet DigitalTMPCR System, Bio-Rad Laboratories) set the threshold level for each channel.

Droplets that produce a signal in channel 2 (wild-type probe) that exceeds a threshold level (e.g., are positive) and fail to produce a signal in channel 1 (mutant probe) that exceeds a threshold level (e.g., are negative) are considered to contain wild-type copies. Droplets that produce a signal in channel 1 (the mutation probe) that exceeds a threshold level (e.g., are positive) and fail to produce a signal in channel 2 (the mutation probe) that exceeds a threshold level (e.g., are negative) are considered to contain mutant copies (depicted as squares drawn around the various points in fig. 9A-9D, fig. 10A-10D, and fig. 11A-11D). Generation of signals above threshold level in channel 2 (wild-type probe)

Droplets that are (i.e., positive) and that produce a signal in channel 1 (the mutation probe) that exceeds a threshold level (e.g., positive) are considered to contain false positives and are excluded from the analysis (depicted as circles drawn around the various points in fig. 9A-9D, 10A-10D, and 11A-11D). The average detection rate was calculated for each input and allele frequency as shown in table 5. No false positive determinations were detected in any blank samples.

Table 5.

Other sequence variants can be detected using the methods described in example 1. Non-limiting examples of mutant and wild-type probes and forward and reverse primers that can be used to detect other sequence variants are provided in table 6.

TABLE 6 Taqman primer and probe sequences that can be used to detect sequence variants according to the methods provided herein.

While preferred embodiments of the present invention have been shown and described herein, it will be obvious to those skilled in the art that such embodiments are provided by way of example only. Numerous variations, changes, and substitutions will now occur to those skilled in the art without departing from the invention. It should be understood that various alternatives to the embodiments of the invention described herein may be employed in practicing the invention. It is intended that the following claims define the scope of the invention and that methods and structures within the scope of these claims and their equivalents be covered thereby.

87页详细技术资料下载
上一篇:一种医用注射器针头装配设备
下一篇:用于测序的引物寡核苷酸

网友询问留言

已有0条留言

还没有人留言评论。精彩留言会获得点赞!

精彩留言,会给你点赞!

技术分类