Formamide-free target enrichment composition for next generation sequencing applications

文档序号:883840 发布日期:2021-03-19 浏览:8次 中文

阅读说明:本技术 用于下一代测序应用的无甲酰胺靶标富集组合物 (Formamide-free target enrichment composition for next generation sequencing applications ) 是由 J·S·蔡 G·库什瓦哈 C·莫伦坎普 D·阮 D·拉特曼 于 2019-07-26 设计创作,主要内容包括:本发明为一种用于测序工作流程中的核酸杂交溶液的新型组合物。(The present invention is a novel composition for use in nucleic acid hybridization solutions in sequencing workflows.)

1. A method of enriching a target nucleic acid in a nucleic acid solution, comprising the steps of:

a. isolating nucleic acids in a sample solution;

b. contacting the sample solution with a formamide-free hybridization solution comprising one or more single-stranded hybridization probes attached to a binding moiety and further comprising a solvent selected from the group consisting of dimethyl sulfoxide (DMSO), sulfolane, ethylene carbonate, pyrrolidone, or a primary amide;

c. incubating the sample under conditions that promote hybrid formation between the sample nucleic acid and the probe;

d. separating the hybrid by capturing the bound moiety.

2. The method of claim 1, wherein the pyrrolidone or amide has a structure selected from the group consisting of:

wherein R1 is H, methyl, propyl, or hydroxyethyl; r2 and R3 are independently from each other H or methyl; and R4 is H, propyl or isobutyl.

3. The method of claim 2, wherein the pyrrolidone or amide is selected from the group consisting of 2-pyrrolidone, N-methylpyrrolidone, N-hydroxyethyl pyrrolidone, acetamide, N-methylacetamide, N-dimethylacetamide, propionamide, isobutyramide.

4. A method of enriching a target nucleic acid to be sequenced by single molecule sequencing-by-synthesis, the method comprising the steps of:

a. isolating nucleic acids in a sample solution;

b. conjugating the nucleic acid to an adaptor, wherein the adaptor comprises a universal primer binding site and a sequencing primer binding site;

c. amplifying the adapted target nucleic acid with universal primers to form a target amplicon;

d. contacting the sample with a formamide-free hybridization solution comprising one or more single-stranded hybridization probes attached to a binding moiety and further comprising a solvent selected from the group consisting of dimethyl sulfoxide (DMSO), sulfolane, ethylene carbonate, pyrrolidone, or a primary amide;

e. incubating the sample under conditions that promote hybrid formation between the target amplicon and the probe;

f. separating the hybrid by capturing the binding moiety;

g. releasing amplicons from the hybrid.

5. The method of claim 4, wherein the pyrrolidone or amide has a structure selected from the group consisting of:

wherein R1 is H, methyl, propyl, or hydroxyethyl; r2 and R3 are independently from each other H or methyl; and R4 is H, propyl or isobutyl.

6. The method of claim 5, wherein the pyrrolidone or amide is selected from 2-pyrrolidone, N-methylpyrrolidone, N-hydroxyethylpyrrolidone, acetamide, N-methylacetamide, N-dimethylacetamide, propionamide, isobutyramide.

7. A method of sequencing a target nucleic acid, comprising the steps of:

a. isolating nucleic acids in a sample solution;

b. conjugating the nucleic acid to an adaptor, wherein the adaptor comprises a universal primer binding site and a sequencing primer binding site;

c. amplifying the adapted target nucleic acid with universal primers to form a target amplicon;

d. contacting the sample with a formamide-free hybridization solution comprising one or more single-stranded hybridization probes attached to a binding moiety and further comprising a solvent selected from the group consisting of dimethyl sulfoxide (DMSO), sulfolane, ethylene carbonate, pyrrolidone, or a primary amide;

e. incubating the sample under conditions that promote hybrid formation between the target amplicon and the probe;

f. separating the hybrid by capturing the binding moiety;

g. releasing amplicons from the hybrid and sequencing the amplicons by extending the sequencing primer bound to the sequencing primer binding site.

8. The method of claim 7, wherein the pyrrolidone or amide has a structure selected from the group consisting of:

wherein R1 is H, methyl, propyl, or hydroxyethyl; r2 and R3 are independently from each other H or methyl; and R4 is H, propyl or isobutyl.

9. The method of claim 8, wherein the pyrrolidone or amide is selected from the group consisting of 2-pyrrolidone, N-methylpyrrolidone, N-hydroxyethyl pyrrolidone, acetamide, N-methylacetamide, N-dimethylacetamide, propionamide, isobutyramide.

10. The method of claim 9, wherein said sequencing is characterized by performance characteristics identical to those of a method utilizing a formamide-containing solution, wherein said characteristics are selected from the group consisting of on-target read length, de-duplicated (deduped) depth, error rate, uniformity, and GE recovery.

11. The method of claim 10, wherein the feature is a mid-target read length, and the mid-target read length is about 70% or greater.

12. The method of claim 10, wherein the feature is a deduplication depth, and the deduplication depth is 2500 or higher.

13. A kit for enriching a target nucleic acid to be sequenced by single molecule sequencing-by-synthesis, the kit comprising reagents for:

a. isolating nucleic acids in a sample solution;

b. conjugating the nucleic acid to an adaptor, wherein the adaptor comprises a universal primer binding site and a sequencing primer binding site;

c. amplifying the adapted target nucleic acid with a universal primer;

d. hybridizing to the single-stranded hybridization probe, wherein the hybridization buffer comprises a solvent selected from the group consisting of dimethyl sulfoxide (DMSO), sulfolane, ethylene carbonate, pyrrolidone, or a primary amide.

14. The kit of claim 13, wherein the pyrrolidone or amide has a structure selected from the group consisting of:

wherein R1 is H, methyl, propyl, or hydroxyethyl; r2 and R3 are independently from each other H or methyl; and R4 is H, propyl or isobutyl.

15. The kit of claim 1, wherein the pyrrolidone or amide is selected from the group consisting of 2-pyrrolidone, N-methylpyrrolidone, N-hydroxyethylpyrrolidone, acetamide, N-methylacetamide, N-dimethylacetamide, propionamide, isobutyramide.

Technical Field

The present invention relates to the field of nucleic acid analysis, and more specifically to nucleic acid hybridization within a nucleic acid sequencing workflow.

Background

Nucleic acid hybridization experiments use formamide to promote denaturation of double-stranded nucleic acids and minimize the formation of secondary structures from single nucleic acid strands. Formamide is also indispensable in increasing the specificity of hybridization by disrupting the stability of incomplete (partial mismatch) nucleic acid duplexes, as well as facilitating disruption of such duplexes during washing after hybridization. Formamide is toxic and is considered to be hazardous, and thus its use in widely used laboratory products in the clinic is not advocated. There is a need for a non-toxic alternative to formamide. Suitable substitutes must have properties that promote denaturation and increase hybridization specificity. Furthermore, the presence of formamide substitutes may not interfere with any downstream applications (e.g., nucleic acid sequencing). The present invention discloses formamide substitutes suitable for next generation nucleic acid sequencing applications.

Disclosure of Invention

The present invention is a sample preparation and sequencing workflow that includes a target enrichment step that does not use formamide. Alternative solvents selected from dimethyl sulfoxide (DMSO), sulfolane, ethylene carbonate, pyrrolidone or primary amides are used.

In one embodiment, the invention is a method of enriching a target nucleic acid in a nucleic acid solution, the method comprising the steps of: isolating nucleic acids in a sample solution; contacting the sample solution with a formamide-free hybridization solution comprising one or more single-stranded hybridization probes attached to a binding moiety and further comprising a solvent selected from the group consisting of dimethyl sulfoxide (DMSO), sulfolane, ethylene carbonate, pyrrolidone, or a primary amide; incubating the sample under conditions that promote hybrid formation between the nucleic acid sample and the probe; the hybrid is separated by capturing the bound fraction. The pyrrolidone or amide has a structure selected from the group consisting of:

wherein R1 is H, methyl, propyl, or hydroxyethyl; r2 and R3 are independently from each other H or methyl; and R4 is H, propyl or isobutyl, for example, the pyrrolidone or amide is selected from 2-pyrrolidone, N-methylpyrrolidone, N-hydroxyethylpyrrolidone, acetamide, N-methylacetamide, N-dimethylacetamide, propionamide, isobutyramide.

In one embodiment, the invention is a method of enriching a target nucleic acid to be sequenced by single molecule sequencing-by-synthesis comprising the steps of: isolating nucleic acids in a sample solution; conjugating the nucleic acid to an adaptor, wherein the adaptor comprises a universal primer binding site and a sequencing primer binding site; amplifying the adapted target nucleic acid with universal primers to form a target amplicon; contacting the sample with a formamide-free hybridization solution comprising one or more single-stranded hybridization probes attached to a binding moiety and further comprising a solvent selected from the group consisting of dimethyl sulfoxide (DMSO), sulfolane, ethylene carbonate, pyrrolidone, or a primary amide; incubating the sample under conditions that promote hybrid formation between the target amplicon and the probe; the hybrid is separated by capturing the binding moiety, and the amplicon is released from the hybrid. In some embodiments, the pyrrolidone or amide has a structure selected from:

wherein R1 is H, methyl, propyl, or hydroxyethyl; r2 and R3 are independently from each other H or methyl; and R4 is H, propyl or isobutyl, for example, the pyrrolidone or amide is selected from 2-pyrrolidone, N-methylpyrrolidone, N-hydroxyethylpyrrolidone, acetamide, N-methylacetamide, N-dimethylacetamide, propionamide, isobutyramide.

In one embodiment, the invention is a method of sequencing a target nucleic acid, the method comprising the steps of: isolating nucleic acids in a sample solution; conjugating the nucleic acid to an adaptor, wherein the adaptor comprises a universal primer binding site and a sequencing primer binding site; amplifying the adapted target nucleic acid with universal primers to form a target amplicon; contacting the sample with a formamide-free hybridization solution comprising one or more single-stranded hybridization probes attached to a binding moiety and further comprising a solvent selected from the group consisting of dimethyl sulfoxide (DMSO), sulfolane, ethylene carbonate, pyrrolidone, or a primary amide; incubating the sample under conditions that promote hybrid formation between the target amplicon and the probe; separating the hybrid by capturing the bound moiety; the amplicons are released from the hybrid and sequenced by extending the sequencing primer that binds to the sequencing primer binding site. In some embodiments, the pyrrolidone or amide has a structure selected from:

wherein R1 is H, methyl, propyl, or hydroxyethyl; r2 and R3 are independently from each other H or methyl; and R4 is H, propyl or isobutyl, for example, the pyrrolidone or amide is selected from 2-pyrrolidone, N-methylpyrrolidone, N-hydroxyethylpyrrolidone, acetamide, N-methylacetamide, N-dimethylacetamide, propionamide, isobutyramide. In some embodiments, the sequencing is characterized by performance characteristics identical to those of a method utilizing a formamide-containing solution, wherein the characteristics are selected from the group consisting of on-target read length, deduped depth, error rate, uniformity, and GE recovery. For example, the characteristic is a hit read length, and the hit read length is about 70% or greater. In another embodiment, the characteristic is a deduplication depth, and the deduplication depth is 2500 or higher. In yet another embodiment, the characteristic is an error rate, and the error rate is 0.04 or less. In another embodiment, the characteristic is uniformity, and the uniformity is about 2.5. In another embodiment, the characteristic is genome equivalent recovery, and the genome equivalent recovery is 0.25 or greater.

In one embodiment, the invention is a kit for enriching a target nucleic acid to be sequenced by single molecule sequencing-by-synthesis, the kit comprising reagents for: isolating nucleic acids in a sample solution; conjugating the nucleic acid to an adaptor, wherein the adaptor comprises a universal primer binding site and a sequencing primer binding site; amplifying the adapted target nucleic acid with universal primers to form a target amplicon; the amplified ligated target nucleic acids are hybridized to one or more single-stranded hybridization probes attached to a binding moiety in a hybridization solution that further comprises a solvent selected from the group consisting of dimethyl sulfoxide (DMSO), sulfolane, ethylene carbonate, pyrrolidone, or a primary amide. In some embodiments, the pyrrolidone or amide has a structure selected from:

wherein R1 is H, methyl, propyl, or hydroxyethyl; r2 and R3 are independently from each other H or methyl; and R4 is H, propyl or isobutyl, for example, the pyrrolidone or amide is selected from 2-pyrrolidone, N-methylpyrrolidone, N-hydroxyethylpyrrolidone, acetamide, N-methylacetamide, N-dimethylacetamide, propionamide, isobutyramide. In some embodiments, the DMSO concentration in the hybridization buffer is selected from 15%, 18%, 20%, 23%, 25%, 28%, 30%, and 32%.

Drawings

Figure 1 shows the performance of the novel hybridization buffer containing 20% formamide substitution in a sequencing workflow, measured as% of the in-target read length.

Figure 2 shows the performance of the novel hybridization buffer containing 20% formamide substitution in a sequencing workflow, measured in de-duplication depth.

Figure 3 shows the performance of the novel hybridization buffer containing 20% formamide substitution in a sequencing workflow, measured in terms of uniformity.

Figure 4 shows the performance of the novel hybridization buffer containing 20% formamide substitution in a sequencing workflow, which is measured in error rate.

Figure 5 shows the performance of the novel hybridization buffer containing 20% formamide substitutions in a sequencing workflow, measured in genome-equivalent recovery.

Figure 6 shows the performance of the novel hybridization buffer containing formamide surrogate titration in sequencing workflow, measured as% of mid-target read length.

Figure 7 shows the performance of the novel hybridization buffer containing formamide surrogate titration in a sequencing workflow, measured in de-duplication depth.

Figure 8 shows the performance of the novel hybridization buffer containing formamide surrogate titration in a sequencing workflow, measured for uniformity.

Figure 9 shows the performance of the novel hybridization buffer containing formamide surrogate titration in a sequencing workflow, measured in genome equivalent recovery.

Figure 10 shows the performance of the novel DMSO-containing hybridization buffer in the sequencing workflow, measured as% of the on-target read length.

Figure 11 shows the performance of the novel DMSO-containing hybridization buffer in a sequencing workflow, measured in de-duplication depth.

Figure 12 shows the performance of the novel DMSO-containing hybridization buffer in a sequencing workflow, which is measured in terms of uniformity.

Detailed Description

Definition of

The following definitions assist in understanding the present disclosure.

The term "sample" refers to any composition that contains or is assumed to contain a target nucleic acid. This includes samples of tissues or fluids isolated from an individual, e.g., skin, plasma, serum, spinal fluid, lymph fluid, synovial fluid, urine, tears, blood cells, organs, and tumors, as well as samples of in vitro cultures established from cells taken from an individual, including formalin-fixed paraffin-embedded tissue (FFPET) and nucleic acids isolated therefrom. The sample may also comprise cell-free material, such as a cell-free blood fraction (fraction) containing cell-free dna (cfdna) or circulating tumor dna (ctdna).

The term "nucleic acid" refers to a polymer of nucleotides (e.g., ribonucleotides and deoxyribonucleotides, both natural and non-natural), including DNA, RNA, and their subcategories such as cDNA, mRNA, and the like. Nucleic acids can be single-stranded or double-stranded, and will typically contain 5 '-3' phosphodiester linkages, but in some cases, nucleotide analogs can have other linkages. Nucleic acids can include naturally occurring bases (adenine, guanine, cytosine, uracil, and thymine) as well as non-natural bases. Some examples of non-natural bases include those described in, for example, Seela et al, (1999) Helv.Chim.acta 82: 1640. the non-natural base may have a specific function, for example, increasing the stability of the nucleic acid duplex, inhibiting nuclease digestion, or blocking primer extension or strand polymerization.

The terms "polynucleotide" and "oligonucleotide" are used interchangeably. Polynucleotides are nucleic acids, either single-stranded or double-stranded. Oligonucleotides are a term sometimes used to describe shorter polynucleotides. Oligonucleotides are prepared by any suitable method known in the art, for example, by methods involving direct chemical synthesis as described in the following references: narang et al (1979) meth. enzymol.68: 90-99; brown et al (1979) meth. enzymol.68: 109-; beaucage et al (1981) Tetrahedron Lett.22: 1859-; matteuccietal (1981) J.Am.chem.Soc.103: 3185-3191.

The term "hybridization" refers to the pairing of complementary nucleic acids to form a duplex (a double-stranded nucleic acid). Hybridization and the strength of hybridization (e.g., the stability of the duplex) are affected by a number of factors, including the degree of complementarity between nucleic acids, the GC content of the nucleic acids, and the stringency of the hybridization and wash conditions involved.

The term "stringent conditions" or "high stringency conditions" refers to hybridization conditions that form only highly stable duplexes. Typically, the high stringency conditions comprise one or more of low salt and high temperature. For example, a conventional high stringency hybridization buffer at 42 ℃ may contain 50% formamide, 5XSSC (0.75M NaCl, 0.075M sodium citrate), 50mM sodium phosphate (pH6.8), 0.1% sodium pyrophosphate, 5 XDenhardt's solution, sonicated salmon sperm DNA (50mg/ml), 0.1% SDS, and 10% dextran sulfate.

The term "stringent wash" refers to a post-hybridization wash with a wash buffer containing decreasing concentrations of salt or increasing concentrations of detergent and at an elevated temperature. Stringent wash conditions may include temperatures greater than about 42 ℃. Stringent wash buffer compositions typically contain less than about 0.1M salt. For example, a conventional high stringency post-hybridization wash can contain 0.2 XSSC (0.03M NaCl, 0.003M sodium citrate) and 50% formamide at 42 ℃ followed by a wash with 0.1XSSC at 55 ℃.

The term "primer" refers to a single-stranded oligonucleotide that hybridizes to a sequence in a target nucleic acid (the "primer binding site") and is capable of serving as a point at which synthesis is initiated along the complementary strand of the nucleic acid under conditions suitable for such synthesis.

The term "adaptor" means a nucleotide sequence that can be added to another sequence to impart additional properties to the other sequence. The adaptor is typically an oligonucleotide, which may be single-stranded or double-stranded, or may have both a single-stranded portion and a double-stranded portion.

The term "ligation" refers to a condensation reaction that joins two nucleic acid strands, in which the 5 '-phosphate group of one molecule reacts with the 3' -hydroxyl group of the other molecule. Ligation is typically an enzymatic reaction catalyzed by a ligase or a topoisomerase. Ligation may join two single strands to create one single-stranded molecule. Ligation may also join two strands, each belonging to a double-stranded molecule, thereby joining the two double-stranded molecules. Ligation may also join two strands of one double-stranded molecule to two strands of another double-stranded molecule, thereby joining the two double-stranded molecules. Ligation may also join the ends of one strand within a double-stranded molecule, thereby repairing the gap within the double-stranded molecule.

The term "barcode" refers to a nucleic acid sequence that can be detected and identified. Barcodes can be incorporated into a variety of nucleic acids. Barcodes are sufficiently long, e.g., 2, 5, 20 nucleotides, such that in a sample, nucleic acids incorporated within the barcode can be distinguished or grouped according to the barcode.

The term "multiplex identifier" or "MID" refers to a barcode that identifies the source of a target nucleic acid (e.g., a sample from which the nucleic acid is derived). All or substantially all target nucleic acids from the same sample will share the same MID. Target nucleic acids from different sources or samples can be mixed and sequenced simultaneously. Using MID, sequence reads can be assigned to individual samples from which the target nucleic acid originates.

The term "unique molecular identifier" or "UID" is a barcode that identifies a nucleic acid to which it is attached. All or substantially all target nucleic acids from the same sample will have different UIDs. All or substantially all progeny (e.g., amplicons) derived from the same initial target nucleic acid will share the same UID.

The terms "universal primer" and "universal primer binding site" or "universal primer site" refer to primers and primer binding sites that are present (typically by in vitro addition) in different target nucleic acids. The universal primer site is added to the plurality of target nucleic acids using an adaptor or using a target specific (non-universal) primer with a universal primer site in the 5' -end. The universal primer can bind to the universal primer site and prime the primer extension from the universal primer site.

The term "target sequence", "target nucleic acid" or "target" refers to a portion of a nucleic acid sequence in a sample to be detected or analyzed. The term target includes all variants of the target sequence, e.g., one or more mutant variants and wild-type variants.

The term "amplification" refers to a process of making additional copies of a target nucleic acid. The amplification may have more than one cycle, e.g., multiple cycles of exponential amplification. Amplification can also be performed in only one cycle (preparing a single copy of the target nucleic acid). The copy may have additional sequences, for example, those present in the primers used for amplification.

The term "sequencing" refers to any method of determining the nucleotide sequence in a target nucleic acid. "Next generation sequencing," "massively parallel sequencing," and "massively parallel single molecule sequencing" are used interchangeably to refer to parallel sequencing of an entire isolated population of individual nucleic acids (or nucleic acid amplicons).

The present invention includes a nucleic acid sequencing workflow wherein the sequencing is next generation sequencing, also known as massively parallel single molecule sequencing. In some embodiments, the workflow includes steps of nucleic acid isolation, sequencing library preparation, target enrichment, and sequencing. The steps of target enrichment are novel and include the use of novel formamide-free compositions and methods involving the use of formamide substitutes that are compatible (non-inhibitory or non-interfering) with sequencing processes. The formamide substitute is selected from dimethyl sulfoxide (DMSO), sulfolane, ethylene carbonate, pyrrolidone or primary amide. The pyrrolidone or primary amide has a structure selected from the group consisting of:

wherein R1 is H, methyl, propyl, or hydroxyethyl; r2 and R3 are independently from each other H or methyl; and R4 is H, propyl or isobutyl. In some embodiments, the pyrrolidone or amide is selected from 2-pyrrolidone, N-methylpyrrolidone, N-hydroxyethyl pyrrolidone, acetamide, N-methylacetamide, N-dimethylacetamide, propionamide, isobutyramide.

The present invention includes detecting a target nucleic acid in a sample. In some embodiments, the sample is obtained from a subject or patient. In some embodiments, the sample may comprise a solid tissue or a fragment of a solid tumor obtained from the subject or patient, e.g., by biopsy. The sample may also include a bodily fluid (e.g., urine, sputum, serum, plasma or lymph, saliva, sputum, sweat, tears, cerebrospinal fluid, amniotic fluid, synovial fluid, pericardial fluid, peritoneal fluid, pleural fluid, cystic fluid, bile, gastric fluid, intestinal fluid, and/or stool sample). The sample may comprise whole blood or a blood fraction in which tumor cells may be present. In some embodiments, the sample, particularly a liquid sample, may comprise cell-free material, such as cell-free DNA or RNA, including cell-free tumor DNA or tumor RNA. In some embodiments, the sample is a cell-free sample, e.g., a cell-free blood-derived sample in the presence of cell-free tumor DNA or tumor RNA. In other embodiments, the sample is a culture sample, e.g., a culture or culture supernatant containing or suspected of containing an infectious agent or a nucleic acid derived from the infectious agent. In some embodiments, the infectious agent is a bacterium, protozoan, virus, or mycoplasma.

Target nucleic acid refers to target nucleic acid that may be present in a sample. In some embodiments, the target nucleic acid is a gene or gene fragment. In other embodiments, the target nucleic acid comprises a genetic variant, such as a polymorphism, including a single nucleotide polymorphism or variant (SNP of SNV), or a gene rearrangement resulting in, for example, a gene fusion. In some embodiments, the target nucleic acid is a biomarker. In other embodiments, the target nucleic acid has a characteristic of a particular organism, e.g., a characteristic that aids in identifying the pathogenic organism or pathogenic organism, such as drug susceptibility or resistance. In other embodiments, the target nucleic acid has characteristics of a human subject, e.g., an HLA or KIR sequence that defines a unique HLA or KIR genotype of the subject. In other embodiments, all sequences in the sample are target nucleic acids in, for example, shotgun genome sequencing.

In some embodiments, the sequencing workflow comprises a step of amplifying the target nucleic acid. The amplification may be performed by Polymerase Chain Reaction (PCR) or any other method using oligonucleotide primers. Various PCR conditions are in PCR Strategies (M.A. Innis, D.H.Gelfand, and J.J.Sninsky eds., 1995, Academic Press, San Diego, Calif.) Chapter 14; PCR Protocols: a Guide to Methods and Applications (m.a. innis, d.h.gelfand, j.j.sninsky and t.j.white, institute press, new york, 1990).

This amplification may utilize bipartite amplification primers that include target-specific sequences and artificial sequences (adaptor sequences) required for subsequent steps. In some embodiments, the identified target or set of target nucleic acids is being identified. In such embodiments, target-specific amplification primers may be used. The primer may have a bipartite structure consisting of a target-specific sequence at the 3 '-end and an adaptor sequence at the 5' -end. Typically, the target-specific primer is used as a pair of different oligonucleotides, e.g., a forward primer and a reverse primer. The 5' -end may also include a universal primer binding site to enable amplification with universal primers. The 5' -end may also include a sequencing primer binding site to enable sequencing with a sequencing primer (e.g., a platform-specific sequencing primer).

In some embodiments, the sequencing workflow includes steps for library preparation. The library preparation starts with adaptor ligation. Adapters are introduced into the target nucleic acid by primer extension, by PCR amplification, or by ligation. In some embodiments, the primer extension is a single round. In other embodiments, the primer extends through multiple cycles, such as PCR amplification (as described in the previous section). The resulting target nucleic acid includes target sequences flanked by adapter sequences. In some embodiments, the adaptors contain primer binding sites for downstream steps, such as amplification primer binding sites and sequencing primer binding sites.

Adapter ligation may be blunt-ended or more efficient cohesive-ended. In some embodiments, the target nucleic acid or adapter can be blunt-ended by strand filling, i.e., extending the 3 ' -end by a DNA polymerase to eliminate 5 ' -overhangs or digesting the 3 ' -overhangs by 3 ' -5 ' -exonuclease activity. In some embodiments, the blunt-ended adapter and the target nucleic acid can be made sticky by adding a single nucleotide to the 3 '-end of the adapter and a single complementary nucleotide to the 3' -end of the target nucleic acid (e.g., by a DNA polymerase or a terminal transferase). In other embodiments, the adapter and the target nucleic acid can be digested by using restriction enzymes to obtain sticky ends (overhangs). The latter option is more advantageous in view of known target sequences known to contain restriction enzyme recognition sites. In each of the above examples, the adaptor molecule can be provided with the desired ends (blunt ends, single base extended ends, or multi-base overhanging ends) by designing synthetic adaptor oligonucleotides as described further below. In some embodiments, the adaptor molecule is an artificial sequence synthesized in vitro. In other embodiments, the adapter molecule is a naturally occurring sequence synthesized in vitro that is known to have a desired secondary structure. In other embodiments, the adapter molecule is an isolated naturally occurring molecule or an isolated non-naturally occurring molecule.

In some embodiments of the sequencing workflow, the adapter comprises one or more barcodes. The barcode may be a multiplex sample id (mid) used to identify the origin of the sample in case the sample is mixed (multiplexed). The barcode may also serve as a unique molecule id (uid) for identifying each original molecule and its progeny. The barcode may also be a combination of UID and MID. In some embodiments, a single barcode is used as both the UID and the MID. In some embodiments, each barcode includes a predefined sequence. In other embodiments, the barcode comprises a random sequence. Barcodes may be 1-20 nucleotides in length.

The method of the invention comprises a novel target enrichment step. This enrichment can be achieved by capturing the target sequence by hybridization with one or more target-specific probes. In some embodiments, the hybridization probe comprises one or more sequences that target multiple exons, introns, or regulatory sequences from multiple genetic sites, or the entire sequence of at least one single genetic site, which is between about 100kb and about 1Mb in size.

In some embodiments, the novel Hybridization solution of the sequencing workflow includes Matthiesen, S. et al, (2012) Fast and Non-Toxic In Situ Hybridization with Blocking of reproducible Sequences PLoS ONE, 7: e40675 or us 9,303,287. In some embodiments, the formamide substitute is based on a synthetic amide such as Hansen, Charles (2007), Hansen Solubility parameters, a user's handbook, Second edition, boca Raton, Fla: polar aprotic solvent selection of the Hansen solubility parameter described by CRC Press. These parameters determine certain energy characteristics of the solvent, in MPa0.5And (4) showing. In particular, if the parameter D of the solvent is between 17.7 and 22.0MPa0.5Between 13 and 23MPa0.5Between 3 and 13MPa0.5In between, the solvent is selected.

In some embodiments, The novel hybridization solution of The sequencing workflow includes Chakrabarti, R. et al (2001) The enhancement of PCR amplification by low molecular weights amides.N.A.R.29: 2377 and a solvent as described in. In some embodiments, the formamide substitute is a solvent selected from dimethyl sulfoxide (DMSO), sulfolane, ethylene carbonate, pyrrolidone, or a primary amide. In some embodiments, the pyrrolidone or amide has a structure selected from:

wherein R1 is H, methyl, propyl, or hydroxyethyl; r2 and R3 are independently from each other H or methyl; and R4 is H, propyl or isobutyl. In some embodiments, the solvent is selected from the group consisting of 2-pyrrolidone, N-methylpyrrolidone, N-hydroxyethyl pyrrolidone, acetamide, N-methylacetamide, N-dimethylacetamide, propionamide, isobutyramide.

The nucleic acids in the sample are single stranded nucleic acids or are denatured and contacted with single stranded target specific probes in a novel hybridization buffer containing a formamide substitute according to the invention. The probe may comprise a ligand for an affinity capture moiety such that, upon formation of a hybrid complex, the complex is captured by providing an affinity capture moiety. In some embodiments, the affinity capture moiety is avidin or streptavidin and the ligand is biotin. Other examples of ligand capture moieties include digoxin/digoxin-resistant and 6 HIS/nickel. In some embodiments, the capture moiety is bound to a solid support. The solid support may comprise suspended particles such as superparamagnetic spherical polymer particles, e.g. DYNABEADSTMMagnetic beads or magnetic glass particles.

In embodiments of the invention, a sample containing denatured (e.g., single-stranded) nucleic acid molecules is exposed to a formamide-free hybridization buffer containing one or more oligonucleotide probes.

Typically, the probes target one or more genomic sites by using one or more capture probes per site. In some embodiments, the probes target combinations of disease-associated genes, such as combinations comprising cancer-associated genes and tumor-associated sites. In some embodiments, the combination comprises an AVENIO ctDNA combination selected from a targeted combination for tumor analysis, an extended combination for extended tumor analysis, and a monitoring combination for longitudinal tumor burden monitoring (Roche Sequencing Solutions, Pleasanton, Cal). The probes may be present in solution or may be bound to a solid support such as a bead or microarray. The probe hybridizes (i.e., captures) to the target nucleic acid sequence. Subsequent post-hybridization washes isolate non-hybridizing nucleic acids, such as excess probes and non-hybridizing regions of the genome or any other non-target sample nucleic acid from the hybridized target sequence. In some embodiments, the hybridization wash is performed under stringent conditions that include one or both of low salt and high temperature. In some embodiments, the wash is performed in a standard and stringent wash buffer at 47 ℃ or room temperature.

In some embodiments, the invention includes detecting a target nucleic acid in a sample by nucleic acid sequencing. A plurality of nucleic acids, including all nucleic acids captured according to the methods of the invention, can be sequenced.

In some embodiments, sequencing is performed by high throughput single molecule sequencing-by-synthesis methods, such as the Illumina platform (Illumina, San Diego, Cal.) including HiSeq, MiSeq, and NextSeg. Other examples of sequencing methods and platforms include sequencing by synthesis, Helicos Biosciences (Cambridge, Mass.), sequencing by ligation (e.g., SOLID)TM) And Ion semiconductor sequencing (e.g., Ion Torrent)TM) (both from Thermo Fisher Scientific), Pacific BioSciences platform using SMRT technology (Pacific BioSciences, Menlo Park, Cal.), or platforms using Nanopore technology, such as platforms manufactured using Oxford Nanopore Technologies (Oxford, UK) or Roche Sequencing Solutions (Santa Clara, Cal.), and any other existing or future DNA Sequencing technology with or without synthetic Sequencing. The sequencing step can utilize platform-specific sequencing primers. The binding sites for these primers can be introduced into the methods of the invention as described herein, i.e., the binding sites for these primers are introduced as part of an adaptor or amplification primer.

In some embodiments, methods utilizing the novel formamide-free solutions are characterized by performance characteristics associated with sequencing. The characteristic is selected from the group consisting of on-target read length, de-duplicated (deduped) depth, error rate, uniformity, and GE recovery. As shown in FIGS. 1, 2, 3 and 4, formamide substitutes are characterized by similarity or superiority to formazan-containing formazans when present at 20% concentration in the hybridization solutionCharacteristics of the amide hybridization solution. In one aspect, the mid-target read length features are determined as a percentage of alignment reads, where any portion of the reads align to the target region (defined by the combination). As shown in fig. 1, the on-target read length for both the formamide-containing hybridization buffer and the novel formamide-free hybridization buffer was about 70% or higher. In one aspect, the deduped depth feature is determined as the average depth of the in-target read lengths measured after deduplication. As shown in fig. 2, the depth of deduplication was about 2500 or more for both formamide-containing hybridization buffers and the novel formamide-free hybridization buffer. In one aspect, the uniformity characteristic is determined to be 90th/10thA ratio, or a ratio of 90 th percentile base coverage to 10 th percentile base coverage. As shown in fig. 3, the uniformity of both the formamide-containing hybridization buffer and the novel formamide-free hybridization buffer was 2.5 or greater. In one aspect, the error rate characteristic is determined as a percentage of all bases within all reads having a non-reference allele, where the calculation is limited to 1) locations having a total read depth of at least 200 (after barcode deduplication), and 2) non-reference bases having an allele fraction of at most 5%. As shown in FIG. 4, the error rate of both the formamide-containing hybridization buffer and the novel formamide-free hybridization buffer was 0.005% or less. In one aspect, the GE (genome equivalents) recovery characteristic is determined as the number of genome equivalents recovered. As shown in fig. 5, GE (genome equivalent) recoveries for both formamide-containing hybridization buffer and the novel formamide-free hybridization buffer were between 0.2 and 0.3. As shown in fig. 6, 7, 8, and 9, an optimum can be determined when titrating an amount of formamide substitutes in a hybridization solution, wherein the performance characteristics are similar to or better than those of formamide-containing hybridization solutions.

In some embodiments, the invention is a kit for performing the target enrichment and sequencing methods of the invention. The kit comprises a formamide-free hybridization solution comprising a solvent selected from sulfolane, ethylene carbonate, pyrrolidone, or a primary amide, and one or more of: an adaptor; a universal amplification primer; enzymes, including DNA ligase (T4 DNA ligase, Taq DNA ligase, or escherichia coli DNA ligase), polynucleotide kinases, and DNA polymerases, such as amplification polymerases or sequencing polymerases. In some embodiments, the kit further comprises an exonuclease having 5 '-3' activity, such as T5 exonuclease.

Examples of the invention

Example 1 sequencing workflow Using formamide substitutes propylene carbonate, sulfolane and 2-pyrrolidone

Briefly, in this example, the Sequencing workflow was performed according to AVENIO ctDNA analysis (Roche Sequencing Solutions, plesasanton, Cal.) except that in the hybridization step of the target capture protocol, formamide was replaced with one of the aprotic solvents selected from propylene carbonate, sulfolane, and 2-pyrrolidone. The captured library was sequenced on Illumina NextSeq 500 (Illumina, San Diego, Cal.). Sequencing results of the library captured in the presence of formamide substitutions showed comparable results to the formamide control and significantly better than the water control. The workflow comprises the steps of DNA fragmentation, library preparation (including adaptor ligation and PCR amplification), target enrichment, post-hybridization cleaning, sequencing and the like.

The DNA fragmentation step was performed using KAPA Frag Kit (Kapa Biosystems, Wortham, Mass) according to the manufacturer's recommendations. Briefly, 100ng of NA12878 genomic DNA was enzymatically cleaved using (KR1141) at 37 ℃ and purified using affinity chromatography. The DNA was quantified using the Qubit fragmentation quality and evaluated with the High Sensitivity Bioanalyzer Kit.

The library preparation procedure utilized 30ng of cleaved NA 12878. Mu.l of universal adaptor solution from the AVENIO kit was added to each sample and ligated for 16-18 hours at 16 ℃. The ligated DNA was purified by affinity chromatography and used for PCR amplification.

The PCR step used a barcode specific universal primer and the following temperature profile:

the amplified ligated DNA was purified using affinity chromatography and used for the target enrichment step. Prior to this step, the library was quantified using Qubit and the size of the library fragments was assessed using the High Sensitivity Bioanalyzer Kit.

The target enrichment step utilized 30. mu.l of adaptor ligated sample. The reaction includes a hybridization additive and an enhancer oligonucleotide from an AVENIO kit, and is performed according to the manufacturer's protocol except that the hybridization solution includes one of the following:

reagent Final concentration
Formamide (control) 20%
Propylene carbonate 20%
Sulfolane 20%
2-pyrrolidone 20%
Water (control) n/a

The samples were subjected to the following temperature profile:

95℃ 10 minutes
47℃

Hybridization wash step standard and stringent AVENIO hybridization wash buffer was used at 47 ℃ and room temperature. The hybridized nucleic acids were captured on streptavidin beads and used for the PCR amplification step.

The PCR amplification step utilized 20. mu.L of DNA bound to streptavidin beads. After addition of the PCR reaction mixture, the reaction was subjected to the following temperature profile:

the PCR product was purified using affinity chromatography. The amount and quality of DNA was evaluated using the Qubit and High Sensitivity Bioanalyzer Kit. The PCR products were then sequenced according to the manufacturer's protocol on NextSeq 500/550.

The results are shown in fig. 1, 2, 3, 4 and 5.

Example 2 sequencing workflow Using titrated formamide substitutes propylene carbonate, sulfolane and 2-pyrrolidone

In this example, the sequencing workflow was performed as described in example 1, except that the following concentrations of formamide substitute were used

The results are shown in fig. 6, 7, 8 and 9.

Example 3 sequencing workflow Using formamide surrogate DMSO

Briefly, in this example, the Sequencing workflow was performed according to AVENIO Tumor Tissue DNA analysis (Roche Sequencing Solutions, Pleasanton, Cal.) except that formamide was replaced with DMSO in the hybridization step of the target capture protocol. The captured library was sequenced on Illumina NextSeq 500 (Illumina, San Diego, Cal.). Sequencing results of the library captured in the presence of DMSO showed comparable results to the formamide control and significantly better than the water control. The working process comprises the steps of DNA polishing, DNA breaking, library preparation (including adaptor connection and PCR amplification), target enrichment, cleaning after hybridization, hybridization amplification, sequencing and the like.

DNA polishing procedure DNA polishing enzymes included in the AVENIO Tumor Tissue DNA kit were used according to the manufacturer's instructions. The DNA fragmentation step was performed as described in example 1. Library preparation procedures including A-tailing and adaptor ligation were performed essentially as described in example 1 using 20ng of genomic DNA from cell line HD789(Horizon Discovery). The universal adaptors in the AVENIO kit were ligated at 16 ℃ for 16-18 hours. The ligated DNA was purified by affinity chromatography and used for PCR amplification. The amplification utilized the temperature profile set forth in example 1.

The amplified ligated DNA was purified using affinity chromatography and used for the target enrichment step. Prior to this step, the library was quantified using Qubit and the size of the library fragments was assessed using the High Sensitivity Bioanalyzer Kit.

The target enrichment step utilized 30. mu.l of adaptor ligated sample. The reaction included hybridization additives, cleaning beads, and enhanced oligonucleotides from the AVENIO kit, following the manufacturer's protocol except that the hybridization solution included 20% formamide (control) or DMSO at the following concentrations: 15%, 18%, 20%, 23%, 25%, 28%, 30%, 32%, 38% and 40%.

The samples were subjected to the following temperature profile:

95℃ 10 minutes
55℃

Hybridization wash step standard and stringent AVENIO hybridization wash buffer was used at 55 ℃ and room temperature. The hybridized nucleic acids were captured on streptavidin beads and used for the PCR amplification step. Post-capture PCR amplification was performed as described in example 1.

The PCR product was purified using affinity chromatography. The amount and quality of DNA was evaluated using the Qubit and High Sensitivity Bioanalyzer Kit. The PCR products were then sequenced according to the manufacturer's protocol on NextSeq 500/550.

Results for DMSO concentration: 15%, 18%, 20%, 23%, 25%, 28%, 30%, 32%, 35%, 38% and 40% are as shown in fig. 10, 11 and 12 (except that no sequence data could be retrieved under conditions of 35%, 38% and 40% DMSO).

25页详细技术资料下载
上一篇:一种医用注射器针头装配设备
下一篇:可切割合作引物和使用所述可切割合作引物扩增核酸序列的方法

网友询问留言

已有0条留言

还没有人留言评论。精彩留言会获得点赞!

精彩留言,会给你点赞!