Methods and compositions for cluster generation by bridge amplification

文档序号:704497 发布日期:2021-04-13 浏览:14次 中文

阅读说明:本技术 通过桥式扩增进行簇生成的方法和组合物 (Methods and compositions for cluster generation by bridge amplification ) 是由 J.M.鲍特尔 O.米勒 于 2019-11-29 设计创作,主要内容包括:本公开内容涉及减少步骤的组合物和方法,所述步骤用于通过组合用于线性化和除去未使用的表面引物的酶来产生单克隆簇。(The present disclosure relates to compositions and methods for reducing the steps used to generate monoclonal clusters by combining enzymes for linearization and removal of unused surface primers.)

1. A composition comprising:

a uracil DNA glycosylase, a process for the preparation of a novel enzyme,

an endonuclease, and

an exonuclease having 3 'to 5' single stranded DNA exonuclease activity.

2. The composition of claim 1, wherein said exonuclease is exonuclease I.

3. The composition of claim 1 or 2, wherein the endonuclease is DNA glycosylase-lyase endonuclease VIII.

4. The composition of any one of claims 1-3, further comprising a double stranded DNA substrate comprising a uracil cleavage site.

5. The composition of any one of claims 1-4, further comprising a double stranded DNA substrate comprising an abasic site.

6. The composition of any one of claims 4-5, wherein the double stranded DNA substrate comprises a single stranded region, and wherein the uracil cleavage site and the abasic site are present in the double stranded region.

7. The composition of any one of claims 4 to 6, further comprising an array comprising a plurality of amplification sites, wherein each amplification site comprises a plurality of the double stranded DNA substrates attached to the amplification site.

8. A method of preparing a nucleic acid for a sequencing reaction, the method comprising:

(a) providing an array comprising a plurality of amplification sites, wherein an amplification site comprises

(i) A plurality of capture nucleic acids attached to the amplification sites,

wherein a first population of the plurality of capture nucleic acids comprises a cleavage site, and

(ii) a plurality of cloned double-stranded modified target nucleic acids,

wherein both strands of each double stranded target nucleic acid are attached at their 5' ends to a capture nucleic acid,

wherein one strand is attached to a capture nucleic acid comprising said cleavage site, and

wherein the cleavage site is located in the double-stranded region of each double-stranded molecule;

(b) contacting the array with a composition comprising at least one enzyme that generates abasic sites at the cleavage sites and an exonuclease comprising 3 'to 5' single stranded DNA exonuclease activity,

wherein cleavage occurs at the cleavage site,

wherein cleavage converts one strand of the double-stranded target nucleic acid into a first strand that is attached to the amplification site and a second strand that is not attached to the amplification site; and

wherein the length of the single stranded capture nucleic acid comprising the free 3' end is shortened by the exonuclease.

9. The method of claim 8, wherein the at least one enzyme that creates an abasic site at the cleavage site comprises uracil DNA glycosylase and an endonuclease.

10. The method of claim 8 or 9, wherein the endonuclease is DNA glycosylase-lyase endonuclease VIII.

11. The method of any one of claims 8 to 10, further comprising removing said at least one enzyme that generates an abasic site at a cleavage site and said exonuclease from said array.

12. The method of any one of claims 8-11, further comprising subjecting the cleaved double stranded target nucleic acid to conditions that remove the second strand that is not attached to the amplification site.

13. The method of claim 12, wherein the conditions that remove the second strands comprise a denaturing agent, wherein the denaturing agent results in immobilized single-stranded nucleic acids comprising the target nucleic acid covalently attached to a second population of capture nucleic acids, wherein the second population of capture nucleic acids are attached to the amplification sites.

14. The method of any one of claims 11-14, wherein the denaturant comprises formamide.

15. The method of any one of claims 11-14, further comprising re-annealing the immobilized single-stranded nucleic acids to a member of the first population of capture nucleic acids to produce immobilized partial single-stranded nucleic acids.

16. The method of any one of claims 8-15, wherein the cleavage site is located in the capture nucleic acid region of the double stranded region of each double stranded target nucleic acid.

17. The method of any one of claims 8-16, wherein the cleavage site comprises uracil, wherein the uracil DNA glycosylase produces an abasic site, and wherein the endonuclease cleaves the abasic site.

18. The method of any one of claims 8-17, wherein the exonuclease is exonuclease I.

19. The method of any one of claims 13 or 15, further comprising hybridizing a sequencing primer to the single-stranded region of the immobilized single-stranded nucleic acid of claim 13 or the immobilized partially single-stranded nucleic acid of claim 15, thereby preparing the single-stranded nucleic acid for a sequencing reaction.

20. The method of claim 19, further comprising performing a sequencing reaction to determine the sequence of at least one region of the immobilized single-stranded nucleic acid or the immobilized partially single-stranded nucleic acid.

21. The method of claim 19, wherein the sequencing reaction comprises sequencing-by-synthesis.

22. The method of any one of claims 8 to 21, wherein the array is generated by amplifying a plurality of target nucleic acids using the capture nucleic acids as amplification primers.

23. The method of claim 22, wherein amplifying comprises excluding amplification.

Technical Field

The present disclosure relates, inter alia, to amplification of target nucleic acids to generate amplicon clusters for sequencing, particularly where steps involved in obtaining sequence information from a sample are reduced.

Background

Next Generation Sequencing (NGS) techniques rely on highly parallel sequencing of a monoclonal population of amplicons generated from a single target nucleic acid. NGS methods greatly increase sequencing speed and data output, resulting in large sample throughput for current sequencing platforms. Further reduction in the time to sequence the template is highly desirable, but it is necessary to maintain a useful signal-to-noise ratio, intensity, and increased percentage of clusters that pass through the filter, all of which contribute to increased data output and data quality. Reducing the time to template sequencing can be achieved by combining separate steps; however, due to incompatibility, it is often not possible to combine separate steps. For example, the activity of one enzyme may be inhibited by the product of another enzyme, thus requiring the use of the enzyme in a separate step.

Summary of The Invention

Next Generation Sequencing (NGS) techniques rely on highly parallel sequencing of a monoclonal population of amplicons generated from a single target nucleic acid. Generating a monoclonal population of amplicons and performing sequencing requires multiple steps, and each step increases the total time required before useful sequence data can be obtained for a sample. For example, generating a monoclonal population of amplicons entails multiple steps, including attaching and subsequently amplifying target nucleic acids present at amplification sites of an array. The inventors have found that two separate steps for generating monoclonal amplicons can be combined. In standard conventional methods, during the generation of monoclonal amplicons, the amplification site is treated with an exonuclease and then in a separate step with a glycosylase that selectively generates single nucleotide gaps at predetermined positions. The steps are performed in this order, since the generation of a single nucleotide gap also generates a structure that inhibits exonuclease activity, i.e., 3' -phosphate. Unexpectedly, the inventors found that exonucleases and glycosylases can be combined in one step with little or no adverse effect on primary metrics, read quality, double indexing (indexing) or genome construction metrics. This results in faster sequencing since both steps can now be performed simultaneously. It also has the advantage of reducing the number and amount of reagents, thereby reducing the overall cost to the consumer.

Compositions are provided herein. In one embodiment, the composition comprises a uracil DNA glycosylase, an endonuclease, and an exonuclease having 3 'to 5' single stranded DNA exonuclease activity.

Methods are also provided. In one embodiment, the method is used to prepare nucleic acids for a sequencing reaction. The method includes providing an array having a plurality of amplification sites. The amplification site comprises a plurality of capture nucleic acids attached to the amplification site, wherein a first population of the plurality of capture nucleic acids comprises a cleavage site. The amplification site further comprises a plurality of cloned double-stranded modified target nucleic acids, wherein both strands of each double-stranded target nucleic acid are attached at their 5' ends to a capture nucleic acid, wherein one strand is attached to the capture nucleic acid comprising a cleavage site, and wherein the cleavage site is located in the double-stranded region of each double-stranded molecule. The method further comprises contacting the array with a composition comprising at least one enzyme that generates an abasic site at the cleavage site and an exonuclease having 3' to 5' single-stranded DNA exonuclease activity, wherein cleavage occurs at the cleavage site, wherein the cleavage converts one strand of the double-stranded target nucleic acid into a first strand that is attached to the amplification site and a second strand that is not attached to the amplification site, and wherein the length of the single-stranded capture nucleic acid comprising the free 3' end is shortened by the exonuclease.

Definition of

Unless otherwise defined, terms used herein should be understood to have ordinary meanings in the relevant art. Several terms used herein and their meanings are listed below.

As used herein, the term "amplicon" when used in reference to a nucleic acid refers to a product of replication of the nucleic acid, wherein the product has a nucleotide sequence that is identical or complementary to at least a portion of the nucleotide sequence of the nucleic acid. Amplicons can be generated by any of a variety of amplification methods that use a nucleic acid, e.g., a target nucleic acid or an amplicon thereof, as a template, including, e.g., polymerase extension, Polymerase Chain Reaction (PCR), Rolling Circle Amplification (RCA), ligation extension, or ligation chain reaction. An amplicon may be a nucleic acid molecule having a single copy of a particular nucleotide sequence (e.g., a polymerase extension product) or multiple copies of a nucleotide sequence (e.g., a concatamer (concatamer) product of RCA). The first amplicon of the target nucleic acid is typically a complementary copy. Subsequent amplicons are copies created from the target nucleic acid or from the first amplicon after the first amplicon is generated. Subsequent amplicons can have a sequence that is substantially complementary to the target nucleic acid or substantially identical to the target nucleic acid.

As used herein, the term "amplification site" refers to a site in or on an array at which one or more amplicons can be generated. The amplification site may be further configured to contain, hold, or attach at least one amplicon generated at the site.

As used herein, the term "array" refers to a population of sites that can be distinguished from each other by relative position. Different molecules located at different sites of the array can be distinguished from each other according to the position of the site in the array. Individual sites of an array may comprise one or more molecules of a particular type. For example, a site can include a single target nucleic acid molecule having a particular sequence, or a site can comprise several nucleic acid molecules having the same sequence (and/or its complement). The sites of the array may be different features located on the same substrate. Exemplary features include, but are not limited to, pores in the substrate, beads (or other particles) in or on the substrate, protrusions from the substrate, ridges on the substrate, or channels in the substrate. The sites of the array may be separate substrates each carrying a different molecule. Different molecules attached to separate substrates can be identified based on the location of the substrate on a surface associated with the substrate or based on the location of the substrate in a liquid or gel. Exemplary arrays where separate substrates are located on the surface include, but are not limited to, those having beads in the wells.

As used herein, the term "volume" when used in reference to a site and nucleic acid material refers to the maximum amount of nucleic acid material that can occupy the site, e.g., amplicons derived from a target nucleic acid. For example, the term can refer to the total number of nucleic acid molecules that can occupy the site under particular conditions. Other measures may also be used, including, for example, the total mass of nucleic acid material or the total number of copies of a particular nucleotide sequence that may occupy the site under particular conditions. Typically, the capacity of a site of the target nucleic acid will be substantially equivalent to the capacity of a site of an amplicon of the target nucleic acid.

As used herein, the term "capture agent" refers to a material, chemical, molecule, or portion thereof that is capable of attaching, retaining, or binding a target molecule (e.g., a target nucleic acid). Exemplary capture agents include, but are not limited to, a capture nucleic acid complementary to at least a portion of the modified target nucleic acid (e.g., a universal capture binding sequence), a member of a receptor-ligand binding pair (e.g., avidin, streptavidin, biotin, lectin, carbohydrate, nucleic acid binding protein, epitope, antibody, etc.) capable of binding to the modified target nucleic acid (or a linking moiety attached thereto), or a chemical agent capable of forming a covalent bond with the modified target nucleic acid (or a linking moiety attached thereto). In one embodiment, the capture agent is a nucleic acid. Nucleic acid capture agents may also be used as amplification primers.

When referring to a nucleic acid capture agent, the terms "P5" and "P7" may be used. The terms "P5 '" (P5 prime) and "P7'" (P7 prime) refer to the complements of P5 and P7, respectively. It will be appreciated that any suitable nucleic acid capture agent may be used in the methods set forth herein, and that the use of P5 and P7 are merely exemplary embodiments. The use of nucleic acid capture agents such as P5 and P7 on flow cells is known in the art, as disclosed in WO 2007/010251, WO 2006/064199, WO 2005/065814, WO 2015/106941, WO 1998/044151 and WO 2000/018957 for example. One skilled in the art will recognize that nucleic acid capture agents may also function as amplification primers. For example, any suitable nucleic acid capture agent can function as a forward amplification primer, whether immobilized or in solution, and can be used in the methods presented herein to hybridize to a sequence (e.g., a universal capture binding sequence) and sequence amplification. Similarly, any suitable nucleic acid capture agent can function as a reverse amplification primer, whether immobilized or in solution, and can be used in the methods presented herein to hybridize to (e.g., universal capture binding sequences) and amplify sequences. In view of the available general knowledge and the teachings of the present disclosure, the skilled artisan will understand how to design and use sequences suitable for capturing and amplifying target nucleic acids as set forth herein.

As used herein, the term "universal sequence" refers to a sequence region that is common to two or more target nucleic acids, wherein the molecules also have sequence regions that are different from one another. The presence of a universal sequence in different members of a collection of molecules may allow for the capture of multiple different nucleic acids using a population of capture nucleic acids that are complementary to a portion of the universal sequence (e.g., a universal capture binding sequence). Non-limiting examples of universal capture binding sequences include sequences identical or complementary to the P5 and P7 primers. Similarly, the universal sequences present in different members of a collection of molecules may allow for the replication or amplification of multiple different nucleic acids using a population of universal primers that are complementary to a universal sequence, e.g., a portion of a universal primer binding site. As described herein, a target nucleic acid molecule can be modified to attach universal adaptors (also referred to herein as adaptors) at, for example, one or both ends of different target sequences.

As used herein, the term "adaptor" and derivatives thereof, such as universal adaptors, generally refer to any linear oligonucleotide that can be ligated to a target nucleic acid. In some embodiments, the adapter is substantially non-complementary to the 3 'end or the 5' end of any target sequence present in the sample. In some embodiments, suitable adaptors are in the range of about 10-100 nucleotides, about 12-60 nucleotides, and about 15-50 nucleotides in length. In general, the adapters may include any combination of nucleotides and/or nucleic acids. In some aspects, the adapter may comprise one or more cleavable groups at one or more positions. In another aspect, the adapter can comprise a sequence that is substantially identical or substantially complementary to at least a portion of a primer, e.g., a capture nucleic acid. In some embodiments, the adapters may include barcodes, also known as indexes or tags, to aid in downstream correction, identification, or sequencing. The terms "adaptor" and "adaptor" are used interchangeably.

As defined herein, "sample" and its derivatives are used in their broadest sense and include any sample, culture, etc., suspected of including a target nucleic acid. In some embodiments, the sample comprises a chimeric or hybrid form of DNA, RNA, PNA, LNA, nucleic acid. The sample may include any biological, clinical, surgical, agricultural, atmospheric or aquatic based sample containing one or more nucleic acids. The term also includes any isolated nucleic acid sample, such as genomic DNA, freshly frozen or formalin fixed paraffin embedded nucleic acid samples. It is also contemplated that the sample may be from a single individual, from a collection of nucleic acid samples of genetically related members, from nucleic acid samples of genetically unrelated members, from a single individual (matched), e.g., tumor samples and normal tissue samples, or from samples containing two unique forms of genetic material, such as a single source of maternal and fetal DNA obtained from a maternal subject, or the presence of contaminating bacterial DNA in a sample containing plant or animal DNA. In some embodiments, the source of nucleic acid material may include nucleic acid obtained from a neonate, such as nucleic acid typically used for neonatal screening.

As used herein, the terms "clonal population" and "monoclonal population" are used interchangeably and refer to a population of nucleic acids that are homogeneous with respect to a particular nucleotide sequence. Homologous sequences are typically at least 10 nucleotides in length, but may be even longer, including, for example, at least 50, at least 100, at least 250, at least 500, or at least 1000 nucleotides in length. The clonal population can be derived from a single target nucleic acid. Typically, all nucleic acids in a clonal population will have the same nucleotide sequence. It is understood that a few mutations, for example due to amplification artifacts (artifacts), may occur in a clonal population without deviating from clonality. It is also understood that a few different target nucleic acids (e.g., due to non-amplified or to a limited degree amplified target nucleic acids) can occur in a clonal population without departing from clonality.

As used herein, the term "different" when used in reference to a nucleic acid means that the nucleic acids have different nucleotide sequences from one another. Two or more nucleic acids may have nucleotide sequences that differ along their entire length. Alternatively, two or more nucleic acids may have nucleotide sequences that differ along a substantial portion of their length. For example, two or more nucleic acids may have target nucleotide sequence portions that are different from each other while also having common sequence regions that are identical to each other. As used herein, the term "different" when used in reference to amplification sites means that the amplification sites are present at distinct, separate locations on the same array.

As used herein, the term "fluidic access" when used to refer to molecules in a fluid and a site in contact with the fluid refers to the ability of a molecule to move in or through the fluid to contact or enter the site. The term may also refer to the ability of a molecule to separate from a site or to leave a site to enter a solution. Fluid passage may occur in the absence of a barrier that prevents molecules from entering, contacting, separating from and/or leaving the site. However, the presence of a fluid pathway is understood as long as entry is not absolutely prevented, even if diffusion is delayed, reduced or altered.

As used herein, the term "double-stranded" when used in reference to a nucleic acid molecule means that substantially all of the nucleotides in the nucleic acid molecule are hydrogen bonded to complementary nucleotides. A partially double-stranded nucleic acid can have at least 10%, at least 25%, at least 50%, at least 60%, at least 70%, at least 80%, at least 90%, or at least 95% of its nucleotides that are hydrogen-bonded to a complementary nucleotide.

As used herein, the term "each" when used in reference to a collection of items is intended to identify an individual item in the collection, but does not necessarily refer to each item in the collection, unless the context clearly indicates otherwise.

As used herein, the term "excluded volume" refers to the volume of space occupied by a particular molecule, excluding other such molecules.

As used herein, the term "interstitial regions" refers to regions in or on a substrate that are separated from other regions of the substrate or surface. For example, a gap region may separate one feature of an array from another feature of the array. The two regions separated from each other may be discrete, lacking contact with each other. In another example, the gap region may separate a first portion of the feature from a second portion of the feature. The separation provided by the gap region may be partial or complete. The gap region will typically have a different surface material than the surface material of the features on the surface. For example, a feature of an array may have an amount or concentration of capture agent that exceeds the amount or concentration present in the interstitial regions. In some embodiments, the capture agent may not be present at the interstitial regions.

As used herein, the term "polymerase" is intended to be consistent with its use in the art and includes, for example, enzymes that use a nucleic acid as a template strand to produce complementary repeats of a nucleic acid molecule. Typically, a DNA polymerase binds to the template strand and then moves down the template strand, sequentially adding nucleotides to the free hydroxyl groups at the 3' end of the growing strand of nucleic acid. DNA polymerases typically synthesize complementary DNA molecules from a DNA template, while RNA polymerases typically synthesize RNA molecules from a DNA template (transcription). Polymerases can use short RNA or DNA strands, called primers, to initiate strand growth. As described in detail herein, a polymerase can be used during amplification to generate clonal clusters, a polymerase can be used during a sequencing reaction to determine the sequence of a nucleic acid, and different polymerases can be used in various aspects of these. Some polymerases can displace strands upstream of the site where they are adding bases to the strand. Such polymerases are considered strand-displacing, meaning that they have the activity of removing the complementary strand from the template strand read by the polymerase. Exemplary polymerases with strand displacement activity include, but are not limited to, Bsu (Bacillus subtilis), Bst (Bacillus stearothermophilus) polymerase, exo-Klenow polymerase, or large fragments of sequencing grade T7 exo-polymerase. Some polymerases degrade their former strand, effectively replacing it with a later growing strand (5' exonuclease activity). Some polymerases have the activity to degrade their subsequent strand (3' exonuclease activity). Some useful polymerases have been modified by mutation or other means to reduce or eliminate 3 'and/or 5' exonuclease activity.

As used herein, the term "nucleic acid" is intended to be consistent with its use in the art, and includes naturally occurring nucleic acids and functional analogs thereof. Particularly useful functional analogs can hybridize to nucleic acids in a sequence-specific manner or can serve as templates for replicating a particular nucleotide sequence. Naturally occurring nucleic acids typically have a backbone comprising phosphodiester bonds. The analog structure may have alternating backbone linkages, including any of a variety of backbone linkages known in the art. Naturally occurring nucleic acids typically have a deoxyribose sugar (e.g., found in deoxyribonucleic acid (DNA)) or a ribose sugar (e.g., found in ribonucleic acid (RNA)). The nucleic acid may contain any of a variety of analogs of these sugar moieties known in the art. Nucleic acids may include natural or unnatural bases. In this regard, the natural deoxyribonucleic acid may have one or more bases selected from adenine, thymine, cytosine, or guanine, and the ribonucleic acid may have one or more bases selected from uracil, adenine, cytosine, or guanine. Useful non-natural bases that can be included in a nucleic acid are known in the art. The term "target" when used in reference to a nucleic acid is intended in the context of the methods or compositions described herein as a semantic identifier of the nucleic acid and does not necessarily limit the structure or function of the nucleic acid beyond what is explicitly indicated otherwise. Target nucleic acids having universal sequences at each end, e.g., universal adaptors at each end, can be referred to as modified target nucleic acids.

As used herein, the term "transport" refers to the movement of molecules through a fluid. The term may include passive transport, such as movement of a molecule along its concentration gradient (e.g., passive diffusion). The term may also include active transport whereby the molecule may move along or against a concentration gradient of the molecule. Thus, transporting may include applying energy to move one or more molecules in a desired direction or to a desired location, such as an amplification site.

As used herein, the term "rate" when used in reference to transport, amplification, capture, or other chemical processes is intended to be consistent with its meaning in chemical kinetics and biochemical kinetics. The rates of the two processes may be compared in terms of maximum rate (e.g., at saturation), pre-steady state rate (e.g., prior to equilibrium), kinetic rate constant, or other metrics known in the art. In particular embodiments, the rate of a particular process may be determined in terms of the total time to complete the process. For example, the rate of amplification can be determined in terms of the time it takes to complete the amplification. However, the rate of a particular process need not be determined in terms of the total time to complete the process.

The term "and/or" refers to one or all of the listed elements or a combination of any two or more of the listed elements.

The words "preferred" and "preferably" refer to embodiments of the invention that may provide certain benefits under certain circumstances. However, other embodiments may also be preferred, under the same or other circumstances. Furthermore, the recitation of one or more preferred embodiments does not imply that other embodiments are not useful, and is not intended to exclude other embodiments from the scope of the invention.

The term "comprising" and its variants are not to be taken in a limiting sense when these terms appear in the description and claims.

It should be understood that wherever embodiments are described in language "comprising" or the like, other similarly described embodiments are also provided in the term "consisting of … …" and/or "consisting essentially of … …".

Unless otherwise specified, "a", "an", "the" and "at least one" are used interchangeably and mean one or more than one.

Conditions that are "suitable" for the occurrence of an event, such as exonuclease-mediated digestion of nucleic acids, or that do not prevent the occurrence of such an event. Thus, these conditions allow, enhance, facilitate, and/or favor the event.

As used herein, "providing" in the context of a composition, article, or nucleic acid refers to preparing the composition, article, or nucleic acid, purchasing the composition, article, or nucleic acid, or otherwise obtaining the compound, composition, article, nucleic acid.

Also herein, the recitations of numerical ranges by endpoints include all numbers subsumed within that range (e.g., 1 to 5 includes 1, 1.5, 2, 2.75, 3, 3.80, 4,5, etc.).

Reference throughout the specification to "one embodiment," "an embodiment," "certain embodiments," or "some embodiments," etc., means that a particular feature, configuration, composition, or characteristic described in connection with the embodiment is included in at least one embodiment of the present disclosure. Thus, the appearances of such phrases in various places throughout this specification are not necessarily referring to the same embodiment of the present disclosure. Furthermore, the particular features, configurations, compositions, or characteristics may be combined in any suitable manner in one or more embodiments.

In the description herein, specific embodiments may be described separately for clarity. Unless otherwise expressly stated that features of a particular embodiment are incompatible with features of another embodiment, certain embodiments may include a combination of compatible features described herein in connection with one or more embodiments.

For any of the methods disclosed herein that include discrete steps, the steps may be performed in any order that is practicable. Also, any combination of two or more steps may be performed simultaneously, as appropriate.

The above summary of the present invention is not intended to describe each disclosed embodiment or every implementation of the present invention. The following description more particularly exemplifies illustrative embodiments. At various places throughout the application, guidance is provided through lists of examples, which examples can be used in various combinations. In various instances, the enumerated lists merely serve as representative groups and should not be construed as exclusive lists.

Brief Description of Drawings

The following detailed description of exemplary embodiments of the present disclosure can be best understood when read in conjunction with the following drawings.

Figures 1A-1D show schematic diagrams of embodiments of preparing nucleic acids for sequencing according to various aspects of the disclosure presented herein.

FIG. 2 shows the effect of 3' -phosphate and DNA glycosylase on exonuclease I activity. The left panel shows the presence of DNA glycosylase and exonuclease in each lane of the flow cell. The DNA glycosylase was added first, followed by exonuclease. The exception is lane 4, where both a DNA glycosylase and an exonuclease are added. The middle panel shows the flow cell, lanes 1-8 numbered from top to bottom, and lanes 3-5 lacking signal. The right panel shows the fluorescence results for each lane, with lanes 3-5 being essentially free of fluorescence.

Figure 3 shows the results of a sequencing run as described in examples 1 and 3.

The schematic drawings are not necessarily drawn to scale. Like numbers used in the figures refer to like components, steps, etc. It should be understood, however, that the use of a number to refer to a component in a given figure is not intended to limit the component in another figure labeled with the same number. Additionally, the use of different numbers to refer to components is not intended to indicate that the different numbered components cannot be the same or similar to other numbered components.

Detailed Description

Methods and compositions related to sequencing nucleic acids are presented herein. The present disclosure provides methods comprising preparing nucleic acids for a sequencing reaction, generating clonal clusters, and fabricating a nucleic acid array on a surface. In one embodiment, the method comprises providing an array comprising a plurality of amplification sites. Each amplification site comprises a plurality of double-stranded amplicons. For example, in FIG. 1A, an amplification site 10 having a plurality of one member of a double-stranded amplicon 11 is shown.

A plurality of capture nucleic acids are attached to the surface of the amplification sites. There are at least two populations of capture nucleic acids, and in some embodiments, there are three or more populations. At least one population of capture nucleic acids includes a cleavage site. In one embodiment, the cleavage site comprises a uracil residue. Each double-stranded amplicon molecule is in a bridged structure in which they are attached to the capture nucleic acid at their 5 'end and not to the array at their 3' end. The cleavage sites are located in the double-stranded region of each double-stranded molecule. For example, as shown in fig. 1A, two populations of capture nucleic acids are shown. A population is shown with 13 attached to one end of each amplicon or bound to the surface of the amplification site 10 13' but not attached to the amplicon 11. Also shown is a second population of capture nucleic acids, attached 14 at the other end of each amplicon or bound 14' to the surface of the amplification site 10 but not attached to the amplicon 11. The cleavage site (marked with an X on the capture nucleic acid 13) is also shown in FIG. 1A.

The method further comprises contacting the amplification sites of the array with an enzyme that cleaves one strand of DNA at a cleavage site and an exonuclease having 3 'to 5' single-stranded DNA exonuclease activity. The exonuclease acts to digest single stranded capture nucleic acids that contain a free 3' -OH end. For example, as shown in FIG. 1B, the cleavage site X in the amplicon 11 is cleaved, leaving a shortened capture nucleic acid 13 ". The unattached capture nucleic acids 13 'and 14' are no longer present at the amplification site 10.

In one embodiment, the sequence of the attached strand can be determined by using a DNA polymerase with strand displacement activity, wherein the 3' end of the shortened capture nucleic acid (13 "in FIG. 1B) is used as a primer to initiate DNA synthesis. In some embodiments, an enzyme that cleaves one DNA strand at a cleavage site will modify the 3 'end of the shortened capture nucleic acid to terminate in a 3' -phosphate. The 3' -phosphate can be removed by a phosphatase before starting the sequencing reaction.

Instead of sequencing the attached strands, the method may further comprise subjecting the cleaved double-stranded amplicons to denaturing conditions to remove portions of the cleaved strands (15' in fig. 1B) that are not attached to the array. This results in immobilized single-stranded nucleic acids. For example, in FIG. 1C, the unattached DNA strand is no longer hybridized to the attached strand 16 and has been lost.

In one embodiment, the immobilized single stranded nucleic acid can be re-annealed to the shortened capture nucleic acid 13 ". For example, as shown in FIG. 1D, the attached DNA strand 16 is re-annealed to the shortened capture nucleic acid 13 ".

Array of cells

The array of amplification sites used in the methods described herein can be present as one or more substrates. Example types of substrate materials that may be used for the array include glass, modified glass, functionalized glass, inorganic glass, microspheres (e.g., inert and/or magnetic particles), plastics, polysaccharides, nylon, nitrocellulose, ceramics, resins, silica-based materials, carbon, metals, optical fibers or fiber bundles, polymers, and multi-well (e.g., microtiter) plates. Exemplary plastics include acrylic, polystyrene, copolymers of styrene with other materials, polypropylene, polyethylene, polybutylene, polyurethane, and TeflonTM. Exemplary silica-basedIncluding silicon and various forms of modified silicon.

In particular embodiments, the substrate may be within or part of a container such as a well, tube, channel, dish, culture dish, bottle, and the like. Particularly useful containers are flow cells, for example, as described in U.S. Pat. No.8,241,573 or Bentley et al, Nature 456:53-59 (2008). An exemplary flow cell is one available from Illumina, Inc. Another particularly useful container is a well in a multiwell plate or microtiter plate.

In some embodiments, the amplification sites of the array may be configured as features on the surface. These features may be present in any of a variety of desired forms. For example, the sites may be holes, pits, channels, ridges, raised regions, pegs, posts, and the like. In one embodiment, the amplification sites may comprise beads. However, in particular embodiments, the site need not contain beads or particles. Exemplary sites include wells present in substrates for commercial sequencing platforms sold by 454Life sciences (a subsidiary of Roche, Basel Switzerland) or Ion Torrent (a subsidiary of Life Technologies, Carlsbad, Calif., USA). Other substrates having holes include, for example, etched optical fibers and other substrates described in U.S. Pat. Nos. 6,266,459; U.S. patent nos. 6,355,431; U.S. patent nos. 6,770,441; U.S. patent nos. 6,859,570; U.S. Pat. Nos. 6,210,891; U.S. Pat. Nos. 6,258,568; U.S. Pat. Nos. 6,274,320; U.S. patent nos. 8,262,900; U.S. patent nos. 7,948,015; U.S. patent publication nos. 2010/0137143; U.S. patent No.8,349,167, or PCT publication No. wo 00/63437. In several cases, the use of a bead substrate in a well is exemplified in these references. In the methods or compositions of the present disclosure, the substrate containing the pores may be used with or without beads. In some embodiments, the wells of the substrate may comprise a gel material (with or without beads), as described in U.S. patent No.9,512,422.

The amplification sites of the array can be metallic features on a non-metallic surface, such as glass, plastic, or other materials exemplified herein. The metal layer may be deposited on the surface using methods known in the art, for exampleSuch as wet plasma etching, dry plasma etching, atomic layer deposition, ion beam etching, chemical vapor deposition, vacuum sputtering, and the like. Any of a variety of commercial instruments may be suitably used, including, for exampleIonfabOr OptofabSystem (Oxford Instruments, UK). The metal layer may also be deposited by electron beam evaporation or sputtering, as described in Thornton, Ann.Rev.Mater.Sci.7:239-60 (1977). Metal layer deposition techniques, such as those exemplified herein, may be combined with photolithographic techniques to create metal areas or patches on the surface. Exemplary methods for combining metal layer deposition techniques and photolithography techniques are provided in U.S. Pat. No.8,778,848 and U.S. Pat. No.8,895,249.

The array of features may appear as a grid of spots or patches. The features may be positioned in a repeating pattern or an irregular non-repeating pattern. Particularly useful patterns are hexagonal patterns, rectilinear patterns, grid patterns, patterns with reflective symmetry, patterns with rotational symmetry, and the like. Asymmetric patterns may also be useful. The spacing between different pairs of nearest neighbor features may be the same, or there may be variations in the spacing between different pairs of nearest neighbor features. In particular embodiments, the features of the array may each have a size greater than about 100nm2、250nm2、500nm2、1μm2、2.5μm2、5μm2、10μm2、100μm2Or 500 μm2The area of (a). Alternatively, or in addition, the features of the array may each have less than about 1mm2、500μm2、100μm2、25μm2、10μm2、5μm2、1μm2、500nm2Or 100nm2The area of (a). In fact, the region may haveA size within a range selected from between the upper and lower limits exemplified above.

For embodiments that include an array of features on the surface, the features may be discrete, separated by interstitial regions. There may be variations in the size of the features and/or the spacing between regions such that the array may be high density, medium density, or lower density. The high density array is characterized by having regions spaced less than about 15 μm apart. The medium density array has regions spaced about 15 to 30 μm apart, while the low density array has regions spaced greater than 30 μm apart. Arrays useful in the present disclosure may have regions less than 100 μm, 50 μm, 10 μm, 5 μm,1 μm, or 0.5 μm apart.

In particular embodiments, the array may comprise a collection of beads or other particles. The particles may be suspended in a solution or they may be positioned on the surface of a substrate. Examples of bead arrays in solution are those commercialized by Luminex (Austin, TX, USA). Examples of arrays with beads located on the surface include arrays in which the beads are located in wells, such as a BeadChip array (Illumina Inc., San Diego, Calif., USA) or a substrate for use in a sequencing platform from 454Life sciences (a subsidiary of Roche, Basel, Switzerland) or Ion Torque (a subsidiary of Life Technologies, Carlsbad, Calif., USA). Other arrays having beads on a surface are described in U.S. Pat. Nos. 6,266,459; U.S. patent nos. 6,355,431; U.S. patent nos. 6,770,441; U.S. patent nos. 6,859,570; U.S. Pat. Nos. 6,210,891; U.S. Pat. Nos. 6,258,568; U.S. Pat. Nos. 6,274,320; U.S. patent publication nos. 2009/0026082 a 1; U.S. patent publication nos. 2009/0127589 a 1; U.S. patent publication nos. 2010/0137143 a 1; U.S. patent publication nos. 2010/0282617 a 1; or PCT publication No. wo 00/63437. Several references above describe methods of attaching target nucleic acids to beads in or before loading the beads on an array substrate. However, it will be appreciated that the beads can be made to contain amplification primers and then the beads can be used to load an array, thereby forming amplification sites for use in the methods described herein. As previously described herein, the substrate may be used without beads. For example, the amplification primers can be directly attached to the wells or to the gel material in the wells. Thus, the references exemplify materials, compositions, or devices that may be modified for use in the methods and compositions described herein.

The amplification sites of the array can comprise a plurality of capture agents capable of binding to the target nucleic acids. In one embodiment, the capture agent comprises a capture nucleic acid. The nucleotide sequence of the capture nucleic acid is complementary to the universal sequence of the target nucleic acid. In some embodiments, the capture nucleic acid can also serve as a primer for amplification of the target nucleic acid. In some embodiments, one population of capture nucleic acids comprises the P5 primer or its complement. In some embodiments, the amplification site further comprises a plurality of second capture nucleic acids, and this second capture nucleic acid may comprise the P7 primer or its complement. In some embodiments, the capture nucleic acid can include a cleavage site. Cleavage sites in capture nucleic acids are described in more detail herein.

In particular embodiments, a capture agent, such as a capture nucleic acid, can be attached to the amplification site. For example, it can be seen that the capture agent is attached to the surface of the array feature. Attachment may be through an intermediate structure such as a bead, particle or gel. Examples of attaching capture nucleic acids to an array via a gel are described in U.S. patent No.8,895,249 and are further exemplified by flow cells available from Illumina Inc (San Diego, CA, USA) or described in WO 2008/093098. Exemplary gels that can be used in the methods and devices described herein include, but are not limited to, those having a colloidal structure, such as agarose; polymer networks, such as gelatin; or a crosslinked polymer structure such as polyacrylamide, SFA (see, e.g., U.S. patent publication No.2011/0059865 a1) or PAZAM (see, e.g., U.S. provisional patent application serial No.61/753,833 and U.S. patent No.9,012,022). Attachment via beads can be achieved as illustrated in the description and cited references previously described herein.

In some embodiments, the features on the surface of the array substrate are discontinuous, separated by interstitial regions of the surface. Interstitial regions having substantially lower amounts or concentrations of capture agents compared to the features of the array are advantageous. Interstitial regions lacking the capture agent are particularly advantageous. For example, the relatively small amount or absence of capture moieties at the gap region facilitates localization of the target nucleic acid and subsequently generated clusters to desired features. In particular embodiments, the features may be concave features in the surface (e.g., pores), and the features may contain a gel material. The gel-containing features may be separated from each other by interstitial regions on the surface in which the gel is substantially absent or, if present, substantially incapable of supporting localization of the nucleic acid. Methods and compositions for making and using substrates having gel-containing features, such as wells, are set forth in U.S. provisional application No.61/769,289.

Target nucleic acid

Arrays used in the methods described herein include double-stranded modified target nucleic acids. The terms "target nucleic acid," "target fragment," "target nucleic acid fragment," "target molecule," and "target nucleic acid molecule" are used interchangeably to refer to a nucleic acid molecule for which it is desired to identify its nucleotide sequence. The target nucleic acid can be essentially any nucleic acid of known or unknown sequence. For example, it may be a fragment of genomic DNA or cDNA. Sequencing may result in the determination of the sequence of all or part of the target molecule. The target may be derived from a primary nucleic acid sample that has been randomly fragmented. In one embodiment, the target may be processed into a template suitable for amplification by placing a universal amplification sequence, such as a sequence present in a universal adaptor, at the end of each target fragment. A target nucleic acid having a universal adaptor at each end can be referred to as a "modified target nucleic acid". Universal adaptors are detailed herein.

The primary nucleic acid sample may be derived from a double stranded DNA (dsDNA) form from the sample (e.g. genomic DNA fragments, PCR and amplification products etc.) or may be derived from the sample in single stranded form (e.g. DNA or RNA) and converted to dsDNA form. For example, mRNA molecules can be replicated into double-stranded cDNA suitable for use in the methods described herein using standard techniques well known in the art. The exact sequence of the polynucleotide molecules from the primary nucleic acid sample is generally not important to the present disclosure and may be known or unknown.

In one embodiment, the primary polynucleotide molecules from the primary nucleic acid sample are DNA molecules. More particularly, the primary polynucleotide molecule represents the entire genetic complement of an organism and is a genomic DNA molecule that includes both intron and exon sequences, as well as non-coding regulatory sequences, such as promoter and enhancer sequences. In one embodiment, a specific subset of polynucleotide sequences or genomic DNA may be used, such as, for example, a specific chromosome. More particularly, the sequence of the primary polynucleotide molecule is unknown. Still more particularly, the primary polynucleotide molecule is a human genomic DNA molecule. The DNA target fragments may be chemically or enzymatically treated before or after any random fragmentation process, and before or after ligation of the universal adaptor sequences.

The nucleic acid sample may comprise high molecular weight material, such as genomic dna (gdna). The sample may comprise low molecular weight material, such as nucleic acid molecules obtained from formalin-fixed paraffin-embedded or archived DNA samples. In another embodiment, the low molecular weight material comprises an enzyme or mechanically fragmented DNA. The sample may comprise cell-free circulating DNA. In some embodiments, the sample may include nucleic acid molecules obtained from biopsies, tumors, scrapings, swabs, blood, mucus, urine, plasma, semen, hair, laser capture microdissection, surgical resection, and other clinically or laboratory obtained samples. In some embodiments, the sample may be an epidemiological, agricultural, forensic, or etiologic sample. In some embodiments, the sample may comprise nucleic acid molecules obtained from an animal, such as a human or mammalian source. In another embodiment, the sample may comprise nucleic acid molecules obtained from a non-mammalian source, such as a plant, bacteria, virus, or fungus. In some embodiments, the source of the nucleic acid molecule can be an archived or depleted sample or species.

Furthermore, the methods and compositions disclosed herein can be used to amplify nucleic acid samples with low quality nucleic acid molecules, such as degraded and/or fragmented genomic DNA from forensic samples. In one embodiment, a forensic sample may include nucleic acid obtained from a crime scene, a database of missing personnel DNA, a laboratory associated with a forensic investigation, or a forensic sample obtained from a law enforcement agency, one or more military departments, or any such personnel. The nucleic acid sample may be a purified sample or a lysate containing crude DNA, e.g. from a buccal swab, paper, fabric or other substrate which may be impregnated with saliva, blood or other body fluids. Thus, in some embodiments, a nucleic acid sample may include a small amount of DNA (e.g., genomic DNA) or a fragmented portion of that DNA. In some embodiments, the target sequence may be present in one or more bodily fluids, including but not limited to blood, sputum, plasma, semen, urine, and serum. In some embodiments, the target sequence may be obtained from hair, skin, tissue samples, autopsies, or remains of the victim. In some embodiments, a nucleic acid comprising one or more target sequences can be obtained from a deceased animal or human. In some embodiments, the target sequence may include nucleic acids obtained from non-human DNA, such as microbial, plant, or entomologic DNA. In some embodiments, the target sequence or amplified target sequence is for human identification purposes. In some embodiments, the methods described herein can be used to identify characteristics of a forensic sample. In some embodiments, the methods described herein can be used in human identification methods using one or more target-specific primers or one or more target-specific primers designed using known primer design criteria. In one embodiment, a forensic or human identification sample comprising at least one target sequence may be amplified using any one or more target-specific primers obtained using known primer standards.

Other non-limiting examples of sources of biological samples may include whole organisms as well as samples obtained from patients. Biological samples can be obtained from any biological fluid or tissue and can be in a variety of forms, including liquid fluids and tissues, solid tissues, and preserved forms, such as dried, frozen, and fixed forms. The sample may be of any biological tissue, cell or body fluid. Such samples include, but are not limited to, sputum, blood, serum, plasma, blood cells (e.g., leukocytes), ascites, urine, saliva, tears, sputum, vaginal fluid (drainage), wash fluids obtained during medical procedures (e.g., pelvic or other wash obtained during biopsy, endoscopy, or surgery), tissue, nipple aspirate, core or fine needle biopsy samples, cell-containing bodily fluids, free floating nucleic acids, peritoneal fluid, and pleural fluid, or cells therefrom. Biological samples may also include tissue sections, such as frozen or fixed sections taken for histological purposes, or microdissected cells or extracellular portions thereof. In some embodiments, the sample may be a blood sample, such as, for example, a whole blood sample. In another example, the sample is an untreated dried blood spot sample. In another example, the sample is a formalin fixed paraffin embedded sample. In yet another example, the sample is a saliva sample. In yet another example, the sample is a dried saliva stain sample.

Exemplary biological samples from which the target nucleic acid can be derived include, for example, samples from eukaryotes, such as mammals, e.g., rodents, mice, rats, rabbits, guinea pigs, ungulates, horses, sheep, pigs, goats, cows, cats, dogs, primates, human or non-human primates; plants, such as Arabidopsis thaliana (Arabidopsis thaliana), maize, sorghum, oats, wheat, rice, canola or soybean; algae, such as Chlamydomonas reinhardtii (Chlamydomonas reinhardtii); nematodes, such as Caenorhabditis elegans (Caenorhabditis elegans); insects, such as Drosophila melanogaster (Drosophila melanogaster), mosquitoes, fruit flies, bees, or spiders; fish, such as zebrafish; a reptile; amphibians, such as frogs or Xenopus laevis (Xenopus laevis); dictyostelium discodermatum (Dictyostelium discoidea); fungi, such as Pneumocystis carinii (Pneumocystis carinii), Takifugu rubripes (Takifugu rubripes), yeasts such as saccharomyces cerevisiae (saccharomyces cerevisiae) or Schizosaccharomyces pombe (Schizosaccharomyces pombe); or Plasmodium falciparum (Plasmodium falciparum). The target nucleic acid may also be derived from a prokaryote, such as a bacterium, e.coli, staphylococcus, or Mycoplasma pneumoniae (Mycoplasma pneumoniae); archaea (archaeon); viruses, such as human hepatitis c virus or human immunodeficiency virus; or a viroid. The target nucleic acid may be derived from a homogeneous culture or population of organisms, or alternatively from a collection of several different organisms, e.g., in a community or ecosystem.

Random fragmentation refers to the fragmentation of polynucleotide molecules from a primary nucleic acid sample in a disordered manner by enzymatic, chemical or mechanical means. Magnetic fragmentation methods are known in the art and use standard methods (Sambrook and Russell, Molecular Cloning, A Laboratory Manual, third edition). In one embodiment, fragmentation can be achieved using a process commonly referred to as tagmentation. Tagmentation uses transposome complexes and is combined into one step fragmentation and ligation for addition of universal adaptors (Gunderson et al, WO 2016/130704). For clarity, amplification of smaller fragments by specific PCR to produce such smaller fragments of a larger nucleic acid is not equivalent to fragmenting the larger nucleic acid because the larger nucleic acid sequence remains intact (i.e., is not fragmented by PCR amplification) with the nucleic acid sequence of the larger fragment remaining intact (i.e., intact nucleic acid sequence is retained). Furthermore, random fragmentation is designed to generate fragments regardless of sequence identity or position of the nucleotides comprising and/or surrounding the fragment. More particularly, random fragmentation is by mechanical means, such as nebulization or sonication, to produce fragments of about 50 base pairs in length to about 1500 base pairs in length, still more particularly 50-700 base pairs in length, and more particularly 50-400 base pairs in length. Most particularly, the method is used to generate smaller fragments of 50-150 base pairs in length.

Fragmentation of polynucleotide molecules by mechanical means (e.g., nebulization, sonication, and Hydroshear) produces heterogeneously mixed fragments with blunt ends and 3 '-and 5' -overhangs. Thus, it is desirable to repair the ends of the fragments using methods or kits known in the art (e.g., Lucigen DNA terminator end repair kit) to produce ends that are optimal for insertion into, for example, the flat sites of a cloning vector. In a specific embodiment, the ends of the fragments of the nucleic acid population are blunt-ended. More particularly, the fragment ends are blunt-ended and phosphorylated. The phosphate moiety may be introduced by enzymatic treatment, for example using a polynucleotide kinase.

The target population of nucleic acids can have an average chain length that is desired or suitable for a particular application of the methods or compositions described herein. For example, the average strand length can be less than about 100,000 nucleotides, 50,000 nucleotides, 10,000 nucleotides, 5,000 nucleotides, 1,000 nucleotides, 500 nucleotides, 100 nucleotides, or 50 nucleotides. Alternatively, or in addition, the average chain length can be greater than about 10 nucleotides, 50 nucleotides, 100 nucleotides, 500 nucleotides, 1,000 nucleotides, 5,000 nucleotides, 10,000 nucleotides, 50,000 nucleotides, or 100,000 nucleotides. The average chain length of the target population of nucleic acids can be within a range between the maximum and minimum values described herein. It will be appreciated that amplicons generated at an amplification site (or otherwise prepared or used herein) can have an average strand length within a range selected between the upper and lower limits exemplified above.

In some cases, a target nucleic acid population can be produced or otherwise configured under conditions to have a maximum length of its members. For example, the maximum length of a member used in one or more steps of the methods described herein or present in a particular composition can be less than 100,000 nucleotides, less than 50,000 nucleotides, less than 10,000 nucleotides, less than 5,000 nucleotides, less than 1,000 nucleotides, less than 500 nucleotides, less than 100 nucleotides, or less than 50 nucleotides. Alternatively, or in addition, a target nucleic acid population can be generated or otherwise configured under conditions to have a minimum length of its target members. For example, the minimum length of a member for use in one or more steps of the methods described herein or present in a particular composition can be greater than 10 nucleotides, greater than 50 nucleotides, greater than 100 nucleotides, greater than 500 nucleotides, greater than 1000 nucleotides, greater than 5,000 nucleotides, greater than 10,000 nucleotides, greater than 50,000 nucleotides, or greater than 100,000 nucleotides. The maximum and minimum strand lengths of target nucleic acids in a population can be within a range between the maximum and minimum values set forth above. It will be appreciated that amplicons generated at an amplification site (or otherwise prepared or used herein) can have maximum and/or minimum strand lengths within a range between the upper and lower limits of the above-exemplified.

In a specific embodiment, target fragment sequences having a single overhanging nucleotide are prepared by, for example, the activity of certain types of DNA polymerases, such as Taq polymerase or Klenow exo minus polymerase, which has template-independent terminal transferase activity that adds a single deoxynucleotide, such as deoxyadenosine (a), to the 3' end of a DNA molecule, such as a PCR product. Such enzymes can be used to add a single nucleotide "a" to the blunt-ended 3' end of each strand of a double-stranded target fragment. Thus, an "a" may be added at the 3 'end of each end-modified strand of the double-stranded target fragment by reaction with Taq or Klenow exo minus polymerase, while the universal adaptor polynucleotide construct may be a T construct with compatible "T" overhangs present at the 3' ends of the respective regions of the double-stranded nucleic acid of the universal adaptor. This end modification also prevents self-ligation of both the vector and the target, such that there is a preference for target nucleic acid formation with universal adaptors at each end.

In some cases, target nucleic acids derived from such sources can be amplified prior to use in the methods or compositions herein. Any of a variety of known amplification techniques may be used, including but not limited to Polymerase Chain Reaction (PCR), Rolling Circle Amplification (RCA), Multiple Displacement Amplification (MDA), or Random Prime Amplification (RPA). It will be appreciated that amplification of the target nucleic acid is optional prior to use in the methods or compositions described herein. Thus, the target nucleic acid may not be amplified prior to use in some embodiments of the methods and compositions described herein. The target nucleic acid can optionally be derived from a synthetic library. The synthetic nucleic acid may have a natural DNA or RNA composition or may be an analogue thereof.

Universal adaptor

Target nucleic acids for use in the methods or compositions described herein include universal adaptors attached to each end. A target nucleic acid having a universal adaptor at each end can be referred to as a "modified target nucleic acid". Methods for attaching universal adaptors to each end of a target nucleic acid for use in the methods described herein are known to those skilled in the art. Ligation can be performed by standard library preparation techniques using ligation (U.S. patent publication No.2018/0305753), or by tagmentation using transposase complexes (Gunderson et al, WO 2016/130704).

In one embodiment, double stranded target nucleic acids from a sample, e.g., a fragmented sample, are processed by first ligating identical universal adaptor molecules ("mismatched adaptors," general characteristics of which are defined below and further described in Gormley et al, U.S. Pat. No.7,741,463 and Bignell et al, U.S. Pat. No.8,053,192) to the 5 'and 3' ends of the double stranded target nucleic acids. In one embodiment, the universal adaptors comprise universal capture binding sequences necessary to immobilize the target nucleic acids on the array for subsequent sequencing. In another embodiment, the universal adaptors present at each end of the target nucleic acid are further modified using a PCR step prior to immobilization and sequencing. For example, an initial primer extension reaction is performed using a universal primer binding site, wherein extension products complementary to both strands of each individual target nucleic acid are formed and a universal capture binding sequence is added. The resulting primer extension products and optionally amplified copies thereof together provide a library of modified target nucleic acids that can be immobilized and then sequenced. The term "library" refers to a collection of target nucleic acids that contain known common sequences at their 3 'and 5' ends, and may also be referred to as a 3 'and 5' modified library.

The universal adaptors used in the methods of the present disclosure are referred to as "mismatched" adaptors because, as explained in detail herein, the adaptors comprise regions of sequence mismatch, i.e., they are not formed by annealing fully complementary polynucleotide strands.

Mismatched adaptors for use herein are formed by annealing two partially complementary polynucleotide strands to provide at least one double-stranded region (also referred to as a double-stranded nucleic acid region) and at least one mismatched single-stranded region (also referred to as a region of single-stranded non-complementary nucleic acid strand) when the two strands are annealed.

The double-stranded region of the universal adaptor is a short double-stranded region, typically comprising 5 or more consecutive base pairs, formed by annealing two partially complementary polynucleotide strands. The term refers to the region of the nucleic acid duplex where two strands anneal and does not imply any particular structural conformation.

It is generally advantageous that the double-stranded region is as short as possible without loss of function. As used herein, "function" refers to the ability of the double-stranded region to form a stable duplex under standard reaction conditions of an enzyme-catalyzed nucleic acid ligation reaction, which conditions are well known to the skilled reader (e.g., incubation at a temperature in the range of 4 ℃ to 25 ℃ in a ligation buffer suitable for enzymes), such that the two strands forming the universal adaptor remain partially annealed during ligation of the universal adaptor to the target molecule. It is not absolutely necessary that the double-stranded region be stable under the conditions normally used in the primer extension or annealing step of the PCR reaction.

The double-stranded region of the universal adaptor is typically the same in all universal adaptors used in ligation. Because the universal adaptors are ligated to both ends of each target molecule, the modified target nucleic acid may be flanked by complementary sequences derived from the double-stranded region of the universal adaptor. The longer the double-stranded region and the complementary sequence derived therefrom in the modified target nucleic acid construct, the greater the likelihood that the modified target nucleic acid construct will be able to fold back within these regions of internal self-complementarity and base-pair with itself under the annealing conditions used in primer extension and/or PCR. Therefore, it is generally preferred that the duplex region be 20 or less, 15 or less, or 10 or less base pairs in length to reduce this effect. The stability of the double-stranded region, and thus potentially its length, can be increased by including unnatural nucleotides that exhibit stronger base pairing than standard Watson-Crick base pairs.

In one embodiment, the two strands of the universal adaptor are 100% complementary in the double-stranded region. It will be appreciated that one or more nucleotide mismatches may be tolerated within the duplex region, provided that the two strands are capable of forming a stable duplex under standard ligation conditions.

Universal adaptors for use herein can generally include a double-stranded region that forms a "ligatable" end of an adaptor, such as the end that is ligated to a double-stranded target nucleic acid in a ligation reaction. The ligatable end of the universal adaptor may be blunt ended or, in other embodiments, a short 5 'or 3' overhang of one or more nucleotides may be present to facilitate/facilitate ligation. The 5 'terminal nucleotide of the ligatable end of the universal adaptor is typically phosphorylated to effect phosphodiester ligation to the 3' hydroxyl group on the target polynucleotide.

The term "mismatch region" refers to a region of a universal adaptor, a region of a single-stranded non-complementary nucleic acid strand, wherein the sequence of the two polynucleotide strands forming the universal adaptor exhibit a degree of non-complementarity such that the two strands cannot fully anneal to each other under standard annealing conditions of primer extension or PCR reactions. The mismatch region may exhibit some degree of annealing under standard reaction conditions for enzyme-catalyzed ligation reactions, provided that both strands are converted to single-stranded form under the annealing conditions in the amplification reaction.

It will be appreciated that the "mismatch regions" are provided by different portions of the same two polynucleotide strands forming the duplex region. Mismatches in the adaptor construct may take the form of one strand being longer than the other, such that a single stranded region is present on one strand, or a sequence selected such that the two strands do not hybridize, thereby forming a single stranded region on the two strands. Mismatches can also take the form of "bubbles" in which the two ends of the universal adaptor construct are able to hybridize to each other and form a duplex, while the central region is not. The portion of the strand that forms the mismatch region does not anneal under conditions that anneal other portions of the same two strands to form one or more double-stranded regions. For the avoidance of doubt, it will be understood that in the context of the present disclosure, the single stranded or single base overhang at the 3' end of the polynucleotide duplex which is subsequently subjected to ligation with the target sequence does not constitute a "mismatch region".

The lower limit of the length of the mismatch region is usually determined by function, such as the need to provide a sequence suitable for: i) primer binding for primer extension, PCR and/or sequencing (e.g., binding of primer to universal primer binding site), or ii) binding of universal capture binding sequence to capture nucleic acid to immobilize the modified target nucleic acid to the surface. In theory, there is no upper limit on the length of the mismatch region, but it is generally advantageous to minimize the total length of the universal adaptors, e.g., to facilitate separation of unbound universal adaptors from the modified target nucleic acid construct after the ligation step. Thus, it is generally preferred that the length of the mismatch region should be less than 50, or less than 40, or less than 30, or less than 25 contiguous nucleotides.

The region of the single-stranded non-complementary nucleic acid strand comprises at least one universal capture binding sequence at the 3' end. The 3' end of the universal adaptor comprises a universal capture binding sequence that will hybridize to the capture nucleic acids present at the array amplification sites. Optionally, the 5' end of the universal adaptor comprises a second universal capture binding sequence attached to each end of the target nucleic acid, wherein the second universal capture binding sequence will hybridize to a different capture nucleic acid present at an amplification site of the array.

The region of the single-stranded non-complementary nucleic acid strand typically further comprises at least one universal primer binding site. The universal primer binding site is a universal sequence that can be used to amplify and/or sequence a target nucleic acid that is attached to a universal linker.

The region of the single-stranded non-complementary nucleic acid strand can also include at least one index. The index can be used as a marker characteristic of the source of a particular target nucleic acid on the array (U.S. Pat. No.8,053,192). Typically, the index is a synthetic sequence of nucleotides that are part of a universal adaptor that is added to the target nucleic acid as part of the library preparation step. Thus, an index is a nucleic acid sequence of each target molecule attached to a particular sample, the presence of which is indicative of or used to identify the sample or source from which the target molecule was isolated. In one embodiment, a dual index system may be used. In a dual index system, the universal linker attached to the target nucleic acid comprises two different index sequences (U.S. patent publication No.2018/0305750, U.S. patent publication No.2018/0305751, U.S. patent publication No.2018/0305752, and U.S. patent publication No. 2018/0305753).

Preferably, the index may be up to 20 nucleotides in length, more preferably 1-10 nucleotides in length, and most preferably 4-6 nucleotides in length. The 4 nucleotide index gives the possibility to multiplex 256 samples on the same array, and the 6 base index enables the processing of 4096 samples on the same array.

In one embodiment, the universal capture binding sequence is part of the universal adaptor when ligated to the double stranded target fragment, and in another embodiment, the universal primer extension binding site is added to the universal adaptor after the universal adaptor is ligated to the double stranded target fragment. The addition can be accomplished using conventional methods, including amplification-based methods such as PCR.

The precise nucleotide sequence of the universal adaptor is generally not important to the present disclosure and can be selected by the user such that the desired sequence elements are ultimately included in a common sequence of a plurality of different modified target nucleic acids, e.g., to provide a universal capture binding sequence and binding sites for a specific set of universal amplification primers and/or sequencing primers. Other sequence elements may be included, for example, to provide binding sites for sequencing primers that will ultimately be used for sequencing of target nucleic acids in the library, index sequencing or products derived from amplification of target nucleic acids in the library, e.g., on a solid support.

Although the precise nucleotide sequence of the universal adaptor is generally not limited to the present disclosure, the sequence of the individual strands in the mismatch region should be such that none of the individual strands exhibit any internal self-complementarity which can result in self-annealing, hairpin formation, and the like under standard annealing conditions. Self-annealing of strands in the mismatch region should be avoided because it prevents or reduces specific binding of amplification primers to such strands.

Mismatched adaptors are preferably formed from two strands of DNA, but may include a mixture of natural and non-natural nucleotides (e.g., one or more ribonucleotides) joined by a mixture of phosphodiester and non-phosphodiester backbones.

Ligation and amplification of Universal adaptors

The methods of attachment are known in the art and standard methods are used. Such methods use a ligase, such as a DNA ligase, to effect or catalyze the ligation of the ends of the two polynucleotide strands of the universal adaptor and the double-stranded target nucleic acid in this case, thereby forming a covalent linkage. The universal linker may contain a 5 '-phosphate moiety to facilitate attachment to the 3' -OH present on the target fragment. The double stranded target nucleic acid contains a 5 '-phosphate moiety, either remaining from the cleavage process or added using an enzymatic treatment step, and has been end-repaired and optionally extended by one or more overhanging bases to give a 3' -OH suitable for ligation. In this context, a linkage refers to a covalent linkage of polynucleotide chains that have not previously been covalently linked. In particular aspects of the present disclosure, such linkage occurs by forming a phosphodiester linkage between two polynucleotide strands, although other means of covalent linkage (e.g., non-phosphodiester backbone linkages) may be used.

As discussed herein, in one embodiment, the universal adaptors used for ligation are intact and include universal capture binding sequences and other universal sequences, such as universal primer binding sites and index sequences. The resulting plurality of modified target nucleic acids can be used to prepare an immobilized sample for sequencing.

Also as discussed herein, in one embodiment, the universal adaptor used for ligation comprises a universal primer binding site and an index sequence, and does not comprise a universal capture binding sequence. The resulting plurality of modified target nucleic acids can be further modified to include a specific sequence, such as a universal capture binding sequence. Methods for adding specific sequences, such as a universal capture binding sequence, to universal primers ligated to double stranded target fragments include amplification based methods, such as PCR, and are known in the art and described, for example, in: bignell et al (US 8,053,192) and Gunderson et al (WO 2016/130704).

In embodiments where the universal linker is modified, an amplification reaction is prepared. The contents of the amplification reaction are known to those skilled in the art and include the appropriate substrates (e.g., dntps), enzymes (e.g., DNA polymerase) and buffer components required for the amplification reaction. Typically, an amplification reaction requires at least two amplification primers, often denoted as "forward" and "reverse" primers (primer oligonucleotides), which are capable of specifically annealing to a portion of a polynucleotide sequence to be amplified (e.g., a modified target nucleic acid) under conditions encountered in the primer annealing steps of each cycle of the amplification reaction. It will be appreciated that if the primer contains any nucleotide sequence that does not anneal to the modified target nucleic acid in the first amplification cycle, this sequence can be copied into the amplification product. For example, using primers having a universal capture binding sequence, e.g., a sequence that does not anneal to the modified target nucleic acid, the universal capture binding sequence will be incorporated into the resulting amplicon.

Amplification primers are typically single-stranded polynucleotide structures. They may also contain a mixture of natural and non-natural bases and natural and non-natural backbone linkages, provided that any non-natural modifications do not preclude function as primers-defined as the ability to anneal to a template polynucleotide strand during the conditions of the amplification reaction and serve as a point of initiation for the synthesis of a new polynucleotide strand that is complementary to the template strand. In addition, the primer may contain non-nucleotide chemical modifications, such as phosphorothioates to increase exonuclease resistance, again provided that the modification does not prevent primer function.

Amplification to produce clusters

Methods known to those skilled in the art can be used to generate an array comprising amplification sites, each amplification site comprising a clonal population (also referred to as a cluster) of double-stranded amplicons. In one embodiment, an isothermal amplification method is used and includes generating a clonal population of double-stranded amplicons from individual target nucleic acids that have been seeded to the site. In some embodiments, the amplification reaction is conducted to a capacity that produces a sufficient number of amplicons to fill the corresponding amplification sites. Filling the already seeded site in this way to the capacity exclusion point where the subsequent target nucleic acid falls, thereby generating a clonal population of amplicons at that site. Thus, in some embodiments, it is desirable that the rate at which amplicons are generated to fill the capacity of an amplification site exceeds the rate at which individual target nucleic acids are transported to individual amplification sites.

In some embodiments, the amplification method includes, but is not limited to, solid phase amplification, polony amplification, colony amplification, emulsion PCR, bead RCA, surface RCA, or surface SDA. In some embodiments, amplification methods are used which result in the amplification of free DNA molecules in solution or are tethered to a suitable substrate by only one end of the DNA molecule. In some embodiments, bridge-PCR-dependent methods are used in which both PCR primers are attached to a surface (see, e.g., WO 2000/018957, U.S. Pat. No.7,972,820; U.S. Pat. No.7,790,418 and Adessi et al, Nucleic Acids Research (2000):28(20): E87). In some embodiments, the methods of the invention may create a "polymerase colony technique" or "polony," which refers to multiplex amplification that maintains spatial clustering of the same amplicons (see Harvard Molecular Technology Group and coater Center for Computational Genetics website). These include, for example, in situ polony (Mitra and Church, Nucleic Acid Research 27, e34, Dec.15,1999), rolling circle in situ amplification (RCA) (Lizardi et al, Nature Genetics 19,225, 7 1998), bridge PCR (U.S. Pat. No.5,641,658), picoliter PCR (Leamon et al, Electrophoresis 24,3769,2003, 11 months) and emulsion PCR (Dressman et al, PNAS 100,8817, 2003, 7 months 22). In some embodiments, a kinetic exclusion-dependent method is used in which a recombinase-facilitated amplification and isothermal conditions amplify the library (U.S. patent No.9,309,502, U.S. patent No.8,895,249, U.S. patent No.8,071,308).

In some embodiments, significant clonality may be achieved even if the amplification site is not filled to capacity before amplification of the second target nucleic acid begins at the site. Under certain conditions, amplification of the first nucleic acid target can proceed to a point where a sufficient number of copies are produced to effectively outweigh or overwhelm the production of copies of the second nucleic acid target transported to that site. For example, in embodiments using a bridge amplification process on circular features less than 500nm in diameter, it has been determined that after 14 cycles of exponential amplification of a first target nucleic acid, contamination from a second target nucleic acid at the same locus will produce an insufficient number of contaminating amplicons to adversely affect sequencing-by-synthesis analysis on the Illumina sequencing platform.

In all embodiments, the amplification sites in the array need not be fully cloned. Conversely, for certain applications, an individual amplification site can be primarily filled with amplicons from a first target nucleic acid, and can also have a low level of contaminating amplicons from a second target nucleic acid. The array may have one or more amplification sites with low levels of contaminating amplicons, as long as the contamination level does not have an unacceptable impact on the subsequent use of the array. For example, when the array is to be used in an inspection application, an acceptable contamination level is a level that does not affect the signal-to-noise ratio or resolution of the inspection technique in an unacceptable manner. Thus, apparent clonality is generally associated with a particular use or application of the arrays made by the methods described herein. Exemplary levels of contamination acceptable at an individual amplification site for a particular application include, but are not limited to, up to 0.1%, 0.5%, 1%, 5%, 10%, or 25% contaminating amplicons. The array may include one or more amplification sites having these exemplary levels of contaminating amplicons. For example, up to 5%, 10%, 25%, 50%, 75%, or even 100% of the amplification sites in the array may have some contaminating amplicons.

In some embodiments, the methods of making an array useful for the methods described herein can be performed under conditions that transport (e.g., by diffusion) the target nucleic acid to the amplification site while amplification occurs. Thus, some amplification methods may utilize both a relatively slow transport rate and a relatively slow production of the first amplicon relative to subsequent amplicon formation. For example, the amplification reactions described herein can be performed such that the target nucleic acid is transferred from solution to the amplification site concurrently with: (i) generating a first amplicon, and (ii) generating subsequent amplicons at other sites of the array. In particular embodiments, the average rate of subsequent amplicon production at the amplification site can exceed the average rate of target nucleic acid transport from solution to the amplification site. In some cases, a sufficient number of amplicons can be generated from a single target nucleic acid at an individual amplification site to fill the capacity of the corresponding amplification site. The rate at which amplicons are generated to fill the capacity of the respective amplification sites can, for example, exceed the rate at which individual target nucleic acids are transported from solution to the amplification sites.

Compositions for amplifying a target nucleic acid at an amplification site, referred to herein as "amplification reagents," are generally capable of rapidly preparing copies of the target nucleic acid at the amplification site. Amplification reagents used in the methods of the present disclosure will typically include a polymerase and a Nucleotide Triphosphate (NTP). Any of a variety of polymerases known in the art may be used, but in some embodiments, exonuclease-negative polymerases may be preferred. Examples of nucleic acid polymerases suitable for use in embodiments of the invention include, but are not limited to, DNA polymerases (e.g., Klenow fragment, T4 DNA polymerase, Bst (Bacillus stearothermophilus) polymerase), thermostable DNA polymerases (e.g., Taq, Vent, Deep Vent, Pfu, Tfl, and 9 ℃ N DNA polymerase), and genetically modified derivatives thereof (Taqgold, VENTexo, Pfu exo). In some embodiments, the amplification reagents may also include a recombinase, a helper protein, and a single-stranded DNA binding (SSB) protein for recombinase-facilitated amplification.

For embodiments in which a copy of DNA is produced, the NTP may be a deoxyribonucleotide triphosphate (dNTP). Generally, four natural species, dATP, dTTP, dGTP and dCTP, will be present in a DNA amplification reagent. However, analogs may be used if desired. For embodiments in which RNA copies are generated, the NTP may be a ribonucleotide triphosphate (rNTP). Generally, four natural species, rATP, rUTP, rGTP and rCTP, are present in RNA amplification reagents. However, analogs may be used if desired. NTPs can be modified with fluorescent or radioactive groups. To increase the detectability and/or functional diversity of nucleic acids, a wide variety of synthetically modified nucleic acids have been developed for use in chemical and biological methods. These functionalized/modified molecules (e.g., nucleotide analogs) can be fully compatible with the native polymerase, thereby maintaining the base pairing and replication properties of the native counterpart.

Thus, other components of the amplification solution are added to the selection of polymerases and they essentially correspond to compounds known in the art to be effective in supporting various polymerase activities. Compounds such as dimethyl sulfoxide (DMSO), Bovine Serum Albumin (BSA), polyethylene glycol (PEG), betaine, Triton X-100, denaturants (e.g., formyl)Amine) or MgCl2The concentration of such compounds is known in the prior art to be important for having optimal amplification, and thus such concentrations can be readily adjusted by the operator for the methods of the present disclosure based on the examples given below and the generally available knowledge.

The rate at which the amplification reaction occurs can be increased by increasing the concentration or amount of one or more active ingredients of the amplification reaction. E.g., the amount or concentration of polymerase, nucleotide triphosphates, or primers. In some cases, one or more active components of an amplification reaction that are increased in amount or concentration (or otherwise manipulated in the methods described herein) are non-nucleic acid components of the amplification reaction.

The rate of amplification can also be increased in the methods described herein by adjusting the temperature. For example, the amplification rate at one or more amplification sites can be increased by raising the temperature at one or more sites to a maximum temperature at which the reaction rate is reduced due to denaturation or other adverse events. The optimum or desired temperature may be determined empirically from known characteristics of the amplification components used or for a given amplification reaction mixture. Such adjustments can be made based on a priori prediction of primer melting temperature (Tm) or empirically. In certain embodiments, the temperature of the amplification reaction is at least 35 ℃ to no greater than 70 ℃. For example, the amplification reaction can be at least 35 ℃ to no greater than 42 ℃, or at least 57 ℃ to no greater than 63 ℃.

The rate at which the amplification reaction occurs may be increased by increasing the activity of one or more amplification reagents. For example, a cofactor that increases the extension rate of a polymerase may be added to a reaction using the polymerase. In some embodiments, a metal cofactor, such as magnesium, zinc, or manganese, may be added to the polymerase reaction or betaine may be added.

Preparation of immobilized samples for sequencing

The result of the bridging amplification is that clones at the amplification site "bridge" the population of amplification products. Both strands of the amplicon acid are immobilized at the 5' end on the surface of the amplification site, where this attachment results from the original attachment of the capture nucleic acid (see, e.g., fig. 1A, where the double-stranded amplicon 11 is depicted in a "bridging" orientation). The amplicon within the amplification site will be cloned and derived from the amplification of a single target nucleic acid, or have an acceptable level of another amplicon as described herein.

With the exception of the free 3' ends of each strand of the bridged double-stranded amplicon, a significant amount of unused capture nucleic acid remains on the surface of the amplification site after amplification to form a clonal cluster of bridged amplification products. The presence of unused capture nucleic acids can contribute to increased noise and is therefore typically removed by contacting the array with a nuclease under conditions suitable for nuclease digestion of the unused capture nucleic acids. In one embodiment, the nuclease is an exonuclease, e.g., an exonuclease having 3 'to 5' single stranded DNA exonuclease activity. An example of such an exonuclease is exonuclease I. Following nuclease treatment, the array is washed to remove the nuclease and the resulting nucleotides and/or nucleic acids from the amplification sites.

To facilitate sequencing, one of the strands of the double-stranded bridge structure can be selectively removed from the surface to allow efficient hybridization of the sequencing primer to the remaining immobilized strand. The selective removal of a particular strand is referred to herein as "linearization". Examples of suitable linearization methods are described herein and in more detail in application number WO 2007/010251 and U.S. patent application publication 2012/0309634.

In one embodiment, linearization is achieved as follows: one strand of the double-stranded amplicon is bridged by cleavage, and the resulting structure is then subjected to conditions to remove strands that no longer attach to the surface of the amplification sites. Cleavage can be accomplished by using a capture nucleic acid comprising a cleavage site. The cleavage site is usually located in a position that results in a large portion of one strand of the bridging structure not having an amplification site surface-no longer immobilized-and is easily lost after the removal step. For example, as shown in FIG. 1B, the cleavage site X in the amplicon 11 is cleaved, leaving a shortened capture nucleic acid 13 ". One strand 16 of the bridging structure remains immobilized at its 5 'end to the amplification site 10 and the other strand 15' is no longer immobilized due to cleavage at X. In one embodiment, the 3' end of the strand 16 remains annealed to the complementary base of the shortened capture nucleic acid 13 ", thereby maintaining the bridge structure after linearization. The number of complementary bases between the 3' end of the strand 16 maintaining the bridge structure and the shortened capture nucleic acid 13 ″ varies with prevailing conditions and can be determined by the skilled person.

In one embodiment, the cleavage site is treated to remove nucleotides and form abasic sites. An "abasic site" is a nucleotide position in a nucleic acid from which a base component has been removed. Abasic sites can be formed chemically under artificial conditions or by the action of enzymes. Once formed, the abasic sites can be cleaved (e.g., by treatment with an endonuclease or other single-stranded cleaving enzyme, exposure to heat, or alkali), thereby providing a means for site-specific cleavage of nucleic acids.

In one embodiment, abasic sites can be created at predetermined positions on one strand of the immobilized amplicon. This can be achieved, for example, by incorporating specific nucleotides at predetermined positions.

In one embodiment, deoxyuridine (U) is incorporated into one of the capture nucleic acids attached to the surface of the amplification site. Uracil bases can then be removed using the enzyme Uracil DNA Glycosylase (UDG), creating abasic sites on one strand. A polynucleotide strand comprising an abasic site can then be cleaved at the abasic site by treatment with an endonuclease (e.g., DNA glycosylase-lyase endonuclease VIII), heat, or base. In a specific embodiment, a single nucleotide gap is created at the uracil base in the immobilizer using the USER reagent available from New Englad Biolabs (NEB # M5505S). In one embodiment, the amplification site is exposed to a mixture containing a suitable glycosylase and one or more suitable endonucleases, typically at a ratio of activities of at least about 2: 1. Treatment with an endonuclease generates a 3' -phosphate moiety at the cleavage site, which can be removed with a suitable phosphatase, such as alkaline phosphatase. For example, as shown in FIG. 1B, if the cleavage site X is generated using a USER reagent, the shortened capture nucleic acid 13 "will terminate with a 3' -phosphate group.

In one embodiment, 8-oxoguanine is incorporated into one of the capture nucleic acids attached to the surface of the amplification site. The 8-oxoguanine base can then be removed using FPG glycosylase to create an abasic site on one strand. In another embodiment, deoxyinosine is incorporated into one of the capture nucleic acids attached to the surface of the amplification site, and then the deoxyinosine base can be removed using the enzyme AlkA glycosylase to create an abasic site on one strand.

Advantages of this method include the option of releasing a free 3' phosphate group on the cleaved strand, which after phosphatase treatment can provide a starting point for sequencing a region of the complementary strand (e.g., sequencing a region of strand 16 of fig. 1B). Since cleavage reactions require residues in the DNA that do not occur naturally, but are otherwise independent of sequence context, such as deoxyuridine, there is no possibility that glycosylase mediated cleavage will occur elsewhere at unwanted positions in the duplex if it contains only one unnatural base. Another advantage gained by cleavage of abasic sites in the double-stranded region of the immobilized amplicon resulting from the action of UDG on uracil is that the first base incorporated in a sequencing-by-synthesis reaction initiated at the free 3' hydroxyl formed by cleavage will always be T. Thus, for all clonal clusters at different amplification sites of an array that are cleaved in this manner to generate a sequencing template, the first base that is universally incorporated throughout the array will be T. This can provide a sequence-independent assay of the individual cluster intensities at the beginning of a sequencing run.

The steps of exonuclease addition and linearization are known to those skilled in the art to be necessarily separate steps. Treatment with endonucleases produces a 3 '-phosphate moiety at the cleavage site and the presence of the 3' phosphate is known to inhibit exonuclease I activity (Lehman and Nussbaum,1964, J.biol.chem.,239: 2628-2636). The inventors have made the unexpected and surprising discovery that both exonuclease and linearization steps can occur simultaneously by combining enzymes. Simplifying these two steps to one results in a faster sequencing run because both steps are now performed simultaneously. Furthermore, combining these two steps does not adversely affect the primary metrics, read quality, double indexing or genome construction metrics.

The abasic site generation and cleavage results in a free 5' end on the strand that is no longer immobilized to the surface (e.g., as shown in FIG. 1B, one strand 16 of the bridging structure remains immobilized at its 5' end to the amplification site 10, and the other strand 15 ' is no longer immobilized due to cleavage at X). This strand can be completely removed from the surface by exposing the amplification site to suitable conditions. In one embodiment, removal is by denaturation. Denaturation can be carried out thermally or isothermally, for example using chemical denaturation. The chemical denaturant may be urea, hydroxide or formamide or other similar agents. In another embodiment, removal may be achieved by treatment with an exonuclease having 5 '-3' activity, such as lambda or T7 exonuclease. Removal of the unattached strand results in a remaining single strand, which can serve as a template for the polymerase.

Optionally, the 3' end of the nucleic acid at the amplification site is repaired. Exonucleases can remove some nucleotides at the 3' end of a nucleic acid after linearization. Without intending to be limiting, it is possible that the 3' end is slightly "breathing" resulting in a small number of nucleotides becoming single stranded and available for exonuclease digestion. Repair can be achieved by exposing the nucleotides to a DNA polymerase, e.g., a DNA polymerase for bridge amplification.

Removal of unattached strands is optional. In one embodiment, the 3' -phosphate group remaining after the creation of the abasic site is removed to leave a 3' -hydroxyl group at the end of the cleaved capture nucleic acid (FIG. 1B, shortened capture nucleic acid 13 ". The capture nucleic acid can be used as a primer for a polymerase having strand displacement activity since the polymerase synthesizes a complementary strand using the immobilized strand (FIG. 1B, strand 16) as a template, it displaces the unattached strand (FIG. 1B, strand 15 ').

Composition comprising a metal oxide and a metal oxide

Different compositions can be obtained during or after the amplification clustering process described herein. In one embodiment, the composition comprises a glycosylase, an endonuclease, and an exonuclease. In one embodiment, the exonuclease has 3 'to 5' single stranded DNA exonuclease activity, such as exonuclease I. In one embodiment, one class of glycosylases that may be present in the composition is uracil DNA glycosylase and the endonuclease is DNA glycosylase-lyase endonuclease VIII. In one embodiment, one class of glycosylases that may be present in the composition is FPG glycosylase and the endonuclease is DNA glycosylase-lyase endonuclease VIII. In one embodiment, one class of glycosylases that may be present in the composition is AlkA glycosylase and the endonuclease is DNA glycosylase-lyase endonuclease VIII. The composition can include a double stranded DNA substrate that includes a uracil cleavage site, an 8-oxoguanine cleavage site, or a deoxyinosine cleavage site. The composition can comprise a double stranded DNA substrate comprising abasic sites. In some embodiments, a composition double stranded DNA substrate may comprise a single stranded region, wherein a uracil cleavage site and an abasic site are present in the double stranded region. Also provided is an array comprising a plurality of amplification sites, wherein each amplification site comprises a plurality of double stranded DNA substrates attached to the amplification site.

Methods for sequencing/sequencing

Arrays of the present disclosure that have been produced by the methods described herein and include amplified and linearized amplicons at the amplification sites may be used in any of a variety of applications. A particularly useful application is nucleic acid sequencing. One example is Sequencing By Synthesis (SBS). In SBS, extension of a nucleic acid primer along a nucleic acid template (e.g., a target nucleic acid or amplicon thereof) is monitored to determine the sequence of nucleotides in the template. The potential chemical process may be a polymerization reaction (e.g., as catalyzed by a polymerase). In certain polymerase-based SBS embodiments, fluorescently labeled nucleotides are added to the primer (thereby extending the primer) in a template-dependent manner so that the sequence of the template can be determined using detection of the order and type of nucleotides added to the primer. A plurality of different templates at different sites of the arrays described herein may be subjected to SBS techniques under conditions where events occurring for the different templates can be distinguished due to their location in the array.

Flow cells provide a convenient form of housing arrays produced by the methods of the present disclosure and subjected to SBS or other detection techniques that involve repeated delivery of reagents in a cycle. For example, to initiate the first SBS cycle, one or more labeled nucleotides, DNA polymerase, etc., can be flowed into/to a flow cell that houses an array of nucleic acid templates. Those sites of the array where primer extension results in incorporation of labeled nucleotides can be detected. Optionally, the labeled nucleotide may further comprise a reversible termination property that terminates further primer extension once the nucleotide has been added to the primer. For example, a nucleotide analog having a reversible terminator moiety can be added to a primer such that subsequent extension does not occur until the deblocking agent is delivered to remove the moiety. Thus, for embodiments using reversible termination, the deblocking agent can be delivered to the flow cell (either before or after detection occurs). Washing may be performed between each delivery step. The cycle may then be repeated n times to extend the primer n nucleotides, thereby detecting a sequence of length n. Exemplary SBS procedures, fluidic systems, and detection platforms that can be readily adapted for use with arrays produced by the methods of the present disclosure are described, for example, in Bentley et al, Nature 456:53-59(2008), WO 04/018497; U.S. patent nos. 7,057,026; WO 91/06678; WO 07/123,744; U.S. patent nos. 7,329,492; U.S. patent nos. 7,211,414; U.S. patent nos. 7,315,019; U.S. patent No.7,405,281, and U.S. patent No.8,343,746.

Other sequencing procedures using cycling reactions, such as pyrophosphate sequencing, may be used. Pyrophosphoric acid sequencing detects the release of inorganic pyrophosphate (PPi) when a particular nucleotide is incorporated into a nascent nucleic acid strand (Ronaghi, et al, Analytical Biochemistry 242(1),84-9 (1996); Ronaghi, Genome Res.11(1),3-11 (2001); Ronaghi et al Science 281(5375),363 (1998); U.S. Pat. Nos. 6,210,891; 6,258,568 and 6,274,320). In pyrosequencing, the released PPi may be detected by immediate conversion to Adenosine Triphosphate (ATP) by ATP sulfurylase, and the level of ATP produced may be detected by photons produced by luciferase. Thus, the sequencing reaction can be monitored by a luminescence detection system. The pyrophosphate sequencing procedure does not require an excitation radiation source for a fluorescence-based detection system. Useful fluidic systems, detectors, and procedures that can be used to apply pyrosequencing to arrays of the present disclosure are described, for example, in WIPO published patent application 2012/058096, US 2005/0191698 a1, US patent No.7,595,883, and US patent No.7,244,559.

Ligation sequencing reactions are also useful, including, for example, Shendre et al Science309:1728-1732 (2005); U.S. patent nos. 5,599,675; and U.S. Pat. No.5,750,341. Some embodiments may include a sequencing by hybridization program, such as that described in Bains et al, Journal of Theoretical Biology 135(3),303-7 (1988); drmanac et al, Nature Biotechnology 16,54-58 (1998); fodor et al, Science 251(4995),767-773 (1995); and WO 1989/10977. In both ligation sequencing and hybridization sequencing procedures, repeated cycles of oligonucleotide delivery and detection are performed on template nucleic acids (e.g., target nucleic acids or amplicons thereof) present at array sites. Fluidic systems for SBS methods as described herein or in references cited herein can be readily adapted to deliver reagents for use in ligation sequencing or hybridization sequencing procedures. Typically, the oligonucleotides are fluorescently labeled and can be detected using a fluorescence detector similar to that described for the SBS procedure in this document or in references cited herein.

Some embodiments may employ methods involving monitoring DNA polymerase activity in real time. For example, nucleotide incorporation can be detected by Fluorescence Resonance Energy Transfer (FRET) interaction between a fluorophore-bearing polymerase and a gamma-phosphate labeled nucleotide or using a zero mode waveguide. Techniques and reagents for FRET-based sequencing are described, for example, in Leven et al Science 299,682-686 (2003); lundquist et al opt.33, 1026-1028 (2008); korlach et al Proc. Natl. Acad. Sci. USA 105,1176-1181 (2008).

Some SBS embodiments include detecting protons released upon incorporation of nucleotides into the extension products. For example, sequencing based on detection of liberated protons may use electrical detectors and related Technologies available from Ion Torrent (Guilford, con., a Life Technologies, inc.) or US 2009/0026082 a 1; US 2009/0127589 a 1; US 2010/0137143 a 1; or the sequencing methods and systems described in US 2010/0282617 a 1. The methods described herein for amplifying target nucleic acids can be readily applied to substrates for detecting photons. More specifically, the methods described herein can be used for clonal populations of amplicons at sites of an array for detecting photons.

A useful application of the arrays of the present disclosure, which have been produced, for example, by the methods described herein, is gene expression analysis. Gene expression can be detected or quantified using RNA sequencing techniques, such as those known as digital RNA sequencing. RNA sequencing techniques can be performed using sequencing methods known in the art, such as those described above. Gene expression can also be detected or quantified using hybridization techniques by direct hybridization to an array or using multiplex assays that detect their products on an array. For example, arrays of the present disclosure that have been produced by the methods described herein can also be used to determine the genotype of genomic DNA samples from one or more individuals. Exemplary methods for array-based expression and genotyping analysis that can be performed on the arrays of the present disclosure are described in U.S. patent nos. 7,582,420; 6,890,741, respectively; 6,913,884 or 6,355,431 or U.S. patent publication No.2005/0053980 a 1; 2009/0186349 a1 or US 2005/0181440 a 1.

Another useful application of arrays that have been produced by the methods described herein is single cell sequencing. When combined with an indexing method, single cell sequencing can be used in chromatin accessibility assays to generate a profile of active regulatory elements in thousands of single cells, and single cell whole genome libraries can be generated. Examples of single cell sequencing that can be performed on the arrays of the present disclosure are described in U.S. published patent application 2018/0023119 a1, U.S. provisional application serial No. 62/673,023, and serial No. 62/680,259.

An advantage of the methods described herein is that they provide for the rapid and efficient creation of arrays from any of a variety of nucleic acid libraries. Accordingly, the present disclosure provides an integrated system that can use one or more of the methods set forth hereinMethods to prepare the array and also be able to detect nucleic acids on the array using techniques known in the art, such as those exemplified herein. Thus, the integrated systems of the present disclosure may include fluidic components, such as pumps, valves, reservoirs, fluidic lines, etc., capable of delivering amplification reagents to the array of amplification sites. A particularly useful fluid component is a flow cell. Flow cells can be configured and/or used in an integrated system to create an array of the present disclosure and detect the array. Exemplary flow cells are described, for example, in US 2010/0111768 a1 and US patent No.8,951,781. As exemplified by a flow cell, one or more fluidic components of an integrated system may be used for amplification methods and detection methods. Taking the nucleic acid sequencing embodiment as an example, one or more fluidic components of the integrated system may be used for delivery of sequencing reagents for use in amplification methods and in sequencing methods such as the sequencing methods described herein. Alternatively, the integrated system may comprise separate fluidic systems for performing the amplification method and for performing the detection method. Examples of integrated sequencing systems capable of creating nucleic acid arrays and determining nucleic acid sequences include, but are not limited to, MiSeq from Illumina, IncTM、HiSeq2500TM、NextSeqTM、MiniSeqTM、NovaSeqTMAnd iSeqTMSequencing platforms and devices described in U.S. patent No.8,951,781. Such devices can be modified according to the teachings set forth herein to prepare arrays.

A system capable of performing the methods described herein need not be integrated with a detection device. But rather may be a stand-alone system or a system integrated with other devices. Fluidic components similar to those exemplified above in the context of an integrated system may be used in such embodiments.

Whether integrated with detection capabilities or not, a system capable of performing the methods presented herein may include a system controller capable of executing a set of instructions to perform one or more steps of the methods, techniques, or processes described herein. For example, the instructions may direct the performance of steps to create an array under bridge amplification conditions. Optionally, the instructions may further direct the performance of the step of detecting the nucleic acid using the methods previously described herein. Useful system controllers may include any processor-based or microprocessor-based system including systems using microcontrollers, Reduced Instruction Set Computers (RISC), Application Specific Integrated Circuits (ASICs), Field Programmable Gate Arrays (FPGAs), logic circuits, and any other circuit or processor capable of executing the functions described herein. The set of instructions for the system controller may be in the form of a software program. As used herein, the terms "software" and "firmware" are interchangeable, and include any computer program stored in memory for execution by a computer, including RAM memory, ROM memory, EPROM memory, EEPROM memory, and non-volatile RAM (NVRAM) memory. The software may take various forms, such as system software or application software. Further, the software may take the form of a collection of separate programs, or a program module or portion of a program module within a larger program. The software may also include modular programming in the form of object-oriented programming.

It will be appreciated that arrays of the present disclosure, for example, that have been produced by the methods described herein need not be used in detection methods. Instead, the array can be used to store a library of nucleic acids. Thus, the array can be stored in a state in which the nucleic acid is preserved. For example, the array may be stored in a dry state, a frozen state (e.g., in liquid nitrogen), or in a solution that protects the nucleic acids. Alternatively, or in addition, the array may be used to replicate a nucleic acid library. For example, the array may be used to create duplicate amplicons from one or more sites on the array.

Referring now to FIG. 1A, a schematic diagram of an amplification site 10 having a plurality of one member of a double-stranded amplicon 11 is shown. The depicted double-stranded amplicon 11 comprises a first strand 15 and a second strand 16. Two populations of capture nucleic acids are also shown. A first population is shown, attached 13 at one end of a first strand 15 or bound 13' to the surface of an amplification site 10. Also shown is a second population of capture nucleic acids, attached 14 at one end of the second strand 16, or bound 14' to the surface of the amplification site 10. The cleavage site (marked with an X on the capture nucleic acid 13) is also shown in FIG. 1A. In one embodiment, the capture nucleic acid 13 may comprise a P5 capture nucleic acid, while the other capture nucleic acid 14 may comprise a P7 capture nucleic acid.

FIG. 1B shows cleavage at cleavage site X in the capture nucleic acid 13 attached to the first strand. Cleavage of the capture nucleic acid 13 results in (i) a shortened strand 15 'and (ii) a shortened capture nucleic acid 13'. The shortened strand 15' is no longer attached to the amplification site 10. The shortened capture nucleic acid 13 'may be terminated with a 3' phosphate. In the depicted embodiment, the nucleotides present at the 3 'end of strand 16 remain annealed to the nucleotides present at the 3' end of shortened capture nucleic acid 13 ". Due to the action of exonucleases, e.g. exonuclease I, unattached capture nucleic acids 13 'and 14' are no longer present at the amplification site 10.

FIG. 1C shows the results of exposing the amplicons of FIG. 1B to denaturing conditions. The shortened strand 15 ' that is not attached to the amplification site 10 is removed from the strand 16 and the nucleotides at the 3' end of the strand 16 do not anneal to the nucleotides present at the 3' end of the shortened capture nucleic acid 13 ".

FIG. 1D shows the results of the re-annealing. The attached strand 16 re-anneals to a shortened capture nucleic acid 13 ".

Exemplary embodiments

Embodiment 1. a composition comprising:

a uracil DNA glycosylase, a process for the preparation of a novel enzyme,

an endonuclease, and

an exonuclease having 3 'to 5' single stranded DNA exonuclease activity.

Embodiment 2 the composition of embodiment 1, wherein said exonuclease is exonuclease I.

Embodiment 3. the composition of embodiment 1 or 2, wherein the endonuclease is DNA glycosylase-lyase endonuclease VIII.

Embodiment 4 the composition of any one of embodiments 1-3, further comprising a double stranded DNA substrate comprising a uracil cleavage site.

Embodiment 5 the composition of any one of embodiments 1-4, further comprising a double stranded DNA substrate comprising an abasic site.

Embodiment 6 the composition of any one of embodiments 4-5, wherein said double stranded DNA substrate comprises a single stranded region, and wherein said uracil cleavage site and said abasic site are present in said double stranded region.

Embodiment 7 the composition of any one of embodiments 4 to 6, further comprising an array comprising a plurality of amplification sites, wherein each amplification site comprises a plurality of said double stranded DNA substrates attached to said amplification site.

Embodiment 8. a method of preparing a nucleic acid for a sequencing reaction, the method comprising:

(a) providing an array comprising a plurality of amplification sites, wherein an amplification site comprises

(i) A plurality of capture nucleic acids attached to the amplification sites,

wherein a first population of the plurality of capture nucleic acids comprises a cleavage site, and

(ii) a plurality of cloned double-stranded modified target nucleic acids,

wherein both strands of each double stranded target nucleic acid are attached at their 5' ends to a capture nucleic acid,

wherein one strand is attached to a capture nucleic acid comprising said cleavage site, and

wherein the cleavage site is located in the double-stranded region of each double-stranded molecule;

(b) contacting the array with a composition comprising at least one enzyme that generates abasic sites at the cleavage sites and an exonuclease comprising 3 'to 5' single stranded DNA exonuclease activity,

wherein cleavage occurs at the cleavage site,

wherein cleavage converts one strand of the double-stranded target nucleic acid to a second strand attached to the amplification site

One strand and a second strand that is not attached to the amplification site; and

wherein the length of the single stranded capture nucleic acid comprising the free 3' end is shortened by the exonuclease.

Embodiment 9 the method of embodiment 8, wherein the at least one enzyme that creates an abasic site at the cleavage site comprises uracil DNA glycosylase and an endonuclease.

Embodiment 10 the method of embodiment 8 or 9, wherein the endonuclease is DNA glycosylase-lyase endonuclease VIII.

Embodiment 11 the method of any one of embodiments 8 to 10, further comprising removing said at least one enzyme that generates an abasic site at a cleavage site and said exonuclease from said array.

Embodiment 12 the method of any one of embodiments 8-11, further comprising subjecting the cleaved double stranded target nucleic acid to conditions that remove the second strand that is not attached to the amplification site.

Embodiment 13 the method of embodiment 12, wherein the conditions that remove the second strands comprise a denaturing agent, wherein the denaturing agent results in immobilized single-stranded nucleic acids comprising target nucleic acids covalently attached to a second population of capture nucleic acids, wherein the second population of capture nucleic acids are attached to the amplification sites.

Embodiment 14 the method of any one of embodiments 11 to 14, wherein the denaturing agent comprises formamide.

Embodiment 15 the method of any one of embodiments 11-14, further comprising re-annealing the immobilized single-stranded nucleic acids to a member of the first population of capture nucleic acids to produce immobilized partial single-stranded nucleic acids.

Embodiment 16 the method of any one of embodiments 8-15, wherein the cleavage site is located in the capture nucleic acid region of the double stranded region of each double stranded target nucleic acid.

Embodiment 17 the method of any one of embodiments 8-16, wherein the cleavage site comprises uracil, wherein the uracil DNA glycosylase produces an abasic site, and wherein the endonuclease cleaves the abasic site.

Embodiment 18 the method of any one of embodiments 8 to 17, wherein the exonuclease is exonuclease I.

Embodiment 19 the method of any one of embodiments 13 or 15, further comprising hybridizing a sequencing primer to the single-stranded region of the immobilized single-stranded nucleic acid of embodiment 13 or the immobilized partial single-stranded nucleic acid of embodiment 15, thereby preparing a single-stranded nucleic acid for a sequencing reaction.

Embodiment 20 the method of embodiment 19, further comprising performing a sequencing reaction to determine the sequence of at least one region of the immobilized single-stranded nucleic acid or the immobilized partially single-stranded nucleic acid.

Embodiment 21 the method of embodiment 19, wherein said sequencing reaction comprises sequencing-by-synthesis.

Embodiment 22 the method of any one of embodiments 8 to 21, wherein the array is generated by amplifying a plurality of target nucleic acids using the capture nucleic acids as amplification primers.

Embodiment 23 the method of embodiment 22, wherein amplifying comprises excluding amplification.

Examples

The invention is illustrated by the following examples. It is to be understood that the specific examples, materials, amounts, and procedures are to be interpreted broadly in accordance with the scope and spirit of the invention as set forth herein.

Example 1

General assay methods and conditions

This describes the general assay conditions used in the examples described herein, unless otherwise indicated.

Experiments were run on a cbot (ILMN) using a v2.5 HiSeqX flow cell (ILMN). For fig. 2, enzyme activity was tested using P5 and P7 surface primers as substrates, and a flow cell was used without amplifying clusters. During the experiment, the various enzyme mixtures were pumped into a flow cell and incubated for 15 minutes at 37 ℃. Both exonuclease I and USER enzymes are supplied by New England Biolabs. After incubation, the flow cell lanes were washed with HT2 wash buffer (Illumina) and then the presence or absence of surface primers was determined by hybridization of the P5 'and P7' oligomers labeled with the fluorophores TET in HT1 hybridization buffer (Illumina). The fluorescent signal was detected by scanning on a Typhoon long platform imager (GE Healthcare Life Sciences).

For FIG. 3, cluster inoculation and amplification was achieved by mixing denatured DNA template (human TruSeq Nano library) in ExAmp mix to a final concentration of 300pM, then pumping this into a flow cell and incubating for 1 hour at 37 ℃. As detailed in the figure, different lanes of the flow cell were then treated with different combinations of enzymes and treatments. "repair" refers to typically 3 bridge amplification cycles, which are typically performed to fill in strand ends that can be snapped off during the exonuclease step. "USERExo" refers to a combined mixture of USER and ExoI. After these steps, the clusters were hybridized to sequencing primers and sequenced on HiSeqX using standard methods and reagents (ILMN).

Example 2

It is possible to combine exonuclease and linearization reactions

Standard methods for generating clusters useful for genome sequencing include hybridizing a target nucleic acid to a cluster, amplifying the target nucleic acid, and then treating the cluster with exonuclease I to remove excess free surface primers. The immobilized strand is then linearized by creating a single nucleotide gap in a specific region of double-stranded dna (dsdna) in a separate step to generate a template for sequencing. We tested whether it was possible to combine exonuclease and linearization steps. Lanes of the flowcell with amplification sites containing attached surface primers were treated with exonuclease 1, an enzyme that generates single nucleotide gaps in specific regions of dsDNA (a combination of uracil DNA glycosylase and DNA glycosylase-lyase endonuclease VIII), or both.

FIG. 2 shows that treatment without treatment (lanes 1 and 8) or with uracil DNA glycosylase and DNA glycosylase-lyase endonuclease VIII (lane 2) did not result in loss of surface primers, whereas treatment with exonuclease I (lane 5) resulted in loss of substantially all surface primers. Treatment with uracil DNA glycosylase and DNA glycosylase-lyase endonuclease VIII, followed by treatment with exonuclease I (lane 3) and treatment with a mixture of exonuclease in combination with uracil DNA glycosylase and DNA glycosylase-lyase endonuclease VIII (lane 4) resulted in the loss of essentially all surface primers. This was unexpected and surprising because treatment of dsDNA with uracil DNA glycosylase and DNA glycosylase-lyase endonuclease VIII in combination produced 3 'phosphates, and the presence of 3' phosphates was known to inhibit exonuclease I activity (Lehman and Nussbaum,1964, j.biol.chem.,239: 2628-.

Example 3

Simultaneous exonuclease treatment and linearization to generate templates useful for sequencing

To determine whether the simultaneous use of DNA glycosylase and exonuclease has any adverse effect on sequencing metrics, sequencing runs were performed on flow cells containing clusters generated using different combinations of DNA glycosylase and exonuclease. The quality scores for each run were determined in quadruplicate and are shown in figure 3. As expected, the standard lane (no exonuclease, no repair) containing the sequence reaction with the P5 primer site had a Q30 value in the 50s middle phase (see arrow, fig. 3). All other lanes were approximately equivalent, and these reactions performed in a single step using DNA glycosylase and exonuclease yielded the best R2 reading, with a Q30 of about 93%.

The complete disclosures of all patents, patent applications, and publications, and electronically available material (including, for example, nucleotide sequence submissions in, e.g., GenBank and RefSeq, and amino acid sequence submissions in, e.g., SwissProt, PIR, PRF, PDB, and translations from annotated coding regions in GenBank and RefSeq) cited herein are incorporated by reference in their entirety. Supplementary materials (e.g., supplementary tables, supplementary figures, supplementary materials and methods and/or supplementary experimental data) cited in the publications are likewise incorporated by reference in their entirety. In the event of any inconsistency between the disclosure of the present application and the disclosure of any document incorporated herein by reference, the disclosure of the present application shall prevail. The foregoing detailed description and examples have been given for clarity of understanding only. No unnecessary limitations are to be understood therefrom. The invention is not limited to the exact details shown and described, and variations readily apparent to those skilled in the art are intended to be included within the invention defined by the claims.

Unless otherwise indicated, all numbers expressing quantities of ingredients, molecular weights, and so forth used in the specification and claims are to be understood as being modified in all instances by the term "about". Accordingly, unless indicated to the contrary, the numerical parameters set forth in the specification and claims are approximations that may vary depending upon the desired properties sought to be obtained by the present invention. At the very least, and not as an attempt to limit the doctrine of equivalents to the scope of the claims, each numerical parameter should at least be construed in light of the number of reported significant digits and by applying ordinary rounding techniques.

Notwithstanding that the numerical ranges and parameters setting forth the broad scope of the invention are approximations, the numerical values set forth in the specific examples are reported as precisely as possible. All numerical values, however, inherently contain a range of values necessarily resulting from the standard deviation found in their respective testing measurements.

Unless otherwise noted, all headings are for the convenience of the reader and should not be used to limit the meaning of the text following the heading.

33页详细技术资料下载
上一篇:一种医用注射器针头装配设备
下一篇:使用流动池进行信息存储和检索的系统和方法

网友询问留言

已有0条留言

还没有人留言评论。精彩留言会获得点赞!

精彩留言,会给你点赞!