Compositions and methods for improving sample identification in indexed nucleic acid libraries

文档序号:1525112 发布日期:2020-02-11 浏览:22次 中文

阅读说明:本技术 用于改进编索引的核酸文库中的样品鉴定的组合物和方法 (Compositions and methods for improving sample identification in indexed nucleic acid libraries ) 是由 迈克尔·切斯尼 V·P·史密斯 克莱尔·贝维斯-莫特 乔纳森·马克·鲍特尔 安吉拉·卡尔班德 于 2018-04-23 设计创作,主要内容包括:本发明涉及组合物和方法,所述组合物和方法用于通过在扩增和测序之前,通过外切核酸酶处理和任选地封闭从多个样品汇集的编索引(pooled indexed)的多核苷酸的3’末端来提高用于多重下一代测序的编索引的核酸文库制备物中的正确样品鉴定率。(The present invention relates to compositions and methods for improving correct sample identification in an indexed nucleic acid library preparation for multiplex next generation sequencing by exonuclease treatment and optionally blocking of the 3' ends of indexed (pooled indexed) polynucleotides pooled from multiple samples prior to amplification and sequencing.)

1. A composition, comprising:

a first plurality of adaptor-target-adaptor molecules comprising double stranded target fragments isolated from a first source and exonucleases,

wherein the adapters comprise first sample-specific universal adapters,

wherein the first sample-specific universal adaptor comprises

(i) A double-stranded nucleic acid region, and

(ii) a single-stranded non-complementary nucleic acid strand region comprising at least one universal primer binding site,

wherein the first sample-specific universal adaptor further comprises a first set of sample-specific tag sequences that distinguish the first more than one adaptor-target-adaptor molecule from adaptor-target-adaptor molecules derived from different sources, the first set of sample-specific tag sequences being present in the single-stranded non-complementary nucleic acid strand.

2. The composition of claim 1, further comprising a first sample-specific universal adaptor that is not attached to a target fragment.

3. The composition of claim 1, wherein said single stranded non-complementary nucleic acid strand region further comprises at least one universal extension primer binding site.

4. The composition of claim 1, wherein the exonuclease comprises a 5 'to 3' DNA exonuclease activity that prefers double stranded DNA comprising a 5 'phosphate at the 5' end of the double stranded nucleic acid region, or a 5 'to 3' DNA exonuclease activity that prefers single stranded DNA comprising a 5 'phosphate at the 5' end.

5. The composition of claim 4, wherein the exonuclease is a lambda exonuclease.

6. The composition of claim 1, wherein the exonuclease comprises a 5 'to 3' DNA exonuclease activity and a 3 'to 5' DNA exonuclease activity.

7. The composition of claim 6, wherein the adaptor-target-adaptor molecule comprises a modification at each 3 ' terminus that blocks the 3 ' to 5 ' DNA exonuclease activity.

8. The composition of claim 7, wherein the modification comprises at least one phosphorothioate linkage.

9. The composition of claim 6, wherein the adaptor-target-adaptor molecule comprises a modification at the 5 ' end of a strand that is part of a region of the single-stranded non-complementary nucleic acid strand that blocks the 5 ' to 3 ' DNA exonuclease activity.

10. The composition of claim 9, wherein the modification comprises at least one phosphorothioate linkage.

11. The composition of claim 1, wherein the exonuclease comprises 3 ' to 5 ' DNA exonuclease activity, optionally with preference for double stranded DNA with blunt ends and/or with recessed 3 ' ends.

12. The composition of claim 1, further comprising a second more than one adaptor-target-adaptor molecule comprising double stranded target fragments isolated from a second source,

wherein the adaptors comprise second sample-specific universal adaptors comprising a second set of sample-specific tag sequences that distinguish the first more than one adaptor-target-adaptor molecule from the second more than one adaptor-target-adaptor molecule.

13. The composition of claim 12, wherein the second sample-specific universal adaptor further comprises (i) a double-stranded nucleic acid region and (ii) a single-stranded non-complementary nucleic acid strand region comprising at least one universal primer binding site.

14. The composition of claim 12, wherein the 3' ends of the first more than one adaptor-target-adaptor molecules and the second more than one adaptor-target-adaptor molecules, or a combination thereof, are blocked.

15. The composition of claim 2, wherein the 3' end of the first sample-specific universal adaptor that is not attached to a target fragment is blocked.

16. The composition of claim 1, further comprising a terminal deoxynucleotidyl transferase, ddNTP, DNA polymerase, or a combination thereof.

17. A method, the method comprising:

providing a first solution of more than one double-stranded target fragment isolated from a first source;

ligating a first sample-specific universal adaptor to both ends of a double-stranded target fragment from said first source to form a first more than one adaptor-target-adaptor molecule,

wherein each of said first more than one adaptor-target-adaptor molecule comprises a target fragment flanked by said first sample-specific universal adaptor,

wherein the first sample-specific universal adaptor comprises (i) a double-stranded nucleic acid region and (ii) a single-stranded non-complementary nucleic acid strand region comprising at least one universal primer binding site,

wherein said first sample-specific universal adaptor further comprises a first set of sample-specific tag sequences that distinguish said first more than one adaptor-target-adaptor molecule from adaptor-target-adaptor molecules derived from different sources, said first set of sample-specific tag sequences being present in said single-stranded non-complementary nucleic acid strand, and

wherein the ligating covalently attaches the double-stranded nucleic acid region of the first sample-specific universal adaptor to each end of the double-stranded target fragment from the first source; and

contacting the solution with an exonuclease, wherein the exonuclease comprises 5 'to 3' DNA exonuclease activity that optionally prefers double stranded DNA,

wherein the exonuclease selectively degrades first sample-specific universal adaptors not ligated to target fragments present in the first solution.

18. The method of claim 17, wherein the single-stranded non-complementary nucleic acid strand region further comprises at least one universal extension primer binding site.

19. The method of claim 17, wherein the exonuclease comprises a 5 'to 3' DNA exonuclease activity that prefers double stranded DNA comprising a 5 'phosphate at the 5' end of the double stranded nucleic acid region.

20. The method of claim 19, wherein the exonuclease is a lambda exonuclease.

21. The method of claim 17, wherein the exonuclease comprises a 5 'to 3' DNA exonuclease activity and a 3 'to 5' DNA exonuclease activity.

22. The method of claim 21, wherein a first sample-specific universal adaptor that is not attached to a target fragment and the first more than one adaptor-target-adaptor molecule comprise a modification at each 3 ' end that blocks the 3 ' to 5 ' DNA exonuclease activity.

23. The method of claim 22, wherein the modification comprises at least one phosphorothioate linkage.

24. The method of claim 21, wherein the first sample-specific universal adaptor and the first more than one adaptor-target-adaptor molecule that are not attached to a target fragment comprise a modification at the 5 ' end of a strand that is part of a region of the single-stranded non-complementary nucleic acid strand that blocks the 5 ' to 3 ' DNA exonuclease activity.

25. The method of claim 24, wherein the modification comprises at least one phosphorothioate linkage.

26. The method of claim 17, wherein the double stranded nucleic acid region distal to the single stranded non-complementary nucleic acid strand region terminates in a blunt end structure.

27. The method of claim 26, wherein the double-stranded target fragment comprises a blunt-end structure.

28. The method of claim 17, wherein a region of double-stranded nucleic acid distal to the region of single-stranded non-complementary nucleic acid strands terminates in a 3' overhang structure.

29. The method of claim 28, wherein the 3' overhang structure comprises an overhang structure of 1 to 4 nucleotides.

30. The method of claim 28, wherein the 3' overhang structure comprises a T nucleotide overhang.

31. The method of claim 28, wherein the double stranded target fragment comprises a 3 'overhang structure complementary to a 3' overhang structure of the double stranded nucleic acid region.

32. The method of claim 17, further comprising:

providing a surface comprising more than one amplification site,

wherein the amplification sites comprise at least two populations of attached single stranded nucleic acids having free 3' ends, and

contacting said surface comprising amplification sites with said first more than one adaptor-target-adaptor molecule under conditions suitable to produce more than one amplification site, each of said more than one amplification sites comprising a clonal population of amplicons from individual adaptor-target-adaptor molecules.

33. The method of claim 32, wherein the number of the first more than one adaptor-target-adaptor molecules exceeds the number of amplification sites, wherein the first more than one adaptor-target-adaptor molecules have fluidic channels to the amplification sites, and wherein each of the amplification sites comprises a capacity for a number of adaptor-target-adaptor molecules of the first more than one adaptor-target-adaptor molecules.

34. The method of claim 32, wherein said contacting comprises simultaneously (i) transporting said first more than one adaptor-target-adaptor molecules to said amplification sites at an average transport rate, and (ii) amplifying said first more than one adaptor-target-adaptor molecules at said amplification sites at an average amplification rate, wherein said average amplification rate exceeds said average transport rate.

35. The method of claim 17, further comprising:

providing a second solution of more than one double-stranded target fragment isolated from a second source;

ligating a second sample-specific universal adaptor to both ends of the double-stranded target fragment from said second source to form a second more than one adaptor-target-adaptor molecule,

wherein each of the second more than one adaptor-target-adaptor molecules comprises a target fragment flanked by the second sample-specific universal adaptor from the second source,

wherein the second sample-specific universal adaptor comprises (i) a double-stranded nucleic acid region and (ii) a single-stranded non-complementary nucleic acid strand region comprising at least one universal primer binding site,

wherein said second sample-specific universal adaptor further comprises a second set of sample-specific tag sequences that distinguish said second more than one adaptor-target-adaptor molecule from adaptor-target-adaptor molecules derived from a different source, said second set of sample-specific tag sequences being present in said single-stranded non-complementary nucleic acid strand, and

wherein the ligating covalently attaches the double-stranded nucleic acid region of the second sample-specific universal adaptor to each end of the double-stranded target fragment from the second source; and

contacting the solution with an exonuclease, wherein the exonuclease comprises 5 'to 3' DNA exonuclease activity that optionally prefers double stranded DNA,

wherein the exonuclease selectively degrades second sample-specific universal adaptors not ligated to target fragments present in the second solution.

36. The method of claim 27, wherein said single stranded non-complementary nucleic acid strand region further comprises at least one universal extension primer binding site.

37. The method of claim 27, further comprising blocking the 3' ends of the first more than one adaptor-target-adaptor molecules and the second more than one adaptor-target-adaptor molecules.

38. The method of claim 37, wherein the blocking comprises enzymatically incorporating dideoxynucleotides to the 3 'ends of the first more than one adaptor-target-adaptor molecules and the second more than one adaptor-target-adaptor molecules and to the 3' ends of the first sample-specific universal adaptors and the second sample-specific universal adaptors that are not attached to target fragments.

39. The method of claim 35, further comprising:

providing a surface comprising more than one amplification site,

wherein the amplification sites comprise at least two populations of attached single stranded nucleic acids having free 3' ends, and

contacting said surface comprising amplification sites with said first more than one adaptor-target-adaptor molecule and said second more than one adaptor-target-adaptor molecule under conditions suitable to produce more than one amplification site, each of said more than one amplification sites comprising a clonal population of amplicons from individual adaptor-target-adaptor molecules.

40. The method of claim 37, wherein the number of the first more than one adaptor-target-adaptor molecules and the second more than one adaptor-target-adaptor molecules exceeds the number of amplification sites, wherein the first more than one adaptor-target-adaptor molecules and the second more than one adaptor-target-adaptor molecules have fluidic access to the amplification sites, and wherein each of the amplification sites comprises a capacity for a number of adaptor-target-adaptor molecules of the first more than one adaptor-target-adaptor molecules and the second more than one adaptor-target-adaptor molecules.

41. The method of claim 39, wherein said contacting comprises simultaneously (i) transporting said first more than one adaptor-target-adaptor molecules and said second more than one adaptor-target-adaptor molecules to said amplification sites at an average transport rate, and (ii) amplifying said first more than one adaptor-target-adaptor molecules and said second more than one adaptor-target-adaptor molecules at said amplification sites at an average amplification rate, wherein said average amplification rate exceeds said average transport rate.

42. A method, the method comprising:

providing a first solution of more than one double-stranded target fragment isolated from a first source;

ligating a first sample-specific universal adaptor to both ends of a double-stranded target fragment from said first source to form a first more than one adaptor-target-adaptor molecule,

wherein each of said first more than one adaptor-target-adaptor molecule comprises a target fragment flanked by said first sample-specific universal adaptor,

wherein the first sample-specific universal adaptor comprises (i) a double-stranded nucleic acid region and (ii) a single-stranded non-complementary nucleic acid strand region comprising at least one universal primer binding site,

wherein the first sample-specific universal adaptor further comprises a first set of sample-specific tag sequences that distinguish the first more than one adaptor-target-adaptor molecule from adaptor-target-adaptor molecules derived from different sources, the first set of sample-specific tag sequences being present in the single-stranded non-complementary nucleic acid strand, and

wherein the ligating covalently attaches the double-stranded nucleic acid region of the first sample-specific universal adaptor to each end of the double-stranded target fragment from the first source; and

contacting the solution with an exonuclease, wherein the exonuclease comprises 3 ' to 5 ' exonuclease activity optionally preferring DNA having flat or recessed 3 ' ends,

wherein the ligating further forms more than one incomplete product comprising adaptor-target molecules, and wherein the exonuclease selectively degrades adaptor-target molecules and first sample-specific universal adaptors not ligated to target fragments present in the first solution.

43. The method of claim 42, wherein said single stranded non-complementary nucleic acid strand region further comprises at least one universal extension primer binding site.

44. The method of claim 42, wherein the exonuclease is exonuclease III.

45. The method of claim 42, wherein the double stranded nucleic acid region distal to the single stranded non-complementary nucleic acid strand region terminates in a blunt end structure.

46. The method of claim 45, wherein the double-stranded target fragment comprises a blunt-end structure.

47. The method of claim 42, wherein the region of double stranded nucleic acid distal to the region of single stranded non-complementary nucleic acid strands terminates in a 3' overhang structure.

48. The method of claim 47, wherein the 3' overhang structure comprises an overhang structure of no more than 4 nucleotides.

49. The method of claim 47, wherein the 3' overhang structure comprises a T nucleotide overhang.

50. The method of claim 47, wherein the double stranded target fragment comprises a 3 'overhang structure complementary to a 3' overhang structure of the double stranded nucleic acid region.

51. The method of claim 42, further comprising:

providing a surface comprising more than one amplification site,

wherein the amplification sites comprise at least two populations of attached single stranded nucleic acids having free 3' ends, and

contacting the surface comprising amplification sites with a first more than one adaptor-target-adaptor molecule under conditions suitable to produce more than one amplification site, each of the more than one amplification sites comprising a clonal population of amplicons from an individual adaptor-target-adaptor molecule.

52. The method of claim 51, wherein the number of the first more than one adaptor-target-adaptor molecules exceeds the number of amplification sites, wherein the first more than one adaptor-target-adaptor molecules have fluidic access to the amplification sites, and wherein each of the amplification sites comprises a capacity for a number of adaptor-target-adaptor molecules of the first more than one adaptor-target-adaptor molecules.

53. The method of claim 51, wherein said contacting comprises simultaneously (i) transporting said first more than one adaptor-target-adaptor molecules to said amplification sites at an average transport rate, and (ii) amplifying said first more than one adaptor-target-adaptor molecules at said amplification sites at an average amplification rate, wherein said average amplification rate exceeds said average transport rate.

54. The method of claim 42, further comprising:

providing a second solution of more than one double-stranded target fragment isolated from a second source;

ligating a second sample-specific universal adaptor to both ends of the double-stranded target fragment from said second source to form a second more than one adaptor-target-adaptor molecule,

wherein each of the second more than one adaptor-target-adaptor molecules comprises a target fragment flanked by the second sample-specific universal adaptor from the second source,

wherein the second sample-specific universal adaptor comprises (i) a double-stranded nucleic acid region and (ii) a single-stranded non-complementary nucleic acid strand region comprising at least one universal primer binding site,

wherein said second sample-specific universal adaptor further comprises a second set of sample-specific tag sequences that distinguish said second more than one adaptor-target-adaptor molecule from adaptor-target-adaptor molecules derived from a different source, said second set of sample-specific tag sequences being present in said single-stranded non-complementary nucleic acid strand, and

wherein the ligating covalently attaches the double-stranded nucleic acid region of the second sample-specific universal adaptor to each end of the double-stranded target fragment from the second source; and

contacting the solution with an exonuclease, wherein the exonuclease comprises 3 ' to 5 ' exonuclease activity optionally preferring DNA having flat or recessed 3 ' ends,

wherein the ligation also forms more than one incomplete product, wherein the incomplete products comprise adaptor-target molecules, and wherein the exonuclease selectively degrades adaptor-target molecules and second sample-specific universal adaptors not ligated to target fragments present in the second solution.

55. The method of claim 54, wherein said single stranded non-complementary nucleic acid strand region further comprises at least one universal extension primer binding site.

56. The method of claim 54, further comprising blocking the 3' ends of said first more than one adaptor-target-adaptor molecules and said second more than one adaptor-target-adaptor molecules.

57. The method of claim 56, wherein said blocking comprises enzymatically incorporating dideoxynucleotides to the 3 'ends of said first and second more than one adaptor-target-adaptor molecules and to the 3' ends of said first and second sample-specific universal adaptors that are not attached to a target fragment.

58. The method of claim 54, further comprising:

providing a surface comprising more than one amplification site,

wherein the amplification sites comprise at least two populations of attached single stranded nucleic acids having free 3' ends, and

contacting said surface comprising amplification sites with said first more than one adaptor-target-adaptor molecule and said second more than one adaptor-target-adaptor molecule under conditions suitable to produce more than one amplification site, each of said more than one amplification sites comprising a clonal population of amplicons from individual adaptor-target-adaptor molecules.

59. The method of claim 58, wherein the number of the first more than one adaptor-target-adaptor molecules and the second more than one adaptor-target-adaptor molecules exceeds the number of amplification sites, wherein the first more than one adaptor-target-adaptor molecules and the second more than one adaptor-target-adaptor molecules have fluidic access to the amplification sites, and wherein each of the amplification sites comprises a capacity for a number of adaptor-target-adaptor molecules in the first more than one adaptor-target-adaptor molecules and the second more than one adaptor-target-adaptor molecules.

60. The method of claim 58, wherein said contacting comprises simultaneously (i) transporting said first more than one adaptor-target-adaptor molecules and said second more than one adaptor-target-adaptor molecules to said amplification sites at an average transport rate, and (ii) amplifying said first more than one adaptor-target-adaptor molecules and said second more than one adaptor-target-adaptor molecules at said amplification sites at an average amplification rate, wherein said average amplification rate exceeds said average transport rate.

Brief Description of Drawings

The following detailed description of specific embodiments of the present disclosure can be best understood when read in conjunction with the following drawings.

Fig. 1, 2,3, and 4 are schematic illustrations of various embodiments of adapters according to aspects of the disclosure presented herein.

Fig. 5,6, 7, and 8 are schematic diagrams of embodiments of template polynucleotides having adaptor-target-adaptor molecules (which may include adaptors or portions thereof substantially as shown in fig. 1, 2,3, or 4, respectively) according to aspects of the disclosure presented herein.

Fig. 9A and 9B illustrate the nature of the index jump phenomenon. FIG. 9A shows how reads from a given sample are incorrectly de-multiplexed and mixed with different samples after de-multiplexing. FIG. 9B demonstrates index hopping in a dual index system, where it results in unexpected combinations of index tag sequences.

Fig. 10A and 10B illustrate a general method of measuring index hopping rate in a given system. Fig. 10A shows an exemplary layout of a dual adapter daughter plate, where each individual well of a 96-well plate contains a unique pair of index tag sequences. Fig. 10B shows an experimental setup aimed at measuring index jump rate, where only unique double index tag combinations are used.

FIGS. 11A and 11B illustrate the effect of unligated adapters on index hopping rates. Figure 11A shows a 6-fold increase in index jump associated with 50% incorporation of free adaptors. FIG. 11B shows the approximately linear effect of free forked adaptors on index jump rate over the range tested.

FIGS. 12A and 12B illustrate a process for preparing a peptide in Illumina

Figure BDA0002331392460000121

Influence of exonuclease treatment according to the invention on index jump rate with (FIG. 12B) and without (FIG. 12A) 3' blocking in PCR-free library preparation workflow.

FIG. 1 shows a schematic view of a3 is shown in Illumina

Figure BDA0002331392460000122

Effect of combined exonuclease treatment and 3' blocking treatment according to the invention on index hopping rates with and without free adaptor incorporation in PCR-free library preparation workflows.

The schematic drawings are not necessarily to scale. Like numbers used in the figures refer to like components, steps, etc. It will be understood, however, that the use of a number to refer to a component in a given figure is not intended to limit the component to the same number in another figure. Moreover, the use of different numbers to refer to multiple components is not intended to indicate that the differently numbered components cannot be the same or similar to other numbered components.

Detailed Description

Provided herein are compositions and methods, e.g., methods for preparing libraries, methods for mitigating the impact of index hopping on sequencing data quality.

Double stranded target fragment

In one embodiment, the composition comprises more than one double-stranded target fragment. The terms "target fragment," "target nucleic acid fragment," "target molecule," "target nucleic acid molecule," and "target nucleic acid" are used interchangeably to refer to a nucleic acid molecule, such as on an array, for which sequencing is desired. The target nucleic acid can be essentially any nucleic acid of known or unknown sequence. For example, the target nucleic acid can be a fragment of genomic DNA or cDNA. Sequencing may result in the determination of the sequence of all or part of the target molecule. The target may be derived from an initial nucleic acid sample that has been randomly fragmented. In one embodiment, the target may be processed into a template suitable for amplification by placing a universal amplification sequence, such as a sequence present in a universal adaptor, at the end of each target fragment. Targets can also be obtained from initial RNA samples by reverse transcription into cDNA.

The initial nucleic acid sample may be derived from a sample in a double stranded DNA (dsDNA) form (e.g., genomic DNA fragments, PCR and amplification products, etc.) or may originate from a single stranded form, such as DNA or RNA, originating from the sample and converted to a dsDNA form. For example, mRNA molecules can be copied into double-stranded cDNA suitable for use in the methods described herein using standard techniques well known in the art. The exact sequence of the polynucleotide molecules from the initial nucleic acid sample is generally not important to the invention and may be known or unknown.

In one embodiment, the polynucleotide molecules from the initial nucleic acid sample are RNA molecules. In one aspect of this embodiment, RNA isolated from a particular sample is first converted to double-stranded DNA using techniques known in the art. In accordance with the present disclosure, the double stranded DNA is then tagged or indexed with a sample-specific tag, whether it is isolated in RNA or DNA form. Typically, the sample-specific tag is present as part of a universal adaptor. Such different preparations of double stranded DNA comprising sample specific tags may be generated in parallel from RNA isolated from different specific samples. Subsequently, different preparations of double stranded DNA comprising different sample-specific tags can be mixed, sequenced en masse, and the identity of each sequenced target fragment determined relative to the other isolated/derived samples by virtue of the presence of the sample-specific tags. Under certain conditions, index jumps result in sample-specific tags that label different sources being mixed or combined such that a single target fragment, for example, has a sample-specific tag identifying one source at one end and a sample-specific tag identifying a different source at the other end. This can lead to sample cross-contamination, which can confound the results of the sequencing experiment. The approach described herein reduces index jumps.

In one embodiment, the initial polynucleotide molecules from the initial nucleic acid sample are DNA molecules. More particularly, the initial polynucleotide molecule represents the entire genetic complement of an organism and is a genomic DNA molecule that includes both intron and exon sequences, as well as non-coding regulatory sequences such as promoter and enhancer sequences. In one embodiment, a specific subset of polynucleotide sequences or genomic DNA may be used, such as, for example, a specific chromosome. Still more particularly, the sequence of the original polynucleotide molecule is unknown. Still yet more particularly, the initial polynucleotide molecule is a human genomic DNA molecule. The DNA target fragments may be chemically or enzymatically treated before or after any random fragmentation process, and before or after ligation of universal adaptor sequences.

As defined herein, "sample" and derivatives thereof are used in its broadest sense and include any sample, culture, etc. suspected of containing a target. In some embodiments, the sample comprises nucleic acid in DNA, RNA, PNA, LNA, chimeric or hybrid form. The sample may comprise any biological, clinical, surgical, agricultural, atmospheric or aquatic based specimen containing one or more nucleic acids. The term also includes any isolated nucleic acid sample, such as genomic DNA, freshly frozen or formalin fixed paraffin-embedded nucleic acid samples. It is also contemplated that the sample may be a nucleic acid sample from a single individual, a collection of nucleic acid samples from genetically related members, a nucleic acid sample from a genetically unrelated member, a nucleic acid sample (matched) from a single individual such as a tumor sample and a normal tissue sample, or a sample from a single source containing two different forms of genetic material such as maternal and fetal DNA obtained from a maternal subject, or a sample containing plant or animal DNA for which contamination of bacterial DNA is present. In some embodiments, the source of nucleic acid material may include nucleic acid obtained from a neonate, such as is commonly used for neonatal screening.

The nucleic acid sample may comprise high molecular weight material such as genomic dna (gdna). The sample may comprise low molecular weight substances such as nucleic acid molecules obtained from FFPE or archived (archived) DNA samples. In another embodiment, the low molecular weight substance comprises enzymatically or mechanically fragmented DNA. The sample may comprise cell-free circulating DNA. In some embodiments, the sample may include nucleic acid molecules obtained from biopsies, tumors, scrapings (scrapings), swabs, blood, mucus, urine, plasma, semen, hair, laser capture microdissection, surgical resection, and other clinically or laboratory obtained samples. In some embodiments, the sample may be an epidemiological, agricultural, forensic, or pathogenic sample. In some embodiments, the sample may comprise nucleic acid molecules obtained from an animal, such as a human or mammalian source. In another embodiment, the sample may comprise nucleic acid molecules obtained from a non-mammalian source such as a plant, bacteria, virus, or fungus. In some embodiments, the source of the nucleic acid molecule may be an archived or extinct sample or species.

Furthermore, the methods and compositions disclosed herein can be used to amplify nucleic acid samples with low quality nucleic acid molecules, such as degraded and/or fragmented genomic DNA from forensic samples. In one embodiment, a forensic sample may comprise nucleic acid obtained from a crime scene, nucleic acid obtained from a missing personnel DNA database, nucleic acid obtained from a laboratory associated with a forensic investigation, or comprise a forensic sample obtained by a law enforcement agency, one or more military departments, or any such personnel department. The nucleic acid sample may be a purified sample or crude DNA comprising a lysate, e.g. from a buccal swab, paper, fabric or other matrix that may be impregnated with saliva, blood or other body fluids. Thus, in some embodiments, a nucleic acid sample may comprise a small or fragmented portion of DNA, such as genomic DNA. In some embodiments, the target sequence may be present in one or more bodily fluids including, but not limited to, blood, sputum, plasma, semen, urine, and serum. In some embodiments, the target sequence may be obtained from hair, skin, tissue samples, necropsy, or remains of the victim. In some embodiments, the nucleic acid comprising one or more target sequences can be obtained from a deceased animal or human. In some embodiments, the target sequence may include nucleic acids obtained from non-human DNA such as microbial DNA, plant DNA, or entomology DNA. In some embodiments, the target sequence or amplified target sequence is for human identification purposes. In some embodiments, the present disclosure relates generally to methods for identifying characteristics of a forensic sample. In some embodiments, the present disclosure generally relates to human identification methods using one or more target-specific primers disclosed herein or one or more target-specific primers designed using the primer design criteria outlined herein. In one embodiment, a forensic or human identification sample comprising at least one target sequence may be amplified using any one or more of the target-specific primers disclosed herein or using the primer standards outlined herein.

Additional non-limiting examples of sources of biological samples may include whole organisms as well as samples obtained from patients. Biological samples can be obtained from any biological fluid or tissue and can be in a variety of forms, including liquid fluids and tissues, solid tissues, and preserved forms, such as dried, frozen, and fixed forms. The sample may be any biological tissue, cell or fluid. Such samples include, but are not limited to, sputum, blood, serum, plasma, blood cells (e.g., white blood cells), ascites, urine, saliva, tears, sputum, vaginal fluid (drainage), washes obtained during medical procedures (e.g., pelvic or other washes obtained during biopsy, endoscopy or surgery), tissue, nipple aspirates, core or fine needle biopsy samples, cell-containing body fluids, free floating nucleic acids, peritoneal and thoracic fluids, or cells therefrom. Biological samples may also include sections of tissue, such as frozen or fixed sections taken for histological purposes, or microdissected cells or extracellular portions thereof. In some embodiments, the sample may be a blood sample, such as, for example, a whole blood sample. In another example, the sample is an untreated Dried Blood Spot (DBS) sample. In yet another example, the sample is a Formalin Fixed Paraffin Embedded (FFPE) sample. In yet another example, the sample is a saliva sample. In yet another example, the sample is a Dried Saliva Spot (DSS) sample.

Random fragmentation refers to the fragmentation of polynucleotide molecules from an initial nucleic acid sample in a disordered manner by enzymatic, chemical or mechanical means. Such fragmentation methods are known in the art and standard methods are used (Sambrook and Russell, Molecular Cloning, a Laboratory Manual, third edition). In one embodiment, fragmentation is by a method disclosed in Gunderson et al (WO 2016/130704). For clarity, amplifying smaller fragments of a larger piece (lager piece of) nucleic acid by specific PCR to generate such smaller fragments is not equivalent to fragmenting the larger piece of nucleic acid, because the larger piece of nucleic acid remains intact (i.e., is not fragmented by PCR amplification). In addition, random fragmentation is designed to not consider fragments that contain breaks (breaks) and/or the sequence identity or position of nucleotides in the vicinity of the break. More particularly, random fragmentation is by mechanical means such as nebulization or sonication to produce fragments of about 50 base pairs in length to about 1500 base pairs in length, still more particularly 50-700 base pairs in length, and yet more particularly 50-400 base pairs in length. Most particularly, the method is used to generate smaller fragments from 50 to 150 base pairs in length.

Fragmentation of polynucleotide molecules by mechanical means (e.g., nebulization, sonication, and Hydroshear) yields fragments with heterogeneous mixtures of blunt ends and 3 '-and 5' -protruding ends. It is therefore desirable to repair the ends of the fragments using methods or kits known in the art, such as the Lucigen DNA terminator end repair kit, to produce ends that are optimal for insertion into, for example, a cloning vector's blunt-site. In a particular embodiment, the ends of the fragments of the population of nucleic acids are blunt-ended. More particularly, the fragment ends are blunt-ended and phosphorylated. The phosphate moiety can be introduced by enzymatic treatment, for example using a polynucleotide kinase.

In a particular embodiment, the target fragment sequence is prepared with a single protruding nucleotide by, for example, the activity of certain types of DNA polymerases such as Taq polymerase or klenow minus polymerase (with template-independent end transferase activity, adding a single deoxynucleotide, e.g., deoxyadenosine (a), to the 3' end of a DNA molecule, e.g., a PCR product). Such enzymes can be used to add a single nucleotide "a" to the 3' end of the blunt end of each strand of a double-stranded target fragment. Thus, by reacting the double stranded target fragment with Taq polymerase or Klenow exo minus polymerase, "a" may be added to the 3 'end of each end-repaired strand, while the universal adaptor polynucleotide construct may be a T-construct with compatibility, the "T" overhang being present on the 3' end of each region of the double stranded nucleic acid of the universal adaptor. This end modification also prevents self-ligation of both the vector and the target, so that there is a preference for adaptor-target-adaptor molecules that form a combined ligation.

Universal adaptor

The method comprises attaching a universal adaptor to each end of a double stranded target fragment isolated from a source to obtain an adaptor-target-adaptor molecule. Attachment can be by standard library preparation techniques using ligation, or by tagging fragmentation (tagging) using transposase complexes (Gunderson et al, WO 2016/130704).

In one embodiment, the double stranded target fragments of each particular fragmented sample are processed by first ligating the same universal adaptor molecule ("mismatched adaptor", the general characteristics of which are defined below and further described in co-pending application, Gormley et al, US7,741,463 and Bignell et al, US8,053,192) to the 5 'and 3' ends of the double stranded target fragments (which may be of known, partially known or unknown sequence) to form adaptor-target-adaptor molecules. In one embodiment, the universal adaptors include all sequences necessary for immobilization of adaptor-target-adaptor molecules on the array for subsequent sequencing. In another embodiment, a PCR step is used to further modify the universal adaptors present in each adaptor-target-adaptor molecule prior to immobilization and sequencing. For example, an initial primer extension reaction is performed using universal primer binding sites, where extension products complementary to both strands of each individual adaptor-target-adaptor molecule are formed and universal extension primer sites are added. The resulting primer extension products and optionally amplified copies thereof together provide a library of template polynucleotides which can be immobilized and then sequenced. The terms universal primer binding site and universal extension primer site are described in detail herein. The term library refers to a collection of target fragments comprising known consensus sequences at the 3 'end and the 5' end of the target fragment, and may also be referred to as 3 'and 5' modified libraries.

The universal adaptor polynucleotides used in the methods of the present disclosure are referred to herein as "mismatched" adaptors because, as will be explained in detail herein, adaptors include regions of sequence mismatch, i.e., they are not formed by annealing of perfectly complementary polynucleotide strands.

Mismatched adaptors for use herein are formed from the annealing of two partially complementary polynucleotide strands to provide at least one double-stranded region, also referred to as a double-stranded nucleic acid region, and at least one mismatched single-stranded region, also referred to as a single-stranded non-complementary nucleic acid strand region, when the two strands are annealed.

The "double-stranded region" of the universal adaptor is a short double-stranded region, typically comprising 5 or more contiguous base pairs, formed by annealing of two partially complementary polynucleotide strands. The term refers to the double-stranded region of a nucleic acid in which the two strands anneal and does not imply any particular structural conformation. As used herein, the term "double-stranded," when used in reference to a nucleic acid molecule, means that substantially all of the nucleotides in the nucleic acid molecule are hydrogen-bonded to a complementary nucleotide. A partially double-stranded nucleic acid can have at least 10%, 25%, 50%, 60%, 70%, 80%, 90%, or 95% of its nucleotides hydrogen bonded to a complementary nucleotide.

Generally, it is beneficial that the double stranded region is as short as possible without loss of function. In this context, "functional" refers to the ability of a double-stranded region to form a stable duplex under standard reaction conditions for enzyme-catalyzed nucleic acid ligation reactions, which will be well known to the skilled reader (e.g., incubation at a temperature in the range of 4 ℃ to 25 ℃) such that both strands forming the universal adaptor remain partially annealed during ligation of the universal adaptor to the target molecule. It is not absolutely necessary that the double-stranded region is stable under the conditions normally used in primer extension or the annealing step of the PCR reaction.

Because the same universal adaptor is ligated to both ends of each target molecule, the target sequence in each adaptor-target-adaptor molecule will be flanked by complementary sequences derived from the double stranded region of the universal adaptor. In an adaptor-target-adaptor construct, the longer the double stranded region and thus the longer the complementary sequence derived therefrom, the greater the likelihood that the adaptor-target-adaptor construct will be able to fold back (fold back) in those regions with internal self-complementarity and base pair with itself under the annealing conditions used in primer extension and/or PCR. Therefore, it is generally preferred that the double-stranded region be 20 or less, 15 or less, or 10 or less base pairs in length to reduce this effect. The stability of the double-stranded region may be increased by including an unnatural nucleotide that exhibits stronger base pairing than a standard Watson-Crick base pair, and thus its length may be reduced.

In one embodiment, the two strands of the universal adaptor are 100% complementary in the double-stranded region. It will be understood that one or more nucleotide mismatches may be tolerated within the duplex region, as long as the two strands are capable of forming a stable duplex under standard ligation conditions.

A universal adaptor for use herein will typically comprise a double stranded region that forms the "ligatable" end of the adaptor (i.e. the end that is ligated to a double stranded target fragment in a ligation reaction). The ligatable ends of the universal adaptors may be flat, or in other embodiments, a short 5 'overhang or 3' overhang of one or more nucleotides may be present to aid in (factitate)/facilitate (promate) ligation. The 5 'terminal nucleotide at the ligatable end of the universal adaptor is typically phosphorylated to enable ligation of the phosphodiester to the 3' hydroxyl group on the target polynucleotide.

The term "unmatched region" refers to a region of the universal adaptor, i.e., a region of a single-stranded non-complementary nucleic acid strand, wherein the sequence of the two polynucleotide strands forming the universal adaptor exhibit a degree of non-complementarity such that the two strands cannot fully anneal to each other under standard annealing conditions for primer extension or PCR reactions. One or more regions that do not match may exhibit some degree of annealing under standard reaction conditions for enzyme-catalyzed ligation reactions, so long as both strands revert to single-stranded form under annealing conditions in the amplification reaction.

The region of the single stranded non-complementary nucleic acid strand comprises at least one universal primer binding site. The universal primer binding site is a universal sequence that can be used to amplify and/or sequence a target fragment ligated to the universal adaptor.

The region of the single-stranded non-complementary nucleic acid strand further comprises at least one sample-specific tag. The methods of the invention use sample-specific tags as characteristic markers of the source of a particular target fragment on an array. Typically, the sample-specific tag is a synthetic sequence of nucleotides and the sample-specific tag is part of a universal adaptor that is added to the target fragment as part of a template or library preparation step. Thus, a sample-specific tag is a nucleic acid sequence tag attached to each target molecule of a particular sample, the presence of which is indicative of or used to identify the sample or source from which the target molecule was isolated.

Preferably, the sample-specific tag may be up to 20 nucleotides in length, more preferably 1-10 nucleotides in length, and most preferably 4-6 nucleotides in length. The tetranucleotide tag gives the possibility to multiplex 256 samples on the same array, and the hexabase tag enables the processing of 4096 samples on the same array.

The region of the single stranded non-complementary nucleic acid strand further comprises at least one universal extension primer binding site. The universal extension primer binding site can be used to capture a plurality of different nucleic acids, such as a plurality of different adaptor-target-adaptor molecules, using a population of universal capture nucleic acids that are complementary to the universal extension primer binding site. In one embodiment, the universal extension primer binding site is part of the universal adaptor when it is ligated to the double stranded target fragment, and in another embodiment, the universal extension primer binding site is added to the universal adaptor after the universal adaptor is ligated to the double stranded target fragment. Addition can be accomplished using conventional methods, including PCR-based methods.

It will be appreciated that the "regions of mismatch" are provided by different portions of the same two polynucleotide strands forming the double stranded region. Mismatches in an adaptor construct may take the form of one strand being longer than the other, such that there is a single stranded region on one strand, or there is a sequence on both strands that is selected such that the two strands do not hybridize and thus form a single stranded region. Mismatches may also take the form of "bubbles (bunbles)" in which the two ends of one or more universal adaptor constructs are able to hybridize to each other and form a duplex, but the central regions are not able to hybridize to each other and form a duplex. Portions of one or more strands that form a region of mismatch do not anneal under conditions in which other portions of the same two strands anneal to form one or more double-stranded regions. For the avoidance of doubt, it will be understood that the single stranded overhang or single base overhang at the 3' end of the polynucleotide duplex which subsequently undergoes ligation with the target sequence does not constitute a "region of mismatch" within the context of the present disclosure.

The lower limit on the length of the region that does not match will generally be determined by function, e.g., the need to provide appropriate sequences for i) primer binding for primer extension, PCR and/or sequencing (e.g., binding of a primer to a universal primer binding site), or for ii) universal capture nucleic acid binding for immobilization of an adaptor-target-adaptor to a surface (e.g., binding of a universal capture nucleic acid to a universal extension primer binding site). Theoretically, there is no upper limit on the length of the regions that do not match, except that it is generally beneficial to minimize the total length of the universal adaptors, for example, to facilitate separation of unbound universal adaptors from the adaptor-target-adaptor constructs after the ligation step. Thus, it is generally preferred that the length of the region of mismatch should be less than 50, or less than 40, or less than 30, or less than 25 contiguous nucleotides.

The precise nucleotide sequence of the universal adaptor is generally not important to the present invention and may be selected by the user such that the desired sequence elements are ultimately included in the consensus sequence of the library of templates derived from the universal adaptor, for example to provide binding sites for a particular set of universal amplification primers and/or sequencing primers and/or universal capture nucleic acids. Additional sequence elements may be included, for example, to provide binding sites for sequencing primers that will ultimately be used, for example, in the sequencing of template molecules in libraries on solid supports or products derived from amplification of template libraries.

While the precise nucleotide sequence of the universal adaptor is generally not limited to the present disclosure, the sequence of the individual strands in the unmatched regions should be such that the individual strands do not exhibit any internal self-complementarity that can lead to self-annealing, formation of hairpin structures, and the like under standard annealing conditions. Self-annealing of one strand in the mismatched region is avoided because it may prevent or reduce specific binding of the amplification primer to that strand.

Mismatched adaptors are preferably formed from two strands of DNA, but may include a mixture of natural and non-natural nucleotides (e.g., one or more ribonucleotides) joined by a mixture of phosphodiester backbone linkages and non-phosphodiester backbone linkages. Other non-nucleotide modifications may be included, such as, for example, biotin moieties, blocking groups, and capture moieties for attachment to a solid surface, as discussed in further detail below.

The universal adaptor may comprise exonuclease-resistant modifications such as phosphorothioate linkages. Such modifications reduce the number of adaptor dimers present in the library, as two adaptors cannot undergo ligation without removing their non-complementary overhangs. In one embodiment, the adapter may be treated with an exonuclease prior to the ligation reaction with the target to ensure that protruding ends of the strands cannot be removed during the ligation process. Treating the adapters in this manner reduces adaptor dimer formation during the ligation step.

Ligation and amplification

The methods of attachment are known in the art and standard methods are used. Such methods use a ligase, such as a DNA ligase, to effect or catalyse the joining of the universal adaptor and the ends of the two polynucleotide strands of the double stranded target fragment in this case, such that a covalent link is formed. The universal adaptor may comprise a 5 '-phosphate moiety to facilitate ligation to a 3' -OH present on the target fragment. The double stranded target fragment comprises a 5 '-phosphate moiety remaining from the cleavage process or added using an enzymatic treatment step, and has been end-repaired and optionally extended by one or more overhanging bases to give a 3' -OH suitable for ligation. Herein, linking means covalent linking of polynucleotide chains that have not been previously covalently linked. In a particular aspect of the disclosure, such linking occurs by forming a phosphodiester linkage between two polynucleotide strands, but other means of covalent linking (e.g., non-phosphodiester backbone linkages) may be used.

As discussed herein, in one embodiment, the universal adaptor used in ligation is intact and includes a universal primer binding site, a sample-specific tag sequence, and a universal extension primer binding site. The resulting more than one adaptor-target-adaptor molecule can be used to prepare a fixed sample for sequencing.

As also discussed herein, in one embodiment, the universal adaptors used in ligation include a universal primer binding site and a sample-specific tag sequence, and do not include a universal extension primer binding site. The resulting more than one adaptor-target-adaptor molecule may be further modified to include specific sequences, such as universal extension primer binding sites. Methods for adding specific sequences such as a universal extension primer binding site to a universal primer ligated to a double-stranded target fragment include PCR-based methods and are known in the art and described in, for example, Bignell et al (US 8,053,192) and Gunderson et al (WO 2016/130704).

In those embodiments in which the universal adaptor is modified, an amplification reaction is prepared. The contents of the amplification reaction are known to those skilled in the art and include appropriate substrates (such as dntps), enzymes (e.g. DNA polymerase) and buffer components required for the amplification reaction. Typically, an amplification reaction requires at least two amplification primers, typically denoted as a "forward" primer and a "reverse" primer (primer oligonucleotide), which are capable of specifically annealing to a portion of the polynucleotide sequence to be amplified under the conditions encountered in the primer annealing step of each cycle of the amplification reaction. In certain embodiments, the forward primer and the reverse primer may be the same. Thus, a primer oligonucleotide must include an "adaptor-target specific portion" which is a sequence of nucleotides that can anneal to a portion (i.e., a primer binding sequence) of the polynucleotide molecule to be amplified (or its complement if the template is considered single-stranded) during the annealing step.

According to one embodiment of the invention, the amplification primers may be universal to all samples, or one of the forward or reverse primers may carry a tag sequence encoding the origin of the sample. The amplification primers can hybridize across the tag region of the ligated adaptor, in which case a unique primer would be required for each sample nucleic acid. The amplification reaction may be performed with more than two amplification primers. To prevent amplification of ligated adaptor-adaptor dimers, the amplification primers can be modified to contain nucleotides that span the entire ligated adaptor and hybridize into the ligated template (or dntps attached to its 3' end). The first amplification primer may be modified and treated to help prevent exonuclease digestion of the strand, and thus it may be beneficial to have a first amplification primer that is universal and can amplify all samples, rather than modifying and treating each labeled primer individually. The tagged primer can be introduced into the amplification reaction as a sample-specific third primer, but does not require special modification and treatment to reduce exonuclease digestion. In the case of this embodiment, the third amplification primer carrying the tag needs to comprise a sequence identical to at least a portion of the first amplification primer so that it can be used to amplify the duplex resulting from the extension of the first amplification primer.

In the context of the present invention, the term "polynucleotide molecule to be amplified" refers to the original or starting adaptor-target-adaptor molecule added to the amplification reaction. The "adaptor-target specific portion" in the forward and reverse amplification primers refers to the sequence that is capable of annealing to the original or initial adaptor-target-adaptor present at the beginning of the amplification reaction, and reference to the length of the "adaptor-target specific portion" relates to the length of the sequence in the primer that anneals to the initial adaptor-target. It will be appreciated that if the primer comprises any nucleotide sequence that does not anneal to the starting adaptor-target in the first amplification cycle, then that sequence may be copied into the amplification product (assuming the primer does not comprise a portion that prevents a polymerase read-through). Thus, the amplified template strands produced in the first and subsequent amplification cycles may be longer than the starting adaptor-target strands.

Because mismatched adaptors can be of different lengths, the length of adaptor sequences added to the 3 'end and 5' end of each strand can be different. The amplification primers may also be of different lengths from each other, and may hybridize to adaptors of different lengths, and thus may control the length of addition to the end of each strand. In the case of nested PCR, the three or more amplification primers can be designed to be longer than the primers used to amplify the previous amplicon, so the length of the added nucleotides is fully controllable and, if desired, can be hundreds of base pairs. In one embodiment, the first amplification primer adds 13 bases to the ligated adaptor and the third amplification primer adds another 27 bases such that one end of the amplicon is 40 bases longer than the short arm of the adaptor-target construct. The short arm of the adapter is 20 bases in length, meaning that the template prepared comprises the genomic region plus 60 bases added at the end. The second amplification primer is 25 bases longer than the long arm of the adapter, which is 32 bases long plus another T hybridized across the DATP nucleoside added to the genomic sample. Thus, the template prepared comprised the genomic fragment plus the added DATP plus 57 known bases. Thus, in general, one strand of each template duplex comprises from the 5' end: 60 known bases, T, genomic fragment, A, 57 known bases-3' end. This strand is fully complementary to the following sequence: 5 '-57 known bases, T, genomic fragment, A, 60 known bases-3' end. Lengths 57 and 6 are arbitrary and are shown for illustrative purposes and should not be considered limiting. The length of the added sequence may be 20-100 bases or more, depending on the desired experimental design.

The forward and reverse primers may be of sufficient length to hybridize to the entire adapter sequence and at least one base of the target sequence (or the nucleotide DNTP added as a 3' -overhang on the target strand). The forward and reverse primers may also comprise regions that extend beyond the adaptor construct, and thus the amplification primers may be at least 20-100 bases in length. The forward and reverse primers may have significantly different lengths; for example, one may be 20-40 bases in length, while another may be 40-100 bases in length. The nucleotide sequences of the adaptor-target specific portions of the forward and reverse primers are selected to achieve specific hybridization with the adaptor-target sequence to be amplified under the conditions of the annealing step of the amplification reaction, while minimizing non-specific hybridization with any other target sequence present.

The skilled artisan will appreciate that it is not strictly necessary that the adaptor-target specific portions be 100% complementary and that satisfactory levels of specific annealing can be achieved with less than fully complementary sequences. In particular, one or two mismatches in the adaptor-target specific portion are generally tolerated without adversely affecting the specificity of the template. Thus, the term "adaptor-target specific portion" should not be understood as requiring 100% complementarity to the adaptor-target. However, the requirement that the primers do not non-specifically anneal to adaptor-target regions other than their respective primer binding sequences must be met.

Amplification primers are typically single-stranded polynucleotide structures. They may also comprise a mixture of natural and non-natural bases and also natural backbone linkages and non-natural backbone linkages, as long as any non-natural modification does not preclude the function as a primer, defined as the ability to anneal to a template polynucleotide strand during the conditions of the amplification reaction and serve as an initiation point for the synthesis of a new polynucleotide strand that is complementary to the template strand.

The primer may additionally comprise non-nucleotide chemical modifications such as phosphorothioates to increase exonuclease resistance, again so long as the modification does not prevent primer function. The modification may, for example, facilitate attachment of the primer to a solid support, such as a biotin moiety. Certain modifications may themselves improve the function of the molecule as a primer, or may provide some other useful function, such as providing a site for cleavage to enable cleavage of the primer (or extended polynucleotide strand derived therefrom) from the solid support.

In one embodiment, where the tag is attached to an adaptor, amplification may be performed on pooled or non-pooled samples. In one embodiment where universal adaptors are used, the tags are part of the amplification primers and, thus, each sample is amplified independently prior to pooling. The pooled nucleic acid samples can then be processed for sequencing.

Removal of undesired molecules

The combined ligated polynucleotide sequences (adaptor-target-adaptor molecules), unligated universal adaptor polynucleotide constructs, and/or incomplete products are exposed to conditions that reduce or eliminate the amount of undesired molecules, such as unligated universal adaptor polynucleotide constructs and/or incomplete products, to undetectable levels. The method for reducing undesired molecules can be performed on each library individually or on pooled samples. In one embodiment, gel purification or Solid Phase Reverse Immobilization (SPRI) methods may be used. Gel purification and SPRI methods for isolating unligated DNA molecules, such as the unligated universal adaptor polynucleotide constructs described herein, are known and conventional to those skilled in the art and can be readily applied by those skilled in the art to remove incomplete products.

In a preferred embodiment, undesired molecules such as unligated universal adaptor polynucleotide constructs are removed by exonucleases. In one embodiment, the exonuclease useful herein has 5 'to 3' DNA exonuclease activity, and optionally, the exonuclease prefers double-stranded DNA. In one embodiment, the exonuclease specifically targets the 5 ' end of the double stranded DNA, wherein the 5 ' end has a 5 ' phosphate. In another embodiment, the exonuclease specifically targets the 5 ' end of the double stranded DNA, wherein the 5 ' end is free of 5 ' phosphates. Without intending to be limiting, the use of an exonuclease having 5 ' to 3 ' DNA exonuclease activity useful herein is designed to remove at least one strand of the unligated universal adaptor by digestion at the 5 ' end of the double stranded region of the universal adaptor.

In one embodiment, an exonuclease useful herein has 5 'to 3' DNA exonuclease activity that favors double stranded DNA having a 5 'phosphate at the 5' end of the double stranded nucleic acid region of the universal adaptor. Examples of 5 'to 3' exonucleases that favor dsDNA with a 5 'phosphate at the 5' end of the double-stranded nucleic acid region include, but are not limited to, lambda exonucleases (New England Biolabs). The presence of a 5 ' phosphate at the 5 ' end of the double stranded region causes an exonuclease, such as a lambda exonuclease, to favor the 5 ' end of the double stranded region of the unligated universal adaptor. In one embodiment, the 5 'end of a portion of a strand that is a region of a single-stranded non-complementary nucleic acid strand does not include a 5' phosphate. In one embodiment, the 5' end of the strand that is part of the single stranded region is modified to reduce the ability of the exonuclease to use it as a substrate.

In another embodiment, an exonuclease useful herein has both 5 'to 3' DNA exonuclease activity and 3 'to 5' DNA exonuclease activity. When such exonucleases prefer double-stranded DNA but also use single-stranded DNA as a substrate, the universal adaptor used for ligation may include both types of modification. One modification is to block 3 ' to 5 ' DNA exonuclease activity at the 3 ' end of the single stranded region. This modification prevents digestion of the adaptor-target-adaptor molecule from the free 3' end. The second modification is at the 5' end of a portion of the strand that is a region of the single-stranded, non-complementary nucleic acid strand. This modification prevents digestion of the adaptor-target-adaptor molecule from the free 5' end. Examples of modifications include, but are not limited to, including phosphorothioate linkages. Examples of exonucleases having 5 'to 3' DNA exonuclease activity and 3 'to 5' DNA exonuclease activity and a preference for double stranded DNA include, but are not limited to, truncated exonuclease viii (new England biolabs).

In a preferred embodiment, undesired molecules such as incomplete products are removed by exonuclease. In one embodiment, the exonuclease useful herein has 3 ' to 5 ' DNA exonuclease activity, and optionally, the exonuclease pair prefers double stranded DNA with blunt ends or with recessed 3 ' ends. In one embodiment, an exonuclease having 3 'to 5' DNA exonuclease activity has reduced activity on single stranded DNA (e.g., it prefers double stranded DNA) and/or reduced activity on 3 'extension when the single strand is 4 or more bases in length (e.g., it prefers single stranded 3' extended double stranded DNA having 3 bases or less). Without intending to be limiting, the use of an exonuclease having 3 ' to 5 ' DNA exonuclease activity useful herein is designed to remove at least one strand of the incomplete product by digestion at the 3 ' end of the double stranded region of the incomplete product. Examples of incomplete products include adaptor-target molecules and target molecules that do not include adaptors at either end. Examples of 3 ' to 5 ' exonucleases that prefer double stranded DNA with flat or recessed 3 ' ends include, but are not limited to, exonuclease iii (new England biolabs).

Many compounds and compositions are possible during or after exonuclease treatment. For example, a compound or composition can be obtained comprising a polynucleotide having an adaptor-target-adaptor nucleotide sequence, wherein the 3' end of the polynucleotide is blocked due to exonuclease activity. Libraries or compositions comprising more than one such 3' blocked polynucleotide can be obtained. Pooled libraries of such polynucleotides and compositions comprising pooled libraries of such polynucleotides may be obtained. The composition may also comprise universal adaptors and/or incomplete products that are not attached to the target polynucleotide.

By way of further example, a composition comprising a polynucleotide having an adaptor-target-adaptor nucleotide sequence and an exonuclease may be obtained. Similarly, compositions comprising library polynucleotides and exonucleases can be obtained. A pooled library comprising such polynucleotides and a composition of exonucleases can be obtained. The composition may further comprise a universal adaptor that is not attached to the target polynucleotide.

3' sealing

In one embodiment, in addition to reducing or eliminating an amount of undesired molecules such as unligated adaptors and/or incomplete products, the ligated polynucleotide sequence (adaptor-target-adaptor molecule) and the undesired molecules in combination, e.g., unligated universal adaptor polynucleotide, are optionally 3 ' blocked, meaning that the polynucleotide is modified to prevent incorporation of nucleotides on the 3 ' end to extend the polynucleotide or oligonucleotide from the 3 ' end. 3' blocking can be performed for each library individually or for pooled libraries.

The resulting composition may be subjected to a 3 'blocking reaction to block the 3' end of polynucleotides or oligonucleotides in the sample, such as adaptor-target-adaptor polynucleotides or remaining unligated universal adaptors. An oligonucleotide or polynucleotide having a "blocked" 3 'end is prevented from extending by the addition of additional nucleotides in the 5' to 3 'direction due to the blocked 3' end.

The 3' blocking may be accomplished in any suitable manner. For example, a blocking moiety may be covalently attached to a 3 ' hydroxyl group at the 3 ' terminus to prevent extension from the 3 ' terminus.

In some embodiments, the 3 ' -OH blocking group may be removable such that the 3 ' carbon atom has attached to it a group of the structure-O-Z, where Z is-C (R ') 2—O—R″、—C(R′) 2—N(R″) 2、—C(R′) 2—N(H)R″、—C(R′) 2-S-R 'and-C (R') 2Any one of-F, wherein each R' is a removable protecting group or is part of a removable protecting group; each R' is independently a hydrogen atom, an alkyl group, a substituted alkyl group, an arylalkyl group, an alkenyl group, an alkynyl group, an aryl group, a heteroaryl group, a heterocycle, an acyl group, a cyano group, an alkoxy group, an aryloxy group, a heteroaryloxy group, or an amide group, or a detectable label attached by a linking group; or (R') 2Representative formula ═ C (R' ″) 2Wherein each R' "may be the same or different and is selected from the group comprising hydrogen and halogen atoms and alkyl groups; and wherein the molecule can react to produce an intermediateWherein each R 'is exchanged for H, or wherein Z is-C (R') 2F, F being exchanged for OH, SH or NH 2Preferably exchanged for OH, which intermediates dissociate under aqueous conditions to provide a molecule with a free 3' OH; with the proviso that when Z is-C (R') 2when-S-R "is present, the two R' groups are not H. Where the blocking group is-C (R') 2—O—R″、—C(R′) 2—N(R″) 2、—C(R′) 2—N(H)R″、—C(R′) 2-S-R 'and-C (R') 2In the case of any one of-F, i.e. of formula Z, each R' may independently be H or alkyl. Preferably, Z has the formula-C (R') 2—O—R″、—C(R′) 2—N(R″) 2、—C(R′) 2-N (H) R 'and-C (R') 2-SR ". Particularly preferably, Z has the formula-C (R') 2—O—R″、—C(R′) 2—N(R″) 2and-C (R') 2-SR ". R' may be a benzyl group or a substituted benzyl group. Structure-O-Z (wherein Z is-C (R') 2—N(R″) 2) An example of a radical of (b) is that in which-N (R') 2Is azido (-N) 3) Those groups of (a). One such example is azidomethyl, where each R' is H. Alternatively, formula-C (R') 2—N 3R' in the Z group and other Z groups of (a) may be any other group discussed herein. Examples of typical R' groups include C 1-6Alkyl groups, in particular methyl and ethyl. Other non-limiting examples of suitable 3' blocking groups are provided in: greene et al, "Protective Groups in organic Synthesis," John Wiley&Sons, New York (1991), U.S. Pat. No. 5,990,300, No. 5,872,244, No. 6,232,465, No. 6,214,987, No. 5,808,045, No. 5,763,594, No. 7,057,026, No. 7,566,537, No. 7,785,796, No. 8,148,064, No. 8,394,586, No. 9,388,463, No. 9,410,200, No. 7,427,673, No. 7,772,384, No. 8,158,346, No. 9,121,062, No. 7,541,444, No. 7,771,973, No. 8,071,739, No. 8,597,881, No. 9,121,060, No. 9,388,464, No. 638,399,188, No. 8,808,988, No. 9,051612, 9,469,873 and U.S. publications nos. 2016/0002721 and 2016/0060692, the entire contents of which are incorporated herein by reference.

In some embodiments, the blocking group can remain covalently bound during subsequent processes associated with immobilization of the adaptor-target-adaptor polynucleotide to a solid surface and sequencing.

In some embodiments, a dideoxynucleotide (ddNTP) is incorporated onto the 3 'end of the polynucleotide to block the 3' end. ddntps may be incorporated in any suitable manner. In some embodiments, the ddNTP is incorporated by terminal deoxynucleotidyl transferase (TdT). TdT is capable of incorporating nucleotides onto the 3' end of single or double stranded DNA without a template. In some embodiments, ddntps are incorporated onto the 3' end by TdT in the presence of DNA polymerases such as, for example, Pol19, Pol812, or Pol963 polymerase. Non-limiting examples of other suitable polymerases are provided in U.S. patent nos. 8,460,910, 8,852,910, 8,623,628, 9,273,352, 9,447,389 and U.S. publication nos. 2015/0376582, 2016/0032377, 2016/0090579, 2016/0115461, the entire contents of which are incorporated herein by reference.

In some embodiments, digoxigenin-labeled dideoxynucleotide triphosphates are added to the 3 'end using a terminal transferase to block the 3' end. Kits for adding digoxigenin-labeled dideoxynucleoside triphosphates to the 3' end of a polynucleotide are available, for example, from Sigma-Aldrich.

Any other suitable method may also be used to modify the 3' end of the polynucleotide.

During or after 3' blocking, many compounds and compositions are possible. For example, a compound or composition can be obtained comprising a polynucleotide having an adaptor-target-adaptor nucleotide sequence, wherein the 3' end of the polynucleotide is blocked. Libraries or compositions comprising more than one such 3' blocked polynucleotide can be obtained. Pooled libraries of such polynucleotides and compositions comprising pooled libraries of such polynucleotides may be obtained. The composition may further comprise a universal adaptor that is not attached to the target polynucleotide.

By way of further example, a composition can be obtained comprising a polynucleotide having an adaptor-target-adaptor nucleotide sequence and an enzyme and an agent for blocking the 3' end of the polynucleotide. Similarly, libraries comprising polynucleotides as well as compositions of enzymes and reagents can be obtained. A pooled library comprising such polynucleotides can be obtained, as well as compositions of enzymes and reagents. The composition may further comprise an adaptor oligonucleotide that is not attached to the target polynucleotide. In some embodiments, the compositions comprise ddntps. The composition may also comprise a DNA polymerase such as, for example, Pol19, Pol812, or Pol963 polymerase.

Additional compositions can comprise polynucleotides having an adaptor-target-adaptor nucleotide sequence, enzymes and reagents for blocking the 3' end of the polynucleotide, and exonucleases. Similarly, compositions comprising libraries of polynucleotides, enzymes and reagents, and exonucleases can be obtained. Compositions comprising pooled libraries of such polynucleotides, enzymes and reagents, and exonucleases can be obtained. The composition may further comprise an adaptor oligonucleotide that is not attached to the target polynucleotide. In some embodiments, the compositions comprise ddntps. The composition may also comprise a DNA polymerase such as, for example, Pol19, Pol812, or Pol963 polymerase.

After blocking, a step such as cleaning as described above can be performed, and the polynucleotides then immobilized on a solid surface for sequencing.

The method for reducing or eliminating an amount of unligated universal adaptor polynucleotide construct and the method for 3' blocking a polynucleotide may be performed simultaneously, or sequentially in any order.

If the libraries have not been pooled, they can be pooled before immobilization on the sequencing surface.

Exonuclease treatment described herein to reduce or eliminate unligated universal adaptors can be used immediately after ligation or can be used after PCR-based methods of adding universal extension primer binding sites.

Preparation of immobilized samples for sequencing

More than one adaptor-target-adaptor molecule from one or more sources is then immobilized and amplified prior to sequencing. Methods for attaching adaptor-target-adaptor molecules from one or more sources to a substrate are known in the art. Likewise, methods for amplifying immobilized adaptor-target-adaptor molecules include, but are not limited to, bridge amplification and kinetic exclusion. Methods for immobilization and amplification prior to sequencing are described, for example, in Bignell et al (US 8,053,192), Gunderson et al (WO2016/130704), Shen et al (US 8,895,249), and Pipenburg et al (US9,309,502).

The sample, including pooled samples, may then be fixed and prepared for sequencing. Sequencing may be performed as an array of single molecules, or may be amplified prior to sequencing. Amplification may be performed using one or more immobilized primers. The one or more immobilized primers may be plateaus (lawn) on a planar surface or on a pool of beads. The pool of beads can be separated into an emulsion with a single bead in each "compartment" of the emulsion. At the concentration of only one template per "compartment", only a single template is amplified per bead.

The term "solid phase amplification" as used herein refers to any nucleic acid amplification reaction carried out on or in association with a solid support such that all or a portion of the amplification products are immobilized on the solid support as they are formed. In particular, the term includes solid phase polymerase chain reaction (solid phase PCR) and solid phase isothermal amplification, which is a reaction similar to standard solution phase amplification except that one or both of the forward and reverse amplification primers are immobilized on a solid support. Solid phase PCR covers systems such as emulsion where one primer is anchored to a bead and the other in free solution (freesolution), and clone in solid phase gel matrix (colony) formation where one primer is anchored to a surface and one is in free solution.

In some embodiments, the solid support comprises a patterned surface. By "patterned surface" is meant the arrangement of different regions in or on an exposed layer of a solid support. For example, one or more regions may be characteristic of the presence of one or more amplification primers therein. Features may be separated by interstitial regions where amplification primers are not present. In some embodiments, the pattern may be features in an x-y format in rows and columns. In some embodiments, the pattern may be a repeating arrangement of features and/or interstitial regions. In some embodiments, the pattern may be a random arrangement of features and/or interstitial regions. Exemplary patterned surfaces that can be used in the methods and compositions set forth herein are described in U.S. patent nos. 8,778,848, 8,778,849 and 9,079,148, and U.S. publication No. 2014/0243224, each of which is incorporated herein by reference.

In some embodiments, the solid support comprises an array of wells or pits (depressions) in the surface. This may be fabricated using a variety of techniques, as is generally known in the art, including but not limited to photolithography (photolithography), stamping techniques, molding techniques, and microetching techniques. As will be appreciated by those skilled in the art, the technique used will depend on the composition and shape of the array substrate.

Features in the patterned surface may be wells (e.g., microwells or nanopores) in an array of wells on a glass, silicon, plastic, or other suitable solid support with a patterned, covalently attached gel, such as poly (N- (5-azidoacetamidopentyl) acrylamide-co-acrylamide) (PAZAM, see, e.g., U.S. publication nos. 2013/184796, WO 2016/066586, and WO 2015/002813, each of which is incorporated herein by reference in its entirety). This process results in a gel pad for sequencing use that can be stable over a large number of cycles of sequencing runs. Covalent attachment of the polymer to the pores helps maintain the gel in the structured features throughout the life of the structured substrate during a variety of uses. However, in many embodiments, the gel need not be covalently attached to the pore. For example, under some conditions, a silane-free acrylamide (SFA, see, e.g., U.S. patent No. 8,563,477, which is incorporated herein by reference in its entirety) that is not covalently attached to any portion of the structured substrate can be used as the gel material.

In particular embodiments, the gel may be retained in the wells by patterning the solid support material with wells (e.g., microwells or nanopores), coating the patterned support with a gel material (e.g., PAZAM, SFA, or chemically modified variants thereof, such as an azide form of SFA (azide-SFA)), and polishing the gel-coated support, e.g., by chemical or mechanical polishing, to remove or inactivate substantially all of the gel in interstitial regions between the wells on the surface of the structured substrate. The primer nucleic acid may be attached to the gel material. A solution of target nucleic acids (e.g., fragmented human genome) can then be contacted with the polished substrate such that individual target nucleic acids will inoculate individual wells by interacting with primers attached to the gel material; however, since the gel material is not present or active, the target nucleic acid will not occupy the interstitial regions. Amplification of the target nucleic acid will be confined to the well because the absence or inactivity of the gel in the interstitial regions prevents outward migration of the growing nucleic acid clone. The process is conveniently manufacturable, scalable and uses conventional micro-or nano-fabrication methods.

Although the invention encompasses "solid phase" amplification methods in which only one amplification primer is immobilized (the other primer is typically present in free solution), it is preferred to provide both the immobilized forward and reverse primers to the solid support. In practice, because the amplification process requires an excess of primers to maintain amplification, "more than one" of the same forward primer and/or "more than one" of the same reverse primer will be immobilized on the solid support. Unless the context indicates otherwise, reference herein to a forward primer and a reverse primer will be construed accordingly to include "more than one" of such primers.

As the skilled artisan will appreciate, any given amplification reaction requires at least one type of forward primer and at least one type of reverse primer specific for the template to be amplified. However, in certain embodiments, the forward and reverse primers may comprise template-specific portions of the same sequence, and may have identical nucleotide sequences and structures (including any non-nucleotide modifications). In other words, it is possible to perform solid phase amplification using only one type of primer, and such a single primer method is included in the scope of the present invention. Other embodiments may use forward and reverse primers that contain the same template-specific sequence but differ in some other structural features thereof. For example, one type of primer may comprise non-nucleotide modifications that are not present in another type of primer.

In all embodiments of the present disclosure, a primer for solid phase amplification is preferably immobilized by single point covalent attachment to a solid support at or near the 5 'end of the primer, leaving the template-specific portion of the primer free to anneal to its cognate template and the 3' hydroxyl group free for primer extension. Any suitable covalent attachment means known in the art may be used for this purpose. The attachment chemistry chosen will depend on the nature of the solid support, as well as any derivatization or functionalization applied to it. The primer itself may include a moiety that may be non-nucleotide chemically modified to facilitate attachment. In a particular embodiment, the primer may include a sulfur-containing nucleophile, such as a phosphorothioate or phosphorothioate at the 5' terminus. In the case of a solid-supported polyacrylamide hydrogel, this nucleophile will bind to the bromoacetamide groups present in the hydrogel. A more specific means of attaching primers and templates to a solid support is via 5' phosphorothioate attachment to a hydrogel comprising multimerised acrylamide and N- (5-bromoacetamidopentyl) acrylamide (BRAPA), as fully described in WO 05/065814.

Certain embodiments of the invention may use a solid support comprising an inert substrate or matrix (e.g., glass slide, polymer beads, etc.) that has been functionalized, for example, by applying an intermediate layer of material or coating comprising reactive groups that allow covalent attachment to biomolecules such as polynucleotides. Examples of such supports include, but are not limited to, polyacrylamide hydrogels supported on an inert substrate such as glass. In such embodiments, the biomolecule (e.g., polynucleotide) may be covalently attached directly to the intermediate material (e.g., hydrogel), but the intermediate material itself may be non-covalently attached to a substrate or matrix (e.g., glass substrate). The term "covalently attached to a solid support" will accordingly be understood to include this type of arrangement.

The pooled samples can be amplified on beads, where each bead contains a forward amplification primer and a reverse amplification primer. In a particular embodiment, a library of templates prepared according to the first, second or third aspect of the invention is used to prepare a clustered array of nucleic acid clones by solid phase amplification and more particularly by solid phase isothermal amplification, similar to those described in U.S. publication No. 2005/0100900, U.S. patent No. 7,115,400, WO00/18957 and WO98/44151, the contents of which are incorporated herein by reference in their entirety. The terms "cluster" and "clone" are used interchangeably herein to refer to discrete sites on a solid support comprising more than one identical immobilized nucleic acid strand and more than one identical immobilized complementary nucleic acid strand. The term "clustered array" refers to an array formed from such clusters or clones. In this context, the term "array" should not be construed as requiring an ordered arrangement of clusters.

The term solid phase or surface is used to mean that the primers are attached to a planar array (where the surface is flat, e.g., a glass, silica or plastic microscope slide) or similar flow cell device; beads, wherein one or both primers are attached to the beads and the beads are amplified; an array of beads on a surface (after the beads are amplified), and the like.

The cluster array may be prepared using a thermal cycling process as described in WO98/44151 or a process in which the temperature is maintained constant and cycles of extension and denaturation are performed using variations in the reagents. Such isothermal amplification methods are described in patent application No. WO/0246456 and US publication No. 2008/0009420, which are incorporated herein by reference in their entirety. This is particularly preferred because of the lower temperatures required in isothermal processes.

It will be understood that any amplification method described herein or generally known in the art may be used with the universal primers or target-specific primers to amplify the immobilized DNA fragments. Suitable methods for amplification include, but are not limited to, Polymerase Chain Reaction (PCR), Strand Displacement Amplification (SDA), transcription-mediated amplification (TMA), and nucleic acid sequence-based amplification (NASBA), as described in U.S. patent No. 8,003,354, which is incorporated herein by reference in its entirety. The amplification methods above may be used to amplify one or more nucleic acids of interest. For example, PCR including multiplex PCR, SDA, TMA, NASBA, etc. may be used to amplify the immobilized DNA fragments. In some embodiments, primers specific for the polynucleic acid of interest are included in the amplification reaction.

Other suitable methods for amplifying polynucleotides may include oligonucleotide extension and ligation, Rolling Circle Amplification (RCA) (Lizardi et al, nat. Genet.19:225-232(1998)) and Oligonucleotide Ligation Assays (OLA) (see generally U.S. Pat. Nos. 7,582,420, 5,185,243, 5,679,524 and 5,573,907; EP 0320308B 1; EP 0336731B 1; EP 0439182B 1; WO 90/01069; WO 89/12696; and WO 89/09835) techniques. It will be appreciated that these amplification methods may be designed to amplify immobilised DNA fragments. For example, in some embodiments, the amplification method may comprise a ligation probe amplification or Oligonucleotide Ligation Assay (OLA) reaction that includes primers specific for the nucleic acid of interest. In some embodiments, the amplification method may comprise a primer extension-ligation reaction comprising a primer specific for the nucleic acid of interest. As one non-limiting example of primer extension and ligation primers that can be specifically designed to amplify a nucleic acid of interest, amplification can include primers for the golden gate assay (Illumina, inc., San Diego, CA), as exemplified by U.S. patent nos. 7,582,420 and 7,611,869.

Exemplary isothermal amplification methods that can be used in the methods of the present disclosure include, but are not limited to, Multiple Displacement Amplification (MDA) as exemplified by, for example, Dean et al proc.natl.acad.sci.usa 99:5261-66(2002), or isothermal strand displacement nucleic acid amplification as exemplified by, for example, U.S. patent No. 6,214,587. Other non-PCR-based Methods that may be used in the present disclosure include, for example, those described in, for example, Walker et al, Molecular Methods for VirusDetection, Academic Press, inc., 1995; U.S. Pat. Nos. 5,455,166 and 5,130,238, and Walker et al, nucleic acids Res.20:1691-96(1992), or hyperbranched Strand Displacement Amplification (SDA) as described, for example, in Lane et al, Genome Res.13:294-307 (2003). For random primed amplification of genomic DNA, isothermal amplification methods can be used with either strand displacement Phi 29 polymerase or Bst DNA polymerase large fragment, 5 '- > 3' exo-. The use of these polymerases exploits their high processivity and strand displacement activity. The high processivity allows the polymerase to generate fragments of 10kb to 20kb in length. As set forth above, smaller fragments can be produced under isothermal conditions using polymerases such as Klenow polymerase that have low processivity and strand displacement activity. Additional descriptions of amplification reactions, conditions, and components are set forth in detail in the disclosure of U.S. patent No. 7,670,810, which is incorporated by reference herein in its entirety.

Another Nucleic acid amplification method useful in the present disclosure is tagged PCR (tagged PCR) which uses a population of two-domain primers (two-domain primers) with a constant 5 'region followed by a random 3' region, as described, for example, in Grothues et al Nucleic Acids Res.21(5):1321-2 (1993). A first multiple round of amplification was performed to allow a large number of initiations on heat-denatured DNA based on individual hybridizations from randomly synthesized 3' regions. Due to the nature of the 3' region, the initiation site is expected to be random throughout the genome. Thereafter, unbound primer may be removed and additional replication may occur using a primer complementary to the constant 5' region.

In some embodiments, isothermal amplification may be performed using Kinetic Exclusion Amplification (KEA), also known as exclusion amplification (ExAmp). Nucleic acid libraries of the present disclosure can be prepared using a method that includes the step of reacting amplification reagents to produce more than one amplification site, each of which includes a basic clonal population of amplicons from a single target nucleic acid that has been seeded with the site. In some embodiments, the amplification reaction is conducted to a capacity to produce a sufficient number of amplicons to fill the corresponding amplification sites. Filling the already seeded sites to capacity in this manner inhibits target nucleic acid landing and amplification at the sites, thereby generating clonal populations of amplicons at the sites. In some embodiments, apparent clonality (apparent cloning) may be achieved even if the amplification site is not filled to capacity before the second target nucleic acid reaches the site. Under some conditions, amplification of the first target nucleic acid can proceed until a sufficient number of copies are produced to effectively overcome or overwhelm the spots produced from the copies of the second target nucleic acid being transported to the site. For example, in embodiments using a bridge amplification process for circular features having a diameter of less than 500nm, it has been determined that contamination from a second target nucleic acid at the same site will produce an insufficient number of contaminating amplicons to adversely affect sequencing-by-synthesis (sequencing-by-synthesis) analysis on an Illumina sequencing platform after 14 cycles of exponential amplification of the first target nucleic acid.

In particular embodiments, the amplification sites in the array may be, but need not be, fully cloned. Rather, for some applications, an individual amplification site can be primarily filled with amplicons from a first target nucleic acid, and can also have low levels of contaminating amplicons from a second target nucleic acid. The array may have one or more amplification sites with low levels of contaminating amplicons, as long as the contamination level has no unacceptable effect on the subsequent use of the array. For example, when the array is to be used in an inspection application, an acceptable contamination level will be a level that does not affect the signal-to-noise ratio or resolution of the inspection technique in an unacceptable manner. Thus, apparent clonality will generally be related to the particular use or application of the array made by the methods set forth herein. Exemplary contamination levels that may be acceptable at a single amplification site for a particular application include, but are not limited to, up to 0.1%, 0.5%, 1%, 5%, 10%, or 25% contaminating amplicons. The array may include one or more amplification sites with these exemplary levels of contaminating amplicons. For example, up to 5%, 10%, 25%, 50%, 75%, or even 100% of the amplification sites in the array may have some contaminating amplicons. It will be appreciated that at least 50%, 75%, 80%, 85%, 90%, 95% or 99% or more of the sites in an array or other collection of sites may be clonal or apparently clonal.

In some embodiments, kinetic exclusion can occur when a process occurs at a rate fast enough to effectively exclude another event or process from occurring. Take the preparation of a nucleic acid array as an example, where the sites of the array are randomly seeded with target nucleic acids from a solution, and copies of the target nucleic acids are generated during amplification to fill each seeded site to capacity. According to the kinetic exclusion methods of the present disclosure, the seeding and amplification processes may be performed simultaneously under conditions where the amplification rate exceeds the seeding rate. Thus, a relatively fast rate of copying at a site that has been seeded with a first target nucleic acid will effectively preclude a second nucleic acid from seeding the site for amplification. The kinetic exclusion amplification method can be performed as described in detail in U.S. publication No. 2013/0338042, which is incorporated herein by reference in its entirety.

Kinetic exclusion can utilize a relatively slow rate of initiation of amplification (e.g., a slow rate of making a first copy of the target nucleic acid) versus a relatively fast rate of making a subsequent copy of the target nucleic acid (or a first copy of the target nucleic acid). In the example of the preceding paragraph, kinetic exclusion occurs due to a relatively slow rate of target nucleic acid seeding (e.g., relatively slow diffusion or transport) versus a relatively fast rate at which amplification occurs to fill the site with copies of the nucleic acid seed. In another exemplary embodiment, kinetic exclusion can occur due to a delayed (e.g., delayed or slow activation) formation of the first copy of the target nucleic acid that has seeded the site versus a relatively fast rate at which subsequent copies are prepared to fill the site. In this example, the individual sites may have been seeded with several different target nucleic acids (e.g., several target nucleic acids may be present at each site prior to amplification). However, for any given target nucleic acid, first copy formation can be randomly activated such that the average rate of first copy formation is relatively slow compared to the rate at which subsequent copies are produced. In this case, although individual sites may have been seeded with several different target nucleic acids, kinetic exclusion will allow only one of those target nucleic acids to be amplified. More specifically, after the first target nucleic acid has been activated for amplification, the site will be quickly filled to capacity with copies of the first target nucleic acid, thereby preventing copies of the second target nucleic acid from being prepared at the site.

Amplification reagents may include additional components that promote amplicon formation and in some cases increase the rate of amplicon formation. One example is a recombinase. The recombinase may facilitate amplicon formation by allowing repeated invasion/extension. More specifically, the recombinase can facilitate invasion of the target nucleic acid by a polymerase, and extend the primer by the polymerase using the target nucleic acid as a template for amplicon formation. This process can be repeated as a chain reaction in which amplicons generated from each round of invasion/extension are used as templates in a subsequent round. Because denaturation cycles are not required (e.g., by heat or chemical denaturation), this method can occur more rapidly than standard PCR. Thus, the recombinase-facilitated amplification can be carried out isothermally. It is often desirable to include ATP or other nucleotides (or in some cases, non-hydrolyzable analogs thereof) in recombinase-facilitated amplification reagents to facilitate amplification. Mixtures of recombinases and single-stranded binding (SSB) proteins are particularly useful, as SSB can further facilitate amplification. Exemplary formulations for recombinant enzyme-promoted amplification include those commercially sold by twist dx (Cambridge, UK) as twist amp kits. Useful components and reaction conditions for recombinase-facilitated amplification reagents are set forth in U.S. Pat. Nos. 5,223,414 and 7,399,590, each of which is incorporated herein by reference.

Another example of a component that can be included in the amplification reagents to promote amplicon formation and in some cases increase the rate of amplicon formation is helicase. Helicases may facilitate amplicon formation through a chain reaction that allows amplicon formation. This process can occur faster than standard PCR because denaturation cycles are not required (e.g., by heating or chemical denaturation). Thus, helicase-promoted amplification can be performed isothermally. A mixture of helicase and single-stranded binding (SSB) proteins is particularly useful, as SSB can further facilitate amplification. Exemplary formulations for helicase-promoted amplification include those commercially available from Biohelix (Beverly, MA) as an IsoAmp kit. Further, examples of useful formulations comprising helicase protein are described in US7,399,590 and US7,829,284, each of which is incorporated herein by reference.

Yet another example of a component that can be included in an amplification reagent to promote amplicon formation and in some cases increase the rate of amplicon formation is an origin binding protein.

Use in sequencing/method of sequencing

After the adaptor-target-adaptor molecules are attached to the surface, the sequence of the immobilized and amplified adaptor-target-adaptor molecules is determined. Sequencing can be carried out using any suitable sequencing technique, and methods for determining the sequence of immobilized and amplified adaptor-target-adaptor molecules, including strand resynthesis, are known in the art and described in, for example, Bignell et al (US 8,053,192), Gunderson et al (WO2016/130704), Shen et al (US 8,895,249), and Pipenburg et al (US9,309,502).

The methods described herein can be used in conjunction with a variety of nucleic acid sequencing techniques. Particularly applicable techniques are those of: techniques in which nucleic acids are attached at fixed positions in an array such that their relative positions do not change and in which arrays are repeatedly imaged. Embodiments in which images are obtained in different color channels, e.g., conforming to different labels for distinguishing one nucleotide base type from another, are particularly applicable. In some embodiments, the process of determining the nucleotide sequence of a target nucleic acid can be an automated process. Preferred embodiments include sequencing-by-synthesis ("SBS") techniques.

SBS techniques typically involve enzymatically extending a nascent nucleic acid strand by iteratively adding nucleotides to the template strand. In conventional SBS methods, a single nucleotide monomer can be provided to a target nucleotide in the presence of a polymerase at each delivery. However, in the methods described herein, more than one type of nucleotide monomer can be provided to a target nucleic acid in the presence of a polymerase upon delivery.

SBS may use nucleotide monomers with a terminator moiety or those that lack any terminator moiety. As set forth in further detail below, methods of using a nucleotide monomer lacking a terminator include, for example, pyrosequencing and sequencing using gamma-phosphate labeled nucleotides. In methods using a terminator-deficient nucleotide monomer, the number of nucleotides added per cycle is typically variable and depends on the template sequence and nucleotide delivery pattern. For SBS techniques using nucleotide monomers with a terminator moiety, the terminator may be effectively irreversible under the sequencing conditions used, as is the case for traditional Sanger sequencing using dideoxynucleotides, or the terminator may be reversible as is the case for the sequencing method developed by Solexa (now Illumina, Inc.).

SBS techniques may use nucleotide monomers that have a label moiety or those that lack a label moiety. Thus, an incorporation event can be detected based on: a characteristic of the marker, such as fluorescence of the marker; characteristics of the nucleotide monomers such as molecular weight or charge; by-products of nucleotide incorporation, such as released pyrophosphate; and the like. In embodiments where two or more different nucleotides are present in the sequencing reagent, the different nucleotides may be distinguishable from each other, or alternatively, the two or more different labels may be indistinguishable under the detection technique used. For example, different nucleotides present in a sequencing reagent may have different labels, and they may be distinguished using suitable optical means (optics), as exemplified by the sequencing method developed by Solexa (now Illumina, Inc.).

Preferred embodiments include pyrosequencing techniques. Pyrosequencing detects the release of inorganic pyrophosphate (PPi) as specific nucleotides are incorporated into the nascent strand (Ronaghi, M., Karamohamed, S., Pettersson, B., Uhlen, M., and Nyren, P. (1996) "read-time DNA sequencing using detection of pyrophoric DNA sequencing," Analytical Biochemistry 242(1), 84-9; Ronaghi, M. (2001) "pyrosequencing light on DNA sequencing," Genome Res.11(1), 3-11; Ronaghi, M., and Nyren, P. (1998) "A sequencing method base read-time" Science281 (5375; U.S. Pat. No. 5,274,568, incorporated herein by reference in its entirety). In pyrosequencing, the released PPi can be detected by immediate conversion to Adenosine Triphosphate (ATP) by ATP sulfurylase, and the level of ATP produced is detected by the luciferase-produced protons. Nucleic acids to be sequenced can be attached to features in the array, and the array can be imaged to capture chemiluminescent signals resulting from incorporation of nucleotides at the features of the array. When the array is treated with a particular nucleotide type (e.g., A, T, C or G), an image can be obtained. The images obtained after the addition of each nucleotide type will differ in which features in the array are detected. These differences in the image reflect the different sequence content of the features on the array. However, the relative position of each feature will remain unchanged in the image. Using the methods set forth herein, the images can be stored, processed, and analyzed. For example, images obtained after processing the array with each different nucleotide type can be processed in the same manner as exemplified herein for images obtained from different detection channels for reversible terminator-based sequencing methods.

In another exemplary type of SBS, cycle sequencing is accomplished by step-by-step addition of reversible terminator nucleotides, e.g., comprising cleavable or photobleachable dye labels as described, e.g., in WO 04/018497 and U.S. patent No. 7,057,026, the disclosures of which are incorporated herein by reference. This process is being commercialized by Solexa (now Illumina, Inc.) and is also described in WO 91/06678 and WO 07/123,744, each of which is incorporated herein by reference. The availability of fluorescently labeled terminators in which termination can be reversed and the fluorescent label cleaved facilitates efficient Cycle Reversible Termination (CRT) sequencing. Polymerases can also be co-engineered to efficiently incorporate and extend from these modified nucleotides.

Preferably, in the reversible terminator-based sequencing embodiment, the tag does not substantially inhibit extension under SBS reaction conditions. However, the detection label may be removable, for example by cleavage or degradation. After the labels are incorporated into the aligned nucleic acid features, an image can be captured. In particular embodiments, each cycle comprises simultaneously delivering four different nucleotide types to the array, and each nucleotide type has a spectrally distinct label. Four images can then be obtained, each using a detection channel that is selective for one of the four different labels. Alternatively, different nucleotide types may be added sequentially, and images of the array may be obtained between each addition step. In such embodiments, each image will show the nucleic acid features that have incorporated a particular type of nucleotide. Different features will or will not be present in different images due to the different sequence content of each feature. However, the relative positions of the features will remain unchanged in the image. Images obtained from such reversible terminator-SBS methods can be stored, processed, and analyzed as set forth herein. After the image capture step, the labels can be removed and the reversible terminator moieties can be removed for subsequent cycles of nucleotide addition and detection. Removing the marker after it has been detected in a particular cycle and before a subsequent cycle may provide the advantage of reducing cross-talk between background signals and cycles. Examples of useful labels and removal methods are set forth below.

In particular embodiments, some or all of the nucleotide monomers may include a reversible terminator. In such embodiments, the reversible terminator/cleavable fluorophore may comprise a fluorophore linked to a ribose moiety by a 3' ester linkage (Metzker, Genome Res.15: 1767-. Other methods have separated terminator chemistry from the cleavage of fluorescent markers (Ruparel et al, Proc Natl Acad Sci USA 102:5932-7(2005), which is incorporated herein by reference in its entirety). Ruparael et al describe the development of reversible terminators that use small 3' allyl groups to block extension, but can be easily deblocked by brief treatment with a palladium catalyst. The fluorophore is attached to the base through a photocleavable linker that can be easily cleaved by exposure to long wavelength UV light for 30 seconds. Thus, disulfide bond reduction or photocleavage can be used to cleave the cleavable linker. Another method of reversible termination is to use natural termination followed by placement of a large number of dyes on the dNTPs. The presence of a large number of charged dyes on the dntps can act as efficient terminators through steric and/or electrostatic barriers. The presence of one incorporation event prevents further incorporation unless the dye is removed. Cleavage of the dye removes the fluorophore and effectively reverses the termination. Examples of modified nucleotides are also described in U.S. patent nos. 7,427,673 and 7,057,026, the disclosures of which are incorporated herein by reference in their entirety.

Additional exemplary SBS systems and methods that may be used with the methods and systems described herein are described in U.S. publication nos. 2007/0166705, 2006/0188901, 2006/0240439, 2006/0281109, 2012/0270305, and 2013/0260372, U.S. patent No. 7,057,026, PCT publication No. WO 05/065814, U.S. patent application publication No. 2005/0100900, and PCT publications No. WO 06/064199 and WO 07/010,251, the disclosures of which are incorporated herein by reference in their entirety.

Some embodiments use less than four different labels, and detection of four different nucleotides can be used. For example, SBS can be performed using the methods and systems described in the incorporated material of U.S. publication No. 2013/0079232. As a first example, a pair of nucleotide types may be detected at the same wavelength, but distinguished based on a difference in intensity of one member of the pair compared to the other, or distinguished based on a change (e.g., by chemical, photochemical, or physical modification) to one member of the pair resulting in the appearance or disappearance of an apparent signal compared to a signal detected for the other member of the pair. As a second example, three of the four different nucleotide types can be detected under particular conditions, while the fourth nucleotide type lacks a label detectable under those conditions, or is minimally detected under those conditions (e.g., due to minimal detection of background fluorescence, etc.). Incorporation of the first three nucleotide types into a nucleic acid can be determined based on the presence of their respective signals, and incorporation of the fourth nucleotide type into a nucleic acid can be determined based on the absence of any signal or minimal detection of any signal. As a third example, one nucleotide type may include one or more labels detected in two different channels, while the other nucleotide type is detected in no more than one channel. The three exemplary configurations mentioned above are not considered mutually exclusive and may be used in various combinations. One exemplary embodiment that combines all three examples is a fluorescence-based SBS method that uses a first nucleotide type detected in a first channel (e.g., dATP with a label detected in the first channel when excited by a first excitation wavelength), a second nucleotide type detected in a second channel (e.g., dCTP with a label detected in the second channel when excited by a second excitation wavelength), a third nucleotide type detected in both the first and second channels (e.g., dTTP with at least one label detected in both channels when excited by the first and/or second excitation wavelengths), and a fourth nucleotide type lacking a label that is not detected or minimally detected in either channel (e.g., dGTP without a label).

In addition, sequencing data can be obtained using a single channel as described in the incorporated material of U.S. publication No. 2013/0079232. In such a so-called one-dye sequencing method, a first nucleotide type is labeled, but the label is removed after the first image is generated, and a second nucleotide type is labeled only after the first image is generated. The third nucleotide type retains its label in both the first and second images, and the fourth nucleotide type remains unlabeled in both images.

Some embodiments may use sequencing by ligation techniques. Such techniques use DNA ligase to incorporate oligonucleotides, and identify the incorporation of such oligonucleotides. Oligonucleotides typically have different labels that correlate with the identity of a particular nucleotide in the sequence to which the oligonucleotide hybridizes. As with other SBS methods, images can be obtained after processing an array of nucleic acid features with labeled sequencing reagents. Each image will show a particular type of nucleic acid feature that has incorporated a label. Due to the different sequence content of each feature, different features will or will not be present in different images, but the relative positions of the features will remain unchanged in the images. As set forth herein, images obtained from ligation-based sequencing methods can be stored, processed, and analyzed. Exemplary SBS systems and methods that may be used with the methods and systems described herein are described in U.S. patent nos. 6,969,488, 6,172,218, and 6,306,597, the disclosures of which are incorporated herein by reference in their entirety.

Some embodiments may be processed using Nanopore sequencing (Deamer, D.W. & Akeseon, M. "Nanopores and Nuclear acids: Trends Biotechnol.18,147-151 (2000); Deamer, D. and D.Branton," chromatography of nucleic acids by nanopowder analysis ", Acc.chem.Res.35:817-825 (2002); Li, J., M.Gershow, D.Stein, E.Nanopodium and J.A.ovhenko," DNA molecules and complexes in colloidal-stationary nucleic acids "Nat.Mater.2: sequencing, which disclosure is incorporated by reference to the reference numeral 615, the entire DNA analysis may be processed by a Nanopore analysis, such as the Nanopores, 27. Ser. No. 32, Ser. No. 32, No. 7, No. 5. Nanopores, No. 32. the entire DNA analysis may be processed by a Nanopore analysis, such as a Nanopore analysis, nanometer protein processing, such as a Nanopore 2. A.7. A. 10. A. distribution, DNA analysis, the entire Nanopore analysis, nanometer protein may be processed by a Nanopore analysis, such as a Nanopore 2. 10. A.7, nanometer luminescence, nanometer wash, and other specific methods, such as a Nanopore 2. A.7. sample, nanometer wash, and analysis, nanometer wash, sample processing, sample wash, sample processing, and wash, and analysis of target nucleic acid, and analysis, such as described herein, may be performed by a wash, and analysis, as described herein, and analysis.

Some embodiments may employ methods that include monitoring DNA polymerase activity in real time. Nucleotide incorporation can be detected by Fluorescence Resonance Energy Transfer (FRET) interaction between a fluorophore-bearing polymerase and gamma-phosphate labeled nucleotides (as described, for example, in U.S. patent nos. 7,329,492 and 7,211,414, both of which are incorporated herein by reference), or nucleotide incorporation can be detected with zero-mode waveguides (zero-mode waveguides) (as described, for example, in U.S. patent No. 7,315,019, which is incorporated herein by reference), and using fluorescent nucleotide analogs and engineered polymerases (as described, for example, in U.S. patent No. 7,405,281 and U.S. publication No. 2008/0108082, both of which are incorporated herein by reference). Illumination can be limited to zeptoliter-scale volumes in the vicinity of surface-tethered polymerases, such that incorporation of fluorescently labeled nucleotides can be observed with low background (Leven, M.J. et al, "Zero-mode waveguiding for single-molecule analysis at high concentrations." Science 299,682-686 (2003); Lundquist, P.M. et al, "Parallel linkage for single-molecule in-time" operation. Lett.33,1026-1028 (2008); Korlach, J. et al, "Selective amplification for target amplification of single-molecule waveguiding nucleotide No. structures," Proc. Acl.105. 2008. for general reference), incorporated herein by reference in its entirety, 1181). Images obtained from such methods can be stored, processed, and analyzed as set forth herein.

Some SBS embodiments include detecting protons released upon incorporation of nucleotides into the extension products. For example, sequencing based on detecting released protons may use electron detectors and related technologies commercially available from Ion Torrent (Guilford, CT, a branch of life technologies), or sequencing methods and systems described in U.S. publication nos. 2009/0026082, 2009/0127589, 2010/0137143, or 2010/0282617, each of which is incorporated herein by reference. The methods set forth herein for amplifying a target nucleic acid using kinetic exclusion can be readily applied to substrates used to detect protons. More specifically, the methods set forth herein can be used to generate clonal populations of amplicons that are used to detect protons.

The SBS method above can advantageously be performed in a multiplex format, allowing for simultaneous manipulation of multiple different target nucleic acids. In certain embodiments, different target nucleic acids can be treated in a common reaction vessel or on the surface of a particular substrate. This allows convenient delivery of sequencing reagents, removal of unreacted reagents and detection of incorporation events in a multiplex format. In embodiments using surface-bound target nucleic acids, the target nucleic acids can be in an array format. In array formats, the target nucleic acids can typically be bound to the surface in a spatially distinguishable manner. The target nucleic acid may be bound by direct covalent attachment, attached to beads or other particles or bound to a polymerase or other molecule attached to the surface. The array may include a single copy of the target nucleic acid at each site (also referred to as a feature) or multiple copies of the same sequence may be present at each site or feature. Multiple copies may be generated by amplification methods, such as bridge amplification or emulsion PCR, as described in further detail below.

The methods set forth herein may use arrays of features having any of a variety of densities, including, for example, at least about 10 features/cm 2100 features/cm 2500 features/cm 21,000 features/cm 25,000 features/cm 210,000 features/cm 250,000 features/cm 2100,000 features/cm 21,000,000 features/cm 25,000,000 features/cm 2Or higher.

An advantage of the methods set forth herein is that they provide rapid and efficient parallel detection of more than one target nucleic acid. Accordingly, the present disclosure provides a computer-readable storage medium capable of storing a computer programIntegrated systems (integrated systems) for preparing and detecting nucleic acids using techniques known in the art, such as those exemplified above. Thus, the integrated system of the present disclosure may include fluidic components capable of delivering amplification reagents and/or sequencing reagents to one or more immobilized DNA fragments, the system including components such as pumps, valves, reservoirs, fluidic lines, and the like. The flow cell can be configured and/or used in an integrated system for detecting a target nucleic acid. Exemplary flow cells are described, for example, in U.S. publication nos. 2010/0111768 and 2012/0270305, each of which is incorporated herein by reference. As exemplified with respect to flow cells, one or more fluid components of the integrated system may be used for amplification methods and for detection methods. Taking the nucleic acid sequencing embodiment as an example, one or more fluidic components of the integrated system may be used for the amplification methods set forth herein and for delivering sequencing reagents in sequencing methods such as those exemplified above. Alternatively, the integrated system may comprise separate fluidic systems to carry out the amplification method and to carry out the detection method. Examples of integrated sequencing systems capable of producing amplified nucleic acids and also capable of determining the sequence of the nucleic acids include, but are not limited to, MiSeq TMThe platform (Illumina, inc., San Diego, CA) and the device described in U.S. publication No. 2012/0270305, which is incorporated herein by reference.

Referring now to fig. 1, a schematic diagram of an adaptor 100 that can be used according to various embodiments described herein is shown. The depicted adaptor 100 comprises a double stranded region 110 and a non-complementary single stranded region 120. The double stranded region 110 may be attached to a double stranded target polynucleotide. In the depicted embodiment, the 5 ' end of the strand in the double stranded region 110 includes an optional 5 ' phosphate (indicated as "W") that aids in ligating the adaptor 100 to the double stranded target polynucleotide and is digested by an exonuclease having 5 ' to 3 ' exonuclease activity that favors double stranded DNA that includes the end 5 ' phosphate. Optionally, the free 5 ' end of the strand of the single stranded portion 120 is modified to protect that end from exonuclease activity (indicated by "X"), e.g., the free 5 ' end of the strand of the single stranded portion 120 does not include a 5 ' phosphate. If the adapter 100 is not attached to a double-stranded target fragment, the unincorporated adapter may be degraded by one or more exonucleases having 5 'to 3' exonuclease activity that favors double-stranded DNA. The strand with 140, 142 and 144 is selectively degraded, the other strand and the adaptor-target-adaptor molecule remain intact. Optional modification of the free 5 ' end of the strand of the single-stranded portion 120 may help reduce the residual activity that a 5 ' to 3 ' exonuclease may have on single-stranded DNA. If the adaptors 100 are part of an incomplete product, e.g., one adaptor 100 is attached to a double-stranded target molecule, the incomplete product may be degraded by one or more exonucleases having 3 ' to 5 ' exonuclease activity for double-stranded DNA that preferentially has flat or recessed 3 ' ends. The strands with 130, 132 and 134 were selectively degraded, with the other strand and adaptor-target-adaptor molecule remaining intact.

Referring now to fig. 2, a schematic diagram of an adaptor 200 that can be used according to various embodiments described herein is shown. In the depicted embodiment, the free ends of each strand of the single-stranded portion 220 are modified (indicated by "Y") to protect these ends from exonuclease activity. If the adapter 100 is not attached to a double-stranded target fragment, the unincorporated adapter may be degraded by one or more exonucleases having both 5 'to 3' exonuclease activity and 3 'to 5' exonuclease activity. Protection of both free ends of each strand of the single-stranded portion 220 prevents exonucleases from using the desired adaptor-target-adaptor molecule as a substrate. If the adaptors 200 are part of an incomplete product, e.g., one adaptor 200 is attached to a double stranded target molecule, the incomplete product may be degraded by one or more exonucleases having 3 ' to 5 ' exonuclease activity for double stranded DNA that prefers to have flat or recessed 3 ' ends. Protection of both free ends of each strand of the single-stranded portion 220 prevents exonucleases from using the desired adaptor-target-adaptor molecule as a substrate.

Referring now to fig. 3, a schematic diagram of an adaptor 300 that can be used according to various embodiments described herein is shown. In the described embodiment, the 5 ' end of the strand in the double stranded region 310 includes an optional 5 ' phosphate (indicated as "W") that aids in ligating the adaptor 300 to the double stranded target polynucleotide and is digested by an exonuclease having 5 ' to 3 ' exonuclease activity that favors double stranded DNA that includes the end 5 ' phosphate. The double stranded region 310 may be attached to a double stranded target polynucleotide if the 3' end is not blocked. In the depicted embodiment, each strand of the adaptor 300 includes a blocked 3' end, indicated by a "Z". If the adaptor 300 is not attached to a double stranded target fragment, the unincorporated adaptor may be degraded by one or more exonucleases having both 5 'to 3' exonuclease activity and 3 'to 5' exonuclease activity. Any remaining adapter sequences that are not degraded by exonucleases are unable to serve as primers for extending any polynucleotide sequence during subsequent amplification and/or sequencing reactions.

Referring now to fig. 4, a schematic diagram of an adaptor 400 that can be used according to various embodiments described herein is shown. In the described embodiment, the 5 'end of the strands in the double stranded region 410 include an optional 5' phosphate (indicated as "W") that aids in ligating the adaptor 400 to the double stranded target polynucleotide. The double stranded region 410 may be attached to a double stranded target polynucleotide. In those embodiments in which one adaptor is attached to a double-stranded target molecule (incomplete product), the incomplete product may be digested by one or more exonucleases having 3 ' to 5 ' exonuclease activity for double-stranded DNA that prefers to have flat or recessed 3 ' ends.

One strand of the depicted adaptor 100 or adaptor 200 or adaptor 300 or adaptor 400 comprises the universal extension primer binding site 130 or universal extension primer binding site 230 or universal extension primer binding site 330 or universal extension primer binding site 430 (e.g., P5), tag sequence 132 or tag sequence 232 or tag sequence 332 or tag sequence 432 (e.g., i5), and the sequencing primer binding site 134 or sequencing primer binding site 234 or sequencing primer binding site 334 or sequencing primer binding site 434 (e.g., SBS 3). Depicted is an adaptor 100 or adaptor 200 or adaptor 300 or adaptor 400 comprising a universal extension primer binding site 140 or universal extension primer binding site 240 or universal extension primer binding site 340 or universal extension primer binding site 440 (e.g., P7 '), tag sequence 142 or tag sequence 242 or tag sequence 342 or tag sequence 442 (e.g., i7), and a sequencing primer binding site 144 or sequencing primer binding site 244 or sequencing primer binding site 344 or sequencing primer binding site 444 (e.g., SBS 12').

For amplification or sequencing purposes, the universal extension primer binding site 130 or the universal extension primer binding site 230 or the universal extension primer binding site 330 or the universal extension primer binding site 430 (e.g., P5), the universal extension primer binding site 140 or the universal extension primer binding site 240 or the universal extension primer binding site 340 or the universal extension primer binding site 440 (e.g., P7') may hybridize to an extension primer oligonucleotide attached to a solid surface (if the adapter 100 or 200 or the adapter 300 or the adapter 400 is attached to a target polynucleotide). Universal extension primer binding site 140 or universal extension primer binding site 240 or universal extension primer binding site 340 or universal extension primer binding site 440 (e.g., P7'), or a portion thereof, can also hybridize to a sequencing primer used to sequence index tag sequence 142 or index tag sequence 242 or index tag sequence 342 or index tag sequence 442 (e.g., i 7). Alternatively, the strand may comprise other sequencing primer sequences (not shown).

Sequencing primer binding site 134 or sequencing primer binding site 234 or sequencing primer binding site 334 or sequencing primer binding site 434 (e.g., SBS3) can hybridize to a sequencing primer to allow sequencing of index tag sequence 132 or index tag sequence 232 or index tag sequence 332 or index tag sequence 432 (e.g., i 5). Tag sequence 142 or tag sequence 242 or tag sequence 342 or tag sequence 442 and tag sequence 132 or tag sequence 232 or tag sequence 332 or tag sequence 432 may be the same or different.

The sequencing primer binding site 144 or the sequencing primer binding site 244 or the sequencing primer binding site 344 or the sequencing primer binding site 444 (e.g., SBS 12') can hybridize to a sequencing primer to allow sequencing of the target polynucleotide sequence (if attached to adapter 100 or adapter 200 or adapter 300 or adapter 400).

If the adaptor is attached to the target in a multi-step process as described above, the sequencing primer binding site 134 or the sequencing primer binding site 234 or the sequencing primer binding site 334 or the sequencing primer binding site 434 (e.g., SBS3), the sequencing primer binding site 144 or the sequencing primer binding site 244 or the sequencing primer binding site 344 or the sequencing primer binding site 444 (e.g., SBS 12') may hybridize with, for example, a PCR primer.

It will be understood that suitable adaptors for use in the various embodiments described herein may have more or less sequence features or other sequence features than those described in relation to fig. 1, fig. 2, fig. 3 and fig. 4.

Referring now to fig. 5, a schematic diagram of an adaptor-target-adaptor 500 with a library of adaptor 100-template 510-adaptor 100 sequences is shown. The adaptor-target-adaptor 510 is double stranded and is attached to the double stranded portion of the adaptor 100. The 5' end of the single stranded portion of the adapter is modified to protect against exonuclease digestion (indicated by "X"). Because the adaptors 100 are ligated to both ends of the double stranded target fragments 510, there is no available double stranded sequence on the adaptor-target-adaptor molecule for exonuclease, and the resulting adaptor-target-adaptor 400 is therefore resistant to exonuclease digestion.

Referring now to fig. 6, a schematic diagram of an adaptor-target-adaptor 600 with a library of adaptor 200-template 610-adaptor 200 sequences is shown. The adaptor-target-adaptor 610 is double stranded and is attached to the double stranded portion of the adaptor 200. The ends of the single-stranded portion of the adapter are modified to protect against exonuclease digestion (indicated by "Y"). Because the adaptors 200 are ligated to both ends of the double stranded target fragment 610, there is no unblocked single stranded sequence available on the adaptor-target-adaptor molecule for exonuclease, and thus the resulting adaptor-target-adaptor 600 is resistant to exonuclease digestion.

Referring now to fig. 7, a schematic diagram of an adaptor-target-adaptor 700 with a library of adaptor 300-template 710-adaptor 300 sequences is shown. The adaptor-target-adaptor 710 is double stranded and is attached to the double stranded portion of the adaptor 300. The ends of the single stranded regions of the adapters are modified to prevent them from acting as primers for extending any polynucleotides in the flow cell. FIG. 7 also shows a schematic of adapters that are not completely degraded by exonucleases. One single strand of the adapter 300 is shown. Such single stranded adaptors are not capable of acting as primers for extending any polynucleotide in the flow cell.

Referring now to FIG. 8A, a schematic diagram of an incomplete product adaptor-target 800 of a library having adaptor 400-template 810 sequences is shown. The adaptor-target 800 is double stranded and is attached to the double stranded portion of the adaptor 400. Fig. 8B also shows a schematic of one result of digesting incomplete product 800 with an exonuclease having 3 ' to 5 ' exonuclease activity for double stranded DNA that prefers to have flat or concave 3 ' ends. Digestion of one strand of the double stranded portion of the adaptor-target 800 from 3 'to 5' can result in two single stranded molecules. One strand is a single stranded adaptor-target 830. The other adaptor strand 820 corresponds to a single stranded region of the adaptor 400. In this embodiment, the polynucleotides present in the library pool are 3 'blocked (as indicated by "Z") upon exposure to an exonuclease having 3' to 5 'exonuclease activity that favors double stranded DNA having flat or recessed 3' ends. These 3' blocked single stranded adaptor-target 830 and adaptor strand 820 cannot serve as primers for extending any polynucleotide in the flow cell.

Referring now to fig. 9A and 9B, the nature of the index jump phenomenon is illustrated. FIG. 9A shows how reads from a given sample are incorrectly de-multiplexed and mixed with different samples after de-multiplexing. FIG. 9B demonstrates index hopping in a dual index system, where it results in unexpected combinations of index tag sequences.

Referring now to fig. 10A and 10B, a general method of measuring index hopping rate in a given system is illustrated. Fig. 10A shows an exemplary layout of a dual adapter daughter plate, where each individual well of a 96-well plate contains a unique pair of index tag sequences (12 different P7 indices combined with 8 different P5 indices). Fig. 10B shows an experimental setup aimed at measuring index hopping rates, where 8 unique double-index tag combinations were used (i.e., no P5 index is expected to pair with more than one P7 index, and vice versa). The unexpected combination of index tags (e.g., D505-D703) is then easily identified as an index jump case.

Referring now to FIGS. 11A and 11B, the effect of unligated adaptors on index hopping rates is illustrated. Figure 11A shows a 6-fold increase in index jump associated with 50% incorporation of free adaptors. FIG. 11B shows the approximately linear effect of free forked adaptors on index jump rate over the range tested. The inventors also observed that the effect of the free single stranded P7 adaptor on index hopping rates was more pronounced than the free single stranded P5 adaptor (data not shown).

Referring now to FIGS. 12A and 12B, exonuclease treatment of Illumina alone and in combination with 3' blocking, respectively, is illustrated

Figure BDA0002331392460000504

There was no effect of index jump rate in the PCR library preparation workflow. In both cases, a significant reduction in index jump was observed, but a stronger reduction was observed with the combined exonuclease and 3' blocking treatment.

The invention is illustrated by the following examples. It is understood that specific embodiments, materials, amounts, and procedures are to be construed in accordance with the scope and spirit of the invention as set forth herein.

Examples

49页详细技术资料下载
上一篇:一种医用注射器针头装配设备
下一篇:模块式核酸衔接头

网友询问留言

已有0条留言

还没有人留言评论。精彩留言会获得点赞!

精彩留言,会给你点赞!