Method of attaching adapters to sample nucleic acids

文档序号:1785746 发布日期:2019-12-06 浏览:19次 中文

阅读说明:本技术 将衔接子附接至样品核酸的方法 (Method of attaching adapters to sample nucleic acids ) 是由 安德鲁·肯尼迪 斯特凡尼·安·沃德·莫蒂默 于 2018-04-13 设计创作,主要内容包括:公开了制备具有单链突出端的双链核酸以便扩增和测序的方法。使平端双链核酸分子与Taq接触导致单核苷酸向核酸的3’末端的非模板指导的添加,其中A被最频繁地添加A,随后是G,随后是C和T。G加尾频繁得足以使得核酸分子与衔接子的连接效率可以通过包括用T和C加尾的衔接子而显著增加。用平端衔接子连接未成功进行加尾的平端核酸分子,连接效率可以甚至进一步增加。(Methods of preparing double-stranded nucleic acids with single-stranded overhangs for amplification and sequencing are disclosed. Contacting the blunt-ended double-stranded nucleic acid molecule with Taq results in non-template directed addition of single nucleotides to the 3' end of the nucleic acid, where a is added most frequently with a, followed by G, followed by C and T. G-tailing is frequent enough that the ligation efficiency of nucleic acid molecules to adapters can be significantly increased by including adapters that are tailed with T and C. Ligation of blunt-ended nucleic acid molecules that have not been successfully tailed with blunt-ended adaptors can increase the efficiency of ligation even further.)

1. A method of preparing a nucleic acid for analysis, the method comprising:

(a) Blunting double-stranded nucleic acid having single-stranded overhangs in a sample by the action of one or more enzymes providing 5'-3' polymerase activity and 3'-5' proofreading activity and four standard types of nucleotides, wherein the single-stranded overhangs accompanying the 5 'ends serve as templates for extending complementary strands by the polymerase activity and the single-stranded overhangs accompanying the 3' ends are digested by the proofreading activity resulting in blunt-ended nucleic acid;

(b) End-tailing the blunt-ended nucleic acid by the action of a polymerase without 3' -5' proofreading function that performs non-template directed addition of nucleotides to the 3' end of the blunt-ended nucleic acid, wherein A is added prior to G and G is added prior to C or T, without separating the blunt-ended nucleic acid from other components of the sample;

(c) Annealing the nucleic acid from step (C) to an at least partially double-stranded adaptor having a single nucleotide T or C overhang at the 3' end; and

(d) Ligating the nucleic acid to the adaptor.

2. The method of claim 1, further comprising denaturing the one or more enzymes after step (a).

3. The method of claim 1 or 2, further comprising contacting the sample with the one or more enzymes, the four standard types of nucleotides, and the polymerase without 3'-5' proofreading function.

4. The method of claim 3, wherein the sample is contacted with the one or more enzymes, the four standard types of nucleotides, and the polymerase without 3'-5' proofreading function together.

5. The method of any preceding claim, wherein step (b) is performed at a higher temperature than step (a).

6. The method of claim 5, wherein step (a) is performed at ambient temperature and step (b) is performed at a temperature in excess of 60 ℃.

7. The method of any preceding claim, wherein the one or more enzymes are polymerases having 5'-3' polymerase activity and 3'-5' proofreading activity.

8. The method of any preceding claim, wherein the polymerase without 3'-5' proofreading function is a thermostable polymerase, and the method further comprises increasing the temperature of the sample after step (a) to inactivate the polymerase having 5'-3' polymerase activity and 3'-5' proofreading activity.

9. The method of any preceding claim, further comprising (e) amplifying the nucleic acid ligated to the adaptor; and (f) analyzing the nucleic acid.

10. The method of any preceding claim, further comprising contacting the sample with an at least partially double-stranded blunt-ended adaptor that is ligated to a blunt-ended double-stranded nucleic acid in a ligation step, the blunt-ended double-stranded nucleic acid having not undergone non-template directed addition of nucleotides to a 3' end.

11. The method of claim 7, wherein the polymerase having 5'-3' polymerase activity and 3'-5' proofreading activity is T4 polymerase or Klenow large fragment.

12. The method of any preceding claim, wherein the polymerase without 3'-5' proofreading function is Taq polymerase.

13. The method of any preceding claim, wherein at least steps (a) - (d) are performed in a single tube.

14. The method of any preceding claim, wherein for at least steps (a) -step (d), no components are removed from the sample.

15. The method of claim 9, wherein steps (a) - (e) are performed in a single tube.

16. The method of any preceding claim, wherein the molar ratio of adaptors having at least partial double strands of mononucleotides T to adaptors having at least partial double strands of mononucleotides C is from 4:1 to 2: 1.

17. The method of claim 16, wherein the molar ratio of blunt-ended adapters to tailed adapters is from 1:5 to 1: 500.

18. The method of any preceding claim, wherein at least 70% of the double stranded nucleic acids in the sample are ligated to an adaptor.

19. The method of claim 9, wherein at least 70% of available double-stranded nucleic acids in the sample are analyzed.

20. The method of claim 9, wherein step (f) comprises sequencing the nucleic acid ligated to the adapter.

21. The method of claim 20, wherein the sequencing sequences the nucleotides that form the overhangs in step (c) or step (d).

22. A method of converting double-stranded DNA into adaptor-tagged DNA, the method comprising:

(a) Contacting a population of double-stranded DNA molecules with a population of adaptors that are at least partially double-stranded, wherein:

(i) The population of double-stranded DNA molecules comprises DNA molecules comprising single nucleotide A overhangs and DNA molecules comprising single nucleotide G overhangs, and wherein the population is more abundant (e.g., 10-fold, 100-fold, 1000-fold) in single nucleotide A overhangs than in single nucleotide G overhangs, and

(ii) The population of at least partially double-stranded adapters comprises adapters comprising single nucleotide T overhangs and adapters comprising single nucleotide C overhangs; and

(b) Ligating said adaptor to said DNA molecule, wherein ligation produces adaptor tagged DNA.

23. The method of claim 22, wherein:

(i) The population of double stranded DNA molecules further comprises at least one of: a DNA molecule comprising a single nucleotide C overhang, a DNA molecule comprising a single nucleotide T overhang and a blunt end, and

(ii) the population of at least partially double-stranded adaptors further comprises at least one of: adapters comprising a single nucleotide G overhang, adapters comprising a single nucleotide A overhang and blunt ends.

24. The method of claim 22 or 23, wherein the at least partially double-stranded adaptor comprises an NGS ("next generation sequencing") primer binding site and a DNA barcode.

25. The method of any one of claims 22-25, wherein the population of at least partially double-stranded adaptors comprises more than one different DNA barcode.

26. The method of claim 25, wherein the number of barcode combinations attachable to both ends of a double stranded DNA molecule is less than the number of double stranded DNA molecules in the population, e.g., between 5 and 10,000 different combinations.

27. The method of claim 24, the method further comprising:

Amplifying the adaptor tagged DNA using amplification primers comprising a sample index barcode and a nucleotide sequence suitable for hybridization to an oligonucleotide immobilized to a flow cell support.

28. The method of any one of claims 22-27, wherein the adaptor is a Y-shaped adaptor.

29. The method of any preceding claim, wherein the sample is a bodily fluid sample.

30. The method of claim 29, wherein the sample is whole blood, serum, or plasma.

31. The method of any one of claims 22-30, wherein the nucleic acid population is a cell-free nucleic acid population, preferably cell-free DNA.

32. The method of any preceding claim, wherein the sample is from a subject suspected of having cancer.

33. The method of claim 9, wherein the analyzing step detects somatic cells or germline variants.

34. The method of claim 9, wherein the analyzing step detects copy number variation.

35. The method of claim 9, wherein the analyzing step detects Single Nucleotide Variations (SNVs).

36. A population of adapted nucleic acids produced by the method of any preceding claim, the population comprising more than one nucleic acid molecule, each nucleic acid molecule comprising a nucleic acid fragment flanked on both sides by adapters, the adapters comprising a barcode having a/T base pairs or G/C base pairs between the nucleic acid fragment and adapter.

37. The population of claim 36, wherein the more than one nucleic acid molecule is at least 100,000 molecules.

38. The population of claim 36 or 37, wherein the ratio of a/T base pairs to G/C base pairs is between 2:1 and 4: 1.

39. The population of any one of claims 36-38, wherein at least 99% of the nucleic acid molecules in the population have nucleic acid fragments flanked by adaptors with different barcodes.

40. A kit comprising a pair of at least partially double stranded adaptors having T and C single nucleotide 3' tails, respectively, the pair of at least partially double stranded adaptors being identical to each other except for the tail.

41. The kit of claim 40, wherein the adaptor is a Y-shaped adaptor comprising oligonucleotides SEQ ID No.1 and 2 and oligonucleotides SEQ ID No.3 and 2.

42. The kit of claim 40 or 41, further comprising T4 polymerase or Klenow large fragment and Taq polymerase and four standard types of nucleotides.

Brief Description of Drawings

FIG. 1 shows blunting, end-tailing, and ligation of Y-adapters with-T and-C tailing of sample DNA.

Definition of

A subject refers to an animal, such as a mammalian species (preferably human) or avian (e.g., bird) species or other organism, such as a plant. More specifically, the subject can be a vertebrate, e.g., a mammal such as a mouse, a primate, a simian, or a human. Animals include farm animals, sport animals, and pets. The subject may be a healthy individual, an individual having or suspected of having a disease or of being predisposed to having a disease, or an individual in need of treatment or suspected of being in need of treatment.

Genetic variants refer to alterations, variants or polymorphisms in a nucleic acid sample or genome of a subject. Such alterations, variants, or polymorphisms may be relative to a reference genome, which may be a reference genome of a subject or other individual. Variations include one or more Single Nucleotide Variations (SNVs), insertions, deletions, repeats, small insertions, small deletions, small repeats, structural variant junctions, variable length tandem repeats, and/or flanking sequences, Copy Number Variations (CNVs), transversions, and other rearrangements are also forms of genetic variation. The variation may be a base change, insertion, deletion, duplication, copy number variation, transversion or a combination thereof.

cancer markers are genetic variants associated with the presence of cancer or the risk of developing cancer. A cancer marker may provide an indication that a subject has cancer or has a high risk of developing cancer that is higher than the risk of developing cancer in an age and sex matched subject of the same species. The cancer marker may or may not be the cause of the cancer.

Nucleic acid tags are short nucleic acids (e.g., less than 100, 50, or 10 nucleotides in length), typically artificial sequences and typically DNA, that are used to label sample nucleic acids to distinguish between nucleic acids from different samples (e.g., representing a sample index), different types of nucleic acids, or nucleic acids that undergo different processes. The tag may be single-stranded or double-stranded. The nucleic acid tag can be decoded to reveal information, such as the sample source, form, or processing of the nucleic acid. Tags can be used to allow multiple nucleic acids carrying different tags to be pooled and processed in parallel, which are then deconvoluted by reading the tags. The label may also be referred to as a molecular identifier or a barcode.

Adapters are typically short nucleic acids (e.g., less than 500, 100, or 50 nucleotides in length, and typically DNA) that are at least partially double-stranded for ligation to either or both ends of a sample nucleic acid molecule. The adaptors may comprise primer binding sites that allow amplification of sample nucleic acid molecules flanked on both ends by adaptors, and/or sequencing primer binding sites, including primer binding sites for next generation sequencing. The adapter may also comprise a binding site for a capture probe, such as an oligonucleotide attached to a flow cell support. The adapter may also comprise a tag as described above. The tag is preferably positioned relative to the primer binding site and the sequencing primer binding site such that the tag is contained in the amplicon and sequencing reads of the sample nucleic acid. The same or different adaptors can be ligated to the respective ends of the sample molecules. Sometimes the same adapter is ligated to the corresponding end, except for the tag. Preferred adaptors are Y-shaped adaptors in which one end is blunt ended or tailed as described herein for ligation to sample nucleic acid which is also blunt ended or tailed with complementary nucleotides. Another preferred adaptor is a bell-shaped adaptor, also having blunt-ended or tailed ends for ligation to the nucleic acid to be analysed.

The four standard types of nucleotides are A, C, G, T for deoxyribonucleotides and A, C, T and U for ribonucleotides.

Detailed description of the invention

1. For review (general)

Sample preparation for next generation sequencing platforms generally followed a similar protocol. The sample typically comprises double stranded nucleic acid fragments with single stranded overhangs. Such fragments may be blunt-ended and ligated directly to adapters. However, such ligation also produces by-products in which adaptors or fragments form concatamers. The formation of such by-products can be reduced by an alternative procedure in which blunt-ended fragments are A-tailed and ligated to T-tailed adaptors. Commercial kits for end repair and tailing in a single tube are simple and quick to use, and can be used with commercially available adaptors (e.g., NEBNext Ultra II (New England Biolabs, Ipswich, MA.). however, use of kits that are not optimized for A tailing may result in tailing with other nucleotides such as G, T and C.

The present invention provides improved methods for preparing double stranded nucleic acids, preferably DNA, having single stranded overhangs for amplification and subsequent analysis, particularly sequencing. It has been found that contacting a blunt-ended double-stranded nucleic acid with Taq in the presence of all four standard types of nucleotides results in non-template directed addition of single nucleotides to the 3' end of the nucleic acid, such that a, followed by G, followed by C and T are added most frequently. Although the inclusion of additional nucleic acid molecules increases the likelihood of off-target side reactions, it has been found that the ratio of single G-tailing to single a-tailing is sufficiently high that the ligation efficiency of nucleic acid molecules in a sample to adapters can be significantly increased by a mixture comprising custom adapters that are not only T-tailed (as in prior methods) but are also C-tailed, which anneal to the 3' ends of DNA molecules tailed with a and G, respectively. Ligation efficiency can be increased even further by also including blunt-ended adaptors (i.e., not tailed with any nucleotides) to ligate to blunt-ended nucleic acid molecules in the sample that have not undergone tailing with any nucleotides.

2. Sample (I)

The sample may be any biological sample isolated from a subject. The sample may comprise a body tissue, such as a known or suspected solid tumor, whole blood, platelets, serum, plasma, stool, red blood cells, white blood cells (white blood cells) or white blood cells (leucocytes), endothelial cells, tissue biopsies, cerebrospinal fluid, synovial fluid, lymph fluid, ascites fluid, interstitial or extracellular fluid, fluid in the space between cells, including gingival crevicular fluid, bone marrow, pleural effusion, cerebrospinal fluid, saliva, mucus, sputum, semen, sweat, urine. The sample is preferably a body fluid, in particular blood and fractions thereof, as well as urine. The sample may be in a form that is initially isolated from the subject, or may be further processed to remove or add components such as cells, or to enrich one component relative to another. Thus, the preferred body fluid for analysis is plasma or serum containing cell-free nucleic acids.

The volume of plasma may depend on the desired read depth for the sequencing region. Exemplary volumes are 0.4mL to 40mL, 5mL to 20mL, 10mL to 20 mL. For example, the volume may be 0.5mL, 1mL, 5mL, 10mL, 20mL, 30mL, or 40 mL. The volume of plasma sampled may be, for example, 5mL to 20 mL.

The sample may comprise varying amounts of nucleic acid comprising genomic equivalents. For example, a sample of about 30ng of DNA may contain about 10,000 haploid human genome equivalents, while in the case of cell-free DNA, about 2000 billion individual nucleic acid molecules. Similarly, a sample of about 100ng of DNA may contain about 30,000 haploid human genome equivalents, and in the case of cell-free DNA, about 6000 billion individual molecules. Some samples contain 1ng-500ng, 2ng-100ng, 5ng-150ng cell-free DNA, e.g., 5ng-30ng or 10ng-150ng cell-free DNA.

The sample may comprise nucleic acids from different sources. For example, the sample may comprise germline DNA or somatic DNA. The sample may comprise nucleic acids carrying mutations. For example, the sample may comprise DNA carrying germline and/or somatic mutations. The sample may also comprise DNA carrying cancer-associated mutations (e.g., cancer-associated somatic mutations).

Exemplary amounts of cell-free nucleic acid in the sample prior to amplification range from about 1fg to about 1ug, e.g., 1pg to 200ng, 1ng to 100ng, 10ng to 1000 ng. For example, the amount of cell-free nucleic acid molecule can be up to about 600ng, up to about 500ng, up to about 400ng, up to about 300ng, up to about 200ng, up to about 100ng, up to about 50ng, or up to about 20 ng. The amount of cell-free nucleic acid molecule can be at least 1fg, at least 10fg, at least 100fg, at least 1pg, at least 10pg, at least 100pg, at least 1ng, at least 10ng, at least 100ng, at least 150ng, or at least 200 ng. The amount of cell-free nucleic acid molecule can be up to 1 femtogram (fg), 10fg, 100fg, 1 picogram (pg), 10pg, 100pg, 1ng, 10ng, 100ng, 150ng, or 200 ng. The method may include obtaining 1 femtogram (fg) to 200 ng.

Exemplary samples are 5ml to 10ml of whole blood, plasma or serum, which include about 30ng of DNA or about 10,000 haploid genome equivalents.

Cell-free nucleic acid is nucleic acid that is not contained within or otherwise bound to a cell, or in other words, remains in a sample with intact cells removed. Cell-free nucleic acids include DNA, RNA, and hybrids thereof, including genomic DNA, mitochondrial DNA, siRNA, miRNA, circulating RNA (crna), tRNA, rRNA, small nucleolar RNA (snorna), Piwi-interacting RNA (pirna), long non-coding RNA (long ncRNA), or fragments of any of these. The cell-free nucleic acid can be double-stranded, single-stranded, or hybrids thereof. For any of the methods disclosed herein, wherein at least some of the double stranded DNA molecules with single stranded overhangs are a preferred form of cell-free DNA. Cell-free nucleic acids can be released into body fluids by secretory or cell death procedures, such as cell necrosis and apoptosis. Some cell-free nucleic acids are released from cancer cells into body fluids, such as circulating tumor dna (ctdna). Others are released from healthy cells.

the cell-free nucleic acid can have one or more epigenetic modifications, e.g., the cell-free nucleic acid can be acetylated, methylated, ubiquinated, phosphorylated, sumoylated, ribosylated, and/or citrullinated.

Cell-free nucleic acids have a size distribution of about 100-500 nucleotides, particularly 110 to about 230 nucleotides, with a mode of about 168 nucleotides and a second minor peak in the range between 240 to 440 nucleotides.

Cell-free nucleic acids can be isolated from body fluids by a separation step in which e.g. cell-free nucleic acids present in solution are separated from intact cells and other non-soluble components of the body fluid. Separation may include techniques such as centrifugation or filtration. Alternatively, cells in the body fluid may be lysed and cell-free nucleic acid and cellular nucleic acid are processed together. Generally, the nucleic acid can be precipitated with alcohol after the addition of the buffer and the washing step. Further cleaning steps such as silica-based columns may be used to remove contaminants or salts. For example, non-specific bulk (bulk) vector nucleic acid may be added throughout the reaction to optimize certain aspects of the procedure such as yield.

After such treatment, the sample may include various forms of nucleic acids, including double-stranded DNA, single-stranded DNA, and single-stranded RNA. Optionally, single-stranded DNA and single-stranded RNA can be converted to double-stranded form, so that they are included in subsequent processing and analysis steps.

3. Ligating sample nucleic acid molecules to adaptors

The nucleic acid present in the sample with or without pretreatment as described above typically comprises a substantial portion of the molecule in the form of a partially double-stranded molecule with a single-stranded overhang. Such molecules can be converted into blunt-ended double-stranded molecules by treatment with one or more enzymes that provide a 5'-3' polymerase and a 3'-5' exonuclease (or proofreading function) in the presence of all four standard types of nucleotides, as shown in the upper part of fig. 1. Such active combinations may extend strands having a concave 3 'end so that they are eventually flush (flush) with the 5' end of the opposing strand (in other words, produce a blunt end), or may digest strands having a 3 'overhang so that they are also flush with the 5' end of the opposing strand. Both activities may optionally be conferred by a single polymerase. The polymerase is preferably heat sensitive so that its activity can be terminated when the temperature is increased. Klenow large fragment and T4 polymerase are examples of suitable polymerases.

The one or more enzymes conferring 5'-3' polymerase activity and 3'-5' exonuclease activity are preferably denatured by increasing the temperature or otherwise. For example, denaturation can be achieved by raising the temperature to, for example, 75 ℃ to 80 ℃. The sample is then acted upon by a polymerase lacking the correction function (figure 1 middle). The polymerase is preferably thermostable so as to remain active at elevated temperatures. Taq, Bst Large fragment and Tth polymerase are examples of such polymerases. The second polymerase effects the non-template addition of a single nucleotide to the 3' end of the blunt-ended nucleic acid. Although the reaction mixture typically contains an equal molar amount of each of the four standard types of nucleotides from the previous step, the four types of nucleotides are not added to the 3' end in equal proportions. Instead, a is added most frequently, followed by G, followed by C and T.

After tailing the sample molecules, and with or without subsequent purification of the tailed sample molecules, the tailed sample molecules are contacted with an adaptor that is tailed with complementary T and C nucleotides at one end of the adaptor (fig. 1, bottom). Adaptors are typically formed by individual synthesis and annealing of their respective strands. Thus, additional T and C tails may be added as additional nucleotides during synthesis of one strand. Adapters that are tailed with G and A are generally not included because although these adapters can anneal to sample molecules that are tailed with C and T, respectively, they will also anneal to other adapters. Adaptor molecules carrying complementary nucleotides (i.e., T-A and C-G) at their 3' ends anneal to the sample molecules and can ligate to each other. The percentage of C-tailed adapters relative to T-tailed adapters ranges from about 5% -40% by moles, such as 10% -35%, 15% -25%, 20% -35%, 25% -35%, or about 30%. Since non-template directed addition of mononucleotides to the 3' end of the sample molecules does not proceed to completion, the sample also contains some blunt-ended sample molecules that are not tailed. These molecules can also be recovered by providing the sample with adapters having one and preferably only one blunt end. Blunt-ended adaptors are typically provided in a molar ratio of 0.2% to 20%, or 0.5% to 15%, or 1% to 10% adaptors to T-tailed and C-tailed adaptors. Blunt-ended adapters may be provided simultaneously with, before, or after T-tailed and C-tailed adapters. Blunt-ended adaptors are ligated to blunt-ended sample molecules, again resulting in sample molecules flanked on both sides by adaptors. These molecules lack the A-T or C-G nucleotide pairs between the sample and the adapter that are present when the tailed sample molecule is ligated to the tailed adapter.

The adapters used for these reactions preferably have one and only one end tailed with a T or C or have one and only one blunt end, so that they can be ligated to the sample molecules in only one orientation. The adapter may be, for example, a Y-shaped adapter, wherein one end is tailed or flat and the other end has two single strands. An exemplary Y-shaped adaptor has the following sequence, indicating a tag (6 bases). The above oligonucleotides comprise a single base T tail.

Universal adaptors:

5'AATGATACGGCGACCACCGAGATCTACACTCTTTCCCTACACGACGCTCTTCCGATCT(SEQ ID NO.1)。

Adaptor, index 1-12: 5' GATCGGAAGAGCACACGTCTGAACTCCAGTCAC (6 bases) ATCTCGTATGCCGTCTTCTGCTTG (SEQ ID NO. 2).

Another Y-adapter with a C-tail has the following sequence:

5' AATGATACGGCGACCACCGAGATCTACACTCTTTCCCTACACGACGCTCTTCCGATCC (SEQ ID NO.3) and an adaptor, indices 1-12: 5' GATCGGAAGAGCACACGTCTGAACTCCAGTCAC (6 bases) ATCTCGTATGCCGTCTTCTGCTTG (SEQ ID NO. 2).

Such a customized combination of oligonucleotides including oligonucleotides having both a T-tail and a C-tail may be synthesized for use in the present methods.

truncated forms of these adaptor sequences have been obtained by Rohland et al, Genome res.2012, month 5; 22(5) 939-.

Adapters may also be bell-shaped, having only one tailed or blunt end. The adaptors may include primer binding sites for amplification, binding sites for sequencing primers, and/or nucleic acid tags for identification purposes. The same or different adaptors can be used in a single reaction.

when the adapter comprises an identification tag and the nucleic acids in the sample are attached to the adapter at each end, the number of potential combinations of identifiers grows exponentially with the number of unique tags provided (i.e., nn combinations, where n is the number of unique identification tags). In some methods, the number of combinations of unique tags is sufficient such that it is statistically likely that all or substantially all (e.g., at least 90%) of the different double stranded DNA molecules in the sample receive different combinations of tags. In some methods, the number of unique combinations of identifier tags is less than the number of unique double stranded DNA molecules in the sample (e.g., 5 to 10,000 different tag combinations).

Kits for providing suitable enzymes for carrying out the above methods are the UltraTM II DNA library preparation kits. The kit provides the following reagents:

NEBNext Ultra II end preparation enzyme mixture, NEBNext Ultra II end preparation reaction buffer, NEBNext ligation enhancer, NEBNext Ultra II ligation master mixture-20 and Ultra II master mixture.

The blunting and tailing of the sample nucleic acid can be performed in a single tube. Before the tailing reaction occurs, the blunt-ended nucleic acid does not have to be separated from the enzyme that is blunted. Optionally, all enzymes, nucleotides and other reagents are provided together before the blunting reaction occurs. Providing together means that all substances are added to the sample in a sufficiently close time that all substances are present when incubation of the sample to allow blunting to occur occurs. Optionally, no material is removed from the sample after providing the enzymes, nucleotides and other reagents at least until the blunting and end-tailing incubations are complete. Generally, the end-tailing reaction is performed at a higher temperature than the blunt-end reaction. For example, the blunt-ended reaction may be carried out at ambient temperature, where the 5'-3' polymerase and 3'-5' exonuclease are active, and the thermostable polymerase is inactive or minimally active; and the end-tailing reaction is carried out at elevated temperatures, such as above 60 ℃, when the 5'-3' polymerase and the 3'-5' exonuclease are inactive and the thermostable polymerase is active.

Attachment of T-tailed adapters and C-tailed adapters as described produces a population of adapted nucleic acids that includes more than one nucleic acid molecule, each nucleic acid molecule comprising a nucleic acid fragment flanked on both sides by adapters, the adapters comprising a barcode with a/T or G/C base pairs between the nucleic acid fragment and the adapter. The more than one nucleic acid molecule may be at least 10,000, 100,000 or 1,000,000 molecules. The ratio of A/T base pairs to G/C base pairs at the junction region between the fragment and the flanking adapters depends on the ratio of T-tailed adapters to C-tailed adapters and is, for example, between 2:1 and 4: 1. Most nucleic acids in a population are flanked by adapters with different barcodes (e.g., at least 99%). If blunt-ended adaptors are also included, the population comprises one nucleic acid molecule with a nucleic acid fragment directly attached to an adaptor (i.e., without an intervening A/T pair or G/C pair) at either or both ends.

4. Amplification of

sample nucleic acid flanked by adaptors can be amplified by PCR and other amplification methods, which are typically primed from primers that bind to primer binding sites in the nucleic acid flanked by adaptors to be amplified. Amplification methods may include cycles of extension, denaturation and annealing caused by thermal cycling, or may be isothermal, as in transcription-mediated amplification. Other amplification methods include ligase chain reaction, strand displacement amplification, nucleic acid sequence-based amplification, and self-sustained sequence-based replication.

Preferably, the method results in at least 75%, 80%, 85%, 90% or 95% of the double stranded nucleic acid in the sample being ligated to the adaptor. Preferably, the use of T-tailing and C-tailing increases the percentage of double stranded nucleic acid ligated to adaptors in the sample by at least 1%, 2%, 3%, 4%, 5%, 6%, 7%, 8%, 9% or 10% (increasing yield from 75% to 80%, considered an increase of 5%) relative to a control method performed with T-tailed adaptors alone. Preferably, the use of T-tailing and C-tailing in combination with blunt-ended adaptors increases the percentage of double-stranded nucleic acid ligated to the adaptors by at least 5%, 10%, 15%, 20%, or 25%. The percentage of nucleic acid ligated to the adapter can be determined by comparative gel electrophoresis of the original sample and the treated sample after ligation to the adapter is completed.

Preferably, the method results in at least 75%, 80%, 85%, 90% or 95% of available double stranded molecules in the sample being sequenced. Preferably, the use of T-tailing and C-tailing increases the percentage of double stranded nucleic acid sequenced in the sample by at least 1%, 2%, 3%, 4%, 5%, 6%, 7%, 8%, 9%, or 10% relative to a control method performed with T-tailed adaptors alone. Preferably, the use of T-tailed and C-tailed conjugated blunt-ended adaptors increases the percentage of sequenced double-stranded nucleic acid in the sample by at least 5%, 10%, 15%, 20%, or 25% relative to a control method performed with T-tailed adaptors alone. The percentage of nucleic acid sequenced can be determined by comparing the number of molecules actually sequenced based on the number that can be sequenced based on the input nucleic acid and the genomic region targeted for sequencing.

5. Label (R)

Tags providing molecular identifiers or barcodes may be incorporated into or otherwise ligated to adapters by ligation, overlap extension PCR, and other methods. In general, the assignment of unique or non-unique identifiers or molecular barcodes in a reaction follows the methods and systems described by U.S. patent applications 20010053519, 20030152490, 20110160078 and U.S. patent No. 6,582,908 and U.S. patent No. 7,537,898.

The tags may be randomly or non-randomly attached to the sample nucleic acids. In some cases, they are introduced at a desired ratio of unique identifier to microwell. For example, the unique identifiers can be loaded such that each genomic sample carries more than about 1, 2, 3, 4, 5, 6, 7,8, 9, 10, 20, 50, 100, 500, 1000, 5000, 10000, 50,000, 100,000, 500,000, 1,000,000, 10,000,000, 50,000,000, or 1,000,000,000 unique identifiers. In some cases, the unique identifiers may be loaded such that each genomic sample loads less than about 2, 3, 4, 5, 6, 7,8, 9, 10, 20, 50, 100, 500, 1000, 5000, 10000, 50,000, 100,000, 500,000, 1,000,000, 10,000,000, 50,000,000, or 1,000,000,000 unique identifiers. In some cases, the average number of unique identifiers loaded per sample genome is less than or greater than about 1, 2, 3, 4, 5, 6, 7,8, 9, 10, 20, 50, 100, 500, 1000, 5000, 10000, 50,000, 100,000, 500,000, 1,000,000, 10,000,000, 50,000,000, or 1,000,000,000 unique identifiers per genome sample.

In some cases, the unique identifier may be a predetermined or random or semi-random sequence oligonucleotide. In other cases, more than one barcode may be used such that the barcodes are not necessarily unique relative to each other in the more than one barcode. In this example, the barcode may be attached to a separate molecule such that the combination of the barcode and the sequence to which it can be attached produces a unique sequence that can be traced separately. As described herein, the detection of non-unique barcodes in combination with sequence data at the beginning (beginning) and end (ending) portions of sequence reads may allow for the assignment of unique identities for particular molecules. The length or base pair number of individual sequence reads can also be used to assign unique identities to such molecules. As described herein, fragments from a single strand of nucleic acid that has been assigned a unique identity may thus allow for subsequent identification of fragments from that parent strand.

Polynucleotides in a sample can be tagged with a sufficient number of different tags such that there is a high probability (e.g., at least 90%, at least 95%, at least 98%, at least 99%, at least 99.9%, or at least 99.99%) that all polynucleotides mapped to a particular genomic region carry different identification tags (the molecules within that region are substantially uniquely tagged). The genomic region to which a polynucleotide may map may be, for example, (1) the entire set of genes being sequenced, (2) a portion of the set such as within a single gene, exon, or intron, (3) a single nucleotide coordinate (e.g., to which at least one nucleotide in a polynucleotide maps, for example, a start position, an end position, a midpoint, or any position in between) or (4) a particular start/end (start/end) nucleotide coordinate pair. The number of different identifiers (tag count) required to tag a polynucleotide uniquely in essence varies with how many original polynucleotide molecules in the sample map to that region. This in turn varies with several factors. One factor is the total number of haploid genome equivalents included in the assay. Another factor is the average size of the polynucleotide molecules. Another factor is the distribution of molecules throughout the region. This in turn can vary with the cleavage pattern-one can expect that cleavage occurs primarily between nucleosomes, such that more polynucleotides map across nucleosome locations and not between nucleosomes. Another factor is the distribution of barcodes in the pool and the efficiency of linking individual barcodes, which may cause differences in the effective concentration of one barcode compared to another. Another factor is the size of the region within which the molecule to be uniquely tagged is confined (e.g., the same start/stop or the same exon).

The identifier may be a single barcode attached to one end of the molecule, or two barcodes, each attached to a different end of the molecule. Attaching barcodes independently to both ends of the molecule increases the number of possible identifiers by the square. In this case, the number of different barcodes is selected such that the combination of barcodes on each end of a particular polynucleotide is of a high probability of uniqueness relative to other polynucleotides mapped to the same selected genomic region.

In certain embodiments, the number of different identifiers or barcode combinations used (tag count) may be at least any one of 64, 100, 400, 900, 1400, 2500, 5625, 10,000, 14,400, 22,500, or 40,000, and no more than any one of 90,000, 40,000, 22,500, 14,400, or 10,000. For example, the number of identifiers or barcode combinations may be between 64 and 90,000, between 400 and 22,500, 400 and 14,400, or between 900 and 14,400.

In samples comprising fragmented genomic DNA from more than one genome, e.g. cell-free DNA (cfdna), there is a certain probability that: more than one polynucleotide from different genomes will have the same start and stop positions ("duplicates" or "homologues"). The likely number of replicates starting at any position varies with the number of haploid genome equivalents in the sample and the distribution of fragment sizes. For example, cfDNA has a peak at about 160 nucleotides, and the majority of fragments in this peak range from about 140 nucleotides to 180 nucleotides. Thus, cfDNA from a genome having about 30 hundred million bases (e.g., the human genome) can contain almost 2 million (2x107) polynucleotide fragments. A sample of about 30ng of DNA may contain about 10,000 haploid human genome equivalents. (similarly, a sample of about 100ng of DNA may comprise about 30,000 haploid human genome equivalents.) a sample comprising about 10,000(104) haploid genome equivalents of such DNA may have about 2,000 billion (2x1011) individual polynucleotide molecules. It has been empirically determined that in a sample of about 10,000 haploid genome equivalents of human DNA, there are about 3 replicon polynucleotides beginning at any given location. Thus, such a collection may comprise a diversity of about 6x1010-8x1010 (about 600-800 million, e.g., about 700 million (7x1010)) of sequenced different polynucleotide molecules.

The probability of correctly recognizing a molecule depends on the initial number of genome equivalents, the length distribution of the sequenced molecules, the sequence uniformity (sequence uniformity) and the number of tags. This number may be calculated using a poisson distribution. When the tag count is equal to 1, it is equivalent to not having a unique tag or not being tagged. Table 1 below lists the probabilities of correctly identifying molecules as unique, assuming a typical cell-free size distribution as above.

TABLE 1

Label counting Correctly uniquely identified tag%
1000 human haploid genome equivalents
1 96.9643
4 99.2290
9 99.6539
16 99.8064
25 99.8741
100 99.9685
3000 human haploid genome equivalents
1 91.7233
4 97.8178
9 99.0198
16 99.4424
25 99.6412
100 99.9107

In this case, it may not be possible to determine which sequence reads originate from which parent molecules after sequencing the genomic DNA. This problem can be mitigated by: tagging parent molecules with a sufficient number of unique identifiers (e.g., tag count) such that there is a possibility that two replicator molecules (i.e., molecules with the same start and stop positions) carry different unique identifiers such that sequence reads can be traced to a particular parent molecule. One solution to this problem is to tag each or almost every different parent molecule in the sample uniquely. However, depending on the number of haploid gene equivalents and the distribution of fragment sizes in the sample, this may require billions of different unique identifiers.

This process can be cumbersome and expensive. In some aspects, provided herein are methods and compositions wherein a population of polynucleotides in a sample of fragmented genomic DNA is tagged with n different unique identifiers, wherein n is at least 2 and no more than 100,000 x z, wherein z is a measure of the median tendency (e.g., mean, median, mode) of the expected number of replicate molecules having the same start and stop positions. In certain embodiments, n is any one of (e.g., a lower limit) at least 2x z, 3 x z, 4 x z, 5 x z, 6x z, 7x z, 8x z, 9 x z, 10 x z, 11 x z, 12 x z, 13 x z, 14 x z, 15 x z, 16 x z, 17 x z, 18 x z, at least 19 x z, 20 x z, or 100 x z. In other embodiments, n is no greater than 100,000, 10,000, 2000, 1000, 500, or 100 (e.g., upper limit). Thus, n may range between any combination of these lower and upper limits. In certain embodiments, n is between 100 and 1000, 5 and 15, between 8 and 12, or about 10. For example, a haploid human genome equivalent has about 3 picograms of DNA. A sample of about 1 microgram of DNA contains about 300,000 haploid human genome equivalents. The number n may be between 15 and 45, between 24 and 36, between 64 and 2500, between 625 and 31,000, or between about 900 and 4000. Improved sequencing can be achieved as long as at least some of the replicate or homologue polynucleotides carry unique identifiers, i.e. carry different tags. However, in certain embodiments, the number of tags used is selected such that the chance that all replicator molecules starting at any one position carry a unique identifier is at least 95%. For example, a sample comprising about 10,000 haploid human genome equivalents of cfDNA can be tagged with about 36 unique identifiers. The unique identifier may comprise 6 unique DNA barcodes. Attached to both ends of the polynucleotide, 36 possible unique identifiers were generated. The sample labeled in this way may be a sample that: it has fragmented polynucleotides, e.g. genomic DNA, e.g. cfDNA, in the range of any of about 10ng to about 100ng, about 1 μ g, about 10 μ g.

Thus, the present disclosure also provides compositions of tagged polynucleotides. The polynucleotide may comprise fragmented DNA, such as cfDNA. A set of polynucleotides in a composition that map to mappable base positions in a genome can be non-uniquely tagged, i.e., the number of different identifiers can be at least 2 and less than the number of polynucleotides that map to mappable base positions. A composition of between about 10ng to about 10 μ g (e.g., any of about 10ng-1 μ g, about 10ng-100ng, about 100ng-10 μ g, about 100ng-1 μ g, about 1 μ g-10 μ g) can carry different identifiers between any of 2,5, 10, 50, or 100 to any of 100, 1000, 10,000, or 100,000. For example, between 5 and 100 or between 100 and 4000 different identifiers may be used to tag polynucleotides in such compositions.

An event in which different molecules map to the same coordinates (in this case, with the same start/stop position) and carry the same tag instead of different tags is called "molecular collision". In some cases, the actual number of molecular collisions may be greater than the number of theoretical collisions, e.g., as calculated above. This may vary with uneven distribution of molecules throughout the coordinates, differences in the efficiency of the linkage between barcodes, and other factors. In this case, empirical methods may be used to determine the number of barcodes needed to approximate the theoretical number of collisions. In one embodiment, provided herein are methods for determining the number of barcodes needed to reduce barcode collisions for a given haploid genome equivalent based on the length distribution and sequence uniformity of the sequenced molecules. The method comprises creating a pool of more than one nucleic acid molecule; labeling each pool with an incrementally increasing number of barcodes; and determining an optimal number of barcodes that reduces the number of barcode collisions to a theoretical level, e.g., a theoretical level that may result from differences in effective barcode concentration due to differences in pooling and ligation efficiencies.

In one embodiment, the number of identifiers required to substantially uniquely tag a polynucleotide mapped to a region may be determined empirically. For example, a selected number of different identifiers can be attached to molecules in the sample, and the number of different identifiers of molecules mapped to the region can be counted. If an insufficient number of identifiers are used, some polynucleotides mapped to that region will carry the same identifier. In this case, the number of identifiers counted will be less than the number of original molecules in the sample. The number of different identifiers used may be iteratively increased for a sample type until additional identifiers representing new original molecules are not detected. For example, in a first iteration, five different identifiers representing at least five different original molecules may be counted. In a second iteration, using more barcodes, seven different identifiers representing at least seven different original molecules are counted. In the third iteration, using more barcodes, 10 different identifiers representing at least 10 different original molecules were counted. In the fourth iteration, more barcodes were used, again counting 10 different identifiers. At this point, adding more barcodes is unlikely to increase the number of original molecules detected.

6. sequencing

Sample nucleic acids flanked by adaptors with or without prior amplification can be sequenced. Sequencing methods include, for example, Sanger sequencing, high throughput sequencing, pyrosequencing, sequencing by synthesis, single molecule sequencing, nanopore sequencing, semiconductor sequencing, ligation sequencing, sequencing by hybridization, RNA-seq (Illumina), digital gene expression (Helicos), next generation sequencing, single molecule sequencing by synthesis (SMSS) (Helicos), massively parallel sequencing, clonal single molecule arrays (Solexa), shotgun sequencing, Ion Torrent, Oxford nanopores, Roche Genia, Maxim-Gilbert sequencing, primer walking, sequencing using PacBio, SOLID, Ion Torrent, or nanopore platforms. The sequencing reaction may be performed in a variety of sample processing units, which may include multiple lanes, multiple channels, multiple wells, or other devices that process multiple sets of samples substantially simultaneously. The sample processing unit may further comprise a plurality of sample chambers to enable simultaneous processing of a plurality of runs.

The sequencing reaction may be performed on one or more fragment types known to contain markers for cancer or other diseases. The sequencing reaction may also be performed on any nucleic acid fragments present in the sample. A sequencing reaction may provide at least 5%, 10%, 15%, 20%, 25%, 30%, 40%, 50%, 60%, 70%, 80%, 90%, 95%, 99%, 99.9%, or 100% coverage of the genomic sequence. In other cases, genomic sequence coverage may be less than 5%, 10%, 15%, 20%, 25%, 30%, 40%, 50%, 60%, 70%, 80%, 90%, 95%, 99%, 99.9%, or 100%.

Simultaneous sequencing reactions can be performed using multiplex sequencing. In some cases, cell-free nucleic acids can be sequenced with at least 1000, 2000, 3000, 4000, 5000, 6000, 7000, 8000, 9000, 10000, 50000, 100,000 sequencing reactions. In other cases, the cell-free polynucleotide can be sequenced with less than 1000, 2000, 3000, 4000, 5000, 6000, 7000, 8000, 9000, 10000, 50000, 100,000 sequencing reactions. The sequencing reactions may be performed sequentially or simultaneously. All or part of the sequencing reaction may be subjected to subsequent data analysis. In some cases, data analysis may be performed on at least 1000, 2000, 3000, 4000, 5000, 6000, 7000, 8000, 9000, 10000, 50000, 100,000 sequencing reactions. In other cases, data analysis may be performed on fewer than 1000, 2000, 3000, 4000, 5000, 6000, 7000, 8000, 9000, 10000, 50000, 100,000 sequencing reactions.

The sequencing method may be massively parallel sequencing, i.e., sequencing any of at least 100, 1000, 10,000, 100,000, 1 million, or 10 million nucleic acid molecules simultaneously (or in rapid succession).

7. Analysis of

The present methods can be used to diagnose a condition, particularly the presence of cancer, in a subject, to characterize the condition (e.g., to stage the cancer or to determine heterogeneity of the cancer), to monitor response to treatment of the condition, to influence prognostic risk for developing the condition or subsequent progression of the condition.

A variety of cancers can be detected using the present methods. Cancer cells, such as most cells, can be characterized by a turnover rate at which old cells die and are replaced by newer cells. Generally, dead cells in contact with the vasculature in a given subject can release DNA or DNA fragments into the bloodstream. The same is true of cancer cells during different stages of the disease. Depending on the stage of the disease, cancer cells can also be characterized by a variety of genetic aberrations, such as copy number variation and rare mutations. This phenomenon can be used to detect the presence or absence of an individual with cancer using the methods and systems described herein.

The types and number of cancers that can be detected may include leukemia, brain cancer, lung cancer, skin cancer, nasal cancer, larynx cancer, liver cancer, bone cancer, lymphoma, pancreatic cancer, skin cancer, intestinal cancer, rectal cancer, thyroid cancer, bladder cancer, kidney cancer, oral cancer, stomach cancer, solid tumor (solid tumor), heterogeneous tumor, homogeneous tumor, and the like.

cancer can be detected from genetic variations, including mutations, rare mutations, gain-of-place, copy number variations, transversions, translocations, inversions, deletions, aneuploidies, partial aneuploidies, polyploids, chromosomal instability, chromosomal structure alterations, gene fusions, chromosomal fusions, gene truncations, gene amplifications, gene duplications, chromosomal lesions, DNA lesions, abnormal alterations in chemical modifications of nucleic acids, abnormal alterations in epigenetic patterns, abnormal alterations in methylation of nucleic acids, infections and cancers.

Genetic data can also be used to characterize a particular form of cancer. Cancer is often heterogeneous in both composition and stage. Genetic profile data may allow for the characterization of a particular subtype of cancer, which may be important in the diagnosis or treatment of that particular subtype. This information may also provide the subject or practitioner with clues as to the prognosis of a particular type of cancer and allow the subject or practitioner to adjust treatment options according to the progression of the disease. Some cancers progress, becoming more aggressive and genetically unstable. Other cancers may remain benign, inactive or dormant. The systems and methods of the present disclosure may be used to determine disease progression.

The present analysis can also be used to determine the efficacy of a particular treatment option. Successful treatment options can increase the amount of copy number variation or rare mutations detected in the blood of a subject if the treatment is successful, as more cancers may die and shed DNA. In other instances, this may not occur. In another example, perhaps certain treatment options may be associated with the genetic profile of the cancer over time. This correlation can be used to select a therapy. In addition, if cancer is observed to decline after treatment, the present method can be used to monitor the remaining disease or recurrence of the disease.

The method may also be used to detect genetic variations in conditions other than cancer. Immune cells, such as B cells, can undergo rapid clonal expansion in the presence of certain diseases. Clonal expansion can be monitored using copy number variation detection, and certain immune states can be monitored. In this example, copy number variation analysis can be performed over time to generate a profile of how a particular disease may progress. Copy number variation or even rare mutation detection can be used to determine how a pathogen population changes during the course of an infection. This may be particularly important during chronic infections such as HIV/AIDs or hepatitis infections, where the virus may change life cycle state and/or mutate to a more virulent form during the course of the infection. The method can be used to determine or profile rejection activity of the host body when immune cells attempt to destroy transplanted tissue, to monitor the status of transplanted tissue and to alter the course of treatment or prevent rejection.

furthermore, the methods of the present disclosure can be used to characterize heterogeneity of an abnormal condition in a subject, the methods comprising generating a genetic profile of extracellular polynucleotides in a subject, wherein the genetic profile comprises more than one datum resulting from copy number variation and rare mutation analysis. In some cases, including but not limited to cancer, the disease may be heterogeneous. The disease cells may be different. In the example of cancer, it is known that some tumors contain different types of tumor cells, some of which are at different stages of the cancer. In other examples, heterogeneity may include multiple foci of disease. Also, in the case of cancer, there may be multiple tumor foci, perhaps with one or more of the foci being the result of metastases that have spread from the primary site.

The method may be used to generate or perform profiling of a fingerprint or dataset that is the sum of genetic information obtained from different cells of a heterogeneous disease. The data set can comprise copy number variation and rare mutation analysis, alone or in combination.

The method may be used to diagnose, prognose, monitor or observe cancer or other diseases of fetal origin. That is, these methods can be used in pregnant subjects to diagnose, prognose, monitor or observe cancer or other diseases in unborn subjects whose DNA and other nucleic acids can co-circulate with the maternal molecule.

9. Reagent kit

The present disclosure also provides kits for carrying out any of the above methods. An exemplary kit includes a pair of at least partially double-stranded adaptors having T and C mononucleotide 3' tails, respectively. Preferably, the paired oligonucleotides are identical except for the T-tail and C-tail. Optionally, the kit does not contain an at least partially double-stranded adaptor with a and G mononucleotide 3' tails. Preferably, the adaptor is Y-shaped, such as an adaptor comprising the oligonucleotides SEQ ID No.1 and 2 and the oligonucleotides SEQ ID No.3 and 2. The kit may also contain enzymes for performing the method, such as T4 polymerase or Klenow large fragment and/or Taq polymerase, and optionally four standard types of nucleotides. The kit may also comprise packaging, brochures (leafets), CDs, etc. providing instructions for carrying out the claimed method.

Examples

The use of C-tailed adaptors and T-tailed adaptors contributes to increased sensitivity by capturing more molecules in the sample. The ratio of C adaptors to T adaptors was tested and varied from 0 to 1: 2.75 (36%), as shown in Table 2 below.

TABLE 2

Sample # Input (ng) T tailed (40uM) C tailed (40uM) is connected to%
1 20 3.25 0.5 80%
2 20 3.25 0.5 77%
3 20 3.25 1 79%
4 20 3.25 1 80%
5 20 2.75 0.5 79%
6 20 2.75 0.5 77%
7 20 2.75 1 80%
8 20 2.75 1 78%
9 20 3.25 - 75%
10 20 3.25 - 75%

All samples in which the C-tailed adaptor was present showed a higher yield of adaptor-ligated nucleic acids (% ligation) than samples in which the C-tail was not present. The best yield was obtained for the C-tailed primers to T-tailed primers at a ratio of 1: 3.25 (about 30%), but improved yields were obtained from a ratio of 0.5: 3.25 (about 15%) to 1: 2.75 (36%).

After sequencing the amplified DNA, the diversity of each preparation was calculated. Diversity is the number of molecules sequenced, calculated as follows: (average size of DNA molecules in bp) — (number of unique molecules sequenced)/(size of target region in bp). Diversity is generally greater in samples where a C-tailed adaptor is present. Sequencing also showed that the ratio of incorporated T-tailed adapters to C-tailed adapters was about 10%.

All patent documents, web sites, other publications, accession numbers, and the like, cited above or below are incorporated by reference in their entirety for all purposes to the same extent as if each individual item was specifically and individually indicated to be incorporated by reference. If different versions of a sequence are associated with an accession number at different times, that means the version associated with the accession number on the valid filing date of the present application. Valid application date, if applicable, means the actual application date or the earlier of the application date of the priority application referring to that accession number. Likewise, if different versions of a publication, website, etc. are published at different times, the most recently published version on the valid filing date of the present application is meant, unless otherwise indicated. Any feature, step, element, embodiment or aspect of the present invention may be used in combination with any other feature, step, element, embodiment or aspect, unless specifically indicated otherwise. Although the foregoing invention has been described in some detail by way of illustration and example for purposes of clarity and understanding, it will be apparent that certain changes and modifications may be practiced within the scope of the appended claims.

Sequence listing

<110> Quadent health company

Andru kennedy

Stevensini-an-Ward-Motimer

<120> method for attaching adapters to sample nucleic acids

<130> 065777-512837

<150> US 62/485,769

<151> 2017-04-14

<150> US 62/486,663

<151> 2017-04-18

<150> US 62/517,145

<151> 2017-06-08

<150> PCT/US2017/027809

<151> 2017-04-14

<160> 3

<170> PatentIn version 3.5

<210> 1

<211> 58

<212> DNA

<213> Artificial sequence (Artificial sequence)

<220>

<223> synthetic

<400> 1

aatgatacgg cgaccaccga gatctacact ctttccctac acgacgctct tccgatct 58

<210> 2

<211> 63

<212> DNA

<213> Artificial sequence (Artificial sequence)

<220>

<223> synthetic

<220>

<221> misc_feature

<222> (34)..(39)

<223> n is a, g, c or t

<400> 2

gatcggaaga gcacacgtct gaactccagt cacnnnnnna tctcgtatgc cgtcttctgc 60

ttg 63

<210> 3

<211> 58

<212> DNA

<213> Artificial sequence (Artificial sequence)

<220>

<223> synthetic

<400> 3

aatgatacgg cgaccaccga gatctacact ctttccctac acgacgctct tccgatcc 58

21页详细技术资料下载
上一篇:一种医用注射器针头装配设备
下一篇:通过衔接子序列定量NGS DNA

网友询问留言

已有0条留言

还没有人留言评论。精彩留言会获得点赞!

精彩留言,会给你点赞!