Target enrichment by one-way dual probe primer extension

文档序号:1047711 发布日期:2020-10-09 浏览:13次 中文

阅读说明:本技术 通过单向双重探针引物延伸的靶标富集 (Target enrichment by one-way dual probe primer extension ) 是由 D.伯格斯 B.C.戈德温 A.洛夫乔伊 B.米勒 J.普拉策 J-A.E.彭克勒 于 2018-12-19 设计创作,主要内容包括:本发明提供了用于富集核酸文库中的至少一种靶标核酸的方法。第一寡核苷酸与具有第一和第二衔接子的核酸文库中的靶标核酸杂交。用第一聚合酶延伸杂交的第一寡核苷酸,由此产生包括所述靶标核酸和延伸的第一寡核苷酸的第一引物延伸复合物。捕获所述第一引物延伸复合物,其相对于所述核酸文库富集,且第二寡核苷酸与所述靶标核酸杂交。用第二聚合酶延伸杂交的第二寡核苷酸,由此产生包括所述靶标核酸和延伸的第二寡核苷酸的第二引物延伸复合物,且进一步从所述第一引物延伸复合物释放延伸的第一寡核苷酸。(The present invention provides methods for enriching at least one target nucleic acid in a nucleic acid library. The first oligonucleotide hybridizes to a target nucleic acid in a library of nucleic acids having first and second adaptors. Extending the hybridized first oligonucleotide with a first polymerase, thereby generating a first primer extension complex comprising the target nucleic acid and the extended first oligonucleotide. Capturing the first primer extension complex enriched relative to the nucleic acid library and a second oligonucleotide hybridized to the target nucleic acid. Extending the hybridized second oligonucleotide with a second polymerase, thereby generating a second primer extension complex comprising the target nucleic acid and the extended second oligonucleotide, and further releasing the extended first oligonucleotide from the first primer extension complex.)

1. A method for enriching at least one target nucleic acid in a nucleic acid library, the method comprising:

hybridizing a first oligonucleotide to target nucleic acids in a library of nucleic acids, the nucleic acids in the library each having a first end comprising a first adaptor and a second end comprising a second adaptor;

extending the hybridized first oligonucleotide with a first polymerase, thereby generating a first primer extension complex comprising the target nucleic acid and the extended first oligonucleotide;

capturing the first primer extension complex;

enriching the first primer extension complex relative to the nucleic acid library;

hybridizing a second oligonucleotide to the target nucleic acid;

extending the hybridized second oligonucleotide with a second polymerase, thereby generating a second primer extension complex comprising the target nucleic acid and the extended second oligonucleotide, thereby releasing the extended first oligonucleotide from the first primer extension complex; and

amplifying the target nucleic acid with a third polymerase, a first amplification primer and a second amplification primer, the first amplification primer having a 3 'end complementary to the first adaptor and the second amplification primer having a 3' end complementary to the second adaptor.

2. The method of claim 1, further comprising sequencing the amplified target nucleic acid.

3. The method of claim 1, wherein the first oligonucleotide comprises a capture moiety.

4. The method of claim 1, wherein the first oligonucleotide is bound to a solid support prior to hybridizing the first oligonucleotide to the target nucleic acid, and wherein the first oligonucleotide is hybridized to the target nucleic acid and the hybridized first oligonucleotide is extended with a polymerase, thereby capturing the first primer extension complex on the solid support.

5. The method of claim 1, further comprising incorporating at least one modified nucleotide into at least one of the extended first oligonucleotide in the first primer extension complex and the extended second oligonucleotide in the second primer extension complex.

6. The method of claim 1, further comprising incorporating at least one modified nucleotide with a capture moiety into the extended first oligonucleotide in the first primer extension complex.

7. The method of claim 1, further comprising incorporating at least one uracil into at least one of:

an extended first oligonucleotide in the first primer extension complex, and

the second primer extends the extended second oligonucleotide in the complex,

thereby forming an oligonucleotide product containing uracil.

8. The method of claim 1, further comprising contacting the library of nucleic acids with a blocking oligonucleotide.

9. The method of claim 1, wherein the first adaptor and the second adaptor are bifurcated adaptors.

10. The method of claim 1, wherein said first adaptor and said second adaptor comprise at least one uracil.

11. The method of claim 1, wherein the second oligonucleotide hybridizes to the target nucleic acid at a 5' position of the first oligonucleotide.

12. The method of claim 1, wherein at least one of the first adaptor, the second adaptor, the first amplification primers, and the second amplification primers comprises at least one of a Unique Identifier (UID) sequence, a Molecular Identifier (MID) sequence.

13. A kit for enriching at least one target nucleic acid in a nucleic acid library, the kit comprising:

a first oligonucleotide complementary to a target nucleic acid in a library of nucleic acids, each of the nucleic acids in the library of nucleic acids having a first end comprising a first adaptor and a second end comprising a second adaptor;

a second oligonucleotide complementary to the target nucleic acid;

a first amplification primer; and

a second amplification primer, a third amplification primer,

wherein the first oligonucleotide comprises a capture moiety,

wherein the second oligonucleotide hybridizes to the target nucleic acid at a position 5' to the first oligonucleotide, and

wherein the first amplification primer has a 3 'end complementary to the first adaptor and the second amplification primer has a 3' end complementary to the second adaptor.

14. A kit for enriching at least one target nucleic acid in a nucleic acid library, the kit comprising:

a first oligonucleotide complementary to a target nucleic acid in a library of nucleic acids, each of the nucleic acids in the library of nucleic acids having a first end comprising a first adaptor and a second end comprising a second adaptor;

a modified nucleotide having a capture moiety;

a second oligonucleotide complementary to the target nucleic acid;

a first amplification primer; and

a second amplification primer, a third amplification primer,

wherein the second oligonucleotide hybridizes to the target nucleic acid at a position 5' to the first oligonucleotide, and

wherein the first amplification primer has a 3 'end complementary to the first adaptor and the second amplification primer has a 3' end complementary to the second adaptor.

15. A composition comprising:

a library of nucleic acids comprising at least one target nucleic acid, the nucleic acids in the library of nucleic acids each having a first end comprising a first adaptor, a second end comprising a second adaptor, and a region of interest between the first adaptor and the second adaptor;

an extended first oligonucleotide hybridized to a target region of the target nucleic acid, the extended first oligonucleotide comprising at least one capture moiety;

a solid support bound to at least one capture moiety;

a second oligonucleotide that hybridizes to the target nucleic acid at a position 5' to the first extended oligonucleotide; and

a polymerase associated with the 3' terminus of the second oligonucleotide.

Technical Field

The present invention relates generally to enriching nucleic acid targets in a sample, and more particularly to enriching targets for nucleic acid sequencing (including high throughput sequencing).

Background

The present invention is one class of techniques that allow a user to focus on a target region within a nucleic acid to be sequenced. This reduces the costs associated with sequencing reactions and subsequent data analysis. There are three general types of techniques for selectively capturing a target region within nucleic acids present in a sample. The first technique is hybrid capture, in which the target region is captured by hybridization of a probe that can selectively bind to a capture surface. This capture allows for removal of non-target nucleic acids followed by release and collection of the captured target molecules. This type of technique has advantages, including the ability to capture regions of exome size and regions containing unknown structural variations. Disadvantages include long and complex schemes, which tend to take more than 8 hours to complete. Complexity is mainly caused by the need to prepare randomly fragmented shotgun libraries prior to hybridization. A single hybridization step may take three days to complete. Examples of this type of technology include the SECAP EZ target enrichment system (ROCHE) and the SURESLECT target enrichment system (AGELENT).

Another method of target enrichment is based on amplification of dual target primers. In this method, two probes are used on the boundary of the target to enrich the target region. The method tends to take less than 8 hours to complete and is simpler than the hybrid capture method. However, the dual primer based technique cannot enrich for sequences with unknown structural variations. The most mature dual primer method is the multiplex Polymerase Chain Reaction (PCR). This is a very simple single process, but only tens of targets can be amplified in each reaction tube. Other newer TECHNOLOGIES are currently available, including the trusteeq amplicon sequencing kit (illumin) and ION token AMPLISEQ sequencing kit (LIFE techrologies) products, which are capable of amplifying hundreds to thousands of targets in a single reaction tube and require only a few processing steps.

A third technique is based on amplification with single target primers. In this method, the target is enriched by amplifying a region defined by a single target primer and an end-linked universal primer. Similar to hybridization-based methods; these techniques require the generation of randomly fragmented shotgun libraries prior to selective hybridization of target oligonucleotides. However, rather than using this oligonucleotide to capture the target and wash away non-target molecules, an amplification step is employed that selectively amplifies the region between the randomly generated ends and the target-specific oligonucleotide. The advantage of this technique is that, unlike the dual primer technique, it allows the detection of sequences with unknown structural variations. It is faster and simpler than hybridization-based techniques. However, this type of technique is still slower and more complex than the dual primer based approach. Examples of this type of technology are the anchor multiplex pcr of ARCHER (ARCHER dx) and the enrichment target system of OVATION (NUGEN).

There remains an unmet need for a rapid and simple method of target enrichment that will also accommodate unknown structural variations in the target sequence.

SUMMARY

According to one embodiment, the present invention provides a method for enriching at least one target nucleic acid in a nucleic acid library. The method includes hybridizing a first oligonucleotide to a target nucleic acid in a nucleic acid library. The nucleic acids in the library of nucleic acids each have a first end comprising a first adaptor and a second end comprising a second adaptor. The method further comprises extending the hybridized first oligonucleotide with a first polymerase, thereby generating a first primer extension complex comprising the target nucleic acid and the extended first oligonucleotide. The method further comprises capturing the first primer extension complex, enriching the first primer extension complex relative to the nucleic acid library, hybridizing a second oligonucleotide to the target nucleic acid, and extending the hybridized second oligonucleotide with a second polymerase, thereby generating a second primer extension complex comprising the target nucleic acid and the extended second oligonucleotide, thereby releasing the extended first oligonucleotide from the first primer extension complex. The method further comprises amplifying the target nucleic acid with a third polymerase, a first amplification primer and a second amplification primer, the first amplification primer having a 3 'end complementary to the first adaptor and the second amplification primer having a 3' end complementary to the second adaptor.

In one aspect, the method further comprises sequencing the amplified target nucleic acid.

In another aspect, the first oligonucleotide comprises a capture moiety.

In another aspect, capturing the first primer extension complex comprises capturing the capture moiety on a solid support.

In another aspect, the capture moiety is biotin and the solid support comprises streptavidin.

In another aspect, the first oligonucleotide is bound to a solid support prior to hybridizing the first oligonucleotide to a target nucleic acid, and the first oligonucleotide is hybridized to the target nucleic acid and the hybridized first oligonucleotide is extended with a polymerase, thereby capturing the first primer extension complex on the solid support.

In another aspect, capturing the first primer extension complex is performed after extending the hybridized first oligonucleotide.

In another aspect, the method further comprises incorporating at least one modified nucleotide into at least one of the extended first oligonucleotide in the first primer extension complex and the extended second oligonucleotide in the second primer extension complex.

In another aspect, the modified nucleotide is selected from the group consisting of dUTP and a nucleotide having a capture moiety.

In another aspect, the method further comprises incorporating at least one modified nucleotide into the extended first oligonucleotide in the first primer extension complex, the at least one modified nucleotide having a capture moiety.

In another aspect, capturing the first primer extension complex comprises capturing the capture moiety on a solid support.

In another aspect, the method further comprises incorporating at least one uracil into at least one of the extended first oligonucleotide in the first primer extension complex and the extended second oligonucleotide in the second primer extension complex, thereby forming a uracil-containing oligonucleotide product.

In another aspect, the method further comprises digesting the uracil containing oligonucleotide product.

In another aspect, the uracil containing oligonucleotide product is digested with at least one of uracil DNA glycosylase and DNA glycosylase-lyase.

In another aspect, the DNA glycosylase-lyase is selected from the group consisting of endonuclease IV and endonuclease VIII.

In another aspect, the method further comprises contacting the library of nucleic acids with a blocking oligonucleotide.

In another aspect, the blocking oligonucleotide is at least partially complementary to at least one of the first adaptor and the second adaptor.

In another aspect, the blocking oligonucleotide is a universal blocking oligonucleotide.

In another aspect, the first adaptor and the second adaptor have the same nucleic acid sequence.

In another aspect, the first adaptor and the second adaptor have different nucleic acid sequences.

In another aspect, the first adaptor and the second adaptor are bifurcated adaptors.

In another aspect, the first adaptor and the second adaptor comprise at least one uracil.

In another aspect, at least one of the first polymerase and the second polymerase is a uracil-incompatible polymerase.

In another aspect, the third polymerase is a uracil compatible polymerase.

In another aspect, the second oligonucleotide hybridizes to the target nucleic acid at a position 5' to the first oligonucleotide.

In another aspect, the third polymerase is a uracil-incompatible polymerase.

In another aspect, at least one of the first adaptor, the second adaptor, the first amplification primer, and the second amplification primer comprises at least one of a Unique Identifier (UID) sequence, a Molecular Identifier (MID) sequence.

According to another embodiment, the present invention provides a kit for enriching at least one target nucleic acid in a nucleic acid library. The kit includes first oligonucleotides complementary to target nucleic acids in a library of nucleic acids each having a first end including a first adaptor and a second end including a second adaptor. The kit further comprises a second oligonucleotide complementary to the target nucleic acid, a first amplification primer, and a second amplification primer. The first oligonucleotide comprises a capture moiety, the second oligonucleotide hybridizes to the target nucleic acid at a 5 ' position of the first oligonucleotide, and the first amplification primer has a 3 ' end that is complementary to the first adaptor and the second amplification primer has a 3 ' end that is complementary to the second adaptor.

According to another embodiment, the present invention provides a kit for enriching at least one target nucleic acid in a nucleic acid library. The kit includes first oligonucleotides complementary to target nucleic acids in a library of nucleic acids each having a first end including a first adaptor and a second end including a second adaptor. The kit further includes a modified nucleotide having a capture moiety, a second oligonucleotide complementary to the target nucleic acid, a first amplification primer, and a second amplification primer. The second oligonucleotide hybridizes to the target nucleic acid at a 5 ' position of the first oligonucleotide, and the first amplification primer has a 3 ' end that is complementary to the first adaptor, and the second amplification primer has a 3 ' end that is complementary to the second adaptor.

In one aspect, the kit further comprises at least one of a uracil nucleotide, a uracil compatible polymerase, a uracil incompatible polymerase, and a blocking oligonucleotide.

According to another embodiment, the present invention provides a composition comprising a library of nucleic acids comprising at least one target nucleic acid. The nucleic acids in the nucleic acid library each have a first end comprising a first adaptor, a second end comprising a second adaptor, and a region of interest between the first adaptor and the second adaptor. The composition further includes an extended first oligonucleotide that hybridizes to a target region of the target nucleic acid. The extended first oligonucleotide includes at least one capture moiety. The composition further includes a solid support bound to the at least one capture moiety, a second oligonucleotide hybridized to the target nucleic acid at a 5 'position of the first extended oligonucleotide, and a polymerase associated with a 3' end of the second oligonucleotide.

In one aspect, the composition further comprises a blocking oligonucleotide hybridized to each of the first adaptor and the second adaptor.

In another aspect, the at least one capture moiety is located at the 5' end of the extended first oligonucleotide.

In another aspect, the at least one capture moiety is incorporated into an extension portion of the extended first oligonucleotide.

In another aspect, the extended first oligonucleotide further comprises at least one uracil and at least one thymine.

In another aspect, the polymerase is a uracil-incompatible polymerase.

In another aspect, at least one of the first adaptor and the second adaptor comprises at least one uracil and at least one thymine.

In another aspect, releasing the extended first oligonucleotide from the first primer extension complex is achieved with an enzyme having an activity selected from the group consisting of strand displacement activity, 5 'to 3' exonuclease activity, and flanking endonuclease activity.

The foregoing and other aspects and advantages of the present invention will become apparent from the following description. In the description, reference is made to the accompanying drawings which form a part hereof, and in which there is shown by way of illustration a preferred embodiment of the invention. Such embodiments, however, do not necessarily represent the full scope of the invention, and reference is therefore made to the claims and herein for interpreting the scope of the invention.

Brief Description of Drawings

Fig. 1 is a schematic flow chart illustrating one embodiment of a method for enriching at least one target nucleic acid in a nucleic acid library according to the present invention.

Fig. 2 is a schematic illustration of a first embodiment of a method for enriching at least one target nucleic acid in a nucleic acid library according to the present invention. In an illustrative embodiment, the first oligonucleotide comprises a capture moiety for solution phase capture of the target nucleic acid.

Fig. 3 is a schematic illustration of yet a second embodiment of a method for enriching at least one target nucleic acid in a nucleic acid library according to the present invention. In the illustrated embodiment, the first oligonucleotide is bound to a solid support for in situ capture of the target nucleic acid.

Fig. 4 is a schematic illustration of yet a third embodiment of a method for enriching at least one target nucleic acid in a nucleic acid library according to the present invention. In the illustrated embodiment, one or more capture moieties are incorporated during extension of the first oligonucleotide hybridized to the target nucleic acid, thereby enabling capture of a complex comprising the target nucleic acid and the extended first oligonucleotide on a solid support.

FIG. 5 is a schematic representation of various nucleic acids in library molecules exhibiting intermolecular adaptor-adaptor hybridization.

FIG. 6A is a trace of fluorescence output from a nucleic acid library derived from human genomic DNA and adjusted with universal adaptor end sequences using a commercially available library preparation kit from an electrophoretic DNA analyzer. Data were collected after 5 and 12 cycles of standard PCR amplification of 1 μ L10 ng library and 1 μ L100 ng library, respectively. According to the invention, a nucleic acid library is sampled and enriched for target nucleic acids.

FIG. 6B is a fluorescence-based size analysis of the nucleic acid library of FIG. 6A after enrichment and amplification of primer extension targets according to the present invention.

FIG. 7 is a bar graph depicting high level sequencing indicators for the enriched nucleic acid library of FIG. 6B. More than 99% of the sequencing reads map to sequences known to be present in the library, and about half of the sequencing reads map to enriched target nucleic acids. For first oligonucleotide primer annealing temperatures of 60 ℃ and 65 ℃, the 80-fold base penalty for the library was 1.4 and 1.5, respectively. In each cluster of the three bars, data are shown for the percent of mapped trimmed reads (left), percent of bases padded target nucleic acid (center), and percent of target in mapped non-duplicate reads (right).

Detailed description of the invention

I. Definition of

In this application, unless otherwise clear from the context, (i) the term "a" may be understood to mean "at least one"; (ii) the term "or" may be understood to mean "and/or"; (iii) the terms "comprising" and "including" can be understood to encompass a listing of components or steps by item, whether presented separately or together with one or more additional components or steps; and (iv) the terms "about" and "approximately" can be understood to allow for standard variation as would be understood by one of ordinary skill in the art; and (v) where ranges are provided, endpoints are included.

An adaptor: as used herein, "adaptor" means a nucleotide sequence that can be added to another sequence in order to input additional properties into the sequence. The adapter may be single-stranded or double-stranded, or may have both single-stranded and double-stranded portions.

Approximation: as used herein, the term "approximately" or "about," as applied to one or more target values, refers to a value that is similar to the reference value. In certain embodiments, the term "approximately" or "about" refers to a range of values that fall within 25%, 20%, 19%, 18%, 17%, 16%, 15%, 14%, 13%, 12%, 11%, 10%, 9%, 8%, 7%, 6%, 5%, 4%, 3%, 2%, 1%, or less of either direction (greater or less) of the referenced value, unless otherwise stated or otherwise evident from the context (except that such numbers would exceed 100% of the possible values).

Associated with …: as the term is used herein, two events or entities are "related" to each other if the presence, level and/or form of one event or entity is related to the presence, level and/or form of the other event or entity. For example, a particular entity (e.g., a polypeptide, a genetic marker, a metabolite, etc.) is considered to be associated with a particular disease, disorder, or condition if the presence, level, and/or form of the particular entity (e.g., a polypeptide, a genetic marker, a metabolite, etc.) is associated with the occurrence of and/or susceptibility to a disease, disorder, or condition (e.g., among a relevant population). In some embodiments, two or more entities are "associated" with each other physically if they interact directly or indirectly, such that they are and/or remain in physical proximity to each other. In some embodiments, two or more entities that are physically associated with each other are covalently linked to each other; in some embodiments, two or more entities that are physically associated with each other are not covalently linked to each other, but are non-covalently associated, for example, by hydrogen bonding, van der waals interactions, hydrophobic interactions, magnetic properties, and combinations thereof.

Bar code: as used herein, "barcode" means a nucleotide sequence that confers an identity to a molecule. Barcodes can confer a unique identity to a single molecule (and its copy). This bar code is a unique id (uid). The barcode may assign an identity to the entire population of molecules (and copies thereof) from the same source (e.g., patient). The barcode is a Multiple ID (MID).

Biological sample: as used herein, the term "biological sample" generally refers to a sample obtained or derived from a biological source of interest (e.g., a tissue, organism, or cell culture) as described herein. In some embodiments, the target source comprises or consists of an organism, such as an animal or a human. In some embodiments, the biological sample comprises or consists of a biological tissue or fluid. In some embodiments, the biological sample may be or comprise bone marrow; blood; blood cells; ascites fluid; tissue or fine needle biopsy samples; a cell-containing body fluid; free floating nucleic acids; sputum; saliva; (ii) urine; cerebrospinal fluid, peritoneal fluid; pleural fluid; manure; lymph fluid; gynecological liquor; a skin swab; a vaginal swab; a buccal swab; a nasal swab; wash or lavage fluids, such as ductal or bronchoalveolar lavage fluid; a puncture object; scraping objects; a bone marrow sample; a tissue biopsy sample; a surgical sample; other body fluids, secretions and/or excretions; and/or cells derived therefrom, etc. In some embodiments, the biological sample comprises, or consists of, cells obtained from an individual. In some embodiments, the obtained cells are or include cells from the individual from which the sample was obtained. In some embodiments, the sample is a "primary sample" obtained directly from a target source by any suitable means. For example, in some embodiments, the primary biological sample is obtained by a method selected from the group consisting of biopsy (e.g., fine needle puncture or tissue biopsy), surgery, collection of bodily fluids (e.g., blood, lymph, stool, etc.), and the like. In some embodiments, as will be apparent from the context, the term "sample" refers to a preparation obtained by processing (e.g., by removing one or more components thereof and/or by adding one or more reagents to the zillic red) a primary sample. For example, filtration using a semipermeable membrane. Such "processed samples" may include, for example, nucleic acids or proteins extracted from a sample or obtained by subjecting a primary sample to techniques such as amplification or reverse transcription of mRNA, isolation and/or purification of certain components.

Blocking oligonucleotide: an oligonucleotide complementary to another nucleic acid present in the reaction mixture and capable of hybridizing to such nucleic acid to prevent undesired hybridization of such nucleic acid. Such another nucleic acid may be a synthetic nucleic acid, e.g., a primer or an adaptor. When the primers or adaptors are incorporated into library nucleic acid molecules, undesired hybridization to be prevented can occur. The blocking oligonucleotide does not have to be perfectly complementary to the nucleic acid to be protected from undesired hybridization, but must form a sufficiently stable hybrid to prevent undesired hybridization from occurring. To this end, the blocking oligonucleotide may comprise a universal base or Tm-a modified base.

Comprises the following steps: a composition or method described herein as "comprising" one or more named elements or steps is open-ended, meaning that the named elements or steps are essential, but that other elements or steps may be added within the scope of the composition or method. It should be understood that a composition or method described as "comprising" (or "comprising") one or more named elements or steps also describes a corresponding, more limited composition or method "consisting essentially of (or" consisting essentially of ") the same named elements or steps, meaning that the composition or method includes the named elements or steps, and may also include additional elements or steps that do not materially affect the basic and novel characteristics of the composition or method. It will also be understood that any composition or method described herein as "comprising" or "consisting essentially of" one or more named elements or steps also describes a corresponding, more limited and enclosed composition or method "consisting of" (or "consisting of") the named elements or steps to the exclusion of any other unnamed element or step. Known or disclosed equivalents of any named essential elements or steps may be substituted for those elements or steps in any of the compositions or methods disclosed herein.

Designing: as used herein, the term "engineered" refers to an agent that (i) has a structure that is selected by the hand of man, (ii) is produced by a method that requires the hand of man; and/or (iii) differ from natural substances and other known agents.

And (3) determination: one of ordinary skill in the art reading this specification will appreciate that "determining" can be accomplished using or by using any of a variety of techniques available to those of skill in the art, including, for example, the specific techniques explicitly mentioned herein. In some embodiments, the assay involves manipulation of a physical sample. In some embodiments, the determination involves consideration and/or manipulation of data or information, for example, using a computer or other processing unit adapted to perform the correlation analysis. In some embodiments, the assay involves receiving relevant information and/or materials from a source. In some embodiments, the assay involves comparing one or more characteristics of the sample or entity to a comparable reference.

Identity: as used herein, the term "identity" refers to the overall relatedness between polymeric molecules, for example between nucleic acid molecules (e.g., DNA molecules and/or RNA molecules) and/or between polypeptide molecules. In some embodiments, polymeric molecules are considered "substantially identical" to each other if their sequences are at least 25%, 30%, 35%, 40%, 45%, 50%, 55%, 60%, 65%, 70%, 75%, 80%, 85%, 90%, 95%, or 99% identical. For example, calculation of percent identity of two nucleic acid or polypeptide sequences can be performed by aligning the two sequences for optimal comparison purposes (e.g., gaps can be introduced in one or both of the first and second sequences for optimal alignment, and non-identical sequences can be disregarded for comparison purposes). In certain embodiments, the length of the sequences aligned for comparison purposes is at least 30%, at least 40%, at least 50%, at least 60%, at least 70%, at least 80%, at least 90%, at least 95%, or substantially 100% of the length of the reference sequence. The nucleotides at the corresponding positions are then compared. When a position in the first sequence is occupied by the same residue (e.g., nucleotide or amino acid) as the corresponding position in the second sequence, then the molecules are identical at that position. The percent identity between two sequences is a function of the number of identical positions common to the sequences, taking into account the number of gaps, and the length of each gap, which needs to be introduced for optimal alignment of the two sequences. Sequence comparison and determination of percent identity between two sequences can be accomplished using a mathematical algorithm. For example, the percentage identity between two nucleotide sequences can be determined using the algorithms of Meyers and Miller (CABIOS, 1989, 4:11-17), which have been incorporated into the ALIGN program (version 2.0). In some exemplary embodiments, the nucleic acid sequence comparison using the ALIGN program uses a PAM120 weight residue table, a gap length penalty of 12, and a gap penalty of 4. Alternatively, the percent identity between two nucleotide sequences can be determined using the GAP program in the GCG software package using the nwsgapdna.

Attachment site: as used herein, a "ligation site" is a portion of a nucleic acid molecule (other than the blunt end of a double-stranded molecule) that can facilitate ligation. The presence of "compatible attachment sites" on two molecules allows the two molecules to preferentially bind to each other.

Sample preparation: as used herein, the term "sample" refers to a substance that is or contains a composition of interest for qualitative and/or quantitative evaluation. In some embodiments, the sample is a biological sample (i.e., from a living being (e.g., a cell or organism). in some embodiments, the sample is from a geological, aquatic, astronomical, or agricultural source.

Single-strand ligation: as used herein, "single-stranded ligation" is a ligation procedure that begins with at least one single-stranded substrate and typically involves one or more double-stranded or partially double-stranded adaptors.

Solid support: as used herein, "solid support" refers to any solid material capable of interacting with a capture moiety. The solid support can be a solution phase support (e.g., glass beads, magnetic beads, or other similar particles) or a solid phase support (e.g., silicon wafer, glass slide, etc.) that can be suspended in a solution. Examples of solution phase supports include superparamagnetic spherical polymer particles, such as DYNABEADS magnetic beads from INVITROGEN, or magnetic glass particles, such as described in U.S. Pat. Nos. 656568, 6274386, 7371830, 6870047, 6255477, 6746874 and 6255851.

Essentially: as used herein, the term "substantially" refers to a qualitative condition that exhibits an overall or near overall degree or grade of a target feature or characteristic. One of ordinary skill in the art of biology will appreciate that biological and chemical phenomena rarely, if ever, go to completion and/or go on to completion or achieve or avoid absolute results. Thus, the term "substantially" is used herein to capture the potential lack of integrity inherent in many biological and chemical phenomena.

The synthesis comprises the following steps: as used herein, the word "synthetic" means produced by man, and thus is in a form that does not occur in nature, either because it has a structure that does not occur in nature, or because it is associated with one or more other components with which it is not associated in nature, or with one or more other components with which it is not associated in nature.

The general primer is as follows: as used herein, "universal primer" and "universal priming site" refer to primers and priming sites that do not naturally occur in a target sequence. Typically, the universal priming site is present in an adaptor or target specific primer. The universal primer can bind to the universal priming site and direct primer extension from the universal priming site.

Variants: as used herein, the term "variant" refers to an entity that exhibits significant structural identity to a reference entity, but differs structurally from the reference entity in the presence or level of one or more chemical moieties as compared to the reference entity. In many embodiments, the variant is also functionally different from its reference entity. In general, whether a particular entity is considered to be a "variant" of a reference entity is based on the degree to which it shares structural identity with the reference entity. As will be appreciated by those skilled in the art, any biological or chemical reference entity has certain characteristic structural elements. By definition, a variant is a unique chemical entity that shares one or more of these characteristic structural elements. To name a few examples, a small molecule may have a characteristic core structural element (e.g., a macrocyclic core) and/or one or more characteristic pendant moieties, such that variants of the small molecule are variants that share the core structural element and the characteristic pendant moieties, but differ in other pendant moieties and/or the type of bond present within the core (single bond vs double bond, evz, etc.), polypeptides may have a characteristic sequence element consisting of a plurality of amino acids having specified positions relative to each other in linear or three-dimensional space and/or contributing to a particular biological function, nucleic acids may have a characteristic sequence element consisting of a plurality of nucleotide residues having specified positions relative to one another in linear or three-dimensional space. For example, a variant polypeptide can differ from a reference polypeptide due to one or more differences in amino acid sequence and/or one or more differences in chemical moieties (e.g., carbohydrates, lipids, etc.) covalently attached to the backbone of the polypeptide. In some embodiments, the variant polypeptide exhibits an overall sequence identity of at least 85%, 86%, 87%, 88%, 89%, 90%, 91%, 92%, 93%, 94%, 95%, 96%, 97%, or 99% to the reference polypeptide. Alternatively or additionally, in some embodiments, the variant polypeptide does not share at least one characteristic sequence element with the reference polypeptide. In some embodiments, the reference polypeptide has one or more biological activities. In some embodiments, the variant polypeptide shares one or more of the biological activities of the reference polypeptide. In some embodiments, the variant polypeptide lacks one or more of the biological activities of the reference polypeptide. In some embodiments, the variant polypeptide exhibits a reduced level of one or more biological activities as compared to the reference polypeptide. In many embodiments, a polypeptide of interest is considered a "variant" of a parent or reference polypeptide if it has an amino acid sequence that is identical to the amino acid sequence of the parent except for a small number of sequence alterations at specific positions. Typically, less than 20%, 15%, 10%, 9%, 8%, 7%, 6%, 5%, 4%, 3%, 2% of the residues in the variant are replaced compared to the parent. In some embodiments, the variant has 10, 9, 8,7, 6, 5, 4, 3, 2, or 1 substituted residues as compared to the parent. Typically, variants have a very small number (e.g., less than 5, 4, 3, 2, or 1) of substituted functional residues (i.e., residues involved in a particular biological activity). Furthermore, variants typically have no more than 5, 4, 3, 2, or 1 additions or deletions, and often no additions or deletions, as compared to the parent. Furthermore, any addition or deletion is typically less than about 25, about 20, about 19, about 18, about 17, about 16, about 15, about 14, about 13, about 10, about 9, about 8, about 7, 6 residues, and typically less than about 5, about 4, about 3, or about 2 residues. In some embodiments, a variant may also have one or more functional deficiencies and/or may otherwise be considered a "mutant. In some embodiments, the parent or reference polypeptide is a polypeptide found in nature. As will be appreciated by those of ordinary skill in the art, many variants of a particular polypeptide of interest are commonly found in nature, particularly when the polypeptide of interest is an infectious agent polypeptide.

Detailed description of certain embodiments

For many nucleic acid enrichment techniques, it may be useful to first provide a shotgun nucleic acid library, whereby longer nucleic acid sequences derived from the sample are subdivided into smaller fragments (i.e., about 50-500 nucleotides) that are compatible with short read sequencing techniques. To prepare a shotgun library, high molecular weight nucleic acid strands (typically cDNA or genomic DNA) are sheared into random fragments, optionally modified by ligating consensus end sequences (i.e., adaptors), and selected for size for downstream processing and analysis. For example, it may be useful to selectively capture subsets of nucleic acids in a shotgun library.

Currently, there are two general categories of capture technologies: hybridization-based capture and amplification-based capture. The hybridization-based capture method offers the following advantages: this enables recovery of the entire original shotgun library fragment, as opposed to merely copying and recovering a subset of the original library fragment. However, the on-target rates associated with hybridization-based capture are generally lower compared to amplification-based methods. Notably, the lower hit rate results in a waste of sequencing capacity due to the need to sequence off-target capture products. Furthermore, the workflow associated with hybridization-based capture methods can be complex relative to amplification-based methods, with long turnaround times. In contrast, although amplification-based methods (such as anchored multiplex PCR methods) offer the advantages of simple workflow, faster turn-around time, and higher targeting rates relative to hybridization-based methods, there are still several disadvantages. For example, target-specific primer sequences incorporated into library fragments following amplification result in wasted sequencing capacity. Furthermore, the library fragments do not necessarily represent the original shotgun library, as the template must be truncated at the target-specific primer binding site. Thus, there remains an unmet need for a rapid and simple method of target enrichment that will also accommodate unknown structural variations in the target sequence.

These and other challenges can be overcome with the method of target enrichment by one-way dual probe primer extension according to the present invention. In one aspect, the present invention describes a general method of enrichment based on one-way dual probe primer extension and improvements thereof. To this end, the invention provides a combination of primer extension and hybridization-based capture onto a solid support for enriching one or more target nucleic acids from a library of target nucleic acids. The present invention further provides an overall workflow that has many of the aforementioned advantages of anchored multiplex amplification based enrichment methods and hybrid capture methods without many of the aforementioned disadvantages. Advantages of the kits, compositions, and methods of the invention over many existing hybridization-based capture methods and anchored multiplex amplification-based capture methods include recovery of library molecules derived from whole shotgun molecules, simple workflow (e.g., less total steps and less hands-on time), fast turn-around time, higher on-target rates, and lower overall material costs.

In one embodiment, the invention is a method for enriching at least one target nucleic acid in a nucleic acid library. The method can include hybridizing a first oligonucleotide to a target nucleic acid in the nucleic acid library. The nucleic acids in the nucleic acid library are each provided having a first end comprising a first adaptor and a second end comprising a second adaptor. The method further comprises extending the hybridized first oligonucleotide with a first polymerase, thereby generating a first primer extension complex comprising the target nucleic acid and the extended first oligonucleotide.

In one aspect, the method can further comprise capturing the first primer extension complex, and enriching the first primer extension complex relative to the nucleic acid library. In another aspect, the method may comprise hybridizing a second oligonucleotide to the target nucleic acid, and extending the hybridized second oligonucleotide with a second polymerase, thereby generating a second primer extension complex comprising the target nucleic acid and the extended second oligonucleotide, thereby releasing the extended first oligonucleotide from the first primer extension complex. The method may further comprise amplifying the target nucleic acid with a third polymerase, a first amplification primer, and a second amplification primer. The first amplification primer has a 3 'end complementary to the first adaptor and the second amplification primer has a 3' end complementary to the second adaptor.

The first, second and third polymerases can be any suitable polymerase. An example polymerase is Taq or Taq-derived polymerase (e.g., KAPA 2G polymerase from KAPA BIOSYSTEMS). Another example polymerase is a B-family DNA polymerase (e.g., KAPA HIFI polymerase from KAPA BIOSYSTEMS).

In another embodiment, the invention provides a kit for enriching at least one target nucleic acid in a nucleic acid library. The kit can include a first oligonucleotide complementary to a target nucleic acid in the nucleic acid library. The nucleic acids in the library of nucleic acids each have a first end comprising a first adaptor and a second end comprising a second adaptor. The kit may further comprise a second oligonucleotide complementary to the target nucleic acid, a first amplification primer, and a second amplification primer. The first oligonucleotide may comprise a capture moiety. The second oligonucleotide may hybridize to the target nucleic acid at a position 5' to the first oligonucleotide. The first amplification primer has a 3 'end complementary to the first adaptor and the second amplification primer has a 3' end complementary to the second adaptor.

In another embodiment, the invention provides a kit for enriching at least one target nucleic acid in a nucleic acid library. The kit can include a first oligonucleotide complementary to a target nucleic acid in a nucleic acid library. The nucleic acids in the nucleic acid library each can have a first end comprising a first adaptor and a second end comprising a second adaptor. The kit can further include a modified nucleotide having a capture moiety, a second oligonucleotide complementary to the target nucleic acid, a first amplification primer, and a second amplification primer. The second oligonucleotide hybridizes to the target nucleic acid at a 5 ' position of the first oligonucleotide, and the first amplification primer has a 3 ' end that is complementary to the first adaptor, and the second amplification primer has a 3 ' end that is complementary to the second adaptor.

In yet another embodiment, the present invention provides a composition comprising a library of nucleic acids comprising at least one target nucleic acid. The nucleic acids in the nucleic acid library each have a first end comprising a first adaptor, a second end comprising a second adaptor, and a region of interest between the first adaptor and the second adaptor. The composition further includes an extended first oligonucleotide that hybridizes to a target region of the target nucleic acid. The extended first oligonucleotide includes at least one capture moiety. The composition further includes a solid support bound to the at least one capture moiety, a second oligonucleotide hybridized to the target nucleic acid at a 5 'position of the first extended oligonucleotide, and a polymerase associated with a 3' end of the second oligonucleotide.

The methods of the invention may be used as part of a sequencing protocol, including high throughput single molecule sequencing protocols. The methods of the invention generate a library of target nucleic acids to be sequenced. The target nucleic acids in the library may incorporate barcodes for molecular identification and sample identification.

The present invention comprises at least one linear primer extension step with a target-specific primer. The linear extension step has several advantages over exponential amplification as practiced in the art. Each target nucleic acid is characterized by a unique synthesis rate that depends on the annealing rate of the target-specific primer and the rate at which the polymerase can read through a particular target sequence. The difference in extension and synthesis rates creates a bias that can result in slight differences in single round synthesis. However, during PCR, the slight difference becomes exponentially amplified. The resulting difference is called the PCR bias. The bias may mask any differences in the initial number of each sequence in the sample and exclude any quantitative analysis.

The present invention limits extension of target specific primers (including gene specific primers and degenerate primers that are occasionally specific for binding sites within a genome) to a single step. Any exponential amplification is performed with universal primers that are not or less biased than target-specific primers.

Referring now to fig. 1, a method 100 of target enrichment by one-way dual probe primer extension includes a step 102 of preparing nucleic acid library fragments. In one aspect, the nucleic acid library fragments can be prepared from any nucleic acid source including one or more target nucleic acids. Typically, the target nucleic acids will include regions or sequences of interest, and the method 100 enables preferential enrichment of one or more target nucleic acids relative to non-target nucleic acids in the nucleic acid library for downstream detection and analysis of those regions or sequences of interest.

With continuing reference to step 102, the nucleic acid is optionally fragmented and adapters are ligated to each end of the nucleic acid. Example methods for preparing a library of nucleic acid fragments for use in the present invention include transposon-mediated fragmentation and tagging, mechanical shearing, enzymatic digestion, overhang (e.g., T/a) or blunt-end ligation, template switching-mediated adaptor ligation, and the like, and combinations thereof. Finally, the products of step 102 of preparing nucleic acid library fragments can generate a nucleic acid library, wherein the nucleic acids in the nucleic acid library each have a first end comprising a first adaptor and a second end comprising a second adaptor. Notably, the first and second adapters can be the same or different, and can further take on a variety of forms, including but not limited to bifurcated or Y-shaped adapters having complementary and non-complementary portions, blunt-ended adapters, overhang adapters, hairpin adapters, and the like, and combinations thereof. Typically, at least a portion of the aforementioned adaptors are double-stranded; however, other adapter configurations may also be used to prepare libraries of nucleic acid fragments according to the present invention. Furthermore, in the case of hairpin adaptors, it may be useful to include blocking elements (e.g., 3' dideoxynucleotides or phosphate groups) to prevent self-priming events.

The next step 104 of the method 100 may comprise hybridizing a first oligonucleotide primer to a target nucleic acid present in the nucleic acid library, thereby forming an unextended first primer-target complex. In one embodiment, the first oligonucleotide primer is a target specific primer having a defined sequence complementary to a sequence of a target nucleic acid. One example of a target-specific primer is a gene-specific primer that is designed to hybridize to or be in proximity to (e.g., upstream or 5' of) a gene of interest (e.g., cDNA, genomic DNA). The target nucleic acid may be RNA, DNA, or a combination thereof. The first oligonucleotide primer can be an oligonucleotide primer composed of ribonucleic acid, deoxyribonucleic acid, modified nucleic acid (e.g., biotinylated, locked nucleic acid, inosine, Seela bases, etc.), or other nucleic acid analogs known in the art.

In various embodiments of the invention, the first oligonucleotide primer may include one or more modified bases, capture moieties, or a combination thereof. Where the first oligonucleotide primer includes a capture moiety, the first oligonucleotide primer may be attached to a solid support or free in solution (i.e., unbound or otherwise attached to a solid support) prior to the step 104 of hybridizing the first oligonucleotide primer to the target nucleic acid. In embodiments in which the first oligonucleotide primer comprising the capture moiety is not attached to the solid support via the capture moiety, step 104 can be performed in solution. In embodiments in which the first oligonucleotide primer comprising the capture moiety is attached to the solid support via the capture moiety, step 104 may be performed in situ. Notably, in the case of an in situ reaction, the resulting unextended primer-target complexes will be attached to a solid support. By separating the solution from the solid support to which the primer-target complex is bound, any non-target nucleic acids or target nucleic acids that remain in solution that are not annealed to the first oligonucleotide primer can be removed.

The next step 106 of the method 100 includes performing a first primer extension reaction. In one aspect, step 106 comprises extending the hybridized first oligonucleotide primer with a first polymerase. After hybridizing the first oligonucleotide primer to the target nucleic acid template in step 104, the first oligonucleotide primer is extended by a first polymerase, thereby generating a first primer extension product or complex comprising a 3 'region of the extended first oligonucleotide primer, the 3' region comprising the reverse complement of at least a portion of the target nucleic acid template. As described herein, hybridization and extension reactions are optionally performed simultaneously, while in other embodiments, hybridization and extension reactions are performed separately (e.g., sequentially) and can be separated by washing steps that remove unannealed and uncaptured target nucleic acids from the reaction mixture. Furthermore, step 104 may further comprise terminating the primer extension reaction to control the length of the extended first oligonucleotide primer. Notably, the length of the extended first oligonucleotide primer product can be actively controlled by techniques such as those that inactivate the polymerase added in step 104, or passively controlled by enabling the reaction to complete such as by consuming a restriction reactant or by controlling/selecting the size of the nucleic acid fragments in the nucleic acid library in step 102 of method 100.

The method 100 further comprises a step 108 of capturing the first primer extension complex. The capture of the first primer extension complex can be achieved in various ways as disclosed herein, and can be achieved before, simultaneously with, or after any of steps 104 and 106 of method 100. As described above, the first oligonucleotide may include a capture moiety that may be used to capture the first oligonucleotide primer to a solid support before, during, or after step 104 or step 106 of method 100. In another example, extending the first oligonucleotide primer after hybridization to the target nucleic acid comprises incorporating one or more modified nucleotides. The modified nucleotide may include a capture moiety, or may be configured to enable downstream modification of the modified nucleotide to attach or otherwise incorporate the capture moiety into the extension portion of the first primer extension complex. Thus, the first primer extension complex can be captured during or after step 106 by a capture moiety associated with one or more modified nucleotides. The choice of whether to capture the target nucleic acid, the annealed primer-target complex, or the target-extended primer complex further determines whether steps 104 and 106 of the method are performed in solution or in situ.

The next step 110 of the method 100 may include enriching the first primer extension complex. In one aspect, step 110 includes one or more purification and enrichment steps for recovering the first primer extension complex from the non-target nucleic acids and other molecules in the library, such as unused reaction components (e.g., nucleotides, primer molecules, ATP, etc.), enzymes, buffers, and the like. In some embodiments, step 110 comprises enzymatic digestion, size exclusion based purification, affinity based purification, or the like, or combinations thereof. Notably, enrichment of the first primer extension product can be measured relative to the entirety of the nucleic acid library. In one aspect, enrichment involves increasing the concentration of the target nucleic acid by depleting (i.e., removing) other members of the nucleic acid library that are not the target nucleic acid.

The next step 112 of the method 100 may include hybridizing a second oligonucleotide primer to the target nucleic acid present in the nucleic acid library. In one aspect, the second oligonucleotide primer is a target-specific primer that binds to a region of interest within the target nucleic acid (and hybridizes or is complementary opposite to one or both of the first adaptor and the second adaptor). In another aspect, the target nucleic acid is part of the first primer extension complex during step 112. For example, the second oligonucleotide primer may hybridize to the target nucleic acid at a position 5' (i.e., upstream) relative to the extended first oligonucleotide primer in the first primer extension complex. In this case, the resulting unextended second primer-target complex comprises the first extended oligonucleotide primer, the target nucleic acid hybridized to the first extended oligonucleotide primer, and the second (unextended) oligonucleotide primer. In the event that the first primer extension product is attached to a solid support during step 112, the unextended second primer-target complex will similarly be attached to the solid support. In other embodiments, in step 112, (e.g., after removing non-target nucleic acids from the reaction mixture), the first primer extension product is free from the solid support and in solution to enable solution hybridization of the second oligonucleotide primer.

The next step 114 of the method 100 includes performing a second primer extension reaction. After the second oligonucleotide primer is hybridized to the target nucleic acid template in step 112, the second oligonucleotide primer is extended by a second polymerase, thereby generating a second primer extension product or complex comprising the target nucleic acid. The extended second oligonucleotide primer includes a 3' region comprising the reverse complement of at least a portion of the target nucleic acid template. In one aspect, extending the second oligonucleotide primer with the second polymerase releases the extended first oligonucleotide primer from the complex with the target nucleic acid. Releasing the extended first oligonucleotide from the first primer extension complex can include one or more of strand displacement (e.g., by a polymerase) or digestion (e.g., by a nuclease). For example, the extended first oligonucleotide can be released with an enzyme having at least one of strand displacement activity, 5 'to 3' exonuclease activity, and flanking endonuclease activity.

As described herein, step 112 and step 114 are optionally performed simultaneously, while in other embodiments, step 112 and step 114 are performed separately (e.g., sequentially). Furthermore, step 114 may further comprise terminating the primer extension reaction to control the length of the extended second oligonucleotide primer. Notably, the length of the extended second oligonucleotide primer product can be actively controlled by techniques such as those that inactivate the polymerase added in step 114, or passively controlled by enabling the reaction to be completed such as by consumption of a restriction reactant or by controlling/selecting the size of the nucleic acid fragments in the nucleic acid library.

Where the extended first primer includes one or more capture moieties attached to a solid support, release of the extended first oligonucleotide in step 114 results in the generation of a second primer extension complex that is free in solution, rather than attached to a solid support. Thus, as described in step 110 of method 100, one or more purification techniques may be performed after step 114 to recover unbound second extension products or complexes including the target nucleic acid from the support-attached first extension oligonucleotide primer, second polymerase, other reaction components, and the like, and combinations thereof.

The method 100 further comprises a step 116 of amplification. Step 116 may involve linear or exponential amplification (e.g., PCR). Typically, step 116 comprises amplifying the target nucleic acid with a third polymerase, a first amplification primer, and a second amplification primer. In one aspect, the first and second amplification primers are designed to be complementary to sequences of adaptors of target nucleic acids incorporated into the nucleic acid library in step 102. For example, the first amplification primer may have a 3 'end complementary to the first adaptor and the second amplification primer may have a 3' end complementary to the second adaptor. However, the primers used for amplification may include any sequence present within the amplified target nucleic acid (e.g., gene/target specific primers, universal primers, etc.) and may support synthesis of one or both strands (i.e., both the top and bottom strands of the double-stranded nucleic acid corresponding to the template of the amplification reaction).

In some embodiments, step 116 enables selective amplification of the target nucleic acid from the nucleic acid library, rather than amplification of either of the first or second extension oligonucleotide primers derived from the target nucleic acid. In one example, a uracil compatible polymerase and dUTP are included in one or both of the extension reactions performed in steps 106 and 114. The extended oligonucleotide primer resulting from this reaction will include at least one uracil nucleotide, and the target nucleic acid template can be a DNA template without a uracil nucleotide. Thereafter, a uracil-incompatible polymerase is included in step 116 for amplification of the target nucleic acid. Uracil-incompatible polymerases can amplify target nucleic acids without uracil nucleotides; however, a uracil-incompatible polymerase will not be able to replicate an extended oligonucleotide primer containing uracil. Alternatively or additionally, uracil-containing products can be selectively digested or otherwise degraded, thereby leaving only the original molecules from the nucleic acid library.

Following the step of amplifying 116, the method 100 may include a step of analyzing 118 the amplified target nucleic acid. Step 116 may include any method for determining the nucleic acid sequence of one or more products of method 100. Step 116 may further comprise sequence alignment, identification of sequence variations, counting of unique primer extension products, and the like, or combinations thereof.

In addition to the elements of the invention outlined in method 100, it may also be useful to consider a number of additional considerations in practicing the kits, compositions, and methods described herein. In one aspect, the primer hybridization step is mediated by a target-specific region of the primer. In some embodiments, the target-specific region is capable of hybridizing to a region of a gene that is located in an exon, intron, or untranslated portion of a gene or an untranscribed portion of a gene (e.g., a promoter or enhancer). In some embodiments, the gene is a protein-encoding gene, but in other embodiments, the gene is not a protein-encoding gene, such as an RNA-encoding gene or a pseudogene. In still other embodiments, the target-specific region is located in an intergenic region. For mRNA or cDNA targets, the primers may comprise an oligo-dT sequence.

Instead of a pre-designed target-specific region, the primer may contain a degenerate sequence (i.e., a randomly introduced string of nucleotides). This primer can also find a binding site within the genome and act as a target-specific primer for that binding site. Notably, fully degenerate primers, each of which is degenerate at a nucleotide position, may not be used for targeted enrichment. However, partially degenerate primers, wherein only a fraction of the nucleotide positions are degenerate, may be used for use according to the present invention. For example, primers with partial degeneracy at a single nucleotide position can be used to capture a target sequence that includes one or more Single Nucleotide Polymorphisms (SNPs).

The primer may comprise additional sequences in addition to the target specific region. In some embodiments, these sequences are located at the 5' -end of the target-specific region. In other embodiments, these sequences may be included elsewhere within the primer, so long as the target-specific region is capable of hybridizing to the target and driving a primer extension reaction, as described below. Additional sequences within the primers may include one or more barcode sequences, such as a unique molecular identification sequence (UID) or a multiplex sample identification sequence (MID). The barcode sequence may exist as a single sequence or as two or more sequences.

In some embodiments, the additional sequence comprises a sequence that facilitates ligation to the 5' -end of the primer. The primers may contain universal ligation sequences that enable ligation of adaptors as described in the following sections.

In some embodiments, the additional sequence comprises one or more binding sites of one or more universal amplification primers.

The primer extension step is performed by a nucleic acid polymerase. Depending on the type of nucleic acid being analyzed, the polymerase may be a DNA-dependent DNA polymerase ("DNA polymerase") or an RNA-dependent DNA polymerase ("reverse transcriptase").

In some embodiments, it is desirable to control the length of the nucleic acid strand synthesized in the primer extension reaction. As described below, the length of the strand determines the length of the nucleic acid that is subjected to the subsequent steps of the method and any downstream applications. The extension reaction may be terminated by any method known in the art. For example, the reaction may be physically stopped by temperature shift or addition of a polymerase inhibitor. In some embodiments, the reaction is stopped by placing the reactants on ice. In other embodiments, the reaction is terminated by increasing the temperature to inactivate the non-thermostable polymerase. In still other embodiments, the reaction is stopped by adding a chelating agent capable of chelating a key cofactor of the enzyme (such as EDTA) or another chemical or biological compound capable of reversibly or irreversibly inactivating the enzyme.

Another method of controlling the length of the primer extension product is to limit (steady) extension reactions by limiting key components (e.g., dntps) to directly limit extension length or limiting Mg2+ to slow extension rate and improve the ability to control the point of extension stop. The skilled person is able to determine experimentally or theoretically the appropriate amount of key components that allow limited primer extension to produce predominantly products of the desired length.

Another method of controlling the length of the primer extension product is the addition of terminator nucleotides, including reversible terminator nucleotides. The skilled artisan can experimentally or theoretically determine the appropriate ratio of terminator and non-terminator nucleotides that will allow limited primer extension to produce predominantly a product of the desired length. Examples of terminator nucleotides include dideoxynucleotides, such as the 2 '-phosphate nucleotide described in U.S. Pat. No. 8,163,487 to Gelfand et al, 3' -O-substituted reversible terminators, and nucleotides such as, for example, U.S. patent application publication No. 2014/0242579 to Zhuo et al and Guo, J., et al,Four-color DNA sequencing with 3′-O-modified nucleotide reversible terminators and chemically cleavable fluorescent dideoxynucleotides,P.N.A.S. 2008105 (27) 9145-9150. Yet another method of controlling the length of the primer extension product is to add a limited amount of uracil (dUTP) to the primer extension reaction. The uracil containing DNA may then be treated with uracil-N-DNA glycosylase to create abasic sites. DNA having abasic sites can be degraded by heat treatment with optional addition of a base to improve the degradation efficiency, as described in U.S. Pat. No. 8,669,061 to Gupta et al. One skilled in the art can experimentally or theoretically determine the dUTP to dTTP ratio in an amplification reactionSuitable ratios that allow limited inclusion of dUTP after endonuclease treatment to produce primarily the desired length product.

In some embodiments, the length of the extension product is inherently limited by the length of the input nucleic acid. For example, cell-free DNA present in maternal plasma is less than 200bp in length, with a majority of 166bp in length. Yu, s.c.y., et al, Size-based molecular diagnostics using plasma DNA for noninvasive prenatal testingPNAS USA 2014, 111(23) 8583-8. The median length of cell-free DNA found in the plasma of healthy individuals and cancer patients was about 185-200 bp. Giacona, m.b., et al, Cell-free DNA in human blood plasma: length measurements in patients with pancreatic cancer and healthy controls,Pancreas 1998, (17) (1) 89-97. Poorly preserved or chemically treated samples may contain chemically or physically degraded nucleic acids. For example, formalin-fixed paraffin-embedded tissue (FFPET) typically produces nucleic acids with an average length of 150 bp.

In some embodiments, the methods of the invention comprise one or more purification steps after primer extension by a DNA polymerase or reverse transcriptase. The purification will remove unused primer molecules and template molecules used to generate primer extension products. In some embodiments, the template nucleic acid and all nucleic acid fragments except the extended primer are removed by exonuclease digestion. In this embodiment, the primer used for primer extension may have a 5' -terminal modification that renders the primer and any extension products resistant to exonuclease digestion. Examples of such modifications include phosphorothioate linkages. In other embodiments, the RNA template may be removed by enzymatic treatments that conserve DNA, such as RNase digestion, including RNase H digestion. In still other embodiments, the primers and large-size template DNA are separated from the extension products by size exclusion methods, such as gel electrophoresis, chromatography, or isotachophoresis or differential electrophoresis (epithophoresis).

In some embodiments, the purification is by affinity binding. In a variation of this embodiment, the affinity is for a specific target sequence (sequence capture). In other embodiments, the primer comprises an affinity tag. Any affinity tag known in the art may be used, such as biotin or an antibody or an antigen for which a specific antibody is present. The affinity partner of the affinity tag may be present in solution, e.g., on a solution-phase solid support, such as a suspended particle or bead, or bound to a solid support. During affinity purification, unbound components of the reaction mixture are washed away. In some embodiments, additional steps are taken to remove unused primers. In some embodiments, the affinity capture alters the charge of the primer extension product. For example, including one or more biotinylated nucleotides and binding them to streptavidin results in an altered charge on the nascent nucleic acid strand. The altered charge can be used to separate nascent strands (primer extension products) by isotachophoresis or differential electrophoresis.

Notably, the methods of the invention do not require a ligation step (e.g., adding a consensus sequence to the extended first or second oligonucleotide primer). However, in some embodiments, the invention includes a linking step. For example, a homopolymer tail may be added to the 3' end of the nucleic acid. In this embodiment, the homopolymer can serve as a binding site for a reverse complement homopolymer (similar to the poly-a tail with a poly-T primer for mRNA). The ligation adds one or more adapter sequences to the primer extension products generated in the previous step. The adaptor sequence provides one or more universal priming sites (for amplification or sequencing) and optionally one or more barcodes. The exact mode of ligation of the adaptors is not important as long as the adaptors become associated with the primer extension products and are capable of performing the subsequent steps described below.

In some embodiments described above, the methods involve target-specific primers that include a universal priming sequence ("priming site") and produce primer extension products with a single priming site. In such embodiments, only one additional priming sequence ("priming site") needs to be provided to achieve exponential amplification. In other embodiments, the target-specific primers do not include a universal priming site. In such embodiments, two priming sites need to be provided to achieve exponential amplification. Adapters with universal priming sites can be added by any single-stranded ligation method available in the art.

In embodiments where the extension primer comprises a universal ligation site, one example of a single-stranded ligation method may be used. In such embodiments, adapters having a double-stranded region complementary to the universal ligation site in the primer and a single-stranded overhang may be annealed and ligated. Annealing of the single-stranded 3 '-overhang of the adapter to the universal ligation site at the 5' -end of the primer results in a double-stranded region with a gap in the strand containing the primer extension product. The two strands can be ligated at the nick by a DNA ligase or another enzyme or non-enzymatic reagent that can catalyze the reaction between the 5 '-phosphate of the primer extension product and the 3' -OH of the adaptor. The ligation provides a universal priming site at one end of the primer extension product by ligation of an adaptor.

Another example of a single-stranded ligation method can be used to add universal priming sites to opposite ends of the primer extension product (or to both sides of the extension product in embodiments where the extension primer does not contain a universal ligation site). For this embodiment, one or both ends of the primer extension product to be ligated do not have a universal ligation site. In addition, in some embodiments, at least one end of the primer extension product to be ligated has an unknown sequence (e.g., due to a random termination event or unknown sequence variation). In such embodiments, a sequence-independent single-stranded ligation approach is employed. One exemplary method is described in U.S. application publication No. 20140193860. In essence, the method uses a population of adaptors in which the single stranded 3' -end overhangs do not have universal ligation sites with random sequences, such as random hexamer sequences. In some embodiments of this method, the adapter also has a hairpin structure. Another example is a method implemented by the ACCEL-NGS 1S DNA library kit (Swift Biosciences, Ann Arbor, Mich.).

The ligation step of the method utilizes a ligase or another enzymatic or non-enzymatic reagent with similar activity. The ligase may be of viral or bacterial origin, for exampleA DNA or RNA ligase such as T4 or E.coli ligase, or a thermostable ligaseAfu、Taq、TflOrTth. In some embodiments, alternative enzymes, such as topoisomerase enzymes, may be used. In addition, non-enzymatic reagents can be used to form a phosphodiester linkage between the 5 '-phosphate of the primer extension product and the 3' -OH of the adaptor, as described and referenced in U.S. patent application publication 2014/0193860 to Bevilacqua et al.

In some embodiments of this method, the first ligation of the adaptor is followed by optional primer extension. The ligated adaptors have free 3' -ends that can be extended to produce double stranded nucleic acids. The end opposite to the adapter will then become suitable for blunt-end ligation of another adapter. To avoid the need for a single-stranded ligation procedure, this double-stranded end of the molecule can be ligated to the double-stranded adaptor by any ligase or another enzymatic or non-enzymatic means. The double-stranded adaptor sequence provides one or more universal priming sites (for amplification or sequencing) and optionally one or more barcodes.

In some embodiments, the methods of the invention comprise one or more purification steps after the ligation step. The purification will remove unused adaptor molecules. The adaptors and large size ligation products are separated from the extension products by size exclusion methods such as gel electrophoresis, chromatography or isotachophoresis.

In some embodiments, the purification is by affinity binding. In a variation of this embodiment, the affinity is for a specific target sequence (sequence capture). In other embodiments, the adaptor comprises an affinity tag. Any affinity tag known in the art (e.g., biotin or an antigen presented by an antibody or antibody specific thereto) may be used. The affinity partner of the affinity tag may be associated with a solution phase support (e.g., on a suspended particle or bead), or bound to a solid phase support. During affinity purification, unbound components of the reaction mixture are washed away. In some embodiments, additional steps are taken to remove unused adaptors.

In some embodiments, the invention includes an amplification step. This step may involve linear or exponential amplification (e.g., PCR). Primers used for amplification may include any sequence present within the amplified nucleic acid and may support synthesis of one or both strands. Amplification may be isothermal or involve thermal cycling.

In some embodiments, the amplification is exponential and involves PCR. It is desirable to reduce PCR amplification bias. If one or more gene-specific primers are used to reduce bias, the method involves a limited number of amplification cycles (e.g., about 10 or fewer cycles). In other variations of these embodiments, universal primers are used to synthesize both strands. The universal primer sequence may be part of the original extension primer of one or both ligated adaptors. One or two universal primers may be used. The extension primer and one or both adaptors can be engineered to have the same primer binding site. In this embodiment, a single universal primer may be used to synthesize both strands. In other embodiments, the extension primer (or adaptor) on one side of the molecule to be amplified and the adaptor on the other side contain different universal primer binding sites. The universal primer can be paired with another universal primer (having the same or different sequence). In other embodiments, the universal primer can be paired with a gene-specific primer. Because PCR using universal primers has reduced sequence bias, the number of amplification cycles need not be limited to the same extent as PCR using gene-specific primers. The number of amplification cycles using the universal primers may be low, but may also be up to about 20, 30 or more cycles.

The present invention includes the use of molecular barcodes. The barcodes typically consist of 4 to 36 nucleotides. In some embodiments, the barcodes are designed to have melting temperatures within 10 ℃ or less of each other. Barcodes can be designed to form a minimal set of cross-hybridizations, i.e., as few combinations of sequences as possible that form stable hybrids with each other under the desired reaction conditions. The design, placement and use of barcodes for sequence identification and enumeration are known in the art. See, e.g., U.S. patent nos. 7,393,665, 8,168,385, 8,481,292, 8,685,678, and 8,722,368.

Barcodes can be used to identify each nucleic acid molecule in a sample and its progeny (i.e., a collection of nucleic acid molecules generated using the original nucleic acid molecule). Such barcodes are "unique IDs" (UIDs).

Barcodes may also be used to identify the sample from which the nucleic acid molecule being analyzed is derived. Such barcodes are "multiple sample IDs" ("MID"). All molecules derived from the same sample share the same MID.

Barcodes contain unique sequences of nucleotides characteristic of each barcode. In some embodiments, the sequence of the barcode is pre-designed. In other embodiments, the barcode sequence is random. All or some of the nucleotides within the barcode may be random. Random sequences and random nucleotide bases within a known sequence are referred to as "degenerate sequences" and "degenerate bases", respectively. In some embodiments, the molecule comprises two or more barcodes: one for molecular identification (UID) and one for sample identification (MID). Sometimes, the UID or MID each contain several barcodes, which when used together can identify a molecule or sample.

In some embodiments, the number of UIDs in the reaction may exceed the number of molecules to be labeled. In some embodiments, one or more barcodes are used to group or bin (bins) sequences. For example, in some embodiments, one or more UIDs are used to group or bin sequences (bins), where the sequences in each bin contain the same UID, i.e., are amplicons derived from a single target molecule. In some embodiments, the UID is used to align the sequences. In other embodiments, the target-specific regions are used to align sequences. In some embodiments of the invention, the UID is introduced in the initial primer extension event, while the sample barcode (MID) is introduced in the ligated adaptor.

After ligation has been performed, the nucleic acid product can be sequenced. Sequencing may be performed by any method known in the art. High throughput single molecule sequencing is particularly advantageous. Examples of such techniques include 454 LIFE SCIENCES GS FLX platform (454 LIFE SCIENCES), ILLUMINA HISEQ platform (ILLUMINA), ION TORRENT platform (LIFETECHNOLOGIES), PACIFIC BIOSCIENCES platform (PACIFIC BIOSCIENCES) using SMRT sequencing technology, and any other currently existing or future single molecule sequencing technology, with or without the involvement of sequencing-by-sequencing synthesis. In variations of these embodiments, the sequencing utilizes universal primer sites present in one or both of the adaptor sequences or in one or both of the primer sequences. In still other variations of these embodiments, gene-specific primers are used for sequencing. However, it should be noted that universal primers are associated with reduced sequencing bias compared to gene-specific primers.

In some embodiments, the sequencing step involves sequence alignment. In some embodiments, alignments are used to determine consensus sequences from multiple sequences (e.g., multiple sequences having the same unique molecular id (uid)). In some embodiments, alignment is used to identify sequence variations, such as Single Nucleotide Variations (SNVs). In some embodiments, a consensus sequence is determined from multiple sequences all having the same UID. In other embodiments, a UID is used to eliminate artifacts, i.e., variations that exist in the progeny of a single molecule (characterized by a particular UID). UID may be used to eliminate these artifacts caused by PCR errors or sequencing errors.

In some embodiments, the number of each sequence in a sample can be quantified by quantifying the relative number of sequences with each UID in a population with the same multiplex sample id (mid). Each UID represents a single molecule in the original sample, and counting the different UIDs associated with each sequence variant allows the determination of the score for each sequence variant in the original sample, where all molecules share the same MID. One skilled in the art will be able to determine the number of sequence reads necessary to determine the consensus sequence. In some embodiments, the correlation number is the read-out ("sequence depth") of each UID necessary for accurate quantitative results. In some embodiments, the desired depth is 5-50 readings per UID.

The sample used in the method of the invention comprises any individual (e.g. human, patient) or environmental sample containing nucleic acids. The polynucleotide may be extracted from the sample, or the sample may be directly subjected to the method of the invention. The starting sample may also be an extracted or isolated nucleic acid, DNA or RNA. The sample may constitute any tissue or fluid obtained from an organism. For example, the sample may be a tumor biopsy sample or a blood or plasma sample. In some embodiments, the sample is a formalin fixed, paraffin embedded (FFPE) sample. The sample may comprise nucleic acids from one or more sources (e.g., one or more patients). In some embodiments, the tissue may be infected by a pathogen and thus contain nucleic acids of the host and the pathogen.

Methods for DNA extraction are well known in the art. See J, Sambrook et al,"molecular cloning: A Laboratory Manual," 1989, 2 nd edition, Cold Spring Harbor Laboratory Press: New York, N.Y.). Various kits are commercially available for extracting nucleic acids (DNA or RNA) from biological samples (e.g., BD BIOSCIENCES CLONTECH (Palo Alto, Calif.), EPICENTRE TECHNOLOGIES (Madison, Wisc.); GENTRA SYSTEMS, INC. (Minneapolis, Minn.); and QIAGEN, INC. (Valencia, Calif.), AMBION, INC. (Austin, Tex.); BIORAD LABORATORIES (Hercules, Calif.); and the like.

In some embodiments, the starting sample used in the methods of the invention is a library, e.g., a genomic library or an expression library comprising a plurality of polynucleotides. In other embodiments, the library is generated by the methods of the invention. Where the starting material is a biological sample, the method generates an amplification library or collection of amplicons representing various species or sequences. The library may be stored and used multiple times for further amplification or sequencing of the nucleic acids in the library.

According to one embodiment of the invention, a method for primer extension target enrichment may comprise primer-mediated capture of target nucleic acids in solution. Turning now to fig. 2A-2E, a nucleic acid library includes a target nucleic acid 200, which target nucleic acid 200 includes a region of interest (ROI) 202 (fig. 2A). The target nucleic acid 200 further comprises a first end comprising a first adaptor 204 and a second end comprising a second adaptor 206. In fig. 2A-2E, the target nucleic acid 200 is illustrated as a single-stranded nucleic acid having a first adaptor 204 located at the 3 'end (i.e., first end) of the target nucleic acid 200 and a second adaptor 206 located at the 5' end (i.e., second end) of the target nucleic acid 200. The first oligonucleotide 208 hybridizes to the target nucleic acid 200 in the nucleic acid library. The first oligonucleotide 208 includes a 3' target-specific region 210 that is complementary to a target nucleic acid and a capture moiety 212. In the illustrated embodiment, the target-specific region 210 is complementary to the ROI 202.

As shown in fig. 2B, the hybridized first oligonucleotide 208 is extended with a first polymerase (not shown), thereby generating a first primer extension complex 214 comprising the target nucleic acid 200 and the extended first oligonucleotide 216 (wherein the dashed line indicates the extended portion of the extended first oligonucleotide 216). The first primer extension complex 214 is captured on a solid support 218. The solid support can be a solution phase support (e.g., a bead or another similar particle), or a solid phase support (e.g., a silicon wafer, a glass slide, etc.). For example, magnetic glass particles and devices employing the same described in U.S. patents 656568, 6274386, 7371830, 6870047, 6255477, 6746874, and 6255851 may be used. In the embodiment illustrated in fig. 2B, the first primer extension complex 214 is captured on a solid support via the capture moiety 212. After capture, the first primer extension complex 214 is enriched relative to the nucleic acid library.

Turning to fig. 2C, a second oligonucleotide 220 hybridizes to the target nucleic acid 200. The second oligonucleotide 220 is complementary to the target nucleic acid 200 and hybridizes to the target nucleic acid 200 at a position 5' relative to the target-specific region 210 of the first oligonucleotide 208. In the illustrated embodiment, the second oligonucleotide 220 is complementary to and hybridizes to the target nucleic acid 200 at a location just outside the ROI 202; however, it is to be understood that the first oligonucleotide 208 and the second oligonucleotide 220 may be designed to hybridize at any defined location along the length of the target nucleic acid 200, wherein the second oligonucleotide 220 hybridizes to the target nucleic acid 200 at a 5' position relative to the target-specific region 210 of the first oligonucleotide 208. As can be seen in fig. 2C, both the first extension oligonucleotide 216 (which is attached to the solid support 218) and the second oligonucleotide 220 hybridize to the target nucleic acid 200.

Referring to fig. 2D, the hybridized second oligonucleotide 220 is extended with a second polymerase (not shown), thereby generating a second primer extension complex 222 comprising the target nucleic acid 200 and an extended second oligonucleotide 224 (where the dashed line indicates the extended portion of the extended second oligonucleotide 224). In one aspect, extension of the hybridized second oligonucleotide 220 releases the extended first oligonucleotide 216 from the first primer extension complex 214. In another aspect, the extended first oligonucleotide 216 (including the first oligonucleotide primer 208) remains attached to the solid support 218.

As illustrated in fig. 2E, the target nucleic acid 200 is amplified with a third polymerase (not shown), a first amplification primer 226, and a second amplification primer 228. The first amplification primer 226 includes a 3 'end that is complementary to the first adaptor 204, and the second amplification primer 228 includes a 3' end that is complementary to the second adaptor 206.

According to another embodiment of the invention, a method for primer extension target enrichment may comprise in situ primer-mediated capture of target nucleic acids. Referring to fig. 3A and 3B, a nucleic acid library includes a target nucleic acid 300, the target nucleic acid 300 including a region of interest (ROI) 302 (fig. 3A). The target nucleic acid 300 further comprises a first end comprising a first adaptor 304 and a second end comprising a second adaptor 306. In fig. 3A and 3B, the target nucleic acid 300 is illustrated as a single-stranded nucleic acid having a first adaptor 304 located at the 3 'end (i.e., first end) of the target nucleic acid 200 and a second adaptor 306 located at the 5' end (i.e., second end) of the target nucleic acid 300. The first oligonucleotide 308 hybridizes to the target nucleic acid 300 in the nucleic acid library. The first oligonucleotide 308 includes a 3' target-specific region 310 that is complementary to the target nucleic acid 300 and a capture moiety 312. In the illustrated embodiment, the target-specific region 310 is complementary to the ROI 302.

In contrast to the embodiment illustrated in FIGS. 2A-2E, the first oligonucleotide 308 is captured on solid support 318 prior to or simultaneously with hybridization of the first oligonucleotide 308 to the target nucleic acid 300. Solid support 318 can be a solution phase support (e.g., a bead or another similar particle), or a solid phase support (e.g., a silicon wafer, a glass slide, etc.). In the embodiment illustrated in fig. 3A, first oligonucleotide 308 is captured on solid support 318 via capture moiety 312. Turning to fig. 2B, the hybridized first oligonucleotide 308 is extended with a first polymerase (not shown), thereby generating a first primer extension complex 314 comprising the target nucleic acid 300 and an extended first oligonucleotide 316 (where the dashed line indicates the extended portion of the extended first oligonucleotide 316). Notably, the first primer extension complex 314 is captured on a solid support 318, enabling enrichment of the target nucleic acid 300 with respect to the nucleic acid library. Thereafter, a second primer hybridization and extension reaction may be performed as illustrated in FIGS. 2C and 2D, followed by an amplification step as illustrated in FIG. 2E.

According to yet another embodiment of the invention, a method for primer extension target enrichment may comprise extension-mediated capture of target nucleic acids. Referring to fig. 4A-4D, a nucleic acid library includes a target nucleic acid 400, the target nucleic acid 400 including a region of interest (ROI) 402 (fig. 4A). The target nucleic acid 400 further comprises a first end comprising a first adaptor 404 and a second end comprising a second adaptor 406. In fig. 4A-4D, the target nucleic acid 400 is illustrated as a single-stranded nucleic acid having a first adaptor 404 located at the 3 'end (i.e., first end) of the target nucleic acid 400 and a second adaptor 406 located at the 5' end (i.e., second end) of the target nucleic acid 400. The first oligonucleotide 408 hybridizes to the target nucleic acid 400 in the nucleic acid library. The first oligonucleotide 408 is complementary to the target nucleic acid 400. Notably, in contrast to the first oligonucleotide 208 comprising the capture moiety 212 in fig. 2A, the first oligonucleotide 408 does not necessarily comprise a capture moiety. In the embodiment illustrated in fig. 4A, the first oligonucleotide 408 is complementary to a portion of the ROI 402.

As shown in fig. 4B, the hybridized first oligonucleotide 408 is extended with a first polymerase (not shown), thereby generating a first primer extension complex 414 comprising the target nucleic acid 400 and an extended first oligonucleotide 416 (where the dashed lines indicate the extended portion of the extended first oligonucleotide 416). According to the embodiment illustrated in fig. 4A-4D, extension of the first oligonucleotide 408 is performed in the presence of one or more modified nucleic acids 412. Each modified nucleic acid includes a capture moiety 412a or may be modified so that the capture moiety 412a is added at the same time as or after extension of the first oligonucleotide 416. Incorporation of one or more modified nucleic acids 412 including a capture moiety 412a enables extension-mediated capture of the target nucleic acid 400 on the solid support 418. The solid support 418 can be a solution phase support (e.g., a bead or another similar particle), or a solid phase support (e.g., a silicon wafer, a glass slide, etc.). In the embodiment illustrated in fig. 4B, the first primer extension complex 414 is captured on a solid support 418 via the modified nucleic acid 412 including the capture moiety 412 a. After capture, the first primer extension complex is enriched 414 relative to the nucleic acid library.

Turning to fig. 4C, a second oligonucleotide 420 hybridizes to the target nucleic acid 400. The second oligonucleotide 420 is complementary to the target nucleic acid 400 and hybridizes to the target nucleic acid 400 at a 5' position relative to the first oligonucleotide 408. In the illustrated embodiment, the second oligonucleotide 420 is complementary to and hybridizes to the target nucleic acid 400 at a location just within the ROI 402; however, it is understood that the first oligonucleotide 408 and the second oligonucleotide 420 can be designed to hybridize at any defined position along the length of the target nucleic acid 400, wherein the second oligonucleotide 420 hybridizes to the target nucleic acid 400 at a 5' position relative to the target-specific region 410 of the first oligonucleotide 408.

With continued reference to fig. 4C, the hybridized second oligonucleotide 420 is extended with a second polymerase (not shown), thereby generating a second primer extension complex 422 comprising the target nucleic acid 400 and an extended second oligonucleotide 424 (where the dashed line indicates the extended portion of the extended second oligonucleotide 224). The first extended oligonucleotide 416 (which is attached to the solid support 418) and the second oligonucleotide 420 are each hybridized to the target nucleic acid 400 prior to extension of the second oligonucleotide 420 with the second polymerase. Extension of the hybridized second oligonucleotide 420 releases the extended first oligonucleotide 416 from the first primer extension complex 414. In another aspect, the extended first oligonucleotide 416 (including the first oligonucleotide primer 408 and the modified nucleic acid 412) remains attached to the solid support 418.

As illustrated in fig. 4D, the target nucleic acid 400 is amplified with a third polymerase (not shown), a first amplification primer 426, and a second amplification primer 428. The first amplification primer 426 includes a 3 'end that is complementary to the first adaptor 404 and the second amplification primer 428 includes a 3' end that is complementary to the second adaptor 406.

In one aspect, target and non-target nucleic acids in a nucleic acid library can exhibit intermolecular interactions that produce a daisy-chain structure. As shown in fig. 5, the target nucleic acid 200 (see also fig. 2A) includes a ROI, a first adaptor 204, and a second adaptor 206. The first oligonucleotide 208 hybridizes to the target nucleic acid 200. The first oligonucleotide 208 includes a 3' target-specific region 210 and a capture moiety 212. The nucleic acid library may further include one or more non-target nucleic acids, including a first non-target nucleic acid 500 and a second non-target nucleic acid 500'. Similar to the target nucleic acid 200, the first non-target nucleic acid 500 and the second non-target nucleic acid 500 ' each include a first end comprising a first adaptor 504 and 504 ', respectively, and a second end comprising a second adaptor 506 and 506 ', respectively. In one aspect, the first adaptor 204 is at least partially complementary to the first adaptor 504, and the second adaptor 506 is at least partially complementary to the second adaptor 506'. Thus, the target nucleic acid 200 may form a daisy chain with the non-target nucleic acid 500 and the non-target nucleic acid 500', as illustrated in fig. 5.

In various instances, it may be useful to minimize or eliminate the formation of the daisy chain structure. For example, capture of the target nucleic acid 200 by hybridization and extension of the first oligonucleotide 208 may capture the non-target nucleic acid 500 and the non-target nucleic acid 500 by association, which may result in reduced specificity of the capture and enrichment process. To reduce intermolecular interactions between the adaptor ends of target and non-target nucleic acids in a nucleic acid library, blocking oligonucleotides may be hybridized to the adaptor end sequences.

To facilitate reduction of off-target hybridization, the blocking oligonucleotide has sequences that are complementary to the adaptors (e.g., the first adaptor 204 and the second adaptor 206) and preferentially hybridizes to these adaptor sequences. Blocking oligonucleotides can be used in both single and multiplex formats. Where multiplexing is desired, various sample index sequences can be incorporated into the adaptors. However, this requires the use of matched blocking oligonucleotides. In the case of using a large sample index (e.g., 24, 96, etc.), one possibility is to use a "universal" blocking oligonucleotide. The universal blocking oligonucleotide has a unique sequence comprising non-natural nucleotides that are capable of binding to a large number of different sample index sequences. As a result, only a single blocking oligonucleotide is added to the nucleic acid sample. Alternatively (or in addition), the single universal blocking oligonucleotide may be a mixture of oligonucleotides that together make up the universal blocking oligonucleotide composition.

In one aspect, the universal blocking oligonucleotide comprises a non-specific region flanked by first and second specific regions. Non-specific regions include, for example, a series of inosines that align with the sample index sequence when the universal blocking oligonucleotide hybridizes to the target adaptor sequence. The specific region of the universal blocking oligonucleotide is complementary to an invariant portion of the adaptor sequence and includes one or more melting temperatures (T;)m) Modified bases to increase T blocking oligonucleotide-adaptor duplexesm。TmExamples of modified base substitutions are illustrated in table 1.

Table 1:

in another aspect, a library of unamplified nucleic acids prepared with two different adaptor sequences can be processed without blocking oligonucleotides if the adaptor ends do not hybridize to each other. Adapter types suitable for use in this method include bifurcated and Y-shaped adapters.

Examples

32页详细技术资料下载
上一篇:一种医用注射器针头装配设备
下一篇:二价核酸配体及其用途

网友询问留言

已有0条留言

还没有人留言评论。精彩留言会获得点赞!

精彩留言,会给你点赞!