Sequencing method for genome rearrangement detection

文档序号:1409295 发布日期:2020-03-06 浏览:19次 中文

阅读说明:本技术 用于基因组重排检测的测序方法 (Sequencing method for genome rearrangement detection ) 是由 M·科里尼 D·N·罗伯茨 于 2018-07-10 设计创作,主要内容包括:本公开涉及用于改进检测多核苷酸中存在的基因组重排比如缺失、插入、倒位和易位的单端测序方法。衔接子上的第一引发事件允许靶序列的测序,并且第二引发事件允许鉴定通过选择性扩增而扩增和加标签的序列。在相同方向上的引发事件的组合有助于读段比对和任何基因组重排的鉴定。(The present disclosure relates to single-ended sequencing methods for improved detection of genomic rearrangements, such as deletions, insertions, inversions, and translocations, present in a polynucleotide. A first priming event on the adapter allows sequencing of the target sequence and a second priming event allows identification of sequences amplified and tagged by selective amplification. The combination of priming events in the same direction facilitates read alignment and identification of any genomic rearrangements.)

1. A method of preparing a polynucleotide for sequencing by attaching a target-specific barcode, the method comprising:

amplifying a polynucleotide with a first amplification primer and a second amplification primer, wherein the first amplification primer hybridizes to a first priming site of the polynucleotide and the first amplification primer comprises a target-specific barcode;

wherein the amplification produces a polynucleotide amplicon, wherein the polynucleotide amplicon comprises a sequence that is the same as or complementary to the polynucleotide of interest and the target-specific barcode.

2. The method of claim 1, wherein the second amplification primer hybridizes (1) to a portion of an adapter attached to the polynucleotide at a distance from the first priming site, or (2) to a second priming site of the polynucleotide, wherein the second priming site is at a distance from the first priming site.

3. The method of claim 1 or claim 2, further comprising attaching an adaptor to the polynucleotide at a distance from the first priming site, wherein the adaptor comprises the second priming site.

4. The method of any preceding claim, wherein the first priming site is part of a fusion gene and the target-specific barcode is specific for the part of the fusion gene.

5. The method of claim 4, wherein said portion of the fusion gene is a point of attachment of the fusion gene.

6. The method of any one of the preceding claims, wherein the polynucleotide of interest comprises a plurality of polynucleotides of interest, and the method comprises attaching a plurality of adaptors to the plurality of polynucleotides, thereby forming a plurality of adaptor-bearing polynucleotides, each of the plurality of adaptor-bearing polynucleotides comprising a different molecular barcode.

7. The method of any one of the preceding claims, wherein the polynucleotide of interest comprises a plurality of polynucleotides of interest and the first amplification primers comprise a plurality of first amplification primers having different target-specific primers and different target-specific barcodes, thereby forming a plurality of adaptor-bearing polynucleotide amplicons, wherein each of the plurality of adaptor-bearing polynucleotide amplicons comprises a different target-specific barcode.

8. The method of any one of the preceding claims, wherein the polynucleotide amplicon or adaptor-bearing polynucleotide comprises a binding partner, such as a biotin moiety.

9. The method of any one of the preceding claims, further comprising sequencing the polynucleotide amplicon at the first and second positions by performing a first primer extension and a second primer extension, wherein the first primer extension and the second primer extension are performed in the same direction.

10. The method of claim 9, wherein the first primer extension and the second primer extension are performed in the same direction on the polynucleotide in separate sequencing runs.

11. A composition or kit for detecting a genomic rearrangement in a polynucleotide having a first binding site, the composition or kit comprising:

a first amplification primer comprising a target-specific primer and a target-specific barcode; and

a second amplification primer.

12. The composition or kit of claim 11, further comprising:

an adaptor comprising a second priming site and an adaptor barcode, and

wherein the second amplification primer comprises a priming sequence that is complementary or identical to a sequence within the adaptor.

13. A method of detecting genomic rearrangements in a polynucleotide, the method comprising:

amplifying a polynucleotide with a first amplification primer and a second amplification primer, wherein the first amplification primer hybridizes to a first priming site of the polynucleotide and the first amplification primer further comprises a target-specific barcode,

wherein the amplification produces a polynucleotide amplicon comprising a sequence that is the same as or complementary to the polynucleotide of interest and the target-specific barcode; and is

Sequencing the polynucleotide amplicon at the first and second positions by performing a first primer extension and a second primer extension, wherein the first primer extension and the second primer extension are performed in the same direction.

14. The method of claim 13, wherein sequencing at the first position provides a sequence of at least a portion of a polynucleotide of interest and sequencing at the second position provides a sequence of a target-specific barcode.

15. The method of claim 13 or claim 14, wherein the first primer extension and the second primer extension are performed in the same direction on the polynucleotide in separate sequencing runs.

16. The method of any one of claims 13-15, wherein the sequencing is Next Generation Sequencing (NGS) or massively parallel sequencing.

17. The method of any one of claims 13-16, further comprising detecting genomic rearrangements using single-ended sequencing of at least one of the polynucleotide amplicons, e.g., by identifying genomic rearrangements based on data generated from sequencing of the first primer extension and the second primer extension.

18. The method of claim 17, wherein the frequency of genomic rearrangements is about 10% or less.

19. The method of claim 17, wherein the genomic rearrangement is a translocation.

20. The method of any one of claims 13-19, wherein data generated from sequencing of the first primer extension is compared to a known nucleic acid sequence, such as a known gDNA sequence, to determine genomic rearrangement.

Technical Field

The present disclosure relates to sequencing methods, compositions, and kits for improving the detection of genomic rearrangements, such as fusion genes. The disclosure also relates to methods of making libraries of target polynucleotides comprising genomic rearrangements.

Background

The ability to identify genomic rearrangements using nucleic acid sequencing methods has proven to be very beneficial in the detection of human genetic disorders and diseases. Genomic rearrangements generally refer to any rearrangement of nucleotides in a nucleic acid strand, including a deletion, insertion, inversion or translocation of one or more nucleotides, and can be detected by sequencing the nucleic acid of interest and comparing the sequence data to a reference (such as a known nucleic acid sequence). Next Generation Sequencing (NGS) can be used to rapidly analyze polynucleotides and detect any genomic rearrangements in the polynucleotides. NGS allows parallel analysis of a large number of sequences simultaneously. In some formats, polynucleotides such as DNA are immobilized on a solid surface by one or more adapters and amplified to increase signal intensity. Typically, libraries for sequencing are prepared by fragmenting a sample into polynucleotide fragments, tagging the fragments with one or more adaptors, and amplifying the polynucleotide fragments. The fragments may be amplified using one or more amplification primers. In sequencing by synthetic format, the fragments are hybridized to sequencing primers and labeled dideoxynucleotides are added enzymatically. The signal from the labeled dideoxynucleotide is detected and analyzed to determine the sequence.

Polynucleotides of interest can be analyzed using single-ended or double-ended sequencing methods. Single-ended sequencing methods involve sequencing genomic fragments from one end of the fragment to the other. Single-ended sequencing reads provide one read per fragment, which corresponds to n base pairs at one of the two ends of the fragment, where n is the number of sequencing cycles. Single-ended sequencing is generally less suitable for detecting large-scale genomic rearrangements and repetitive sequence elements. Single-ended reads across fusion junctions provide base-pair evidence for fusion events. However, it can be difficult to ensure that the single-ended reading has been performed to a sufficient number of base pairs to identify the fusion event.

The paired-end method involves reading nucleic acid fragments from one end to the other until a specified read length is reached, followed by another round of reading from the other side of the fragment. For the double-ended method, forward and reverse sequence reads are performed and the data are paired into adjacent sequences. The sequences are matched to a reference sample to identify variants. Paired-end sequencing methods are commonly used to detect genomic rearrangements because such methods generally provide good positional information, thereby making it easier to resolve structural rearrangements present in the genome. However, many sequencing instruments do not have a configuration to perform double-ended sequencing, but are only capable of single-ended sequencing.

WO 2007133831a2 discusses methods and compositions for obtaining nucleotide sequence information of a target sequence using adapters interspersed with the target polynucleotide. The method can be used to insert a plurality of adaptors at spaced locations within a target polynucleotide or fragment. Adapters can be used as a platform to interrogate adjacent sequences using various sequencing chemistries, such as those that identify nucleotides by primer extension, probe ligation, and the like. The present disclosure includes methods and compositions for inserting known adaptor sequences into target sequences such that contiguous target sequences are interrupted by adaptors. The present disclosure indicates that identification of the entire target sequence can be accomplished by sequencing both "upstream" and "downstream" of the adapter.

WO2015112974a1 discusses aspects related to methods for preparing and analyzing nucleic acids. In some embodiments, methods of preparing nucleic acids for sequence analysis (e.g., using next generation sequencing) are provided.

WO2015148219a1 discusses a method of analyzing a target nucleic acid fragment, the method comprising generating a first strand by primer extension using a first oligonucleotide primer comprising a 5 'to 3' overhang adaptor region, a primer ID region, a sequencing primer binding site and a target-specific sequence region complementary to one end of the target fragment using one strand of the target as a template; optionally removing unbound primer; amplifying the target from the generated first strand to produce an amplification product; and detecting the amplification product. The present disclosure also discusses why unique primers can be used in such target analysis methods.

An improved method for detecting genomic rearrangements using single-ended sequencing would make a useful contribution to the art, particularly if the method is used in conjunction with high throughput sequencing analysis.

Summary of The Invention

Methods, compositions, and kits for detecting genomic rearrangements in polynucleotides are provided. The methods, compositions, and kits of the invention can be used to more easily and reliably detect genomic rearrangements using single-ended sequencing of a nucleic acid of interest.

These and other features and advantages of the present invention will be apparent from the following detailed description, in conjunction with the appended claims.

Brief description of the drawings

The present teachings are best understood from the following detailed description when read with the accompanying drawing figures. These features are not necessarily drawn to scale.

FIG. 1 shows one embodiment of a method for preparing a polynucleotide for sequencing.

FIG. 2 shows another embodiment of a method for preparing a polynucleotide for sequencing.

Terms of definition

It is to be understood that the terminology used herein is for the purpose of describing particular embodiments only and is not intended to be limiting. The defined terms are complementary to the technical and scientific meanings of the defined terms as commonly understood and accepted in the technical field of the present teachings.

As used in the specification and the appended claims, the term "substantial" or "substantially" is meant to be within the limits or degrees acceptable to those of ordinary skill in the art, except in their ordinary sense. For example, "substantially cancel" means that the cancellation is deemed acceptable by one skilled in the art.

As used in the specification and the appended claims, the terms "about" and "approximately" are meant to be within the limits or amounts acceptable to those of ordinary skill in the art, except in the ordinary sense. The term "about" generally refers to plus or minus 15% of the indicated numerical value. For example, "about 10" may indicate a range of 8.7 to 1.15. For example, "substantially the same" means that one of ordinary skill in the art would consider the same thing to be compared.

The terms "polynucleotide" and "nucleic acid" are used interchangeably herein to describe a polymer of any length, for example, a nucleotide consisting of greater than about 10 bases, greater than about 100 bases, greater than about 500 bases, greater than 1000 bases, up to about 10,000 or more bases, such as a deoxyribonucleotide or ribonucleotide, or a synthetically produced compound (e.g., PNA, as described in U.S. patent No.5,948,902 and references cited therein), that can hybridize to a naturally occurring nucleic acid in a sequence-specific manner similar to the hybridization of two naturally occurring nucleic acids, e.g., can participate in watson-crick base pairing interactions. Naturally occurring nucleotides include guanine, cytosine, adenine and thymine (G, C, A and T, respectively). As used in the specification and the appended claims, unless otherwise specified, a polynucleotide may be an adaptor-bearing polynucleotide, a polynucleotide amplicon, or an adaptor-bearing polynucleotide amplicon. The polynucleotide with the adapter differs from the polynucleotide of interest in that the adapter has been added to the polynucleotide of interest.

As used herein, the term "target nucleic acid" or "target" refers to a nucleic acid that contains a target nucleic acid sequence. The target nucleic acid can be single-stranded or double-stranded, and is often double-stranded DNA. As used herein, "target nucleic acid sequence," "target sequence," or "target region" refers to a particular sequence or its complement. The target sequence may be in any form of single-stranded or double-stranded nucleic acid in vitro or in vivo nucleic acid within the genome of the cell.

"hybridization" or "hybridizing" refers to the process by which completely or partially complementary nucleic acid strands come together under specified hybridization conditions to form a double-stranded structure or region in which the two constituent strands are hydrogen bonded. Although hydrogen bonds are typically formed between adenine and thymine or uracil (A and T or U) or between cytosine and guanine (C and G), other base pairs can also form hydrogen bonds (e.g., Adams et al, "The Biochemistry of The nucleic Acids,"11th ed., 1992).

The term "primer" refers to an enzymatically prepared or synthesized oligonucleotide that, when formed into a duplex with a polynucleotide template, is capable of acting as a point of initiation of nucleic acid synthesis and extending from its 3' end along the template, thereby forming an extended duplex. The nucleotide sequence added during the extension process is determined by the sequence of the template polynucleotide. The primer serves as a starting point for nucleotide polymerization catalyzed by a DNA polymerase, an RNA polymerase or a reverse transcriptase. The length of the primer may be 4 to 1000 bases or longer, for example 10 to 500 bases.

As used herein, the term "primer extension" refers to the extension of a primer by annealing a particular oligonucleotide to the primer using a polymerase. The term "adaptor" refers to a nucleic acid molecule that is attached to a polynucleotide of interest to form a synthetic polynucleotide. The adapters may be single-stranded or double-stranded, and may comprise DNA, RNA, and/or artificial nucleotides. The adapter may be located at the end of the polynucleotide of interest, or may be located in the middle or inside. Adapters may add one or more functions or properties to a polynucleotide of interest, such as providing priming sites for amplification or sequencing or adding barcodes. For example, the adaptors may include universal primers and/or universal priming sites, including priming sites for sequencing. As further examples, the adapters may contain one or more barcodes of various types or for various purposes, such as molecular barcodes, sample barcodes, and/or target-specific barcodes. Various adapters are known in the art and may be used in the methods, compositions, and kits of the present invention or modified for use. For example, the adapters include Y adapters that can be attached to polynucleotides to generate libraries with altered 5' ends. The adapter may also include a separate sequence (e.g., an a/B adapter), where the a adapter is attached to one end of the polynucleotide and the B adapter is attached to the other end of the polynucleotide. The adaptors also include stem-loop adaptors, wherein hairpin loops are attached to the ends of the polynucleotide; a portion (typically a stem) may be cleaved prior to amplification or sequencing. The adapter may be attached to the polynucleotide of interest by any suitable technique, including but not limited to ligation, use of transposase, hybridization, and/or primer extension. For example, an adaptor can be ligated to the end of the polynucleotide of interest. As another example, the adaptors are attached by inserting transposons comprising the adaptors into the polynucleotides of interest using transposases, thereby providing the adaptors at the ends of the fragments of the polynucleotides of interest. In some embodiments, the adapter comprises a target-specific primer and a target-specific barcode that allows for attachment of the adapter to the polynucleotide of interest (more specifically, to a complementary polynucleotide) by primer extension of the target-specific primer.

The term "sequencing" refers to determining the identity of one or more nucleotides, i.e., whether a nucleotide is G, A, T or C.

The term "single-ended sequencing" refers to determining the sequence of a polynucleotide using reads from one end of the polynucleotide ("single-ended reads"). Single-ended reads can be performed by any sequencing process, including next generation sequencing and other massively parallel sequencing techniques. Instruments configured to perform single-ended sequencing are commercially available from a number of companies. For example, Hiseq 2500 by Illumina can provide read lengths of 50bp and 100bp single-ended. In some embodiments, the nominal, average, mean, or absolute length of a single-ended read is at least 20 consecutive nucleotides, or at least 30 consecutive nucleotides, or at least 40 consecutive nucleotides, or at least 50 consecutive nucleotides. In some embodiments, the nominal, average, mean, or absolute length of a single-ended read is at most 300 consecutive nucleotides, at most 200 consecutive nucleotides, or at most 150 consecutive nucleotides, or at most 120 consecutive nucleotides, or at most 100 consecutive nucleotides. The foregoing minimum and maximum values may be combined to form a range.

As used herein, the term "portion" or "fragment" of a sequence refers to any portion of the sequence that is smaller than the complete sequence (e.g., a nucleotide subsequence or an amino acid subsequence). The length of a portion of a polynucleotide can be any length, for example, at least 5, 10, 15, 20, 25, 30, 40, 50, 75, 100, 150, 200, 300, or 500 or more nucleotides in length. A portion of the leader sequence may be about 50%, 40%, 30%, 20%, 10% of the leader sequence, for example one third or less, for example 7, 6, 5, 4, 3 or 2 nucleotides in length, of the leader sequence.

The term "fusion gene" refers to a polynucleotide formed from two previously separated genes. Fusion genes can result from translocations, deletions in the middle, or inversions of chromosomes, which are often found in human cancer cells. The fusion gene can result in the expression of a fusion transcript that is translated into a fusion protein that alters the normal regulatory pathways of the cell and/or promotes the growth of the cancer cell. Gene variants may also produce abnormal proteins that affect normal regulatory pathways. Many fusion gene polynucleotides are known and more are being discovered. For example, US20100279890, US20140120540, US20140272956, and US20140315199 disclose a number of fusion genes associated with cancer and other diseases, and methods of detecting such fusion genes. The methods, compositions and kits of the invention can be used to detect known gene fusions, and can be used to discover previously unknown gene fusions.

As used herein, the term "priming site" refers to a site within an oligonucleotide or polynucleotide that is configured to hybridize to a primer, such that adjacent sequences or sequences that are sufficiently close for single-ended sequencing can be amplified or sequenced, e.g., by primer extension. The priming site may be a sequence present in the polynucleotide of interest or may be a sequence added to the polynucleotide by the addition of an adaptor comprising the priming site. Adapters containing priming sites may be added by ligation, by use of transposase, by primer extension, or by other techniques.

In the present disclosure, numerical ranges include the numbers defining the range. In the present disclosure, wherever the word "comprising" is seen, it is contemplated that the phrase "consisting essentially of or" consisting of "may be used instead. It should be appreciated that the chemical structure and formula may be extended or expanded for illustrative purposes.

As used in the specification and the appended claims, the terms "a", "an", and "the" include both singular and plural referents unless the context clearly dictates otherwise. Thus, for example, reference to "a primer" includes a primer and a plurality of primers. In this disclosure, an ordinal number such as the terms first, second, third, etc., does not indicate that a first event occurs before a second event (unless the context indicates otherwise); rather, they are used to distinguish different events from each other.

Unless defined otherwise, all technical and scientific terms used herein have the same meaning as commonly understood by one of ordinary skill in the art to which this disclosure belongs.

As disclosed herein, a number of numerical ranges are provided. It is understood that each intervening value, to the tenth of the unit of the lower limit unless the context clearly dictates otherwise, between the upper and lower limit of that range is also specifically disclosed. Each smaller range between any stated value or intervening value in a stated range and any other stated or intervening value in that stated range is encompassed within the invention. The upper and lower limits of these smaller ranges may independently be included or excluded in the stated range, and each range where either, neither or both limits are included in the smaller ranges is also encompassed within the invention, subject to any specifically excluded limit in the stated range. Where it is stated that a range includes one or both of the limits, ranges excluding either or both of those included limits are also included in the invention.

Unless defined otherwise, all technical and scientific terms used herein have the same meaning as commonly understood by one of ordinary skill in the art to which this disclosure belongs. Although any methods and materials similar or equivalent to those described herein can be used in the practice or testing of the present teachings, some exemplary methods and materials are now described.

All patents and publications mentioned herein are expressly incorporated herein by reference. The citation of any publication is for its disclosure prior to the filing date and should not be construed as an admission that the present claims are not entitled to antedate such publication. Further, the dates of publication provided may be different from the actual publication dates which may be independently confirmed.

It will be apparent to those skilled in the art upon reading this disclosure that each of the individual embodiments described and illustrated herein has discrete components and features which may be readily separated from or combined with the features of any of the other several embodiments without departing from the scope or spirit of the present teachings. Any recited method may be performed in the order of events recited or in any other order that is logically possible.

Detailed Description

In some embodiments, the present disclosure provides methods of preparing polynucleotides for sequencing by attaching target-specific barcodes. The method comprises amplifying the polynucleotide with a first amplification primer and a second amplification primer, wherein the first amplification primer comprises a first priming sequence and a target-specific barcode, wherein the first priming sequence hybridizes to a first priming site of the polynucleotide. This amplification produces a polynucleotide amplicon, wherein the polynucleotide amplicon comprises a sequence that is the same as or complementary to the polynucleotide of interest and the target-specific barcode.

The first amplification primer comprises a target-specific (i.e., complementary to and/or hybridized to a target sequence within the adapter-bearing polynucleotide) first amplification primer. The first amplification primer further comprises a target-specific barcode that is a barcode specific for the target sequence, e.g., a barcode specific for a portion of a gene (e.g., a portion of a fusion gene). The amplification produces a polynucleotide amplicon, wherein the polynucleotide amplicon comprises a sequence that is the same as or complementary to the polynucleotide of interest and the target-specific barcode. The second amplification primer hybridizes (1) to a portion of an adapter attached to the polynucleotide at a distance from the first priming site, or (2) to a second priming site of the polynucleotide, wherein the second priming site is at a distance from the first priming site. In some embodiments, the method may further comprise attaching an adaptor to the polynucleotide to form an adaptor-tagged (adapted) polynucleotide, wherein the adaptor comprises a second priming site and optionally an adaptor barcode. In some embodiments, the second priming site is on the adaptor and is a universal priming site and/or a site of a sequencing primer, and/or the second primer binding site is a universal priming site at the 5' end of the adaptor-bearing polynucleotide. In some embodiments, the adaptor and/or the second priming site is at the 5 'end of the strand of the polynucleotide and the first priming site is at the 3' end of the strand. In some embodiments, the adapter barcode is a sample barcode or a molecular barcode. The molecular barcode may be a unique sequence in that it is unique within a set of adaptors attached to a polynucleotide pool of interest.

In some embodiments, the present disclosure provides methods, compositions, and kits for preparing a polynucleotide library for sequencing by attaching target-specific barcodes. Amplifying the pool of nucleotides using a first set of amplification primers and a second set of amplification primers, wherein the first set of amplification primers hybridize to a plurality of different sequences within the pool of polynucleotides, wherein each of the first set of amplification primers comprises a different target-specific barcode. In some embodiments, an adaptor comprising an adaptor barcode is attached to the polynucleotide amplicon. The second set of amplification primers hybridizes (1) to a portion of an adapter attached to the polynucleotide at a distance from the first priming site, or (2) to a second priming site of the polynucleotide, wherein the second priming site is at a distance from the first priming site. Amplification with the first and second sets of primers generates a library of polynucleotide amplicons. Adapters may be added to the polynucleotide amplicons. In some embodiments, the adapter is added prior to amplification and comprises a second priming site that hybridizes to the second set of amplification primers. In some embodiments, the adapter is added after amplification, e.g., to provide a sequencing priming site on the polynucleotide amplicon.

Sequencing each of the plurality of polynucleotide amplicons may be performed at two positions by performing a first primer extension and a second primer extension, wherein the sequencing of the first primer extension and the second primer extension is performed in the same direction for each of the adapter-bearing polynucleotide amplicons. Genomic rearrangements can be identified based on data generated from sequencing of the first primer extension and the second primer extension.

In other embodiments, the present disclosure provides compositions and kits for detecting genomic rearrangements in a polynucleotide having a first binding site. The compositions and kits comprise first and second amplification primers. The first amplification primer comprises a target-specific primer and a target-specific barcode. The compositions and kits may further comprise an adaptor. The adapter comprises a second priming site and an adapter barcode. In some embodiments, the second amplification primer comprises a priming sequence that is complementary or identical to a sequence within the adaptor, e.g., a second priming site. In some embodiments of the compositions and kits, the second amplification primer hybridizes (1) to a portion of an adapter attached to a polynucleotide at a distance from a first priming site, or (2) to a second priming site of a polynucleotide, wherein the second priming site is at a distance from the first priming site. In some embodiments, the adaptor and/or the second priming site is at the 5 'end of the strand of the polynucleotide and the first priming site is at the 3' end of the strand.

In other embodiments, the present disclosure provides methods, compositions, and kits for detecting genomic rearrangements in polynucleotides. The methods, compositions, and kits include amplifying a polynucleotide with a first amplification primer and a second amplification primer. The first amplification primer hybridizes to a first priming site of a polynucleotide, and the first amplification primer further comprises a target-specific barcode. The amplification produces a polynucleotide amplicon comprising a sequence that is the same as or complementary to the polynucleotide of interest and the target-specific barcode. Sequencing the polynucleotide amplicon at the first and second positions by performing a first primer extension and a second primer extension. The first primer extension and the second primer extension may be performed in the same direction.

In the foregoing methods, compositions, and kits, the target-specific barcode is specific to a target, such as a gene, a portion of a gene, a fused gene, a portion of a fused gene, or other polynucleotide of interest. The fusion gene may be a known fusion gene, including a junction point of a known fusion gene, and/or the fusion gene may be a suspected or putative fusion gene, or a junction point of such a fusion gene. The target may be a genomic rearrangement, such as a deletion, insertion, inversion, and translocation in the polynucleotide of interest. In some embodiments, the target is a cDNA junction or an exon junction.

In some embodiments, the second amplification primer hybridizes to a portion of the adaptor, such as a second priming site, which may be a sequencing priming site for the adaptor. In some embodiments, the adapter-bearing polynucleotide comprises an adapter at the 5 'end and/or a target-specific barcode at the 3' end.

In some embodiments, the polynucleotide of interest comprises a plurality of polynucleotides of interest, and the method comprises attaching a plurality of adapters to the plurality of polynucleotides, thereby forming a plurality of adapter-tagged polynucleotides each comprising a different adapter barcode. Alternatively or additionally, wherein the polynucleotide of interest comprises a plurality of polynucleotides of interest and the first amplification primer comprises a plurality of first amplification primers having different target-specific primers and target-specific barcodes, thereby forming a plurality of adaptor-bearing polynucleotide amplicons each comprising a different target-specific barcode.

In some embodiments, the adapter-bearing polynucleotide amplicons are sequenced at the first and second positions by performing a first primer extension and a second primer extension, wherein the first primer extension and the second primer extension are performed in the same direction. In some embodiments, the first primer extension is performed with a first sequencing primer that is complementary or identical to a portion of the adaptor, such as the second priming site. In some embodiments, the second primer extension is performed with a second sequencing primer that is complementary to or the same as a portion of the first amplification primer, such as a portion adjacent to or sufficiently close to the target-specific barcode for single-ended sequencing of the target-specific barcode.

Sequencing by primer extension was performed as follows: hybridizing a primer to the polynucleotide amplicon; extending the primer by adding one or more labeled nucleotides, thereby generating incorporated labeled nucleotides; and detecting the incorporated labeled nucleotide. The sequencing primer may be complementary or identical to a sequence on the adapter. In some embodiments, the first primer extension and the second primer extension are performed in the same direction on the polynucleotide in separate sequencing runs. In some embodiments, the sequencing is Next Generation Sequencing (NGS) or massively parallel sequencing. The data generated from sequencing of the first primer extension and/or the second primer extension can be compared to known nucleic acid sequences (such as known gDNA sequences).

The methods, compositions, and kits of the invention are useful for sequencing polynucleotides, including genomic dna (gdna), complementary dna (cdna) derived from an RNA template (e.g., messenger RNA (mRNA) or microRNA), mitochondrial dna (mtdna), RNA (e.g., mRNA, microRNA), and other polynucleotides. The polynucleotide may be of any origin, such as microbial, viral, fungal, plant or mammalian.

In some embodiments, the methods, compositions, and kits of the invention are used to detect the presence, location, or absence of a genomic rearrangement in a polynucleotide of interest. The genomic rearrangement may be a deletion, a duplication, an insertion, an inversion, or a translocation, and the methods, compositions, and kits may be used to detect whether certain genomic sequences or genes have been deleted, duplicated, inserted, inverted, or translocated in a polynucleotide of interest. In some embodiments, the methods, compositions, and kits of the invention are used to detect genomic deletions. In some embodiments, the methods, compositions, and kits of the invention are used to detect genomic repeats. In some embodiments, the methods, compositions, and kits of the invention are used to detect genomic insertions. In some embodiments, the methods, compositions, and kits of the invention are used to detect inversion of a genome. In some embodiments, the methods, compositions, and kits of the invention are used to detect genomic translocations. In some embodiments, the methods, compositions, and kits of the invention are used to detect genomic rearrangements in polynucleotides such as gDNA or cDNA derived from RNA. In some embodiments, the frequency of genomic rearrangements is about 100% or less, or about 50% or less, or about 10% or less, or about 5% or less, or about 1% or less. In some embodiments, the methods of the invention further comprise detecting genomic rearrangements using single-ended sequencing of the polynucleotide amplicons, for example, by identifying genomic rearrangements based on data generated from sequencing of the first primer extension and the second primer extension. In some embodiments, the genomic rearrangement is a translocation.

Sequencing methods useful for detecting genomic rearrangements in polynucleotides are provided. The methods of the invention can be used to more easily and reliably detect genomic rearrangements using single-ended sequencing of a nucleic acid of interest. The methods of the invention can be used in Next Generation Sequencing (NGS) processes to detect deletions, insertions, inversions and translocations in a polynucleotide of interest. The methods of the invention involve sequencing of first and second primer extensions in the same direction to improve the accuracy of polynucleotide rearrangement detection. The combined sequence data from the first and second primer extensions facilitates read alignment and identification of genomic rearrangements in the polynucleotide. The combination of reads generated in the same direction allows for more accurate identification of the relative position of the nucleic acids in the polynucleotide. Compared with the standard single-ended sequencing method, the method improves the capability of identifying the relative positions of the nucleotides in the genome in the single-ended sequencing process, thereby generating more effective structural rearrangement analysis.

The methods of the invention can be used in high throughput sequencing methods, such as Next Generation Sequencing (NGS) processes. In some embodiments, a high throughput sequencing method comprises three steps: preparation, immobilization and sequencing of the library. Polynucleotides are typically fragmented randomly and adapters are ligated to one or both ends of the fragments. The adapter may be a linear adapter, a circular adapter, or a bubble adapter. Sequencing library fragments are immobilized on a solid support and parallel sequencing reactions are performed to interrogate the polynucleotide sequences. High throughput sequencing methods may employ emulsion PCR, bridge PCR, or rolling circle amplification to provide copies of the original polynucleotide.

Polymerases tend to generate errors (most commonly erroneous incorporation of nucleotides) during PCR, which can appear as variants in sequencing data analysis if they occur in early cycles. Molecular barcodes can be used to distinguish PCR errors from actual variants in a polynucleotide of interest. The concept of molecular barcodes is that each polynucleotide in the library to be amplified is attached to a unique molecular barcode. Sequence reads with different molecular barcodes represent different original DNA molecules, while reads with the same barcode are the result of PCR replication from the same original molecule. Molecular barcodes called Degenerate Base Regions (DBRs) are disclosed in U.S. patent 8,481,292 (position Genetics Technologies Ltd.). DBRs are random sequence tags attached to molecules present in a sample. DBRs and other molecular barcodes allow for the differentiation of PCR errors during sample preparation from mutations and other variants present in the original polynucleotide.

Attaching adapters to polynucleotides

In some embodiments, the polynucleotide is attached to an adaptor to form an adaptor-bearing polynucleotide. The adapter may be attached to the polynucleotide before or after amplification, and in some embodiments, the polynucleotide is a polynucleotide amplicon and the adapter-bearing polynucleotide is an adapter-bearing polynucleotide amplicon. The adaptors may be attached by any suitable technique, such as by ligation, use of transposases, hybridization, and/or primer extension. In some embodiments, the polynucleotide is ligated to an adaptor at one or both ends. In a ligation reaction, a covalent bond or linkage is formed between the ends of two or more polynucleotides (such as a nucleic acid of interest) or oligonucleotides (such as an adaptor). The nature of the bond or linkage may vary and the linkage may be carried out enzymatically or chemically. Ligation is typically performed enzymatically to form a phosphodiester bond between the 5 'carbon of the terminal nucleotide of one polynucleotide or oligonucleotide and the 3' carbon of the other polynucleotide or oligonucleotide. In some embodiments, the adaptor is a Y adaptor that can generate libraries with varying 5' ends and with P5 and P7 priming sites suitable for use on MiniSeq, NextSeq, and HiSeq3000/4000 sequencing instruments.

In some embodiments, the a/B adaptor is attached to the polynucleotide of interest, wherein the a adaptor is attached to one end of the polynucleotide and the B adaptor is attached to the other end of the polynucleotide. In some embodiments, the a/B adaptors are attached by random ligation or amplification using transposase or by primer extension. It is expected that the individual characteristics of the a and B adaptors provide that each polynucleotide included in the sequencing program will include both a and B adaptors (i.e., one type of adaptor is attached to the 5 'end of each polynucleotide undergoing sequencing, while the other type of adaptor is attached to the 3' end, denoted as a/B adaptor combination). Due to the random nature of the ligation step, polynucleotides with A/A and B/B adaptors will also be produced, and subsequent processing steps may be employed to ensure that only molecules with A/B adaptor combinations are selected and/or included in the sequencing procedure. The adapter-bearing polynucleotide can be amplified using primers directed to portions of the adapter to increase the amount of the polynucleotide of interest, either before or after amplification for attachment of the target-specific barcode as described herein. In some embodiments, the adapters are ligated in a manner and with a sufficient number of polynucleotides to generate a fully sequenceable library for massively parallel sequencing.

In some embodiments, the adaptor comprises an adaptor barcode. The adapter barcode may be used for any desired purpose, such as an identifier of the source or nature of the polynucleotide. Barcodes generally refer to any sequence information used for polynucleotide identification, grouping or processing. Barcodes may be included to identify individual reads, groups of reads, subsets of reads associated with probes, subsets of reads associated with exons, subsets of reads associated with samples or any other groups, or any combination thereof. For example, sequences can be sorted by sample, exon, probe set, or a combination thereof by reference to barcode information (e.g., using a computer processor). Barcode information can be used to assemble contigs. The computer processor may identify the barcodes and assemble the reads by organizing the barcodes together.

The polynucleotide may be obtained by any suitable mechanism. The polynucleotide of interest may be genomic deoxyribonucleic acid (gDNA), cDNA, mRNA, mitochondrial DNA, or other types. The polynucleotide may be a mammalian, viral, fungal or bacterial polynucleotide or a mixture thereof. In some embodiments, polynucleotide strands, such as genomic DNA, are fragmented using any suitable technique prior to attaching the adapters to the polynucleotides. Polynucleotide strands may be fragmented using physical fragmentation, enzymatic fragmentation or chemical shear fragmentation as is known in the art. In some embodiments, the polynucleotides are fragmented using physical fragmentation methods such as sonication, sonic shearing, or hydrodynamic shearing. In some embodiments, the polynucleotide is fragmented using a restriction enzyme. In some embodiments, the polynucleotide is fragmented using an enzyme, such as DNase I or transposase. In some embodiments, the polynucleotides are fragmented using chemical shearing methods such as thermal digestion in the presence of metal cations. In some embodiments, the polynucleotides are randomly fragmented. In some embodiments, the polynucleotide may be treated with sodium bisulfite or other chemical modifiers. In some embodiments, the polynucleotide fragments are used to populate (population) sequencing libraries.

The polynucleotide fragment may be of any suitable base length. In some embodiments, the polynucleotide fragment has a base length of about 30 to about 2,000. In some embodiments, the polynucleotide fragment has a base length of about 30 to about 800. In some embodiments, the polynucleotide fragment has a base length of about 30 to about 500. In some embodiments, the polynucleotide fragment has a base length of about 100 to about 800. In some embodiments, the polynucleotide fragment has a base length of about 200 to about 600.

After fragmentation, one or more adaptors can be attached to the polynucleotide fragments. In some embodiments, the adaptor is a linear adaptor, a circular adaptor, or a bubble adaptor. In some embodiments, the polynucleotide is ligated to at least one circular adaptor. In some embodiments, the polynucleotide fragments are contacted with a circular adaptor to generate a circular polynucleotide molecule. In some embodiments, only the circular polynucleotide molecule is amplified during the amplification process. In any of these embodiments, the adapter may comprise an adapter barcode.

Amplification of target polynucleotides

The methods of the invention comprise amplifying the polynucleotide before and/or after it is attached to the adapter. In some embodiments, the adapter is located at the 5' end of the sequence of interest in the polynucleotide, and the adapter provides a priming site for amplification of the sequence of interest. The adapter-bearing polynucleotide is amplified using a first amplification primer and a second amplification primer. The first amplification primer is sequence specific for a target sequence in the polynucleotide and is capable of hybridizing to a portion of the target sequence (the polynucleotide of interest). The second amplification primer is capable of hybridizing to a priming site of the adapter or to a target-specific priming site of the polynucleotide of interest. During the amplification step, a first amplification primer hybridizes to the target sequence and a second primer hybridizes to a sequence priming site on the adapter. In some embodiments, the first amplification primer hybridizes to the 5' end of the adapter-bearing polynucleotide. The primers of the methods of the invention should be large enough to provide sufficient hybridization to the target sequence of the polynucleotide.

For amplification, the polynucleotide of interest is hybridized to a first amplification primer comprising a target-specific barcode. The first amplification primer is complementary to at least a portion of the polynucleotide. The first amplification primer hybridizes to a first priming site of the polynucleotide. The polynucleotide comprises a target sequence at the 3' end, optionally followed by an adaptor. If the target sequence is present in the adapter-bearing polynucleotide, the first amplification primer hybridizes to the adapter-bearing polynucleotide, thereby allowing selective amplification and detection of the target sequence. The first amplification primer can be complementary to and/or hybridize to a genomic rearrangement, such as a deletion, insertion, inversion, or translocation in the polynucleotide of interest. In some embodiments, the first amplification primer is complementary to and/or hybridizes to a point of cDNA ligation or to a point of exon ligation. In some embodiments, the first amplification primer is complementary to and/or hybridizes to a fusion gene, such as a known fusion gene, including a junction point of a known fusion gene and/or a suspected or hypothetical fusion gene, or a junction point of a suspected or hypothetical fusion gene.

The second amplification primer hybridizes to the polynucleotide or adaptor at a distance from the first priming site. In some embodiments, the second amplification primer hybridizes to a portion of an adaptor attached to the polynucleotide at a distance from the first priming site. In some embodiments, the second amplification primer hybridizes to a second priming site of the polynucleotide, wherein the second priming site is a distance from the first priming site.

The polynucleotide of interest may be amplified using any suitable method. In some embodiments, the polynucleotide is amplified using the Polymerase Chain Reaction (PCR). In general, PCR involves denaturation of a polynucleotide strand (e.g., DNA melting), annealing of primers to the denatured polynucleotide strand, and extension of the primers with a polymerase to synthesize a complementary polynucleotide. The process typically requires a DNA polymerase, forward and reverse primers, deoxynucleotide triphosphates, divalent cations, and a buffer solution. In some embodiments, the polynucleotide is amplified by linear amplification. In some embodiments, the polynucleotide is amplified using emulsion PCR, bridge PCR, or rolling circle amplification. The amplified polynucleotides can be analyzed using suitable sequencing methods to determine the order of the base pairs.

In some embodiments, one or more primers or polynucleotides are immobilized on a solid support. Immobilization of the amplification primers and/or polynucleotide can facilitate washing of the polynucleotide to remove any undesirable species (e.g., deoxynucleotides). In some embodiments, the polynucleotide comprises one or more adapters attached to a solid support, thereby immobilizing the polynucleotide on the support. In some embodiments, the polynucleotide is immobilized on the surface of a flow cell or slide. In some embodiments, the polynucleotide is immobilized on a microtiter well or magnetic bead. In some embodiments, the solid support may be coated with a polymer attached to a functional group or moiety. In some embodiments, the solid support may carry functional groups, such as amino, hydroxyl, or carboxyl groups, or other moieties, such as avidin or streptavidin, for attachment of adapters.

The polynucleotide amplicon may be an adaptor-bearing polynucleotide amplicon. In some embodiments, the adapter-bearing polynucleotide or polynucleotide amplicon comprises a binding partner, such as a biotin moiety. The polynucleotide may be attached to an adaptor comprising a binding partner, or the polynucleotide may be amplified using one or more primers comprising a binding partner. In some embodiments, the methods of the invention comprise forming a complex between the mutual binding partners, such as biotinylated primer extension product and solid-supported anti-avidin or streptavidin. The method may further comprise enriching a sample comprising adaptor-bearing polynucleotides comprising binding partners by binding to the mutual binding partners. Avidin and streptavidin proteins form exceptionally tight complexes with biotin and certain biotin analogues. Typically, when biotin is coupled to a second molecule through its carboxy side chain, the resulting conjugate is still tightly bound by avidin or streptavidin. When such a conjugate is prepared, the second molecule is said to be "biotinylated". In general, the methods of the invention involve complexing biotinylated nucleic acids with avidin or streptavidin, and then detecting, analyzing, and/or using the complexes. In some embodiments, the biotinylated polynucleotide is immobilized on a flowcell coated with streptavidin or a metal bead coated with streptavidin. In some embodiments of the methods, compositions, and kits of the invention, a target-specific primer (e.g., a first amplification primer) can be attached to a binding partner, such as a biotin moiety, to allow selection or purification by binding to a mutual binding partner, such as streptavidin or avidin. Useful binding partners include biotin-avidin, biotin-streptavidin, antibodies-antigen and complementary nucleic acids. In some embodiments, the target-specific primer may include a binding partner, such as biotin, to allow capture of the selectively amplified pool.

For the preparation of next generation sequenced polynucleotides, target enrichment is typically employed prior to next generation sequencing, and one or more target enrichment protocols may be included in the present methods. By enriching for one or more desired target polynucleotides, sequencing can be more focused, while reducing workload and expense and/or increasing depth of coverage. Examples of enrichment protocols currently used for next generation sequencing include hybridization-based Capture protocols such as SureSelect Hybrid Capture by Agilent and TruSeq Capture by Illumina. Other examples include PCR-based protocols such as HaloPlex by Agilent; AmpliSeq from ThermoFisher; TruSeq amplification of Illumina; and emulsion/digital PCR by Raindance.

In some embodiments, a library of polynucleotides having universal linkers at both ends is amplified using methods such as PCR. Target specific primers comprising custom adaptors can be added to the reaction to amplify the target sequence. In such embodiments, two pools of fragments are generated: (a) a pool of fragments with universal linkers at both ends, and (b) a pool of fragments generated by selective amplification with sequence-specific linkers at one or both ends. If desired, target enrichment can be performed on the mixed pool of fragments.

In some embodiments of the methods, compositions, and kits of the invention, more than one target-specific primer is employed or provided for amplification. Amplification may be singleplex or multiplex. Multiplex PCR is a molecular biology technique used to amplify multiple nucleic acid targets in a single PCR experiment. Kits for multiplex amplification of target sequences are available from Multiplicom NV.

In some embodiments of the methods, compositions, and kits of the invention, the polynucleotide amplicons are used in Transposable Element (TE) protocols. The adaptors may be attached to the amplicons by inserting transposons containing the adaptors using transposases, thereby providing the adaptors at the ends of the amplicon fragments. In some embodiments, the polynucleotide may be fragmented and barcoded simultaneously. For example, a transposase (e.g., NEXTERA) can be used to fragment and add barcodes to the polynucleotides.

Fusion gene

The target-specific primer may be complementary to or identical to a portion of any known or suspected fusion gene. For example, the target-specific primer may be complementary to or identical to any of the fusion genes disclosed in US20100279890, US20140120540, US20140272956, or US 20140315199. As a further example, the target-specific primer may be complementary or identical to any of the following fusion genes: BCR-ABL, EML4-ALK, TEL-AML1, AML1-ETO and TMPRSS 2-ERG. Alternatively, the target-specific primer may be complementary to or identical to the newly discovered fusion gene or to the point of attachment of such fusion gene. Alternatively, the target-specific primer may be complementary to or identical to the suspected or putative fusion gene or to a point of attachment of such a fusion gene.

In some embodiments, the methods, compositions, and kits of the invention comprise a plurality of target-specific primers for different fusion genes. For example, the plurality of target-specific primers may include a first target-specific primer for a BCR-ABL junction and a second target-specific primer for EML 4-ALK. In some embodiments, the methods, compositions, and kits of the invention comprise a plurality of target-specific primers for a single fusion gene (including multiple points of attachment for a single fusion gene). For example, the plurality of target-specific primers may include a first target-specific primer for a first EML4-ALK junction and a second target-specific primer for a second EML4-ALK junction. The methods, compositions, and kits of the invention can comprise a third target-specific primer, a fourth target-specific primer, a fifth target-specific primer, up to a twentieth target-specific primer, or even more target-specific primers.

Sequencing of target sequences

Following amplification, the polynucleotide amplicons with the adapters can be sequenced. For example, sequencing can be performed by first primer extension and second primer extension of the adapter-bearing polynucleotide amplicons generated during the amplification process. In some embodiments, the first and second primer extensions are performed in the same direction on a single amplicon or a group of the same amplicons. First primer extension sequencing is determined by detecting the bases incorporated as a result of extension of the first primer (and other primers), thereby allowing determination of at least a portion of the target sequence of the polynucleotide, particularly those located 5' to the adapter. The adapter-bearing polynucleotide may comprise a sequencing priming site, such as a P5 or P7 priming site. In some embodiments, first primer extension may also be used to detect the sequence of the adapter barcode. Second primer extension sequencing is determined by detecting the base incorporated as a result of second primer extension, allowing detection of the target-specific barcode. Sequencing of target-specific barcodes is used to confirm the presence and/or location of genes or other polynucleotides specific for the target-specific barcode in the polynucleotide of interest.

In some embodiments, sequencing is performed by massively parallel sequencing using sequencing-by-synthesis with reversible dye terminators. In some embodiments, sequencing is performed by massively parallel sequencing using side-by-side sequencing. In some embodiments, sequencing is performed by single molecule sequencing. In some embodiments, sequencing is performed using pyrosequencing.

The polynucleotide may be sequenced using any suitable reaction method. In some embodiments, a single reaction cycle may be completed using a single nucleotide (i.e., a nucleotide corresponding to G, A, T or C), and the method involves detecting whether a nucleotide is incorporated. If a nucleotide is incorporated, the identity of the nucleotide will become known. In such embodiments, the method may involve cycling through all four nucleotides (i.e., the nucleotides corresponding to G, A, T and C) in turn, and one of the nucleotides should be incorporated. In such embodiments, the addition of nucleotides can be detected by, for example, detecting pyrophosphate release, proton release, or fluorescence, as is known for such methods. For example, in some embodiments, the chain terminator nucleotide can be a terminal phosphate-labeled fluorescent nucleotide (i.e., a nucleotide having a fluorophore attached to a terminal phosphate), and the identifying step comprises reading fluorescence. In other embodiments, the chain terminator nucleotide may be a fluorescent nucleotide comprising a quencher on a terminal phosphate. In such embodiments, incorporation of the nucleotide removes the quencher from the nucleotide, thereby allowing detection of the fluorescent label. In other embodiments, the terminal phosphate-labeled chain terminator nucleotide may be labeled on the terminal phosphate with a mass tag, charge label, charge blocking label, chemiluminescent label, redox label, or other detectable label.

In some embodiments, a single reaction cycle may be performed using all four nucleotides (i.e., the nucleotides corresponding to G, A, T and C), where each nucleotide is labeled with a different fluorophore. In such embodiments, the sequencing step may comprise adding four chain terminators corresponding to G, A, T and C to the amplified polynucleotide, wherein the four chain terminators comprise different fluorophores. In such embodiments, the identifying step can include identifying which of the four chain terminators is added to the end of the primer.

The sequencing step may be performed using single-ended sequencing, i.e. reading the first primer extension sequence and the second primer extension sequence in the same direction. In some embodiments, a single-ended enabled genome analyzer is used to sequence a polynucleotide. In some embodiments, the method comprises continuously monitoring the sequencing reaction (i.e., base incorporation) in real time. This can be achieved simply by including a "detector enzyme" in the chain extension reaction mixture and performing chain extension and detection or signal generation, reaction simultaneously. In some embodiments, a separate chain extension reaction is first performed as a first reaction step, followed by a separate "detection" reaction in which the primer extension product is subsequently detected.

Sequencing data analysis

Genomic rearrangements can be identified based on data generated from sequencing of the first primer extension and the second primer extension. The methods of the invention include identifying genomic rearrangements in the polynucleotide based on data generated from the sequencing of the first primer extension and the second primer extension. Sequencing data from the extension of the first primer provides the base pair sequence of the target sequence. Sequencing data from the extension of the second primer provides the base pair sequence of the adapter, which can be used to indicate or confirm the presence of the target sequence, as the adapter is designed to specifically hybridize to the target sequence in the polynucleotide sample. The combined data provided by the two primer extensions provides positional information for determining any genomic rearrangements in the polynucleotide.

The data generated from the first and second primer extensions is compared to a reference sample. Any difference between the reference sample and the data generated from the first and second primer extensions indicates that genomic rearrangements may be present in the sample under investigation. The sequence of the reference sample and the sequences generated by the first primer extension and the second primer extension relative to the reference sample can be used to identify the type and location of any genomic rearrangement.

The methods, compositions, and kits of the invention can be used to detect any sequence of interest, including sequences associated with common deletion syndromes.

Example 1

FIG. 1 illustrates a method of preparing polynucleotides for sequencing by attaching adapters and barcodes to the polynucleotides, as well as adapter-bearing polynucleotides and adapter-bearing polynucleotide amplicons generated by the present techniques. According to one embodiment of the invention, the adapter-bearing polynucleotides may be used to detect fusion events using selective gene amplification. In FIG. 1, the adapter-bearing polynucleotide 102 comprises a nucleic acid of interest, in this case a junction point of a fusion gene. The adapter-bearing polynucleotide 102 comprises a first gene 104 and a second gene 106. The adapter-bearing polynucleotide 102 further comprises adapters 108, 110 at each end. Adapters may be attached by any suitable procedure, such as by ligation. At least one of the adapters comprises an adapter barcode 112, which may be a molecular barcode or a sample barcode.

In period a, adapter-bearing polynucleotides are prepared for target-specific amplification. The adapter-bearing polynucleotide may be denatured to provide a single-stranded polynucleotide, or a double-stranded polynucleotide may be provided for amplification. In some embodiments, the adapter-bearing polynucleotides are amplified in a non-specific manner (e.g., by amplifying the adapter-bearing polynucleotides with primers that are complementary to priming sites on adapters attached to library members of the adapter-bearing polynucleotides.

The adapter-bearing polynucleotide is prepared and contacted with a first amplification primer 114 comprising a target-specific primer 116. The target-specific primer 116 is complementary to a sequence known or suspected to be present in the adapter-bearing polynucleotide (e.g., a sequence within the second gene 106). The first amplification primer 114 also contains a target-specific barcode 118 that is specific for a portion of a gene or other target known or suspected to be present in the sample being analyzed or the polynucleotide of interest. In this context, gene-specific does not mean that it is complementary to a gene, but that the barcode is specifically associated with a gene, so detection of the sequence of the gene-specific barcode reliably indicates that the associated sequence is present.

In time period B, the adapter-bearing polynucleotides are amplified in the presence of the first amplification primer 114 and the second amplification primer 120 to generate a library of adapter-bearing polynucleotide amplicons. The adapter-bearing polynucleotide amplicon comprises a nucleic acid of interest, an adapter or its complement, and a gene-specific barcode or its complement. For ease of illustration, FIG. 1 shows a set of first amplification primers 114 and second amplification primers 120, although amplification reactions can employ a large number of target-specific primers for various sequences and can generate an amplicon of a large number of nucleic acids of interest. In some embodiments, the polynucleotide with the adapter is enriched from the pool of polynucleotides, for example where the tag comprises biotin or another binding partner.

In some embodiments (which may be in addition to or instead of enrichment), polynucleotides (including adapter-bearing polynucleotides) may be amplified with either external or internal primers or nested primers. In such embodiments, the outer primer or primers used in earlier amplification rounds are target-specific primers, which do not necessarily include a target-specific barcode. The inner primers or primers for subsequent rounds of amplification are also target-specific primers and comprise target-specific barcodes. In general, nested PCR refers to one or more subsequent PCR amplifications using one or more new primers that bind internally by at least one base pair to the primers used in the earlier round. Nested PCR reduces the number of unwanted amplification targets by amplifying only the amplification product from the previous one with the correct internal sequence in the subsequent reaction. Nested PCR usually requires the design of primers that are completely inside the previous outer primer binding site.

The polynucleotide amplicons with the adapters can then be sequenced. In some embodiments, a first sequencing primer 122 complementary to a first priming site 124 of the adaptor 108 is used for first primer extension to sequence at least the first gene 104. Labeled nucleotides are added to the primers in a sequencing reaction and a first extension sequence 126 is generated that is complementary to the adapter-bearing polynucleotide amplicon, thereby providing sequence information about the adapter-bearing polynucleotide. First primer extension occurs at a first position of the adapter-bearing polynucleotide amplicon. The first priming site 124 may typically be 5 'or 3' to the adapter barcode, depending on whether it is desired to sequence the adapter barcode together with the first gene 104 or separately from the first gene 104. A second sequencing primer 128 is used for second primer extension to sequence at least the gene-specific barcode 118. The second sequencing primer 128 is complementary to a portion 130 of the first amplification primer 114, the portion 130 being 3 'of the gene-specific barcode 118 and 5' of the target-specific sequence 116. Labeled nucleotides are added to the primers in a sequencing reaction and a second extension sequence 132 complementary to the gene-specific barcode is generated, providing sequence information about the gene-specific barcode. As mentioned above, ordinal numbers first and second do not imply the use of a first primer before a second primer; rather, they are used to distinguish different primers from each other.

At time period C, data from the sequencing reaction was processed and interpreted. In some embodiments, first extended sequence 126 is determined as a sequence of a first gene and second extended sequence 132 is determined as a sequence of a gene-specific barcode associated with a second gene. Based on these determinations, the data is interpreted to indicate the presence of the fusion gene in the nucleic acid of interest. The fusion gene comprises portions of the first gene and the second gene, and its presence can be determined even without directly sequencing the second gene 106 itself.

22页详细技术资料下载
上一篇:一种医用注射器针头装配设备
下一篇:检测突变基因的方法

网友询问留言

已有0条留言

还没有人留言评论。精彩留言会获得点赞!

精彩留言,会给你点赞!