Nucleic acid amplification and identification method

文档序号:395373 发布日期:2021-12-14 浏览:72次 中文

阅读说明:本技术 核酸扩增和识别方法 (Nucleic acid amplification and identification method ) 是由 Y·戈佩尔 P·莫尔 T·睿达 A·塞茨 于 2019-12-13 设计创作,主要内容包括:本发明提供了一种生成核酸模板的标记的扩增片段的方法,包括提供所述模板核酸;使至少一种寡核苷酸引物与所述模板核酸退火;以模板特异性方式延伸至少一种寡核苷酸引物从而产生延伸产物,其中当延伸产物到达模板核酸的5’端或达到与延伸产物下游的模板核酸退火的延伸终止子时,所述延伸反应停止;提供衔接子核酸,其5’端包含识别序列,所述识别序列在与延伸终止子接触时不与延伸终止子杂交;将衔接子核酸的5’端连接到延伸产物的3’端,从而产生标记的扩增片段。(The present invention provides a method of generating a labelled amplified fragment of a nucleic acid template comprising providing said template nucleic acid; annealing at least one oligonucleotide primer to the template nucleic acid; extending at least one oligonucleotide primer in a template-specific manner to produce an extension product, wherein the extension reaction stops when the extension product reaches the 5' end of the template nucleic acid or reaches an extension terminator that anneals to the template nucleic acid downstream of the extension product; providing an adaptor nucleic acid comprising at its 5' end a recognition sequence that does not hybridize to the extension terminator when contacted with the extension terminator; the 5 'end of the adaptor nucleic acid is ligated to the 3' end of the extension product, thereby generating labeled amplified fragments.)

1. A method of generating a labeled amplified fragment of a nucleic acid template comprising the steps of:

providing the template nucleic acid, providing a nucleic acid sample,

annealing at least one oligonucleotide primer to said template nucleic acid,

extending the at least one oligonucleotide primer in a template-specific manner to produce an extension product, wherein the extension reaction is terminated when the extension product reaches the 5' end of the template nucleic acid or reaches a nucleic acid extension terminator that anneals to the template nucleic acid downstream of the extension product,

providing an adaptor nucleic acid comprising at its 5' end a recognition sequence that does not hybridize to the extension terminator or the template,

the 5 'end of the adaptor nucleic acid is ligated to the 3' end of the extension product, thereby generating labeled amplified fragments.

2. The method of claim 1, wherein the nucleotide polymerase is allowed to add non-templated nucleotides to the extension products when the extension products reach the 5' end of the template nucleic acid, preferably by terminal transferase activity of the polymerase, and/or preferably between 1 and 15 non-templated nucleotides in at least 70% of the extension products.

3. The method of claim 1 or 2, wherein a plurality of adapter nucleic acids are provided and used in the ligating step, wherein the plurality of adapters have different recognition sequences, preferably at least 10, more preferably at least 50 adapter nucleic acids having different recognition sequences are provided and used in the ligating step.

4. A method according to any one of claims 1 to 3 wherein the recognition sequence is a random sequence.

5. The method of any one of claims 1 to 4, wherein the extension terminator has primer activity and is also extended in the extension step, preferably at least 9, more preferably at least 49 extension terminators having different annealing sequences to the template are used, thereby potentially annealing to different sites of the template nucleic acid.

6. The method of claim 5, wherein the annealing sequence is a random sequence.

7. The method of any one of claims 1 to 6, wherein the adapter nucleic acid binds, hybridizes or does not bind or does not hybridize to the extension terminator, preferably the recognition sequence is independent of an annealing sequence on the extension terminator that anneals the extension terminator to the template when the adapter nucleic acid binds or hybridizes to the extension terminator.

8. The method of any one of claims 1 to 7, wherein the template is RNA, preferably wherein the extension reaction is performed with reverse transcriptase.

9. The method of any one of claims 1 to 8, wherein the oligonucleotide primers and preferably also the extension terminators comprise universal amplification sequences and/or wherein the adaptor nucleic acids comprise universal adaptor amplification sequences.

10. The method according to any one of claims 1 to 9, wherein the oligonucleotide primer comprises an annealing sequence to the template, said annealing sequence comprising an oligo (T) sequence which anneals to an oligo (A) sequence in the template, preferably said oligo (T) sequence comprises one or more 3' anchor nucleotides which are different from the oligo (A) sequence.

11. The method of any one of claims 1 to 10, wherein the ligation reaction is carried out in the presence of a crowding agent, preferably a polymer or a complex comprising a polymer, such as a polyalkyl glycol, preferably PEG, octoxynol or Triton X, or a polysorbate, preferably tween; and/or wherein the extension terminator and preferably also the oligonucleotide primer comprise one or more modified nucleotides that increase the melting temperature of the annealing sequence that anneals to the template.

12. The method of any one of claims 1 to 11, wherein at least 1, preferably at least 9, extension terminators have primer activity and are also extended during the extension step and at least 2, preferably at least 10 adaptor nucleic acids comprising different recognition sequences are used, thereby generating at least 2, preferably at least 10, differently labeled amplified fragments, which are optionally amplified, further comprising assembling the sequences of the unique amplified fragments, wherein the labels are used for identifying the unique amplified fragments.

13. A kit for performing the method of any one of claims 1 to 12, comprising

At least one oligonucleotide primer capable of hybridizing to the template nucleic acid and priming an extension reaction at its 3' end,

one or more extension terminators capable of hybridizing to the template nucleic acid, preferably capable of initiating an extension reaction at its 3' end,

one or more adapter nucleic acids comprising at their 5' end a recognition sequence which does not hybridize to an extension terminator, preferably the adapter nucleic acid binds to, hybridizes to or does not bind to or hybridize to an extension terminator,

reverse transcriptase, and oligonucleotide ligase.

14. The kit of claim 13, comprising at least 10, preferably at least 50 adaptor nucleic acids with different recognition sequences.

15. The kit of claim 13 or 14, wherein at least one oligonucleotide primer comprises an annealing sequence that anneals to the template, said annealing sequence comprising an oligo (t) sequence that anneals to an oligo (a) sequence in the template, preferably said oligo (t) sequence comprises one or more 3' anchor nucleotides different from the oligo (t) sequence.

Background

US 2010/0273219 a1 describes a multi-primer amplification method for barcode recognition (barcoding) target nucleic acids.

WO 2012/134884A 1 describes barcode recognition of template nucleic acids in multiplex amplification reactions.

WO 2013/038010a2 describes a method of generating an amplified nucleic acid portion of a template nucleic acid using oligonucleotide primers and terminators to prevent strand displacement and read-through by the polymerase used to generate the nucleic acid portion for sequencing. This method will eliminate bias in the nucleic acid amplification process.

WO 2014/071361 a1 describes a method of preparing double-barcoded nucleic acids using barcoded adaptor nucleic acids.

US 2014/0274729 a1 describes a method of generating a cDNA library using a DNA polymerase with strand displacement activity.

EP 3119886B 1 describes a quantitative method for producing a nucleic acid product from a template RNA.

US 2018/163201 a1 relates to a reverse transcription method in which a C tail is added to the 3' end of the cDNA strand.

WO 2016/138500A 1 describes a method for barcode recognition of nucleic acids for sequencing. Random (Stochastic) barcodes are used as molecular labels.

Molecular tags or Unique Molecular Identifiers (UMIs), also known as molecular barcodes, have been developed to identify PCR replicons to reduce sequence-specific PCR bias and detect rare mutations. The unique molecular identifier is appended to the RNA molecules prior to any PCR amplification of the sequencing library preparation, establishing a unique identity for each input molecule. This makes it possible to eliminate the effect of subsequent PCR amplification bias, which is particularly important in cases where multiple cycles of PCR are required, for example, when generating sequencing libraries from low template input in single cell studies. After PCR, molecules sharing the same sequence and the same UMI are assumed to be derived from the same copy of the same input molecule (Sena et al, Scientific Reports (2018)8: 13121).

Summary of The Invention

It is an object of the present invention to provide an improved method for generating sequence fragments of a template nucleic acid which facilitates the assignment and assembly of these sequence fragments into a combined sequence (joined sequence) corresponding to the sequence of the template nucleic acid. The desired improvements also reduce sequence bias during fragment generation and increase sequence fragment coverage over the entire template length to increase confidence in the generated merged sequence.

Accordingly, the present invention provides a method of generating a tagged amplified fragment of a nucleic acid template, comprising the steps of: providing said template nucleic acid, annealing at least one oligonucleotide primer to said template nucleic acid; extending at least one oligonucleotide primer in a template-specific manner, thereby generating extension products, wherein the extension reaction stops when the extension products reach the 5' end of the template nucleic acid or reach a nucleic acid extension terminator that anneals to the template nucleic acid downstream of the extension products; providing an adaptor nucleic acid comprising at its 5' end a recognition sequence which does not hybridise to the extension terminator when contacted with the extension terminator and preferably also does not hybridise to the template; ligating the 5 'end of the adaptor nucleic acid to the 3' end of the extension product, thereby generating a labeled amplified fragment.

The invention also provides a method for generating the labeled amplification fragment of the nucleic acid template, which comprises the following steps: providing the template nucleic acid; annealing at least one oligonucleotide primer to the template nucleic acid; elongating the at least one oligonucleotide primer in a template-specific manner to produce an extension product; providing an adaptor nucleic acid comprising a recognition sequence, wherein the recognition sequence does not hybridize to a template; the adaptor nucleic acid, preferably its 5 'end, is ligated to the 3' end of the extension product, thereby generating labeled amplified fragments.

The invention further provides kits suitable for carrying out the methods. The kit of the present invention may comprise at least one oligonucleotide primer capable of hybridizing to a template nucleic acid and initiating an extension reaction at its 3' end; one or more extension terminators capable of hybridizing to the template nucleic acid (preferably capable of initiating an extension reaction at its 3' end); one or more adapter nucleic acids comprising a recognition sequence at their 5' end, wherein the recognition sequence does not hybridize to an extension terminator, preferably the adapter nucleic acid binds, hybridizes to, or does not bind to an extension terminator; reverse transcriptase, and oligonucleotide ligase. The different components of the kit may be provided in different containers (e.g., vials).

The following detailed disclosure is directed to all aspects of the invention, including methods and kits, and embodiments. That is, the description of the method is applicable to the kit. Any of the components described in the methods may be part of a kit. The components of the kit may be used in the methods of the invention.

The attached drawings are as follows:

FIG. 1: schematic representation of UMI-linker tagged short cDNA libraries was created using primers with SDS properties and linker oligonucleotides containing partially complementary UMI inside the RNA body.

a) The universal strand displacement termination primer Pn hybridizes to the RNA transcript, with primer Pn +1 hybridizing to a more upstream (5') position of the template RNA than primer Pn. When the reverse transcriptase extends Pn to reach primer Pn +1, the polymerase reaction will be terminated by the strand displacement termination technique described in WO 2013/038010A 2. An adapter oligonucleotide comprising UMI, which also comprises L2 complementary to L1, hybridizes to primers Pn and Pn + 1. b) During ligation, the extension product is now ligated to the UMI preceding the L2 chain of the linker. Again in this way, a cDNA library was created with two linker sequences (L1, L2) at the ends and containing unique molecular identifiers. c) Finally, PCR was performed to amplify these libraries.

FIG. 2: generation of UMI-containing libraries

FIG. 2a) shows a library generated by the SDS + ligation method. Ligation of the UMI-containing partially complementary L2 adaptors (see fig. 1) can be performed using single-stranded (ss) ligase or double-stranded (ds) ligase (lanes 2, 3). Omission of ligase did not generate a library (lane 1). After ligation, the cDNA fragments containing the L1 and L2 linkers were amplified by PCR and analyzed. Gel photographs of HS DNA Assay run on a Bioanalyzer (Agilent Technologies, Inc.). b) Schematic representation of the generation of UMI-containing libraries using SDS + ligation methods along with non-hybridizing initiator and adaptor oligonucleotides. In this example, the adaptor oligonucleotide L2' does not contain a sequence complementary to the extension initiator Pn. c) Gel photographs and electrophoretograms of the replicon libraries generated using non-hybridizing extension initiators and UMI-containing adaptor oligonucleotides (SEQ ID No. 10). Photographs were from HS DNA Assay run on a Bioanalyzer (Agilent Technologies, Inc.).

FIG. 3: improved coverage of the 5 'end of the transcript is achieved by ligation of an L2 linker to the cDNA at the 5' end of the RNA template.

a) Schematic representation of Reverse Transcription (RT) reaction at the 5' end of the transcript. Without SDS by the downstream primer Pn +1, the terminal deoxynucleotidyl transferase activity (TdT) of RT adds non-templated nucleotides to the 3' end of the cDNA, creating an overhang. b) Non-templated nucleotides (nt) may serve as hybridization sites for primer Pn +1 containing L1. Ligation of the UMI-L2 linker may occur in double strands together with partially hybridized L2. c) Alternatively, the UMI-L2 linker may be attached as a single strand without primer priming. d) The library generated as shown in FIGS. 3a-c) was sequenced on Illumina NextSeq500 (single read, 75 bp). The 5' end plotted reads of ERCC-0130 are shown (see SIRV set 3, Lexogen catalog # 051.0N). Reads were analyzed without pruning for added and mismatched bases. Nucleotides labeled in gray correspond to the ERCC-0130 marker (mutation), nucleotides shown in black are from non-templated additions by TdT activity of RT. Thirty representative sequences of reads from the 5' end of ERCC-0130 are shown below. The reading sequence is SEQ ID NO 12-42 from top to bottom. e) The SDS/ligation method improves 5' coverage compared to conventional protocols. Use the usualThe specification (A) to (B)UltraTM II directional RNA Library Prep Kit forNew England Biolabs, Cat No. E7760S) or SDS/ligation method and sequenced on Illumina NextSeq500 (double-ended reads, 150 bp). The read values mapped to ERCC-0130 are superimposed and compared to the expected coverage value shown as a rectangle, left: routine RNA library preparation protocol, right: SDS/coverage obtained by the new technique of ligation.

FIG. 4: schematic representation of reactions using the SDS/ligation method and combination of universal primers (Pn) and oligo-dT primers (PdT) for improved 3' end coverage.

a) The universal primer Pn hybridizes to an RNA template inside the RNA body. In addition, the existing oligo-dT primer (PdT) hybridizes to the poly A (poly (A)) tail at the 3' end of the polyadenylated transcript. RT will extend PdT until the downstream primer Pn is reached and strand displacement is stopped. b) During ligation, a UMI-containing L2 linker was ligated to the cDNA fragment spanning the 3 'end, generating a UMI-containing cDNA library covering the 3' end of the transcript, linking L1 and L2. c) Genome coverage mapping showed an increase in coverage of the 3' end of the transcript throughout the transcriptome. The library was prepared using the SDS + ligation protocol and a mixture of random primers and oligo-dT first strand synthesis primers (as described in example 3). The library was sequenced on the NextSeq500 machine, and the genome coverage of the entire transcriptome was mapped and compared to the SDS + ligation protocol described previously. d) Exemplary coverage of endogenous housekeeping genes (HSP90) by conventional library preparation (upper panel) and SDS + ligation protocol (lower panel) with oligo-dT titration, resulting in improved 3' end coverage.

FIG. 5: overall improvement in 5 'and 3' coverage of transcripts. The transcription start site (i.e., the true 5 'end of the transcript) and the transcript termination site (i.e., the true 3' end of the transcript) were resolved using the SDS + ligation protocol, but were not resolved using the two exemplary conventional library preparation methods. Use the graph3a-c) the library generated by the SDS + ligation protocol was sequenced on an Illumina NextSeq500 (paired ends, 150 bp). Conventional libraries were prepared as described in TruSeq Stranded Total RNA Library Prep Human/Mouse/Rat, Illumina Cat No. 20020596 or 20020597 (conventional 1) or with TruSeq Stranded Total RNA LibraryUltraTM II directional RNA Library Prep Kit forNew England Biolabs, catalog No. E7760S were prepared as described (conventional 2). a) The read values mapped to the true 5 'and 3' ends of the measured ERCC are shown (see SIRV set 3, Lexogen catalog number 051.0N). Reads were mapped to ERCC internal reference RNAs of known sequence. Normalized coverage of the cumulative mapping reading for all measured ERCCs is plotted as the absolute position of nucleotides relative to the Transcription Start Site (TSS) and transcription termination site (TES), marked by dashed lines. b) The extended 5' coverage reveals the full TSS. The upper diagram: a gapdh overlay visible from the intron in the thumbnail was generated using the SDS + ligation protocol or conventional library preparation as described above. b) Reads mapped to gapdh were analyzed without pruning for added and mismatched bases. The reading sequence is SEQ ID No. 43-67 from top to bottom. Nucleotides marked in black correspond to the gapdh label, nucleotides shown in grey are mismatches or non-templated additions due to the TdT activity of RT. The cluster of start sites generated by stacking reads at the 5' end of the transcript can be used to relabel the TSS. Annotated and manually determined TSS are indicated by arrows at the annotated consensus sequence shown in bold black.

Example (b):

example 1: ligation of Unique Molecular Identifiers (UMIs) to first strand cDNA fragments.

Libraries were prepared from universal human reference RNA (Agilent Technologies, cat #740000) containing SIRV Set 3 internal reference control mix (Lexogen, cat #051.0N) as described.

After cDNA synthesis, a downstream primer (Pn +1(L2)) comprising a unique molecular identifier of 2 to 24 nucleotides, preferably 6 to 12 nucleotides in length, can be ligated to the newly transcribed cDNA strand (located in the hybrid with the template RNA). Reverse transcription was performed using the oligonucleotides, templates and conditions described in WO 2013/038010a 2. Various ligases and combinations thereof may be used to ligate the following oligonucleotides:

(phosphorylated) (5 '-NNNNNNAGATCGGAAGAGCACACGTCTGAACTCCAGTCAC-3' (3-inverted dT)),

(phosphorylated) (5 '-NNNNNNNNNNAGATCGGAAGAGCACACGTCTGAA-3' (3-inverted dT)),

(phosphorylated) (5 '-NNNNNNNNNNAGATCGGAAGAGCGTCGTGTAGGGAAAGAGTG-3' (3-inverted dT)),

(phosphorylated) (5 '-NNNNNNNNNNAGATCGGAAGAGCGTCGTGTAGG-3' (3-inverted dT)),

SEQ ID No. 5 (phosphorylated) (5 '-NNNNNNNNNNNAGATCGGAAGAGCGTCGTGTAGGGAAAGAGTG-3' (3 inverted dT)),

(phosphorylated) (5 '-NNNNNNNNNNNNAGATCGGAAGAGCGTCGTGTAGGGAAAGAGTG-3' (3-inverted dT)),

(phosphorylated) (5 '-NNNNNNNNNNNNAGATCGGAAGAGCGTCGTGTAGG-3' (3-inverted dT)),

(phosphorylated) (5 '- + NNNNNNNNNNAGATCGGAAGAGCGTCGTGTAGG-3' (3-inverted dT)),

SEQ ID No. 9 (phosphorylated) (5 '- + NNNNNNNNNNNNAGATCGGAAGAGCGTCGTGTAGG-3' (3 inverted dT)).

After Reverse Transcription (RT), the samples were purified using Solid Phase Reversible Immobilization (SPRI) and magnetic purification Beads (AMPure Beads; Agentcourt) according to the manufacturer's instructions. RNA hybrids were eluted in 20. mu.l water or 10mM Tris, pH 8.0, and 17. mu.l of the supernatant was then transferred to a new PCR plate. Then, the ligation reaction was performed in 60. mu.l using 20% PEG-8000, 50mM Tris-HCl (pH 7.5, 25 ℃ C.), 10mM MgCl25mM DTT, 0.4mM ATP, 0.01% Triton-x100, 50. mu.g/ml BSA and 20 units of ligase (single strand specific ligase and/or double strand specific ligase). Unligated small fragments and remaining oligonucleotides were removed by SPRI purification. All remaining primary cDNA libraries were used with high assuranceTrue polymerase and the following procedure were amplified in the PCR reaction: 30 seconds at 98 ℃ followed by 10-25 PCR cycles of 10 seconds at 98 ℃, 20 seconds at 65 ℃ and 30 seconds at 72 ℃. The final extension was carried out at 72 ℃ for 60 seconds. FIG. 1b) shows the basic principle of ligating the extended cDNA to a UMI-containing adapter oligonucleotide (L2) with a complementary sequence to the strand displacement stop primer (L1) (L2).

The example in figure 2 shows that various ligases can perform ligation reactions of oligonucleotides containing UMI and thus generate cDNA fragments that comprise two PCR adaptors and can be amplified by PCR (figure 2a), lanes 2-3). In contrast, the control experiment omitting any ligase did not show library amplification, which underscores the specificity of the above reaction (FIG. 2a), lane 1).

Example 2: the library is generated using non-hybridizing extension initiators and adaptor oligonucleotides.

Libraries were prepared from universal human reference RNA (Agilent Technologies, cat #740000) containing SIRV Set 3 internal reference control mix (Lexogen, cat #051.0N) as described.

Reverse Transcription (RT) was performed as described in example 1. After RT, the samples were purified using Solid Phase Reversible Immobilization (SPRI) and magnetic purification Beads (AMPure Beads; Agentcourt) according to the manufacturer's instructions, and the purified cDNA: RNA hybrids were eluted in 20. mu.l 10mM Tris, pH 8.0, and 17. mu.l of the supernatant was transferred to a new PCR plate. Ligation was performed using the conditions described in example 1, but adapter oligonucleotides were provided that did not contain sequences complementary to the extension initiator used to prime the reverse transcription reaction. As such, the oligonucleotide adaptor cannot hybridize and thus is not recruited close to the 3' end of the newly generated extension product (FIG. 2 b)). The oligonucleotide is shown in SEQ ID No.10 (phosphorylated) (5 '-NNNNNNNNNNNNTGGAATTCTCGGGTGCCAAGG-3' (SpcC3)) and has no sequence complementarity with the extension initiator. Fragments containing both adaptor sequences were amplified after clean-up as described in example 1. FIG. 2c) shows a gel photograph and an electrophoretogram of library tracings of two replicon SDS + ligated libraries generated with non-hybridized extension initiators and adaptor oligonucleotides.

Example 3: improved 5' end coverage due to terminal transferase activity and ss ligation of the UMI-linker to the first strand cDNA fragment.

Libraries were prepared from universal human reference RNA (Agilent technology, cat #740000) containing SIRV Set 3 internal reference control mix (Lexogen, cat #051.0N) as described.

First strand cDNA synthesis terminates at the 5' end of the template RNA molecule. The terminal transferase activity of reverse transcriptase catalyzes the non-templated addition of nucleotides at the 3' end of the cDNA strand (FIG. 3 a).

Ligation of the UMI-adaptor oligonucleotides (e.g., SEQ ID 1-9) after reverse transcription can occur in double-stranded formation (FIG. 3b) as well as in single-stranded overhangs (FIG. 3 c). After SPRI purification and PCR amplification, the library was sequenced in single read (single read) or paired-end (paired-end) mode on NextSeq 500. Reads mapped to the 5' end of ERCC-0130 were analyzed without prior clipping of mismatched nucleotides. The read values covering the 5' end of ERCC-0130 are exemplarily shown in FIG. 3 d. The addition of terminal nucleotides and UMI ligation on the extended single strand results in improved 5' coverage. A comparison of conventional RNA-seq library preparation with the coverage of the present invention is shown in FIG. 3 e. Coverage can be seen as the superposition (superpositioning) of all aligned readings (traces shown in grey) and compared to the expected uniform coverage shown as a rectangle. Whereas in sequencing data from the conventional protocol both the 5' and 3' ends were covered less efficiently, evident in the slope towards either end (FIG. 3e, left), the new protocol generated more reads mapping to the most 5' end of the transcript (FIG. 3e, right).

Example 4: improvement of 3' end coverage by oligo-dT first strand synthesis primer titration.

The coverage of the 3 'end of the transcript can be varied, preferably increased, by using first strand primers comprising oligo-dT (Pn comprising L1) which are added to a mixture of random priming SDS oligonucleotides which have included a portion of T-rich priming sequences and T-only priming sequences (e.g., SEQ ID Nos: 115' -GTGACTGGAGTTCAGACGTGTGCTCTTCCGATCT + TTT TTT TTT TTT TTT TTT + V-3 ') according to a conventional distribution of random nucleotides to enhance the coverage of the 3' end. The variation in the sequencing depth of the 3' end site can be predicted based on the selection ratio between the random primer and the poly-dTL1 primer (FIG. 4). The ratio of random SDS primers to specific oligo dT primers, as well as primer length and LNA content, may vary and will determine the amount of over representation of the 3' end.

The library was prepared by SDS + ligation using random priming displacement of the stop primers alone or in admixture with various amounts of oligo-dT first strand primers (SEQ ID No: 11). The resulting individual libraries were sequenced on NextSeq500, the data analyzed, and genome overlays for the full transcriptome were generated from the mapped reads using geneBody _ coverage python script available from rseqc (fig. 4 c). 3' end coverage can be significantly increased by adding oligo-dT primer during reverse transcription.

In addition, gene coverage of endogenous genes was exemplarily observed using a custom script (custom script) to evaluate coverage of individual genes. FIG. 4d shows coverage of housekeeping gene HSP90 obtained using a conventional RNA library preparation protocol (upper panel), which is far from representative at the 5 'and 3' ends. In contrast, the SDS ligation protocol via oligo-dT titration showed improved 5 'and 3' coverage (lower panel).

Example 5: the improvement in 5 'and 3' coverage helps to identify the true transcript initiation and termination sites.

SDS + ligation libraries were prepared from ribose-depleted universal human reference RNA (Agilent Technologies, cat #740000) containing SIRV Set 3 internal reference control cocktail (Lexogen, cat #051.0N) as described in examples 3 and 4. Ribosomal RNA removal was performed using RiboCopLexogen, Cat. No. 037.96) as described. As a comparison, two conventional library preparation methods used the same ribose-depleted universal human reference RNA: TruSeq Stranded TotalRNAibrary Prep Human/Mouse/Rat, Illumina Cat No. 20020596 or 20020597(═ conventional 1) or UltraTM II directionalRNALibrary Prep Kit forNew England Biolabs, Cat No. E7760S (═ conventional 2),following the manufacturer's instructions. The resulting library was sequenced on NextSeq500 and the data analyzed. A genomic overlay was generated for all detected ERCCs in SIRV Set 3. FIG. 5a) shows the normalized coverage across ERCC reads for cumulative mapping at absolute nucleotide positions relative to a known Transcription Start Site (TSS) and transcription termination site (TES), all indicated by dashed lines. Samples from SDS + ligation libraries showed significantly increased coverage at the 5 'and 3' ends compared to both conventional library preparation methods, which showed reduced coverage at the 3 'end and no discrimination of the true 5' end.

In addition, the gene coverage of the endogenous housekeeping gene gapdh was exemplarily observed using custom scripts to evaluate the coverage for each individual gene. Figure 5b) shows a gapdh overlay visible by an intron in a thumbnail version. Reads mapped to gapdh (SEQ ID Nos. 43-67) were analyzed without pruning for added and mismatched bases. Nucleotides that match the consensus sequence (top most row) are marked in black, with nucleotides that deviate from the annotated consensus sequence or that result from non-templated addition marked in grey. Based on the stacking of reads observed for the SDS + ligation library preparation samples, the true transcription start site can be determined and the transcript of interest re-annotated. In the example shown in FIG. 5b), the TSS is manually adjusted to position-15 (the +1 position relative to the label). Likewise, the true transcription initiation and termination sites of other transcripts of interest can be re-evaluated, allowing for a comprehensive analysis of the entire transcript in a high throughput NGS assay, including the resolution of individual nucleotides at the true TSS. This can be achieved simply by using the SDS + ligation library preparation method without the need to use specialized more complex methods such as 5 'capture sequencing technology (CAGE-Seq) or low throughput methods such as 5' RACE (rapid amplification of cDNA ends).

Sequence listing

<110> lexon Olympic Limited (LEXOGEN GMBH)

<120> nucleic acid amplification and identification method

<130> R 75980

<150> EP18212743

<151> 2018-12-14

<160> 67

<170> BiSSAP 1.3

<210> 1

<211> 41

<212> DNA

<213> Artificial sequence

<220>

<223> oligonucleotide

<220>

<221> modified_base

<222> 1

<223 >/mod _ base = "others"

Note = "5' phosphorylated a or g or c or t"

<220>

<221> misc_difference

<222> 2,3,4,5,6

<223 >/Note = "a or g or c or t"

<220>

<221> modified_base

<222> 41

<223 >/mod _ base = "others"

Note = "3' reverse dT (reverse connection)"

<400> 1

nnnnnnagat cggaagagca cacgtctgaa ctccagtcac n 41

<210> 2

<211> 35

<212> DNA

<213> Artificial sequence

<220>

<223> oligonucleotide

<220>

<221> modified_base

<222> 1

<223 >/mod _ base = "others"

Note = "5' phosphorylated a or g or c or t"

<220>

<221> misc_difference

<222> 2,3,4,5,6,7,8,9,10

<223 >/Note = "a or g or c or t"

<220>

<221> modified_base

<222> 35

<223 >/mod _ base = "others"

Note = "3' reverse dT (reverse connection)"

<400> 2

nnnnnnnnnn agatcggaag agcacacgtc tgaan 35

<210> 3

<211> 43

<212> DNA

<213> Artificial sequence

<220>

<223> oligonucleotide

<220>

<221> modified_base

<222> 1

<223 >/mod _ base = "others"

Note = "5' phosphorylated a or g or c or t"

<220>

<221> misc_difference

<222> 2,3,4,5,6,7,8,9,10

<223 >/Note = "a or g or c or t"

<220>

<221> modified_base

<222> 43

<223 >/mod _ base = "others"

Note = "3' reverse dT (reverse connection)"

<400> 3

nnnnnnnnnn agatcggaag agcgtcgtgt agggaaagag tgn 43

<210> 4

<211> 34

<212> DNA

<213> Artificial sequence

<220>

<223> oligonucleotide

<220>

<221> modified_base

<222> 1

<223 >/mod _ base = "others"

Note = "5' phosphorylated a or g or c or t"

<220>

<221> misc_difference

<222> 2,3,4,5,6,7,8,9,10

<223 >/Note = "a or g or c or t"

<220>

<221> modified_base

<222> 34

<223 >/mod _ base = "others"

Note = "3' reverse dT (reverse connection)"

<400> 4

nnnnnnnnnn agatcggaag agcgtcgtgt aggn 34

<210> 5

<211> 44

<212> DNA

<213> Artificial sequence

<220>

<223> oligonucleotide

<220>

<221> modified_base

<222> 1

<223 >/mod _ base = "others"

Note = "5' phosphorylated a or g or c or t"

<220>

<221> misc_difference

<222> 2,3,4,5,6,7,8,9,10,11

<223 >/Note = "a or g or c or t"

<220>

<221> modified_base

<222> 44

<223 >/mod _ base = "others"

Note = "3' reverse dT (reverse connection)"

<400> 5

nnnnnnnnnn nagatcggaa gagcgtcgtg tagggaaaga gtgn 44

<210> 6

<211> 45

<212> DNA

<213> Artificial sequence

<220>

<223> oligonucleotide

<220>

<221> modified_base

<222> 1

<223 >/mod _ base = "others"

Note = "5' phosphorylated a or g or c or t"

<220>

<221> misc_difference

<222> 2,3,4,5,6,7,8,9,10,11,12

<223 >/Note = "a or g or c or t"

<220>

<221> modified_base

<222> 45

<223 >/mod _ base = "others"

Note = "3' reverse dT (reverse connection)"

<400> 6

nnnnnnnnnn nnagatcgga agagcgtcgt gtagggaaag agtgn 45

<210> 7

<211> 36

<212> DNA

<213> Artificial sequence

<220>

<223> oligonucleotide

<220>

<221> modified_base

<222> 1

<223 >/mod _ base = "others"

Note = "5' phosphorylated a or g or c or t"

<220>

<221> misc_difference

<222> 2,3,4,5,6,7,8,9,10,11,12

<223 >/Note = "a or g or c or t"

<220>

<221> modified_base

<222> 36

<223 >/mod _ base = "others"

Note = "3' reverse dT (reverse connection)"

<400> 7

nnnnnnnnnn nnagatcgga agagcgtcgt gtaggn 36

<210> 8

<211> 34

<212> DNA

<213> Artificial sequence

<220>

<223> oligonucleotide

<220>

<221> modified_base

<222> 1

<223 >/mod _ base = "others"

Note = "5' phosphorylated a or g or c or t"

<220>

<221> misc_difference

<222> 2,3,4,5,6,7,8,9,10

<223 >/Note = "a or g or c or t"

<220>

<221> modified_base

<222> 34

<223 >/mod _ base = "others"

Note = "3' reverse dT (reverse connection)"

<400> 8

nnnnnnnnnn agatcggaag agcgtcgtgt aggn 34

<210> 9

<211> 36

<212> DNA

<213> Artificial sequence

<220>

<223> oligonucleotide

<220>

<221> modified_base

<222> 1

<223 >/mod _ base = "others"

Note = "5' phosphorylated a or g or c or t"

<220>

<221> misc_difference

<222> 2,3,4,5,6,7,8,9,10,11,12

<223 >/Note = "a or g or c or t"

<220>

<221> modified_base

<222> 36

<223 >/mod _ base = "others"

Note = "3' reverse dT (reverse connection)"

<400> 9

nnnnnnnnnn nnagatcgga agagcgtcgt gtaggn 36

<210> 10

<211> 33

<212> DNA

<213> Artificial sequence

<220>

<223> oligonucleotide

<220>

<221> modified_base

<222> 1

<223 >/mod _ base = "others"

Note = "5' phosphorylated a or g or c or t"

<220>

<221> misc_difference

<222> 2,3,4,5,6,7,8,9,10,11,12

<223 >/Note = "a or g or c or t"

<220>

<221> modified_base

<222> 33

<223 >/mod _ base = "others"

Note = "g, 3' spacer arm C3"

<400> 10

nnnnnnnnnn nntggaattc tcgggtgcca agn 33

<210> 11

<211> 53

<212> DNA

<213> Artificial sequence

<220>

<223> oligonucleotide

<400> 11

gtgactggag ttcagacgtg tgctcttccg atcttttttt tttttttttt ttv 53

<210> 12

<211> 75

<212> DNA

<213> Artificial sequence

<220>

<223> sequencing reads

<400> 12

cgatttctaa agggaattcg agctcgcatt ttgaaaattc tatggaagag ctagcatctc 60

tgacgaaaac agcag 75

<210> 13

<211> 68

<212> DNA

<213> Artificial sequence

<220>

<223> sequencing reads

<400> 13

cctttgggga attcgagctc gcattttgaa aattctatgg aagagctagc atctctgacg 60

aaaaccag 68

<210> 14

<211> 68

<212> DNA

<213> Artificial sequence

<220>

<223> sequencing reads

<400> 14

caaaacggga attcgagctc gcattttgaa aattctatgg aagagctagc atctctgacg 60

aaaacaac 68

<210> 15

<211> 66

<212> DNA

<213> Artificial sequence

<220>

<223> sequencing reads

<400> 15

agtggtggga attcgagctc gcattttgaa aattctatgg aagagctagc atctctgacg 60

aaatgc 66

<210> 16

<211> 70

<212> DNA

<213> Artificial sequence

<220>

<223> sequencing reads

<400> 16

caaaatggga attcgagctc gcattttgaa aattctatgg aagagctagc atctctgacg 60

aaaacagcgt 70

<210> 17

<211> 65

<212> DNA

<213> Artificial sequence

<220>

<223> sequencing reads

<400> 17

tcggacggga attcgagctc gcattttgaa aattctatgg aagagctagc atctcttacg 60

aaaac 65

<210> 18

<211> 66

<212> DNA

<213> Artificial sequence

<220>

<223> sequencing reads

<400> 18

ggggacggga attcgagctc gcattttgaa aattctatgg aagagctagc atctctgaca 60

aaaaca 66

<210> 19

<211> 73

<212> DNA

<213> Artificial sequence

<220>

<223> sequencing reads

<400> 19

cccgagggga attcgagctc gcattttgaa aattctatgg aagagctagc atctctgacg 60

aaaacggcag aca 73

<210> 20

<211> 71

<212> DNA

<213> Artificial sequence

<220>

<223> sequencing reads

<400> 20

aatacaggga attcgagctc gcattttgaa aattctatgg aagagctagc atctctgacg 60

aaaacagaga g 71

<210> 21

<211> 70

<212> DNA

<213> Artificial sequence

<220>

<223> sequencing reads

<400> 21

caaaatggga attcgagctc gcattttgaa aattctatgg aagagctagc atctctgacg 60

aaaacagcgt 70

<210> 22

<211> 74

<212> DNA

<213> Artificial sequence

<220>

<223> sequencing reads

<400> 22

atttggggaa ttcgagctcg cattttgaaa attctatgga agagctagca tctctgacga 60

aaacagcagg cgga 74

<210> 23

<211> 75

<212> DNA

<213> Artificial sequence

<220>

<223> sequencing reads

<400> 23

aatggggaat tcgagctcgc attttgaaaa ttctatggaa gagctagcat ctctgacgaa 60

aacagcaatc ggaaa 75

<210> 24

<211> 75

<212> DNA

<213> Artificial sequence

<220>

<223> sequencing reads

<400> 24

aaggggaatt cgagctcgca ttttgaaaat tctctggaag agctagcatc tctgacgaaa 60

acagcagaac agaaa 75

<210> 25

<211> 75

<212> DNA

<213> Artificial sequence

<220>

<223> sequencing reads

<400> 25

ggggaattcg agctcgcatt ttgaaaatac tatggaagag ctagcatctc tgacgaaaac 60

agcagacgaa aaagt 75

<210> 26

<211> 61

<212> DNA

<213> Artificial sequence

<220>

<223> sequencing reads

<400> 26

gggaattcga gctcgcattt tgaaaattct atggaagagc tagcatctct gactactaca 60

g 61

<210> 27

<211> 60

<212> DNA

<213> Artificial sequence

<220>

<223> sequencing reads

<400> 27

aagatctcgc attttgaaaa ttctatggaa gagctagcat ctctgacgaa aacagcagaa 60

<210> 28

<211> 74

<212> DNA

<213> Artificial sequence

<220>

<223> sequencing reads

<400> 28

cgagctcgca ttttgaaaat tctatggaag agctagcatc tctgacgaaa acagcagacg 60

gaaaaggaga gacc 74

<210> 29

<211> 75

<212> DNA

<213> Artificial sequence

<220>

<223> sequencing reads

<400> 29

cgagctcgca ttttgaaaat tctatggaag agctagcatc tctgacgaaa acagcagacg 60

gaaaagtact gacca 75

<210> 30

<211> 75

<212> DNA

<213> Artificial sequence

<220>

<223> sequencing reads

<400> 30

cgagctcgca ttttgaaaat tctatggaag agctagcacc tctgacgaaa acagcagacg 60

gaaaaggact gaaaa 75

<210> 31

<211> 75

<212> DNA

<213> Artificial sequence

<220>

<223> sequencing reads

<400> 31

cgagctcgca ttttgaaaat tctatggaag agctagcatc tctgacgaaa acagcagacg 60

gaaaagtact gagcc 75

<210> 32

<211> 75

<212> DNA

<213> Artificial sequence

<220>

<223> sequencing reads

<400> 32

cgagctcgca ttttgaaaat tctatggaag agctagcacc tctgacgaaa acagcagacg 60

gaaaaggact gaaaa 75

<210> 33

<211> 75

<212> DNA

<213> Artificial sequence

<220>

<223> sequencing reads

<400> 33

cgagctcgca ttttgaaaat tctatggaag agctagcatc tctgacgaaa acagcagacg 60

gaaaagtact gactc 75

<210> 34

<211> 75

<212> DNA

<213> Artificial sequence

<220>

<223> sequencing reads

<400> 34

cgagctcgca ttttgaaaat tctatggaag agctagcatc tctgacgaaa acagcagacg 60

gaaaagtact gacca 75

<210> 35

<211> 75

<212> DNA

<213> Artificial sequence

<220>

<223> sequencing reads

<400> 35

cgagctcgca ttttgaaaat tctatggaag agctagcatc tctgacgaaa acagcagacg 60

gaaaagtaca aaacc 75

<210> 36

<211> 75

<212> DNA

<213> Artificial sequence

<220>

<223> sequencing reads

<400> 36

gagctcgcat tttgaaaatt ctatggaaga gctagcatct ctgacgaaaa cagcagacgg 60

aaaagtagct gacca 75

<210> 37

<211> 75

<212> DNA

<213> Artificial sequence

<220>

<223> sequencing reads

<400> 37

agctcgcatt ttgaaaattc tatggaagag ctagcatctc tgacgaaaac agcagacgga 60

aaagtactga ccaga 75

<210> 38

<211> 75

<212> DNA

<213> Artificial sequence

<220>

<223> sequencing reads

<400> 38

gctcgcattt tgaaaattct atggaagagc tagcatctct gacgaaaaca gcagacggaa 60

aagtacagac ccaac 75

<210> 39

<211> 74

<212> DNA

<213> Artificial sequence

<220>

<223> sequencing reads

<400> 39

cgcattttga aaattctatg gaagagctag catctctgac gaaaacagca gacggaaaag 60

tactgaccag ctag 74

<210> 40

<211> 73

<212> DNA

<213> Artificial sequence

<220>

<223> sequencing reads

<400> 40

cgcattttga aaattctatg gaagagctag catctctgac gaaaacagca gacggaaaag 60

tactgaccat gca 73

<210> 41

<211> 74

<212> DNA

<213> Artificial sequence

<220>

<223> sequencing reads

<400> 41

cgcattttga aaattctatg gaagagctag catctctgac gaaaacagca gacggaaaag 60

tactgaccag ccac 74

<210> 42

<211> 73

<212> DNA

<213> Artificial sequence

<220>

<223> sequencing reads

<400> 42

cgcattttga aaattctatg gaagagctag catctctgac gaaaacagca gacggaaaag 60

tactgaccag cat 73

<210> 43

<211> 129

<212> DNA

<213> Artificial sequence

<220>

<223> consensus sequence

<400> 43

ataaattgag cccgcagcct cccgcttcgc tctctgctcc tcctgttcga cagtcagccg 60

catcttcttt tgcgtcgcca gccgagccac atcgctcaga caccatgggg aaggtgaagg 120

tcggagtca 129

<210> 44

<211> 104

<212> DNA

<213> Artificial sequence

<220>

<223> sequencing reads

<400> 44

acgtgtgctc gtcactacct ccccgggtgc tctctgctcc tcctgttcga cagtcagccg 60

catcttcttt tgcgtcgcca gccgagccac atcgctcaga cacc 104

<210> 45

<211> 129

<212> DNA

<213> Artificial sequence

<220>

<223> sequencing reads

<400> 45

gaaaattgag cccgcagcct cccgcttcgc tctctgctcc tcctgttcga cagtcagccg 60

catcttcttt tgcgtcgcca gccgagccac atcgctcaga caccatgggg aaggtgaagg 120

tcggagtca 129

<210> 46

<211> 122

<212> DNA

<213> Artificial sequence

<220>

<223> sequencing reads

<400> 46

aaaatgcatt agaggaactg taaaaatctg ctcctccgtt cgacagtcag ccgcatcttc 60

ttttgcgtcg ccagccgagc cacatcgctc agacaccatg gggaaggtga aggtcggagt 120

ca 122

<210> 47

<211> 122

<212> DNA

<213> Artificial sequence

<220>

<223> sequencing reads

<400> 47

ctagaggaga ttggccaacg agattcactg gactcctgtt cgacagtcag ccgcatcttc 60

ttttgcgtcg ccagccgagc cacatcgctg agacaccatg gggaaggtga aggtcggagt 120

ca 122

<210> 48

<211> 118

<212> DNA

<213> Artificial sequence

<220>

<223> sequencing reads

<400> 48

ttttctgaac gctctggccg ctctgctcct cctgttcgac agtcagccgc ctcttcgttt 60

gcgtcgccag ccgagccaca tagctcagac accaagggga aggtgaaggt cggagtca 118

<210> 49

<211> 114

<212> DNA

<213> Artificial sequence

<220>

<223> sequencing reads

<400> 49

accgagcacc agctttctcc gacgccggga agtcgcagtc agccgcatct tcttttgcgt 60

cgccagccga gccacatcgc tcagacacca tggggaaggt gaaggtcgga gtca 114

<210> 50

<211> 112

<212> DNA

<213> Artificial sequence

<220>

<223> sequencing reads

<400> 50

aacgtgtgct ggctctctgc tcctcctgtt cgacagtcag ccgcatcttc ttttgcgtcg 60

ccagccgagc cacatcgctc agacaccatg gggaaggtga aggtcggagt ca 112

<210> 51

<211> 110

<212> DNA

<213> Artificial sequence

<220>

<223> sequencing reads

<400> 51

ttgctttggg ctctctgctc ctcctgttcg acagtcagcc gcatcttctt ttgcgtcgcc 60

agccgagcca catcgctcag acaccatggg gaaggtgaag gtcggagtca 110

<210> 52

<211> 106

<212> DNA

<213> Artificial sequence

<220>

<223> sequencing reads

<400> 52

gatgggctct ctgctcctcc tgttcgacag tcagccgcat cttcttttgc gtcgccagcc 60

gagccacatc gctcagacac catggggaag gtgaaggtcg gagtca 106

<210> 53

<211> 106

<212> DNA

<213> Artificial sequence

<220>

<223> sequencing reads

<400> 53

atagggctct ctgctcctcc tgttcgacag tcagccgcat cttcttttgc gtcgccagcc 60

gagccacatc gctcagacac catggggaag gcgaaggtcg gagtca 106

<210> 54

<211> 106

<212> DNA

<213> Artificial sequence

<220>

<223> sequencing reads

<400> 54

atctggctct ctgctcctcc tgttcgacag tcagccgcat cttcttttgc gtcgccagcc 60

gagccacatc gctgagacac catggggaag gtgaaggtcg gagtca 106

<210> 55

<211> 106

<212> DNA

<213> Artificial sequence

<220>

<223> sequencing reads

<400> 55

acgtggctct ctgctcctcc tgttcgacag tcagccgcat cttcttttgc gtcgccagcc 60

gagccacatc gctcagacac catggggaag gtgaaggtcg gagtca 106

<210> 56

<211> 106

<212> DNA

<213> Artificial sequence

<220>

<223> sequencing reads

<400> 56

acgtggctct ctgctcctcc tgttcgacag tcagccgcat cttcttttgc gtcgccagcc 60

gagccacatc gctcagacac catggggaag gtgaaggtcg gagtca 106

<210> 57

<211> 97

<212> DNA

<213> Artificial sequence

<220>

<223> sequencing reads

<400> 57

ttgcggctct ctgctcctcc tgttcgacag tcagccgcat cttcttttgc gtcgccagcc 60

gagccacatc gctcagacac catggggaag cggaaca 97

<210> 58

<211> 82

<212> DNA

<213> Artificial sequence

<220>

<223> sequencing reads

<400> 58

gggggctctc tgctcctcct gttcgacagt cagccgcatc ttcttttgcg tcgccagccg 60

agccacatcg ctcagacccc ac 82

<210> 59

<211> 105

<212> DNA

<213> Artificial sequence

<220>

<223> sequencing reads

<400> 59

aatggctctc tgctcctcct gttcgacagt cagccgcatc ttcttttgcg tcgccagccg 60

agccacatcg ctcagacacc atggggaagg tgaaggtcgg agtca 105

<210> 60

<211> 105

<212> DNA

<213> Artificial sequence

<220>

<223> sequencing reads

<400> 60

atcggctctc tgctcctcct gttcgacagt cagccgcatc ttcttttgcg tcgccagccg 60

agccacatcg ctcagacacc atggggaagg tgaaggtcgg agtca 105

<210> 61

<211> 105

<212> DNA

<213> Artificial sequence

<220>

<223> sequencing reads

<400> 61

attggctctc tgctcctcct gttcgacagt cagccgcatc ttcttttgcg tcgccagccg 60

agccacatcg ctcagacacc atggggaagg tgaaggtcgg agtca 105

<210> 62

<211> 57

<212> DNA

<213> Artificial sequence

<220>

<223> sequencing reads

<400> 62

gttggctctc tgctcctcct gttcgacagt cagccgcatc ttcttttgca atcgcca 57

<210> 63

<211> 104

<212> DNA

<213> Artificial sequence

<220>

<223> sequencing reads

<400> 63

atggctctct gctcctcctg ttcgacagtc agccgcatct tcttttgcgt cgccagccga 60

gccacatcgc tcagacacca tggggaaggt gaaggtcgga gtca 104

<210> 64

<211> 104

<212> DNA

<213> Artificial sequence

<220>

<223> sequencing reads

<400> 64

ttggctctct gctcctcctg ttcgacagtc agccgcatct tcttttgcgt cgccagccga 60

gccacatcgc tcagacacca tggggaaggt gaaggtcgga gtca 104

<210> 65

<211> 82

<212> DNA

<213> Artificial sequence

<220>

<223> sequencing reads

<400> 65

ggggctctct gctcctcctg ttcgacagtc agccgcatct tcttttgcgt cgccagccga 60

gccacatcgc tcagaacagc ca 82

<210> 66

<211> 104

<212> DNA

<213> Artificial sequence

<220>

<223> sequencing reads

<400> 66

gtggctctct gctcctcctg ttcgacagtc agccgcatct tcctttgcgt cgccagccga 60

gccacatcgc tcagacacca tggggaaggt gaaggtcgga gtca 104

<210> 67

<211> 103

<212> DNA

<213> Artificial sequence

<220>

<223> sequencing reads

<400> 67

gggctctctg ctcctcctgt tcgacagtca gccgcatctt cttttgcgtc gccagccgag 60

ccacatcgct cagacaccat ggggaaggtg aaggtcggag tca 103

Detailed Description

The present invention provides a method for generating labeled amplified fragments of a nucleic acid template, wherein a recognition sequence is introduced as a label prior to amplification of the fragments. The template nucleic acid may be present in multiple copies. According to the present invention, fragmentation is typically a process that occurs during amplification, i.e., during amplification of portions of a template, one or more (typically multiple) fragments are generated from a template of a given length. When multiple copies of multiple templates are generated at the same time period and the primers used to synthesize these complementary nucleic acid fragments anneal at different positions on different template copies, the sequences of the generated fragments may overlap. Although these inventive concepts are applicable to a single fragment per template, it is preferred to generate multiple fragments from one template molecule, typically by using multiple primers that bind to the template at different positions.

The present invention improves upon existing methods by combining recognition sequences with the generated fragments. The recognition sequence may be introduced together with the primer or after extension (synthesis of a complementary nucleic acid fragment). The recognition sequence is then introduced by ligating the extension product to an adaptor nucleic acid. Surprisingly, the ligation reaction takes place in a single stranded recognition sequence, i.e. the part of the recognition sequence having a non-hybridizing (or "free") 5 'end can be ligated to the 3' end of the extension product. The ligation reaction usually involves a phosphate residue, which is preferably provided at the 5' end of the recognition sequence. Surprisingly, it is not necessary that the adapter nucleic acid be dependent on the template or terminator sequence (supported by hybridization) close to the 3' end of the extension product (as shown in the examples). Although this proximity may be supported by providing an adaptor nucleic acid with a complementary sequence portion (downstream of the recognition sequence, i.e. in the 3' direction) for hybridisation to the oligonucleotide (also referred to herein as an extension terminator or simply a terminator, further primers in the case where more than one fragment is generated per template) bound to the template, direct proximity is not required and may be the result of a straightforward diffusion process without orientation. In particular, it has been shown that adaptor nucleic acids can be ligated to extension products that have reached the 5' end of the template nucleic acid, and that no further downstream extension terminator is present here. This ligation reaction can occur directly at this end of the extension product, or can occur after the polymerase adds one or more non-template nucleotides based on its terminal transferase activity (which some polymerases have). This attachment to the extension product corresponding to the 5' end of the template has some surprising and beneficial advantages: it increases the occurrence of fragments at the 5' end of the template and thus the sequence coverage is fundamentally increased, which is lacking in prior art methods. In previous methods, the fragment start site distribution was constant, which resulted in a high coverage distribution of fragments in the middle of the template, and very low coverage at the 3 'and 5' ends, approaching zero (which is a result of template copy number, average fragment size, and sequencing read length). This effect of the 5' end is mitigated by the method of the present invention. In addition, embodiments are provided that increase coverage of the 3' end of the template.

The amplified fragments (generated as one fragment molecule per extension reaction) are usually further amplified, i.e. copied. This means that the ligated recognition sequence is amplified and thus also copied. Typically, the identification sequences are so diverse that the random selection process can uniquely identify a single fragment carrying the same sequence but from different copies of one template. In all embodiments of the invention, the recognition sequence helps to determine whether the sequenced fragment copies are different copies from the template, because they have different recognition sequences, or they are copies from the same template molecule and are only generated in said further amplification.

Another method provides for generating a tagged amplified fragment of a nucleic acid template, comprising the steps of: providing the template nucleic acid; annealing at least one oligonucleotide primer to the template nucleic acid; extending the at least one oligonucleotide primer in a template-specific manner to produce an extension product; providing an adaptor nucleic acid comprising a recognition sequence, wherein the recognition sequence does not hybridize to a template; the adaptor nucleic acid (preferably at its 5 'end) is ligated to the 3' end of the extension product, thereby generating a tagged amplified fragment. The method is essentially the same as described above and all preferred embodiments described herein are applicable, safely without the use of terminators. A variety of primers may be used which may not have an end function. After the diffusion process, the adapter nucleic acid can still be ligated to the extension product. For ligation, the extension product may remain hybridized to the template or as a single strand. However, it is preferred to use a terminator.

The method of the invention begins with the step of providing the template nucleic acid. The template molecule is made available to the skilled person for use in the method of the invention. Typically, the template is provided in a sample of nucleic acid molecules. Such template nucleic acids may be isolated from a cell (e.g., a eukaryotic or prokaryotic cell). In a particularly preferred embodiment, the template is RNA. Total RNA or a portion of RNA, such as mRNA or rRNA depleted RNA of the cell, may be provided. The amount of RNA that can be easily handled is, for example, 0.1 to 500ng, 1 to 200ng, 10 to 100ng, or 0.1 to 100ng rRNA-depleted RNA or 0.1 to 1000ng total RNA. In some embodiments, the amount of total RNA may be, for example, 10pg, and the amount of non-rRNA RNA may be less than 1 pg. The primers, terminators and adapters are preferably DNA.

The method further comprises annealing at least one oligonucleotide primer to the template nucleic acid. Oligonucleotide primers are oligonucleotide molecules (preferably DNA) that anneal to the template and are capable of initiating an extension reaction, as is standard practice in the art. Oligonucleotide primers (or simply "primers") preferably anneal to the template over at least a portion of their length, e.g., a length of 4-30 nucleotides (nt). Annealing is performed by hybridization. The primer may have a portion that does not anneal to the template. Such other moieties may be used to anneal to other oligonucleotides and/or for further amplification as described above when the amplified fragment is further amplified to produce copies thereof. Such other moieties (parts or moieties) may therefore have sequences that bind to other primers used in the amplification/replication reaction. Such portions are also referred to as primer adaptor sequences. The primer linker sequence is preferably 4nt to 30nt in length.

Returning to the main inventive method, the at least one oligonucleotide primer is extended in a template-specific manner, thereby generating an extension product (complementary sequence). Such reactions are standard in the art and typically employ a polymerase. If the template is RNA, an RNA-dependent polymerase, such as reverse transcriptase, is used. If the template is DNA, a DNA-dependent polymerase is used. The extension reaction stops when the extension reaction reaches a nucleic acid extension terminator that anneals to the template nucleic acid downstream of the extension product or when the extension product reaches the 5' end of the template nucleic acid. It is evident that when the extension reaction reaches the 5' end of the template and thus the template is used up, it stops. Some polymerases may add one or more non-templated nucleotides to the extension product at this point, which may be acceptable, or may even be beneficial in selecting for a 5' coverage product in the sequence analysis of the generated tagged amplified fragments. However, the addition of such non-templated nucleotides is not necessary. The extension reaction also stops when it reaches a nucleic acid extension terminator that anneals to the template nucleic acid downstream of the extension product. Such a stopped reaction is described in detail in WO 2013/038010A2 (incorporated herein by reference). In this WO document, the extension terminator is referred to as an "oligonucleotide terminator" or "further oligonucleotide primer". According to the present invention, a term, i.e., a nucleic acid extension terminator or simply "extension terminator" or simply "terminator" is used. The terminator of the invention may also be a primer, which then corresponds to the "further oligonucleotide primer" of WO 2013/038010A 2. In essence, such a terminator prevents the extension reaction of an upstream extension product by providing a barrier on the template (and thus, such a terminator is located downstream of the extension product). The terminator anneals or hybridizes to the template and the extension reaction does not replace the terminator and is thus terminated. Readthrough, i.e., replacement of the terminator, is a side reaction. Measures to prevent terminator replacement are described in detail in WO 2013/038010A2, and these measures can be used according to the present invention. Briefly, the preferred method and means of preventing terminator displacement (due to strand displacement activity) is to use an extension terminator comprising one or more modified nucleotides that raise the melting temperature of the annealing sequence to anneal to the template (the terminator portion that anneals/hybridizes to the template). An elevated melting temperature refers to unmodified, native nucleic acids, such as DNA or RNA. Such modifications are for example LNA (locked nucleic acid), ZNA (zipper nucleic acid), 2 'fluoronucleoside/2' fluoronucleotide or PNA (peptide nucleic acid). Other measures are the use of polymerases without strand displacement activity or the use of intercalators. Preferably, 1,2, 3, 4, 5 or 6 nucleotides are modified. Preferably, the modified nucleic acid is located 5' to the terminator sequence which hybridises to the template. There may be other non-hybridizing parts in the 5' direction of the terminator-for example amplification sequences that function identically to the oligonucleotide primers used for amplification/replication in the further amplification reactions described above ("primer linker sequences") -in fact, such other parts are preferably used for binding/hybridizing to adaptor nucleic acids-see below. The adaptor may bind/hybridize to a "primer adaptor sequence" or another portion of an oligonucleotide terminator. In a preferred embodiment, the extension terminator and preferably also the oligonucleotide primer comprise one or more modified nucleotides which increase the melting temperature at which the annealing sequence (linker) anneals to the template.

Preferably, primers and terminators not bound to the template are removed in a purification step after the extension reaction. That is, the extension product hybridized to the template is purified and retained for further processing. Other embodiments of the invention are performed without purification in a single volume. Such purification can be carried out by methods known in the art, such as immobilizing the template or extension product onto a solid phase (e.g., beads) and washing to remove any unbound primers and terminators. An exemplary method is solid phase reverse immobilization (SPRI; DeAngelis et al, Nucleic Acids Research,1995,23(22): 4742-.

The method of the invention comprises the step of providing an adaptor nucleic acid comprising a recognition sequence at its 5' end. Other sequence tags, such as sequences used for amplification (amplification sequences), may also be part of the adaptor nucleic acid. The 5 'end is the end used for ligation to the 3' end of the extension product for subsequent labeling with the recognition sequence. The recognition sequence should not hybridize to the extension terminator or the template. Thus, it is generally single stranded rather than hybridized. In this context, the term "recognition sequence" is used for the 5' end portion of the adaptor nucleic acid that does not hybridize or anneal — even if only part of the recognition sequence is later used for recognition. Other portions of the adapter nucleic acid may form hybrids with or anneal to the extension terminator. The adaptor nucleic acid may also comprise a complementary primer sequence that serves as a target for further amplification reactions of the labeled amplified fragments described above (referred to as adaptor sequences). By selecting a sequence for the recognition sequence that is not complementary to the extension terminator, the recognition sequence can be prevented from hybridizing with the extension terminator or with the template. The recognition sequence may also be selected so that it is not complementary to the template. This is easily done if the template sequence is known. If the template sequence is unknown but from a biological source, the recognition sequence can be selected from sequences not found in or rare to the biological nucleic acid. Such sequences are known from "internal reference" nucleic acids, such as ERCC (exogenous RNA control Association) sequences or SIRV (internal reference RNA variant) sequences (see, e.g., ERCC, BMC Genomics 20056: 150; Jiang et al Genome Res.2011,21(9):1543 and 1551; WO 2016/005524A1, all incorporated herein by reference). If the recognition sequence anneals to the template in a side reaction, this will generally prevent further ligation and therefore will not produce tagged fragments and will therefore not be considered a result. Such side reactions are tolerable but not preferred. The simplest and most preferred way to prevent annealing of the recognition sequence (and preferably the entire adaptor nucleic acid) to the template is simply to provide the adaptor nucleic acid after the extension reaction. After the extension reaction, the template is in double-stranded form with the extension product (as well as the primer and terminator). In this format, the adapter nucleic acid can no longer bind to the template because the template is already covered by the hybridization partner. In this preferred method, the recognition sequence may even have a sequence complementary to the template and may be able to hybridize with the template but is hindered by subsequent method steps. Thus, no consideration of the template sequence is required in this embodiment.

The most preferred option to prevent annealing of the recognition sequence to the terminator is for the portion of the terminator and the portion of the adaptor to carry sequences complementary to each other. Because when the adapter approaches the terminator, the complementary sequences hybridize first and the recognition sequence remains single-stranded.

The methods of the invention further comprise ligating the 5 'end of the adaptor nucleic acid to the 3' end of the extension product, thereby generating a labeled amplified fragment. Ligation is usually performed using a ligase. The type of ligase depends on the nature of the oligonucleotides to be ligated and may be selected by the skilled practitioner. Exemplary ligases include DNA ligase or RNA ligase. The ligase may also be an RNA ligase, particularly an RNA ligase having DNA ligation activity, such as T4RNA ligase 2. Additional ligases are T4 DNA ligase, T4RNA ligase 1, DNA ligase I, DNA ligase III, DNA ligase IV, E.coli DNA ligase, AmpligaseDNA ligase, truncated Rnl2, Rnl2 truncated K227Q, Thermus scottotus ligase, Methanobacterium thermoautotrophicum (Methanobacterium thermoautophicum) RNA ligase, thermostable App-ligase (NEB), Chlorella (Chlorella) virus DNA ligase or SplintR ligase. The ligase may be a single-stranded ligase or a double-stranded ligase. It is also possible to combine ligases for different reactions in one reaction volume in order to perform parallel reactions, for example when different extension products and/or adaptor nucleic acid molecules are present and need to be ligated simultaneously. Preferred combinations are DNA ligase and RNA ligase or single-stranded ligase and double-stranded ligase. The ligase reaction typically involves a phosphate residue, which is preferably provided at the 5' end of the recognition sequence of the adaptor nucleic acid. Other 5' moieties (moieties) may also be used for ligation, for example, ligation of adenylated ends. This can be linked to a truncated ligase or an App-ligase.

The resulting tagged amplified fragments have a structure following 5 'to 3' ligation: primer sequence-extension product sequence-adapter sequence (bordered by the recognition sequence and the extension product sequence). The primer sequence may have a "primer adaptor sequence" and/or the adaptor sequence may have an "adaptor sequence". The product of the method of the invention, i.e.the resulting labelled amplified fragment, is preferably further amplified. Such further amplification produces copies of the resulting tagged amplified fragments by methods known in the art, such as PCR (polymerase chain reaction) or linear amplification. Such further amplification typically involves the use of further primers which bind to the labelled amplified fragments, preferably on the adaptor sequences, especially at the ends of these fragments, i.e.within the primer sequences and part of the adaptor sequences, particularly preferably at the 5 'end of the primer sequences and 3' end of the adaptor sequences. As described above for these primers and adaptors, they may have regions of known sequence to bind such primers for further amplification ("primer adaptor sequences" and "adaptor sequences"). These regions (or "portions") may be long and specific, and do not bind to the template; they may be universal primer binding sites, i.e.no selectivity between different adaptors/primers-as opposed to recognition sequences which are preferably unique. The recognition sequence provides a unique marker for the amplified fragment and is therefore also referred to herein as a Unique Molecular Identifier (UMI). The recognition sequence can identify the replicon for such further amplification (e.g., PCR) and reduce the effect of sequence-dependent amplification bias. In a preferred embodiment, the recognition sequence is an oligonucleotide (having predominantly a random nucleotide distribution at each position) that is ligated to the extension product (fragment) prior to further amplification. If the recognition sequences are evenly distributed and their number is much larger than the number of identical (identification) extension products, it is unlikely that the same (same) recognition sequence will be linked to two identical extension products (different copies). In this case, the number of unique (distint) recognition sequences after further amplification is the same as the number before further amplification. The identification sequences of the present invention may also be used as described by Sena et al for UMI (Scientific Reports (2018)8: 13121). In next generation sequencing methods and further sequence analysis, the entire sequence of the tagged fragments or some portion of the entire sequence may be considered a "read". One or more of the read values are assembled during data analysis to obtain a merged sequence of templates. Subsequently, data analysis may also become quantitative analysis of template molecules and fragments, which may provide insight when a particular template copy is over-represented or under-represented, thereby suggesting, for example, a different expression rate of RNA splice variants. In a preferred embodiment, the invention further comprises the step of assembling the sequences of the unique amplified fragments, wherein the tag is used to identify the unique amplified fragments. The different recognition sequences in the amplified tagged amplified fragments identify the unique amplified fragments. The recognition sequences enable duplication (duplicate) and replicon recognition (replication identification) and removal (removal) in the assembly or any other data analysis step.

In a preferred embodiment, the recognition sequence is 3nt (nucleotides) or longer in length, preferably 3nt to 20nt, particularly preferably 4nt to 15nt or 5nt to 10nt, for example 3nt, 4nt, 5nt, 6nt, 7nt, 8nt, 9nt, 10nt, 11nt, 12nt, 13nt, 14nt, 15nt or longer in length. Such a length is small enough for an efficient ligation reaction to be easily handled, but due to their nucleotide arrangement provides a sufficiently large number of different recognition sequences to enable the desired recognition of the individual amplified fragments, preferably their unique tags.

In a preferred embodiment, the nucleotide polymerase is allowed to add non-templated nucleotides to the extension products when the extension products reach the 5' end of the template nucleic acid, preferably by terminal transferase activity of the polymerase, and/or preferably wherein 1-15 non-templated nucleotides are added to at least 70% of the extension products. As described above, this non-templated nucleotide addition is characteristic of some polymerases (see Chen et al, Biotechniques 2001,30(3): 574-582). This activity is most prominent in reverse transcriptases, such as M-MLV (murine leukemia Virus) reverse transcriptase or AMV (alfalfa mosaic Virus) reverse transcriptase. These non-templated nucleotides are generally of any nucleotide type (A, T (U), G, C) and may occur randomly. This means that extension products at the 5 'end of different templates may share the same sequence corresponding to the 5' end, but may subsequently be extended by different, seemingly random, other nucleotides that are products of such non-templated addition. These different additions can be used to identify the exact location of the 5' end of the template sequence at the transition between the templated repeat sequence and the non-templated random addition. After non-templating the nucleotides, the tagged fragments are continued with a recognition sequence (which can be used as described above). In case the recognition sequence is (also) random, non-templated random nucleotides may be considered as part of the recognition sequence. The position of the recognition sequence relative to the constant portion of the adapter sequence unambiguously identifies the recognition sequence.

In a particularly preferred embodiment, a plurality of adaptor nucleic acids are provided and used in the ligation step. The plurality of adaptors can have different recognition sequences. This allows for unique identification of the adaptors and the generated fragments to which they are ligated. Preferably, at least 10, more preferably at least 50, or even 100 or more or 200 or more adapter nucleic acids with different recognition sequences are provided and used in the ligation step. It is particularly preferred that the more adapters with different recognition sequences are used, the more different fragments with the same sequence can be expected to be generated-or preferably more adapters with different recognition sequences. The expected copy number of the template may be based on the type of sample, e.g., whole cell RNA, whole cell mRNA (transcriptome), amount of RNA, and complexity of the sample (how many different transcriptional variants are targeted, either the entire transcriptome, or just selected genes or transcripts, as is the case in gene detection sets (gene panels)), etc.

Particularly preferably, the identification sequence is a random sequence. "random sequence" is understood to mean a mixture of different sequences which have a high degree of inconsistency due to the random synthesis of at least a part of the recognition sequence. The random sequence possibly covering said sequenceThe entire combined region of 4 natural nucleotides (A, T (U), G, C). The random sequence may cover 1,2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 13, 14, 15 or more nucleotides selected randomly from A, G, C or t (u). In terms of the ability of the sequence of nucleotides to hybridize, T and U are used interchangeably herein. All possible combinations of random sequence portions are mnWhere m is the number of nucleotide types used (preferably A, G, C, T (U) for all four) and n is the number of random nucleotides. Thus, a random hexamer (in which each possible sequence is represented) is represented by 464096 different sequences. The recognition sequence should not bind to the template. In all cases, but especially for randomly recognized sequences, it is preferred to add the adapter nucleic acid after the extension reaction. When the extension product reaches the terminator (or template end) and substantially the entire template then forms a double stranded form with the extension product, the adaptor nucleic acid is prevented from binding to the template.

In a further embodiment of the invention, the primer and terminator are selected to bind to one or more specific target sequences of interest in the template nucleic acid (the terminator being located downstream of the extension product), thereby obtaining an extension sequence of a specific template portion. Such targeting of specific regions is preferably used when transcripts (RNA) or genes (gDNA) are used as templates. The recognition sequences are particularly useful when used in gene detection combinations. For example, for analysis of different types of templates for sequence variants, such as splice variants or other variations of the template sequence.

In a particularly preferred embodiment of all embodiments and aspects of the invention, the extension terminator has primer activity and is also extended during the extension step. This means that more than one primer is used and that most primers have terminator function (i.e.prevent displacement-see above). The use of multiple primers means that one template will produce multiple fragments, i.e., coverage is increased. Although each of the multiple primers binds to one template, they will provide full coverage when different primers bind to different positions of the template. The method of the invention using multiple primers (preferably also terminators) will increase coverage since new extension products will start at the position on the template where the upstream extension product has just stopped. This results in many fragments covering the entire template. Furthermore, this also means the use of terminators/primers (used synonymously in this embodiment) that bind to different parts of the template molecule. In general, binding to the template molecule is determined by the annealing sequences of the primer and the terminator. The sequence hybridizes to the template and can be altered to bind to a different location on the template. Preferably, at least 9, at least 10, more preferably at least 49, at least 50, for example 100 or more or 200 or more, extension terminators are used and they have different annealing sequences for annealing to the template. Therefore, they may anneal to different positions of the template nucleic acid. Preferably, the annealing sequence is a random sequence. The random sequences are described above with respect to the recognition sequence, and those descriptions are equally applicable to the annealing sequence of the primer, the annealing sequence of the terminator, and the annealing sequence of the terminator having the function of the primer. Preferably, the random sequence of annealing sequences may cover 1,2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 13, 14, 15 or more nucleotides randomly selected from A, G, C or t (u).

Preferably, the adaptor nucleic acid is bound, hybridized or not bound to the extension terminator. Such a binding reaction, e.g., one achieved by a chemical reaction, complex formation or hybridization, facilitates the positioning of the adapter nucleic acid near the 3' end of the upstream extension product without its recognition sequence itself hybridizing to the terminator or template and, surprisingly, not being necessary for the ligation reaction to proceed normally. Preferably, when the adapter nucleic acid is bound or hybridized to the extension terminator, the recognition sequence can be selected independently of the extension terminator annealing sequence used to anneal the extension terminator to the template. Both the annealing sequence and the recognition sequence may be random sequences, preferably selected independently of each other. This is generally ensured when the nucleic acid portions of the terminator and the adapter are universal sequences, i.e., any adapter can bind to any terminator (which is preferred for all embodiments of the invention), and the terminator is no longer bound by another adapter nucleic acid, such as is the case when an adapter is provided only after an extension reaction. In other embodiments or other parts of the reaction, they do not bind, e.g., when the extension reaction reaches the 5 'end of the template, there is typically no terminator hybridized to it, since the terminator requires at least a minimal annealing sequence on the template, which moves the most downstream termination position several nucleotides upstream from the 5' end. Adapters can also be ligated to extension products without binding or hybridizing to an extension terminator. However, it is preferred in all embodiments that when the adapter nucleic acid is ligated to the extension product, the extension terminator and/or the extension product, particularly preferably the 3' end thereof, still hybridises to the template. It is also preferred that the adapter nucleic acid hybridizes to the extension terminator, particularly preferably after the extension reaction and/or-particularly preferably-for ligation.

In preferred embodiments of the methods and kits of the invention, the oligonucleotide primers-and preferably but not necessarily also the extension terminators-comprise universal amplification sequences ("primer adapter sequences", see above) and/or wherein the adaptor nucleic acids comprise universal adaptor amplification sequences ("adaptor adapter sequences", see above). Such amplification sequences or "adaptors" may be used to bind primers for further amplification as described above. By universal sequence is meant that it is the same for all primers, for all terminators or for all adapters, respectively. This allows the same primer type to bind to these oligonucleotides. In a particularly preferred embodiment, the universal amplification sequence (adaptor sequence) is also the same for the primer, terminator and adaptor, i.e.another amplification primer can similarly bind to the oligonucleotide primer, extension terminator and adaptor nucleic acid. This facilitates simple handling, since only one type of primer is required for further amplification. In other embodiments, the primers, terminator and adaptor have different universal amplification sequences (adaptor sequences), i.e., one further amplification primer may bind only to the oligonucleotide primer, another further amplification primer may bind only to the extension terminator, and yet another further amplification primer may bind only to the adaptor nucleic acid. In these groups, the primers are preferably universal. This still allows for simple manipulation but better control since the primers at both ends of the tagged fragment will be different and can be specifically selected.

In a preferred embodiment, specific oligonucleotide primers are used to select and anneal to selected sequences of the template (preferably at the 3' end of the template). In the case of mRNA or any other type of RNA comprising an oligo (A) tail, such 3' end may be annealed to a complementary oligonucleotide primer, e.g.the primer comprises an oligo (dT) annealing sequence complementary to the oligo (A) tail. Preferably, at least one oligonucleotide primer comprises an annealing sequence that anneals to a selected sequence of the template (which may be at or near the 3' end of the template). Such selected sequence is any known sequence of the template, such as the oligo (A) tail, but any other sequence, once known, may also be used. Preferably, the oligonucleotide primer of the selected sequence comprises an oligo (dT) sequence that anneals to an oligo (A) sequence in the template. Preferably, the oligo (dT) sequence comprises one or more 3' anchor nucleotides other than the oligo (dT) sequence. This allows for proper localization and binding to the 5' end of the oligo (A) template sequence. The anchor nucleotide will anneal to the next non-A (e.g., T, G, C) on the template immediately following the oligo (A) portion. If the next non-A nucleotide is not known, a mixture of oligonucleotide primers and different anchor primers may be used, for example three oligonucleotide primers, each having one non-T (e.g., A, G, C) nucleotide (complementary to the next non-A (e.g., T, G, C) on the template). In a preferred embodiment, two anchor nucleotides are used. The anchor nucleotide next to the non-T nucleotide may be selected from any nucleotide type (A, T (U), G, C) as it is not bordered by oligo (T). The specific oligonucleotide primer may not be a terminator and may not contain sequences that hybridize to an adaptor, since these sequences are not required if the specific oligonucleotide primer anneals to or near the 3' end of the template-meaning no upstream extension product reaches its position. Of course, such sequences and/or terminator functions may be present for convenience or uniformity in the production of primers/terminators.

Preferably, the ligation reaction is performed in the presence of a crowding agent (growing agent). Crowding agents increase the probability of adaptor and extension product interaction by reducing the effective reaction volume, see Zimmerman et al, Proc Natl Acad Sci U S a.1983; 80(19):5852-6. Other crowding agents are disclosed in, for example, US5,554,730, US 8,017,339 and WO 2013/038010a 2. Preferably, the crowding agent is a macromolecule, polymer or polymer-containing complex, such as a polyalkylene glycol, preferably PEG, octoxynol (Octoxinol) or Triton X, or a polysorbate, preferably Tween (Tween). In a preferred embodiment, the crowding agent is used at a concentration of 5% to 35% (v/v), particularly preferably 10% to 25% (v/v). Preferably, the molecular weight of the crowding agent is 200-35000 g/mol, preferably 1000-10000 g/mol. Particularly preferred are polyalkylene glycols, such as PEG, especially having the stated molecular weights. The crowding agent is preferably provided in the kit of the invention, preferably in a ligation buffer.

Other components of the kit include buffers, salts, enzyme cofactors and metals (e.g., Mn for polymerases and ligases)2+And Mg2+) Solvent and container.

The invention provides kits for performing the methods of the invention. Such kits may comprise any of the compounds and tools described thus far. Preferably, the kit comprises (i) at least one oligonucleotide primer capable of hybridising to the template nucleic acid and priming the extension reaction at its 3' end, (ii) one or more extension terminator capable of hybridising to the template nucleic acid, preferably capable of priming the extension reaction at its 3' end, (iii) one or more adaptor nucleic acid comprising a recognition sequence at its 5' end, wherein said recognition sequence is not hybridised to the extension terminator, preferably wherein the adaptor nucleic acid is bound, hybridised or not bound to the extension terminator, (iv) a reverse transcriptase, and (v) an oligonucleotide ligase, (iv) and (v) may be optional as they are available much independently of the laboratory of the invention. An important part is the design of the adapter/terminator, in particular the recognition sequence on the adapter. Preferably, a plurality of adaptors having different recognition sequences are provided in the kit-as described above. All of these components of the kit have been described above, and any preferred embodiment thereof is equally applicable to the kit. Preferably, the kit comprises at least 10, more preferably at least 50 adaptor nucleic acids with different recognition sequences. The reasons for such preferred embodiments have already been given above. Preferably, the oligonucleotide primer comprises an annealing sequence that anneals to the template comprising an oligo (dT) sequence that anneals to an oligo (A) sequence in the template, preferably wherein the oligo (dT) sequence comprises one or more 3' anchor nucleotides other than an oligo (dT) sequence. The kit may also comprise a solid phase for purification, e.g. beads, preferably magnetic beads (see above for details of the method, and for applicability and embodiments of the kit composition).

All of the preferred embodiments described above may be combined. One such method uses a random primer (with an adaptor sequence), which is also a terminator (also known as a "strand displacement terminator primer"). After the extension reaction, purification of the extension product (hybridized with the template) is preferably performed to remove unbound primers and terminators. Adapters with linkers and recognition sequences are then ligated to the extension products. The recognition sequence has a random sequence with the length of preferably 4-12 nt. One preferred option is to use a mixture of recognition sequences of different lengths, since ligases tend to impose ligation bias by favoring certain 5' nucleotides in the last and penultimate positions. Since such deviations can affect read quality in sequencing, such mixtures can equalize nucleotide distribution when sequencing across the junction region. However, the variable recognition sequence provides a more unbiased linkage than any other defined sequence and at the same time also serves as UMI (unique molecular index). The recognition sequence (e.g. UMI) allows to determine that sequencing reads belonging to minor sequencing errors with the same sequence or mapped to the same position in the reference sequence are either from different template molecules or from the same template molecule and are simply the result of further amplification (PCR duplication). The adapter, when present, hybridizes to the primer.

Recognition sequences (e.g., UMI) can also distinguish true SNPs (single nucleotide polymorphisms) between individuals from errors (mutations) introduced (subsequently amplified) during reverse transcription or early PCR cycles. All of these randomly occurring and amplified errors should have the same identifier, while the "true SNPs" in the sample have various different identifiers. Alternatively, those RNA editing events that introduce modified bases during RT, leading to misincorporation and thus errors, can be quantified more reliably.

Recognition sequences (e.g., UMI) can also be used to reliably determine and quantify allele frequencies, molecular markers, and causative mutations of genetic diseases in populations. Preferably, a DNA template is used in this embodiment.

A further preferred combination is a method of the invention, wherein at least one, preferably at least 9, extension terminator has primer activity and is also extended during the extension step, and at least two, preferably at least 10, adaptor nucleic acids comprising different recognition sequences are used, thereby generating at least two, preferably at least 10, different marker fragments, optionally amplifying said marker fragments, further comprising assembling the sequences of the unique amplified fragments, wherein the markers are used to identify the unique amplified fragments. Different markers in the amplified tagged fragments can be used to identify unique amplified fragments.

A further preferred method uses a terminator having a primer function. Preferably, a plurality of such primers are used. In such a method, without distinguishing between a terminator and a primer, one embodiment of the present invention can be defined as follows: a method for generating a tagged amplified fragment of a nucleic acid template, comprising providing the template nucleic acid; annealing a plurality of oligonucleotide primers to the template nucleic acid; extending the oligonucleotide primers in a template-specific manner to produce a plurality of extension products, wherein the extension reaction terminates when an extension product reaches the 5' end of a template nucleic acid or reaches an oligonucleotide primer that anneals to a template nucleic acid downstream of the extension product; providing a plurality of adaptor nucleic acids comprising a recognition sequence at their 5' ends, wherein said recognition sequence does not hybridize to said oligonucleotide primer or said template; ligating the plurality of adaptor nucleic acids at their respective 5 'ends to the 3' ends of the extension products, thereby generating a plurality of labeled amplified fragments. This is a preferred embodiment that may be combined with the claims and any of the specifically described aspects set forth above. All the above descriptions of the terminator are applicable to the primers in the present embodiment, since these primers are terminators having the function of primers. The term "plurality" is used for oligonucleotide primers, extension products (as a result of primer extension), adaptor nucleic acids and labeled amplified fragments (as a result of extension and adaptor ligation). As noted, the number of some of these plural numbers is the result of the method. The amounts of oligonucleotide primer and adaptor nucleic acid may be selected-as described above. Their amounts can be independently selected, but are preferably about the same so as to be paired with a given extension product. Preferably, the plurality is, for example, 10 or more, 50 or more, 100 or more, 200 or more, etc. Many different oligonucleotide primers and adaptor nucleic acids can be used: for binding of oligonucleotide primers to a plurality of different positions on the template for imparting different recognition sequences to the adaptor nucleic acid, preferably unique recognition sequences of the tagged amplified fragments. Although in this embodiment the primer and terminator are the same, it is also possible to add special primers that do not require (but may have) terminator function, such as 5' end specific primers, like the oligo (A) targeting primers described above.

The invention is further illustrated in the following figures and examples, but is not limited to these embodiments of the invention.

43页详细技术资料下载
上一篇:一种医用注射器针头装配设备
下一篇:DNA成像缓冲液中的DNA保护试剂

网友询问留言

已有0条留言

还没有人留言评论。精彩留言会获得点赞!

精彩留言,会给你点赞!