Prokaryotic full-length initial transcript library building method suitable for PacBio sequencing platform and application

文档序号:1350625 发布日期:2020-07-24 浏览:11次 中文

阅读说明:本技术 适用于PacBio测序平台的原核全长初始转录本建库方法及应用 (Prokaryotic full-length initial transcript library building method suitable for PacBio sequencing platform and application ) 是由 方涛 于 2020-04-09 设计创作,主要内容包括:本发明提供一种适用于PacBio测序平台的原核全长初始转录本建库方法及应用,包括:依次在原核生物总RNA的3’末端加尾、5’末端添加生物素标记帽子,然后捕获带有生物素标记的初始转录本,最后合成初始转录本全长cDNA并构建其PacBio文库。该方法在原核转录组的3’末端加尾作为cDNA全长扩增一链合成的引物位点,实现了原核生物进行全长转录组测序,扩大了全长转录组测序技术的适用范围。同时,在初始转录本5’末端添加生物素标记的帽子,然后富集目标全长初始转录本,从而去除总RNA中的rRNA,极大的提高了数据有效率;合成全长cDNA时,在第一条链的末端添加额外核苷酸序列作为第二链引物的结合位点,避免SMARTer技术的偏好性问题,使测序更加均一。(The invention provides a prokaryotic full-length initial transcript library building method suitable for a PacBio sequencing platform and application thereof, wherein the method comprises the following steps: and sequentially adding a tail at the 3 'end and a biotin-labeled cap at the 5' end of the total RNA of the prokaryote, capturing an initial transcript with a biotin label, finally synthesizing the full-length cDNA of the initial transcript and constructing a PacBio library of the initial transcript. The method adds the tail at the 3' end of the prokaryotic transcriptome as a primer site for cDNA full-length amplification one-strand synthesis, realizes the full-length transcriptome sequencing of prokaryotes, and enlarges the application range of the full-length transcriptome sequencing technology. Meanwhile, a biotin-labeled cap is added at the 5' end of the initial transcript, and then the target full-length initial transcript is enriched, so that rRNA in total RNA is removed, and the data efficiency is greatly improved; when synthesizing full-length cDNA, adding extra nucleotide sequence at the end of the first chain as the binding site of the second chain primer, avoiding the preference problem of SMARTer technology and making the sequencing more uniform.)

1. A prokaryotic full-length initial transcript library construction method suitable for a PacBio sequencing platform is characterized by comprising the following steps: 1) tailing at the 3' end of the total RNA of the prokaryote; 2) adding a biotin labeling cap at the 5 'end of the prokaryotic total RNA with the tail at the 3' end; 3) capturing initial transcripts bearing biotin labels; 4) synthesizing initial transcript full-length cDNA; 5) construction of a PacBio library for full-length cDNA.

2. The method for constructing a prokaryotic full-length initial transcript library suitable for the PacBio sequencing platform according to claim 1, wherein the step of synthesizing the initial transcript full-length cDNA comprises the following steps:

(1) synthesizing the first strand of the full-length cDNA of the initial transcript;

(2) adding an additional nucleotide sequence at the end of the first strand of the cDNA as a primer binding site for the second strand;

(2) synthesizing a second strand of the full-length cDNA of the initial transcript;

(3) the full-length cDNA was amplified and purified.

3. The method for constructing the library of prokaryotic full-length initial transcripts suitable for PacBio sequencing platform according to claim 2, wherein the terminal transferase TdT is used in step (2) to add an additional nucleotide sequence to the first strand end of cDNA

4. The method for constructing the library of prokaryotic full-length initial transcripts suitable for PacBio sequencing platform according to claim 3, wherein the reaction system for terminal base addition using TdT is as follows: first Strand purification of full-Length cDNA of initial transcriptProduct 39 μ L, TdT Buffer (10X)5 μ L, CoCl2solution (2.5mM) 5. mu. L, 10mM dCTP or dGTP 0.5. mu. L, Terminal Transferase (20 units/. mu.l) 0.5. mu. L.

5. The method for constructing the prokaryotic full-length initial transcript library suitable for the PacBio sequencing platform according to claim 1, wherein the PacBio library construction of the full-length cDNA comprises the following steps:

(1) repairing damage of the full-length cDNA of the initial transcript;

(2) repairing the tail end;

(3) connecting a sequencing joint to the end repair product;

(4) and (4) digesting and purifying by using exonuclease.

6. The method of claim 1, wherein the library construction further comprises library quality inspection to determine library size.

7. A full-length transcriptome sequencing method suitable for prokaryotes, comprising pooling the full-length primary transcript of prokaryotes using the pooling method of any one of claims 1-6, followed by sequencing using a high-throughput PacBio sequencing platform.

Technical Field

The invention relates to the technical field of high-throughput sequencing library construction, in particular to a prokaryotic full-length initial transcript library construction method suitable for a PacBio sequencing platform and application.

Background

With the development of second generation gene sequencing technology, the sequencing of transcriptomes of biological samples is becoming more mature. Scientists performed transcriptome sequencing on the samples studied to explore the differences in gene expression between different strains, different environmental conditioning treatments, different developmental stages, different tissues or different cells. The transcriptome sequencing technology utilizing the next generation sequencing platform can be used for researching the gene structure, such as cSNP, SSR, prediction ORF and the like. The advantage of the second generation sequencing technology is that the flux is high, the gene expression difference between different samples can be accurately reflected, but based on the limitation of reading length, mRNA samples need to be interrupted in the library construction process, and the mRNA samples need to be spliced again in the information analysis process, so that certain errors exist; in addition, the information on the 5 'end and 3' end in the sequencing results is incomplete, and the full-length sequence of mRNA is often not obtained.

Aiming at the defect of transcriptome sequencing by a second generation sequencing platform, the third generation sequencing platform of the American PacBio company can directly sequence the full-length transcriptome of eukaryote. Based on the advantage of long read length of PacBio three-generation sequencing, mRNA sequences can be passed through without interruption, full-length transcripts are obtained, accurate alternative splicing and transcription initiation site information is provided, homologous isomers, homologous genes, family genes and alleles are accurately distinguished, and an excellent sequence library is provided for downstream molecular cloning experiments. For specific germplasm resources, novel genes can be found and identified. By sequencing the full-length transcriptome of the sample, a large number of full-length sequences of mRNA from the species can be obtained. The acquisition of full-length mRNA provides better technical support for the majority of researchers. In many scientific research projects, particularly relating to the research of gene functions, the full-length sequence of the mRNA of a target gene needs to be obtained, and for species with unknown genome sequences, the full-length of the mRNA needs to be obtained by RACE (rapid amplification of cDNA ends) technology. The technical problem can be well solved by the full-length transcriptome sequencing based on a third-generation sequencing platform, a large number of eukaryotic mRNA full-length sequences can be obtained by the technology, and scientists can select interested genes from the full-length sequences for subsequent research.

Three generations of full-length transcriptome sequencing, i.e., sequencing the mRNA of a certain species using the PacBio-three generation sequencing platform. By virtue of the advantage of ultra-long reading length, the high-quality full-length transcript sequence from the 5 'end to the 3' polyA tail can be obtained by directly sequencing the reverse-transcribed full-length cDNA without interrupting RNA molecules, so that the expression of homologous isomers (isoform), alternative splicing, fusion genes, homologous genes, superfamily genes, alleles and the like can be accurately analyzed. The third-generation full-length transcriptome sequencing realizes the analysis of transcripts without splicing by combining a multi-fragment library screening technology with the advantage of average overlength reading length of 10-15kb, overcomes the defects of short splicing and incomplete transcript structure of the traditional second-generation transcriptome Unigene, and is named because the high-quality all transcriptome information of a single RNA molecule from a 5 'end to a 3' end can be directly obtained.

Compared with the second generation transcriptome sequencing, the third generation full-length transcriptome has the advantages that a, the full-length transcriptome information of eukaryote can be completely read at one time by using the ultra-long reading length (the average reading length is 20-30K, and the longest reading length is 300K), b, fragment breaking and splicing are not needed, assembly errors are avoided, c, the complete and accurate transcriptome information obtained based on the full-length transcriptome sequencing is combined with the second generation data, the specific expression is convenient to identify, and more accurate gene and transcriptome expression quantification is carried out, d, aiming at species with reference genome, the full-length transcriptome information can correct the error assembly of genome, new transcriptome and gene are more accurately found, gene fusion events are analyzed, and the like.

Current sequencing of full-length transcriptome is based on the design that eukaryotic mRNA has a poly (A) structure, eukaryotic mature mRNA has a poly (A) tail at the 3' end, and full-length cDNA of eukaryotic can be directly synthesized using the SMARTERPCR cDNA Synthesis kit of Clontech. For prokaryotic transcriptome, the procedure of eukaryotic full-length transcriptome cannot be applied due to the absence of poly (A) tail, and meanwhile, due to the chain conversion efficiency of SMARTer technology, the sequencing preference exists at the 5' end of the transcript, and partial full-length transcriptome is difficult to sequence. Aiming at the technical defects, the patent develops a prokaryotic full-length initial transcript database construction technology based on prokaryotes.

Disclosure of Invention

Aiming at the technical problems in the prior art, the invention provides a prokaryotic full-length initial transcript library building method suitable for a PacBio sequencing platform and application thereof, the method solves the problems of low chain conversion efficiency and bias in transcript sequencing of SMARTer technology, realizes the full-length transcriptome sequencing of prokaryotes, and enlarges the application range of the full-length transcriptome sequencing technology.

In order to realize the purpose, the invention is realized by the following technical scheme:

the first purpose of the invention is to provide a prokaryotic full-length initial transcript library construction method suitable for a PacBio sequencing platform, which comprises the following steps: 1) tailing at the 3' end of the total RNA of the prokaryote; 2) adding a biotin labeling cap at the 5 'end of the prokaryotic total RNA with the tail at the 3' end; 3) capturing initial transcripts bearing biotin labels; 4) synthesizing initial transcript full-length cDNA; 5) construction of a PacBio library for full-length cDNA.

Further, the synthesis of the full-length cDNA of the initial transcript comprises the following steps:

(1) synthesizing the first strand of the full-length cDNA of the initial transcript;

(2) adding an additional nucleotide sequence at the end of the first strand of the cDNA as a primer binding site for the second strand;

(3) synthesizing a second strand of the full-length cDNA of the initial transcript;

(4) the full-length cDNA was amplified and purified.

Further, the step (2) adds an additional nucleotide sequence to the first strand end of the cDNA using terminal transferase TdT

Further, the reaction system for terminal base addition using TdT was 39. mu. L for the first strand purified product of full-length cDNA of the initial transcript, TdT Buffer (10X) 5. mu. L, CoCl2solution (2.5mM) 5. mu. L, 10mM dCTP or dGTP0.5. mu. L, Terminal Transferase (20 units/. mu.l) 0.5. mu. L.

Further, the PacBio library construction of the full-length cDNA comprises the following steps:

(1) repairing damage of the full-length cDNA of the initial transcript;

(2) repairing the tail end;

(3) connecting a sequencing joint to the end repair product;

(4) and (4) digesting and purifying by using exonuclease.

Further, the library construction also includes library quality testing to determine library size.

The second purpose of the invention is to provide a full-length transcriptome sequencing method suitable for prokaryotes, which comprises the steps of constructing a library of the full-length initial transcripts of the prokaryotes by adopting the library construction method described in any one of the above methods, and then sequencing by adopting a high-throughput PacBio sequencing platform.

Compared with the prior art, the invention has the beneficial effects that:

(1) the prokaryotic full-length initial transcript library construction method applicable to the PacBio sequencing platform provided by the invention takes the total RNA of prokaryotes as a detection sample, and adds a tail at the 3' end of a prokaryotic transcriptome as a primer site for one-strand synthesis during the subsequent cDNA full-length amplification, so that the prokaryotic transcriptome can be applicable to the library construction process of eukaryotes, the full-length transcriptome sequencing of prokaryotes is realized, and the application range of the full-length transcriptome sequencing technology is expanded.

(2) Based on the reason that Total RNA contains more than 90% of rRNA which is useless for sequencing data, the application aims at the characteristic that the initial transcript of prokaryotes has a triphosphate group at the 5 ' end, and only one phosphate group exists between the degraded transcript and the 5 ' end of mature ribosomal RNA, and a cap structure with a biotin label is added at the 5 ' end with three phosphate groups, and then the full-length initial transcript can be captured by streptavidin magnetic beads, so that the removal of ribosomal RNA is realized, and the captured full-length sequence of RNA is also determined.

(3) The invention adds extra nucleotide sequence at the end as the binding site of the primer of the second strand after synthesizing the first strand of cDNA by reverse transcription by using TdT enzyme, thereby realizing the full-length amplification of cDNA. The problem of amplification selection preference of the SMARTer technology can be avoided, and the sequencing of the transcript is more uniform and reliable.

Drawings

FIG. 1 is a flow chart of a prokaryotic full-length initial transcript library construction method suitable for the PacBio sequencing platform.

FIG. 2 is a schematic diagram of screening initial transcripts by adding biotin-labeled caps to the 5' ends of total RNAs from prokaryotes.

FIG. 3 is a diagram illustrating the effect of synthesizing a full-length cDNA using different protocols.

FIG. 4 is a diagram showing the results of size distribution of the constructed E.coli full-length initial transcription library.

Detailed Description

The following examples are presented to illustrate certain embodiments of the invention in particular and should not be construed as limiting the scope of the invention. The present disclosure may be modified from materials, methods, and reaction conditions at the same time, and all such modifications are intended to be within the spirit and scope of the present invention. Unless otherwise specified, the technical means used in the examples are conventional means well known to those skilled in the art.

15页详细技术资料下载
上一篇:一种医用注射器针头装配设备
下一篇:一种用于纳米孔测序建库的位置锚定条码系统

网友询问留言

已有0条留言

还没有人留言评论。精彩留言会获得点赞!

精彩留言,会给你点赞!