Library construction method based on double-stranded cyclization and application of library construction method in sequencing

文档序号：1916846 发布日期：2021-12-03 浏览：18次中文

阅读说明：本技术 基于双链环化的文库构建方法及其在测序中的应用 (Library construction method based on double-stranded cyclization and application of library construction method in sequencing ) 是由胡玉刚汪彪郑文莉吴强于 2021-08-13 设计创作，主要内容包括：本发明提供了一种基于双链环化的文库构建方法及其在测序中的应用。该文库构建方法包括：采用双链环化扩增引物对带有接头序列的文库进行PCR扩增,得到扩增产物,其中,双链环化扩增引物为一对带有RNA碱基的文库扩增引物；对扩增产物中的RNA碱基进行酶切,得到双链酶切产物,双链酶切产物的两端为凸出的互补结构；利用两端的凸出的互补结构,对双链酶切产物进行环化,得到带缺口的双链环化文库。该双链环化的文库构建方法,不受待环化片段碱基组成的影响实现环化,进而解决现有技术中MGI测序平台针对GC含量高低不同的文库产出数据有偏差的问题。(The invention provides a library construction method based on double-stranded cyclization and application thereof in sequencing. The library construction method comprises the following steps: performing PCR amplification on the library with the adaptor sequence by using double-stranded circularization amplification primers to obtain an amplification product, wherein the double-stranded circularization amplification primers are a pair of library amplification primers with RNA bases; carrying out enzyme digestion on RNA basic groups in the amplification product to obtain a double-stranded enzyme digestion product, wherein two ends of the double-stranded enzyme digestion product are provided with protruding complementary structures; and (3) utilizing the protruded complementary structures at the two ends to cyclize the double-stranded enzyme digestion product to obtain a double-stranded cyclization library with a notch. The method for constructing the double-chain cyclized library is not influenced by the base composition of the fragments to be cyclized to realize cyclization, and further solves the problem that the MGI sequencing platform in the prior art has deviation on the output data of the library with different GC contents.)

1. A library construction method based on double-stranded cyclization, which is characterized by comprising the following steps:

performing PCR amplification on the library with the adaptor sequence by using double-stranded circularization amplification primers to obtain an amplification product, wherein the double-stranded circularization amplification primers are a pair of library amplification primers with RNA bases;

performing enzyme digestion on the RNA basic group in the amplification product to obtain a double-stranded enzyme digestion product, wherein two ends of the double-stranded enzyme digestion product are provided with protruding complementary structures;

and circularizing the double-chain enzyme digestion product by utilizing the protruded complementary structures at the two ends to obtain a double-chain circularization library with a gap.

2. The library construction method of claim 1, wherein the double-stranded circularized amplification primer comprises a template strand containing 1 RNA base and a non-template strand containing 2 RNA bases;

preferably, 1 of the RNA bases of the template strand is located at positions 2-8 from the beginning of the 5' end;

preferably, the 2 said RNA bases of the non-template strand are all located at the 5 'end, more preferably at positions 2-8 from the beginning of the 5' end; or, one of the 2 RNA bases of the template strand is located at the 5 'end and the other is located at the 3' end, more preferably, the 2 RNA bases located at the 5 'end and at the 3' end are separated by a distance of at least 10 bases;

preferably, the sequence of the template strand is: AN1N2N3UNNNNNNNNNNNNNNNN, respectively; the non-template strand is: AN4N5N6UNNNNNNNNNNNNNNNUNN or AN7N8N9UNUNNNNNNNNNNNNNNNN, wherein N1 is complementary to N6 or N9, N2 is complementary to N5 or N8, and N3 is complementary to N4 or N7;

preferably, the sequence of the template strand is: ATGCCUCTCAGTACGTCAGCAGTT, respectively; the sequence of the non-template strand is AGGCAUGGCGACCTTAUCAG; alternatively, the sequence of the template strand is: ACUCTCAGTACGTCAGCAGTT, the sequence of the non-template strand: AGUGCAUGGCGACCTTATCAG are provided.

3. The library construction method according to claim 2, wherein the RNA base in the double-stranded circularization amplification primer is U, and the RNA base in the amplification product is subjected to enzyme digestion by using USER enzyme to obtain a double-stranded enzyme digestion product.

4. The library construction method according to any one of claims 1 to 3, wherein the library having the linker sequence is a library having the linker sequence after hybridization elution or a whole genome library having the linker sequence;

preferably, the library with the linker sequence is a library with the linker sequence after elution of the methylated heterocapture cross.

5. The library construction method of claim 4, wherein after obtaining the gapped double-stranded circularized library, the library construction method further comprises: and carrying out enzyme digestion on the double-stranded cyclization library with the gap by utilizing exonuclease to obtain a single-stranded cyclization library.

6. A preparation method of a DNA nanosphere based on an MGI platform is characterized by comprising the following steps:

constructing a double-stranded circularized library with gaps by using the library construction method of any one of claims 1 to 4; preparing the double-stranded circularized library with gaps into DNA nanospheres.

7. The method of claim 6, wherein preparing the gapped double-stranded circularized library into DNA nanospheres comprises:

carrying out enzyme digestion on the double-chain cyclization library with the gap by utilizing exonuclease to obtain a single-chain cyclization library;

performing rolling circle replication on the single-chain cyclization library to obtain the DNA nanosphere;

preferably, the exonuclease comprises Exo I and Exo III.

8. The method of claim 6, wherein preparing the gapped double-stranded circularized library into DNA nanospheres comprises:

and taking the notched single-chain cyclization library in the notched double-chain cyclization library as a primer, and performing rolling-circle replication on the other single-chain cyclization library to obtain the DNA nanosphere.

9. A sequencing method based on an MGI platform, the sequencing method comprising:

preparing a library to be tested into a DNA nanosphere according to the preparation method of any one of claims 6 to 8;

and loading the DNA nanospheres on a sequencing chip array for sequencing.

10. A sequencing reagent comprising the double-stranded circularization amplification primer of the library construction method of any one of claims 1 to 3, or the nicked double-stranded circularization library constructed by the library construction method of any one of claims 1 to 5, or a DNA nanosphere prepared by the preparation method of any one of claims 6 to 8.

Technical Field

The invention relates to the field of high-throughput sequencing detection, in particular to a library construction method based on double-stranded cyclization and application thereof in sequencing.

Background

With the continuous innovation of related technologies of gene sequencing and the deep development of application fields, the competition pattern of the industry is continuously evolving. In the second generation sequencing field, instruments and reagents are mainly provided by foreign manufacturers such as Illumina, ThermoFisher and the like. The BGISEQ-50, MGISEQ-200, MGISEQ-2000 and MGISEQ-T7 sequencers are successively introduced after the Huada gene pair CG (complete genomics) company is purchased in full, the sequencing system covers the flux from 8G to 6T, and the application field is greatly expanded. The Huada sequencer can stably produce high-quality sequencing data, and compared with an Illumina platform, the sequencing repeated sequence (duplicates) is lower, and the problem of label hopping (index hopping) is also obviously solved.

The whole process of the Huada BGISEQ sequencing platform mainly comprises three steps: 1) preparing a library sample; 2) preparation/loading of DNB (DNA NanoBall, DNA nanosphere); and 3) sequencing analysis on a machine. Single-stranded circularized DNA molecules are a prerequisite for DNB preparation. The DNA single-strand circularization is to denature double-stranded DNA (double-stranded DNA, dsDNA) with a linker sequence at high temperature to form single-stranded DNA (ssDNA), and under the catalysis of ligase, a splint oligo primer is complementarily paired with both ends of the ssDNA, and both ends of the ssDNA are connected to form a single-stranded circular DNA molecule (ssCirDNA). Then, the Phi29 polymerase with strong strand displacement activity is used for linear amplification, the original ssCirDNA library is used as a template in each amplification cycle, the independence of each copy is kept, and the 1copy original ssCirDNA library molecules are amplified to obtain the DNA nanospheres of 300-500 copies. And finally, loading the prepared DNB on a chip for on-machine sequencing.

Methylation modification of the genome is an important regulatory signal, and particularly, tumor cells and normal cells are obviously different in methylation level. In recent years, the methylation degree is also used as an important marker for early diagnosis of tumors, and is detected by second-generation sequencing. Therefore, how to accurately detect the methylation state is of great significance, which requires that research methods and sequencing equipment cannot bring about deviation of the methylation state.

However, the inventors found that when the existing MGI sequencing platform is used for sequencing libraries with different GC contents (especially for mixed sample sequencing), the output data can be biased, and no report for solving the problem exists in the prior art.

Disclosure of Invention

The invention mainly aims to provide a library construction method based on double-strand circularization and application thereof in sequencing, so as to solve the problem that the MGI sequencing platform in the prior art has deviation on library output data with different GC content.

In order to achieve the above object, according to one aspect of the present invention, there is provided a library construction method based on double-stranded circularization, the library construction method comprising: performing PCR amplification on the library with the adaptor sequence by using double-stranded circularization amplification primers to obtain an amplification product, wherein the double-stranded circularization amplification primers are a pair of library amplification primers with RNA bases; carrying out enzyme digestion on RNA basic groups in the amplification product to obtain a double-stranded enzyme digestion product, wherein two ends of the double-stranded enzyme digestion product are provided with protruding complementary structures; and (3) utilizing the protruded complementary structures at the two ends to cyclize the double-stranded enzyme digestion product to obtain a double-stranded cyclization library with a notch.

Further, the double-stranded circularization amplification primer comprises a template strand and a non-template strand, wherein the template strand contains 1 RNA base, and the non-template strand contains 2 RNA bases; preferably, 1 RNA base of the template strand is located at positions 2-8 from the beginning of the 5' end; preferably, the 2 RNA bases of the non-template strand are all located at the 5 'end, more preferably at positions 2-8 from the beginning of the 5' end; alternatively, one of the 2 RNA bases of the template strand is located at the 5 'end and the other is located at the 3' end, more preferably, the 2 RNA bases located at the 5 'end and at the 3' end are separated by a distance of at least 10 bases; preferably, the sequence of the template strand is: AN1N2N3UNNNNNNNNNNNNNNNN, respectively; the non-template strand is: AN4N5N6UNNNNNNNNNNNNNNNUNN or AN7N8N9UNUNNNNNNNNNNNNNNNN, wherein N1 is complementary to N6 or N9, N2 is complementary to N5 or N8, and N3 is complementary to N4 or N7; preferably, the sequence of the template strand is: ATGCCUCTCAGTACGTCAGCAGTT, respectively; the sequence of the non-template strand is AGGCAUGGCGACCTTAUCAG; or, a template chainThe sequence of (A) is: ACUCTCAGTACGTCAGCAGTT, sequence of non-template strand: AGUGCAUGGCGACCTTATCAG are provided.

Further, the RNA base in the double-stranded circularization amplification primer is U, and the RNA base in the amplification product is subjected to enzyme digestion by using USER enzyme to obtain a double-stranded enzyme digestion product.

Further, the library with the linker sequence is a library with the linker sequence after hybridization elution, or a whole genome library with the linker sequence; preferably, the library with the linker sequence is a library with the linker sequence after elution of the methylated heterocapture cross.

Further, after obtaining the double-stranded circularized library with gaps, the library construction method further comprises: and (3) carrying out enzyme digestion on the double-stranded cyclization library with the nicks by using exonuclease to obtain the single-stranded cyclization library.

In order to achieve the above objects, according to a second aspect of the present invention, there is provided a method for preparing a MGI platform-based DNA nanoball, comprising: constructing a double-chain cyclization library with a gap by using any library construction method; preparing the double-stranded circularized library with gaps into DNA nanospheres.

Further, preparing the gapped double-stranded circularized library into DNA nanospheres comprises: carrying out enzyme digestion on the double-chain cyclization library with the gap by utilizing exonuclease to obtain a single-chain cyclization library; performing rolling circle replication on the single-chain cyclization library to obtain a DNA nanosphere; preferably, the exonuclease includes Exo I and Exo III.

Further, preparing the gapped double-stranded circularized library into DNA nanospheres comprises: and taking the notched single-chain cyclization library in the notched double-chain cyclization library as a primer, and performing rolling-circle replication on the other single-chain cyclization library to obtain the DNA nanosphere.

In order to achieve the above object, according to a third aspect of the present invention, there is provided a MGI platform-based sequencing method, comprising: preparing the sequence library to be detected into DNA nanospheres according to any one of the preparation methods; and loading the DNA nanospheres on a sequencing chip array for sequencing.

In order to achieve the above object, according to a fourth aspect of the present invention, there is provided a sequencing reagent, wherein the sequencing reagent comprises a double-stranded circularization amplification primer in any one of the library construction methods, or a nicked double-stranded circularization library constructed by any one of the library construction methods, or a DNA nanosphere prepared by any one of the preparation methods.

By applying the technical scheme of the invention, the library with the adaptor sequence is subjected to PCR amplification by using the library amplification primer with the RNA base to obtain an amplification product with the RNA base at two ends, the RNA base is further cut off by enzyme digestion to generate protruding structures at two ends of the amplification product, the protruding structures at two ends can form complementary pairing, and the double-stranded enzyme digestion product is cyclized by using the protruding complementary structures at two ends to obtain the double-stranded cyclization library with the gap. The method for constructing the double-chain cyclized library is not influenced by the base composition of the fragments to be cyclized, can realize the cyclization of any fragment library, further improves the difference caused by the single-chain cyclization of the fragments with different GC contents by the current MGI sequencing platform, and is convenient for improving the accuracy of the library sequencing data yield ratio.

Drawings

The accompanying drawings, which are incorporated in and constitute a part of this application, illustrate embodiments of the invention and, together with the description, serve to explain the invention and not to limit the invention. In the drawings:

figure 1 shows a schematic representation of the sequencing flow differences of Illumina and MGI sequencing platforms;

fig. 2A and 2B show MGI sequencing platform biased towards high GC for Illumina sequencing towards low GC;

FIG. 3 shows that MGI sequencing is not conducive to sequencing for high CG methylation;

FIG. 4 shows a specially modified primer amplification library;

FIG. 5 shows the sequencing of the single-stranded loop after double-stranded circularization;

FIG. 6 shows double-stranded loop sequencing after double-stranded circularization;

FIG. 7 shows two modified primer features with no chaperone sequence removed;

FIG. 8 shows a structure in which two ends of a chaperone sequence are linked to each other in a complementary structure;

FIG. 9 shows the two modified primer features lost following cleavage of the chaperone sequence;

FIG. 10 shows a state in which two ends lost after cleavage of a chaperone sequence are linked to each other in a complementary structure;

FIG. 11 shows a comparison of single-stranded circularization and double-stranded circularization data yield ratios.

Detailed Description

It should be noted that the embodiments and features of the embodiments in the present application may be combined with each other without conflict. The present invention will be described in detail with reference to examples.

As mentioned in the background, some difference in the sequencing effect of the two sequencing platforms is reported, but no analysis and report on the difference in the sequencing yield of the library with different GC content exists at present, and the inventors of the present application found that when a methylation target capture is researched, the methylation data is obviously dominant when a mixture ratio of methylation and non-methylation is tested by using an MGI platform, and no deviation is found in the Illumina sequencing platform. As shown in FIG. 1, the mixing ratio of the methylation libraries on the Illumina sequencing platform was consistent with the data yield ratio, while the mixing ratio of the methylation libraries on the MGI sequencing platform was biased with the data yield ratio (as seen from the mixing ratios of 10% and 50%, the yield data was higher). The inventors have conducted intensive analyses and studies on such a deviation phenomenon, as follows:

the MGI sequencer is very close to an Illumina sequencing platform in sequencing flux and quality, the Illumina sequencing platform and the MGI sequencing platform are essentially different in that amplification and replication modes before on-machine sequencing of a library are different, Illumina is selected to be linear amplification and replication, and if the amplification and the replication modes are shown in figure 2A, the detection signal is increased by amplification and amplification of the library fixed at a relative chip position; MGI sequencing is to amplify the detection signal in a rolling circle amplification mode after the single strand of the library is cyclized in an auxiliary mode, as shown in figure 2B, the constructed library forms a ring of the single strand under the action of a section of chaperone sequence, and the signal is amplified in a rolling circle amplification mode by using a primer after the ring is formed. In the practical application process, as shown in fig. 3, the Illumina sequencing platform is biased to amplify the region with high GC, the sequencing of the high AT region is not dominant, and GC correction is required in the sensitive analysis field, so that the region biased to GC can be adjusted to a normal state. On the MGI sequencing platform, the region with high GC content was sequenced poorly.

Specifically, on the one hand, different amplification systems may cause different amplification preferences; on the other hand, the difference in the amplification process is different, a single-strand cyclization mode is adopted on the MGI sequencing platform, and the cyclized single-strand state is single-strand, so that the high-GC and low-GC fragments have difference in single-strand stability and spatial structure. In addition, there is a process of position migration between the preparation and loading of DNB (DNA nanospheres, DNA nanoballs, DNB) (i.e. DNB is generated first and then loaded onto the sequencing chip array instead of DNB being generated in situ on the chip array), and there is a possibility that high GC and high AT regions may be double-ring bonded together during this change, which makes sequencing signals messy and results in a decrease in the ratio of sequencing data of the two parts, and these possible reasons require specific application exploration.

As described above, since the essential difference between the MGI sequencing platform and the Illumina sequencing platform is the difference between the single-stranded circularization and DNB loading sequencing steps, the inventors speculate that this difference should be generated during circularization and DNB loading sequencing. Further, the inventor proposes a solution of the present application for these two links. The core idea of this solution is as follows: 1) amplifying the library by using a group of special primers; 2) the 5' end of the primer of the template strand is provided with an RNA base; 3) the 3 'end of the primer of the non-template strand, or the middle and 5' ends of the primer are respectively provided with an RNA base; 4) the amplified library is subjected to enzyme digestion to remove RNA bases, wherein the enzyme can be an enzyme for removing U bases and also can be an enzyme for removing other RNA bases; 5) the 3' end of the library after RNA base removal is a double-stranded end of a protruding complementary structure capable of self-looping; 6) one or more gaps of bases are left in the products after double strands of the library form loops; 7) this gap can be recognized by an exonuclease to remove the non-template strand, or alternatively, the gapped non-template strand can be recognized as a primer by a polymerase for rolling circle amplification.

As shown in FIG. 4, a library with one RNA base at one end and two RNA bases at the other end can be formed by amplification of a pair of primers with RNA bases. The step does not increase the difficulty and complexity of operation in the process of target capture sequencing, only needs to change the captured amplification primer into the special primer of the invention, and simultaneously changes the currently used high-fidelity DNA polymerase into the enzyme capable of amplifying RNA base. In the whole genome sequencing process, a separate step of transforming the existing library into the library amplified by the special primer of the invention is required.

The library is subjected to RNA enzyme digestion removal, not restriction enzyme digestion in the traditional sense, namely RNA bases (not limited to U bases) of the primer of the invention are treated by the enzyme subjected to RNA base removal, RNA bases introduced by primer amplification can be removed to form a structure with a sticky end protruding 3', and the structure can be connected by self to form a ring-shaped structure, so that double-strand cyclization is realized, and the defect of single-strand companion cyclization of the library caused by different CG differences of the library is overcome. Meanwhile, 2 RNA bases are used for the non-template strand, and a gap position is formed after the 3' end of the primer and the middle RNA bases are removed by enzyme digestion, so that exonuclease I and exonuclease III can be used for enzyme digestion, and the unclyclized library and the cyclized non-template strand with the introduced gap are digested, so that the single-stranded cyclic structure behind a double-stranded loop is realized, and the influence of the difference of the single-stranded cyclization link is eliminated, as shown in figure 5, the scheme has the advantage that the defect of single-stranded cyclization can be eliminated, and the single-stranded cyclic library and other single-stranded cyclic libraries can be mixed and loaded on a machine.

The method of the present invention can solve not only the difference of single-strand circularization but also the problem of structural stability after circularization, and in the link after double-stranded circle, non-digestion treatment can be selected, although there is non-circularized library in the reaction system, because the non-circularized fragment is linear and will not be generated as DNB to be used as template for rolling circle amplification, as shown in FIG. 6, the non-template strand and template strand form perfect pairing structure after circularization, so there will not be any difference in the secondary structure due to the non-template strand, regardless of the base composition and GC content of the library. Meanwhile, because the non-template strand has a gap, the structure of the template strand can be stabilized, and the influence of base difference in the single-strand cyclization and DNB generation processes on the cyclization and sequencing can be solved when the primer of the template strand is subjected to rolling circle amplification.

It should be noted that although the present invention is a sequencing GC bias of MGI platform found in methylation research, the cyclization scheme in the present application has universality, and can also be applied to other application processes of MGI sequencing platform.

The primers of one embodiment of the present invention can be as shown in FIG. 7, and the two amplification primers are distinguished from the common amplification primer in that the template strand primer has one RNA base at the 5 ' end, and the non-template strand has one RNA base at each of the 3 ' and 5 ' ends. The position of the RNA bases at the 5' end of the two primers is preferably not in the last two bases so as not to interfere with cleavage, and is most preferably set at positions 2 to 8. The two RNA bases on the non-template strand are preferably more than 10 bases apart, which is to ensure that the sequence does not leave the template strand after cleavage. As shown in FIG. 8, the library with removed RNA bases can be cyclized into a complete double strand, except that a gap exists, the other positions are complete double-stranded circular structures, a sticky end library protruding from the 3 'end is formed by removing the structure of the RNA bases at the 5' end of the primer, the library can be cyclized by itself under the action of ligase, and the cyclization product is a double-stranded DNA loop with only one base gap.

The second embodiment of the present invention may also be a primer with a structure that the primer adaptor of the template strand is not changed, the primer of the non-template strand is also two RNA bases, but the two RNA bases are both in the middle and near the 5' end, and are relatively close to each other, as shown in FIG. 9, and the two RNA bases are at the positions on the non-template primer, and the amplification product of this embodiment can also be connected to a nicked circular product by removing the RNA bases, as shown in FIG. 10, and the same effect as the first embodiment can also be achieved, which will not be discussed in detail herein.

In summary, the core idea of the present invention is to introduce three RNA bases by a specific primer scheme, wherein two bases create a sticky end for double-stranded circularization; the other RNA base is a gap in the primer that can be used to either digest the non-template strand or to stabilize the template when the primer directs rolling circle amplification of the DNB.

Based on the above research results, the applicant proposed the technical solution of the present application. In a preferred embodiment, there is provided a library construction method based on double-stranded circularization, the library construction method comprising: performing PCR amplification on the library with the adaptor sequence by using double-stranded circularization amplification primers to obtain an amplification product, wherein the double-stranded circularization amplification primers are a pair of library amplification primers with RNA bases; carrying out enzyme digestion on RNA basic groups in the amplification product to obtain a double-stranded enzyme digestion product, wherein two ends of the double-stranded enzyme digestion product are provided with protruding complementary structures; and (3) utilizing the protruded complementary structures at the two ends to cyclize the double-stranded enzyme digestion product to obtain a double-stranded cyclization library with a notch.

The method comprises the steps of carrying out PCR amplification on a library with a joint sequence by using a library amplification primer with RNA bases to obtain an amplification product with the RNA bases at two ends, further cutting the RNA bases through enzyme digestion to enable two ends of the amplification product to generate protruding structures, enabling the protruding structures at the two ends to form complementary pairing, further utilizing the protruding complementary structures at the two ends, and completing cyclization of a double-stranded enzyme digestion product to obtain a double-stranded cyclization library with a gap. The method for constructing the double-chain cyclized library is not influenced by the base composition of the fragments to be cyclized, can realize the cyclization of any fragment library, further improves the difference caused by the single-chain cyclization of the fragments with different GC contents by the current MGI sequencing platform, and is convenient for improving the accuracy of the library sequencing data yield ratio.

It should be noted that the above library construction method is ingenious in that an amplification primer with RNA bases is used, and the RNA bases are required to satisfy the following requirements during design of the primer: 1) located near the ends of the amplification primers; 2) after the RNA basic group is cut off by enzyme, a gap and a protruding tail end can be formed; 3) the protruding ends at both ends of the double strand are capable of complementary pairing. Thus, any primer that meets the above requirements is suitable for use in the present application. Further, based on the consideration of MGI sequencing platform, the other base sequences of the amplification primers except the RNA base used for enzyme digestion are preferably matched with the platform universal primer so as to improve the universality of sequencing based on the MGI platform.

In a preferred embodiment, the double-stranded circularized amplification primer comprises a template strand containing 1 RNA base (for creating a bulge at one end of the double strand) and a non-template strand containing 2 RNA bases (one for forming a gap and the other for creating a bulge at the other end of the double strand); preferably, 1 RNA base of the template strand is positioned at the 2 nd to 8 th positions from the 5' end (the enzyme digestion is not influenced, small segments after the enzyme digestion can be dropped off, and the remaining longer segments can keep the combination stability of double strands); preferably, the 2 RNA bases of the non-template strand are all located at the 5 'end, more preferably at positions 2-8 from the beginning of the 5' end (effect as above); alternatively, one is located at the 5 'end and the other is located at the 3' end, more preferably, the 2 RNA bases located at the 5 'end and at the 3' end are separated by a distance of at least 10 bases in between.

In accordance with the principles set forth above, in some embodiments, various structural forms of the template strand and the non-template strand may be employed. For example, the template strand is: AN1N2N3UNNNNNNNNNNNNNNNN (SEQ ID NO: 1); the non-template strand is: AN4N5N6UNNNNNNNNNNNNNNNUNN (SEQ ID NO: 2); or AN7N8N9UNUNNNNNNNNNNNNNNNN (SEQ ID NO: 3), wherein N1 is complementary to N6 or N9, N2 is complementary to N5 or N8, and N3 is complementary to N4 or N7.

In accordance with the principles set forth above, in other embodiments, the following specific structures of the template strand and the non-template strand are employed:

the sequence of the template strand is: ATGCCUCTCAGTACGTCAGCAGTT (SEQ ID NO: 4); the sequence of the non-template strand is AGGCAUGGCGACCTTAUCAG (SEQ ID NO: 5); alternatively, the sequence of the template strand is: ACUCTCAGTACGTCAGCAGTT (SEQ ID NO: 6), sequence of the non-template strand: AGUGCAUGGCGACCTTATCAG(SEQ ID NO：7)。

The double-stranded circularization strategy of the present application can be achieved regardless of the structure of the template strand and the non-template strand, as long as the aforementioned principles are satisfied.

The above-mentioned template strand and non-template strand are only distinguished by the name of a primer, and do not mean that the sequences of both strands are complementary to each other, but the overhang ends formed by cleavage are complementary to each other. In the present application, the term "enzymatic cleavage" refers to an enzyme capable of cleaving an RNA base, and is not an enzyme capable of cleaving a phosphodiester bond.

In a preferred embodiment, the RNA base in the double-stranded circularization amplification primer is U, and the RNA base in the amplification product is digested by USER enzyme to obtain a double-stranded digested product. The USER enzyme is capable of producing a single nucleotide gap at the uracil position. The USER enzyme is a mixture of Uracil DNA Glycosylase (UDG) and DNA glycosylase-lyase Endo VIII. UDG catalyzes the cleavage of uracil bases to form an apulgite site, but leaving the phosphodiester backbone intact. The cleavage enzyme activity of Endo VIII breaks the phosphodiester bonds at the 3 'and 5' ends of the abasic sites, releasing abasic deoxyribose.

As described above, although the technical solution of the present application is proposed based on the deviation of the output data in the methylation capture sequencing library, the library construction method of the present application is not limited to the methylation capture sequencing library. The library to be circularized to which the probe is directed may be a library to be sequenced after capture, or may be a whole genome library to be sequenced. Namely, the library with the linker sequence can be a library with the linker sequence after hybridization elution, and can also be a whole genome library with the linker sequence; preferably, the library with the linker sequence is a library with the linker sequence after elution of the methylated heterocapture cross.

In a preferred embodiment, after obtaining the gapped double-stranded circularized library, the library construction method further comprises: and (3) carrying out enzyme digestion on the double-stranded cyclization library with the nicks by using exonuclease to obtain the single-stranded cyclization library. After the double-stranded cyclization library with the gap is obtained, according to the sequencing process of the MGI sequencing platform, the single-stranded cyclization library can be obtained by using exonuclease to perform enzyme digestion on the single strand with the gap by utilizing the existence of the gap position. And the subsequent sequencing process is compatible with the MGI sequencing platform and can be operated according to the MGI sequencing platform process.

According to a second exemplary embodiment of the present application, there is provided a method for preparing a MGI platform-based DNA nanosphere, comprising: constructing a double-chain cyclization library with a gap by using any library construction method; preparing the double-stranded circularized library with gaps into DNA nanospheres.

As mentioned above, the sequencing step of the MGI sequencing platform is to construct a single-stranded cyclization library, then prepare the single-stranded cyclization library into DNA nanospheres, and finally load the DNA nanospheres onto a sequencing chip array for sequencing. Therefore, the nicked double-stranded circularized library obtained by the library construction method of the present application can also be prepared into a DNA nanosphere and then subjected to MGI platform sequencing.

Specifically, the operation mode of preparing the nicked double-stranded cyclization library into the DNA nanosphere is not limited, and can be performed according to the steps of the single-stranded cyclization library, and also can be performed in an improved mode.

In a preferred embodiment, preparing the gapped double-stranded circularized library into a DNA nanosphere comprises: carrying out enzyme digestion on the double-chain cyclization library with the gap by utilizing exonuclease to obtain a single-chain cyclization library; performing rolling circle replication on the single-chain cyclization library to obtain a DNA nanosphere; preferably, the exonuclease includes Exo I and Exo III. The method is to prepare the MGI platform by adopting the current DNA nanosphere preparation process, firstly obtain a single-chain cyclization library, and then carry out rolling-circle replication.

In a preferred embodiment, preparing the gapped double-stranded circularized library into a DNA nanosphere comprises: and taking the notched single-chain cyclization library in the notched double-chain cyclization library as a primer, and performing rolling-circle replication on the other single-chain cyclization library to obtain the DNA nanosphere. In the preferred embodiment, the nicks are used as primers for extension, and direct rolling circle replication is performed to obtain the DNA nanospheres.

In a third exemplary embodiment of the present application, there is provided a MGI platform-based sequencing method, including: preparing the sequence library to be detected into DNA nanospheres according to any one of the preparation methods; and loading the DNA nanospheres on a sequencing chip array for sequencing. Sequencing according to the method can reduce the data output deviation caused by different GC contents.

According to a fourth exemplary embodiment of the present application, a sequencing reagent is provided, wherein the sequencing reagent comprises a double-stranded circularization amplification primer in any one of the library construction methods, or a nicked double-stranded circularization library constructed by any one of the library construction methods, or a DNA nanosphere prepared by any one of the preparation methods. By containing the double-stranded circularization amplification primer, a library to be sequenced can be constructed into a library for on-machine sequencing according to needs after double-stranded circularization, and the deviation of sequencing output data caused by the difference of base composition is avoided.

The following examples are provided to further illustrate the benefits of the present application. It should be further noted that the following examples are only illustrative, and the method of the present application is not limited to the following method.

Example 1

In this example, primers capable of single-stranded or double-stranded circularization are used for PCR enrichment (according to the flow shown in fig. 4) for files after hybridization elution of single-ended or double-ended linker libraries of an MGI sequencer, amplification products can be subjected to double-stranded circularization, and the final circularization product of this example is a single-stranded loop (as shown in fig. 5), and is compatible with the normal DNB preparation flow of the large sequencing platform (as shown in fig. 2B).

1 double-stranded circularization amplification primer for MGI hybridization elution library PCR amplification

And (3) hybridization elution step: after obtaining the hybridization elution library, referring to the Naonda DNA library hybrid Capture (MGI platform) instruction (Cat: 1005102), the amplification reaction system configuration was performed as shown in Table 1:

table 1:

wherein, the structure of the double-stranded circularization amplification primer mix is shown in FIG. 7, and the specific sequence is as follows:

the template strand sequence is: ATGCCUCTCAGTACGTCAGCAGTT (SEQ ID NO: 4);

the non-template strand sequence is: AGGCAUGGCGACCTTAUCAG (SEQ ID NO: 5).

After mixing well, the PCR reaction conditions are as shown in Table 2:

table 2:

2 magnetic bead purification

After the reaction was completed, 60. mu.L of the solution was usedSP Beads were purified and 22. mu.L of TE buffer was used to dissolve the recovered product. mu.L of the recovered product was taken and the product concentration was quantified using the Qubit dsDNA HS assay kit (Invitrogen). The next reaction was carried out.

3 double-stranded cyclization

3.1 enzyme digestion

Taking a certain amount of PCR products obtained in the step 1 as templates for double-strand cyclization. The reaction system configuration was carried out as in table 3:

table 3:

components	Dosage of
		PCR product	Fill TE to 44 μ L
10X CutSmart Buffer(NEB)	5μL
		USER Enzyme(NEB)	1μL
Total volume	50μL

The reaction was carried out on a PCR machine under the conditions shown in Table 4:

table 4:

37℃	30min	1 cycle
			4℃	Holding	/

3.2 cyclization

The reaction system shown in Table 5 was added to the enzyme-cleaved product to carry out the double-strand cyclization reaction. The results after cyclization are shown in FIG. 8.

Table 5:

components	Dosage of
		TE	32μL
Ligation buffer(Enzymatics)	175μL
		T4 DNA Ligase(Rapid)(Enzymatics)	05μL
Total volume	100μL

The reaction was carried out on a PCR machine under the conditions shown in Table 6:

table 6:

37℃	30min	1 cycle
			4℃	Holding	/

3.3 digestion

The cyclization product was added with the reaction system shown in Table 7 to conduct a digestion reaction.

Table 7:

components	Dosage of
		Exo I(Enzymatics)	1μL
Exo III(Enzymatics)	1μL
		Total volume	102μL

The reaction was carried out on a PCR machine under the conditions shown in Table 8:

table 8:

37℃	30min	1 cycle
			4℃	Holding	/

3.4 magnetic bead purification

After the reaction was completed, 200. mu.L of the solution was usedSP Beads were purified and 42. mu.L of TE buffer was used to dissolve the recovered product. Taking 1 uL of recovered product, and using the Qubit ssDNA HS analysis kit(Invitrogen) the concentration of the cyclization product was quantified.

The double-stranded circularization of this example was performed using a methylation capture library, the total amount of initial library used was 80ng (03pmol), the total amount of single-stranded circularization product obtained by the method of this example was 16ng, and the circularization efficiency was 40% (the amount of initial library put was 40ng single-stranded, 16/40 ═ 40%), and the product was subjected to ordinary DNB preparation and then to on-machine sequencing.

Example 2:

in this example, PCR enrichment (shown in FIG. 4) is performed on the initial MGI single-terminal/double-terminal adaptor library after hybridization elution by using double-stranded circularization amplification primers, and then double-stranded circularization operation is performed on the amplification-enriched product. The final circularized product of this example is a di-stranded loop (as shown in FIG. 6) that is subjected to a special DNB preparation protocol followed by on-machine sequencing.

1 double-stranded circularization amplification primer for MGI hybridization elution library PCR amplification

And (3) hybridization elution step: after obtaining the hybrid elution library, referring to the specification of the nanoonda DNA library hybrid capture (MGI platform), the amplification reaction system configuration was performed according to table 9:

table 9:

the structure of the double-stranded circularized amplification primer mix is shown in FIG. 9. The specific sequence is as follows:

template strand: ACUCTCAGTACGTCAGCAGTT (SEQ ID NO: 6),

non-template strand: AGUGCAUGGCGACCTTATCAG (SEQ ID NO: 7).

After mixing well, the PCR reaction conditions are as shown in Table 10:

table 10:

2 magnetic bead purification

3 double-stranded cyclization

3.1 enzyme digestion

Taking a certain amount of PCR products obtained in the step 1 as templates for double-strand cyclization. The reaction system configuration was as follows in Table 11:

table 11:

components	Dosage of
		PCR product	Fill TE to 44 μ L
10X CutSmart Buffer(NEB)	5μL
		USER Enzyme(NEB)	1μL
Total volume	50μL

The reaction was carried out on a PCR machine under the conditions shown in Table 12:

table 12:

37℃	30min	1 cycle
			4℃	Holding	/

3.2 cyclization

The reaction system shown in Table 13 was added to the digested product to carry out double-strand cyclization reaction. The results after cyclization are shown in FIG. 10.

Table 13:

components	Dosage of
		TE	32μL
Ligation buffer(Enzymatics)	175μL
		T4 DNA Ligase(Rapid)(Enzymatics)	05μL
Total volume	100μL

The reaction was carried out on a PCR machine under the conditions shown in Table 14:

table 14:

37℃	30min	1 cycle
			4℃	Holding	/

3.3 magnetic bead purification of double stranded circularized products

After the reaction was completed, 200. mu.L of the solution was usedSP Beads were purified and 42. mu.L of TE buffer was used to dissolve the recovered product. mu.L of the recovered product was taken and the concentration of the circularized product was quantified using the Qubit dsDNA HS assay kit (Invitrogen).

3.4 preparation of the double-stranded cyclization product DNB

The double-stranded circularized product of example 2, in which one strand contains a single-base nick, can be subjected to strand displacement amplification using a strand displacement enzyme starting from the position where the amplification product is the same as that obtained by strand displacement amplification after the single-stranded circularized product is combined with a DNB primer, and the reaction system for double-stranded circularized DNB preparation is shown in Table 15:

table 15:

components	Dosage of
		Double-stranded cyclization product (40fmol)	Fill TE to 40 μ L
DNB polymerase mixture I (MGI)	40μL
		DNB polymerase mixture II (LC) (MGI)	4μL
Total volume	84μL

The reaction was carried out on a PCR machine under the conditions shown in Table 16:

table 16:

30℃	25min	1 cycle
			4℃	Holding	/

When the temperature of the PCR instrument reaches 4 ℃, 20 mu L of DNB stop solution is immediately added, a wide-mouthed suction head is used for slowly blowing and uniformly mixing for 5-8 times, and then sequencing is carried out according to the subsequent process of MGI.

The sequencing library with the methylation mixture proportion of 10% is constructed by using the double-stranded cyclization library construction method of the embodiment 2 and the existing single-stranded cyclization method, and as shown in fig. 11, the cyclization library construction method of the application well solves the problem of sequencing proportion deviation of the primers during single-stranded cyclization. In the 10% methylation mixture sample, the single-strand circularization mode sequencing yields 19% of the data, and the double-strand circularization yield of the present invention is 11%, which is closer to the true ratio. Therefore, the data deviation of different GC content areas caused by single-strand cyclization is compensated through double-strand cyclization.

The above description is only a preferred embodiment of the present application and is not intended to limit the present application, and various modifications and changes may be made by those skilled in the art. Any modification, equivalent replacement, improvement and the like made within the spirit and principle of the present application shall be included in the protection scope of the present application.

Sequence listing

<110> Naon Dada (Nanjing) Biotechnology Ltd

<120> library construction method based on double-stranded circularization and application thereof in sequencing

<130> PN154954NAGD

<160> 7

<170> SIPOSequenceListing 1.0

<210> 1

<211> 21

<212> DNA/RNA

<213> Artificial Sequence (Artificial Sequence)

<220>

<221> misc_feature

<222> (5)..(5)

<223> RNA bases of template strand

<220>

<221> misc_feature

<222> (1)..(21)

<223> all n are a, t, c or g

<400> 1

annnunnnnn nnnnnnnnnn n 21

<210> 2

<211> 23

<212> DNA/RNA

<213> Artificial Sequence (Artificial Sequence)

<220>

<221> misc_feature

<222> (6)..(21)

<223> RNA bases having non-template strands at positions 6 and 21

<220>

<221> misc_feature

<222> (1)..(23)

<223> all n are a, t, c or g

<400> 2

annnunnnnn nnnnnnnnnn unn 23

<210> 3

<211> 23

<212> DNA/RNA

<213> Artificial Sequence (Artificial Sequence)

<220>

<221> misc_feature

<222> (6)..(8)

<223> RNA bases with non-template strands at positions 6 and 8

<220>

<221> misc_feature

<222> (1)..(23)

<223> all n are a, t, c or g

<400> 3

annnununnn nnnnnnnnnn nnn 23

<210> 4

<211> 24

<212> DNA/RNA

<213> Artificial Sequence (Artificial Sequence)

<220>

<221> misc_feature

<222> (6)..(6)

<223> RNA base having template strand at position 6

<400> 4

atgccuctca gtacgtcagc agtt 24

<210> 5

<211> 20

<212> DNA/RNA

<213> Artificial Sequence (Artificial Sequence)

<220>

<221> misc_feature

<222> (6)..(17)

<223> RNA bases of non-template strand at positions 6 and 17

<400> 5

aggcauggcg accttaucag 20

<210> 6

<211> 21

<212> DNA/RNA

<213> Artificial Sequence (Artificial Sequence)

<220>

<221> misc_feature

<222> (3)..(3)

<223> RNA base having template strand at position 3

<400> 6

acuctcagta cgtcagcagt t 21

<210> 7

<211> 21

<212> DNA/RNA

<213> Artificial Sequence (Artificial Sequence)

<220>

<221> misc_feature

<222> (3)..(7)

<223> RNA bases having non-template strands at positions 3 and 7

<400> 7

agugcauggc gaccttatca g 21

22页详细技术资料下载

上一篇：一种医用注射器针头装配设备

下一篇：一种可高效用于黏细菌原核转录组建库测序的方法

Library construction method based on double-stranded cyclization and application of library construction method in sequencing

相关技术

网友询问留言