Design method and preparation method of double-label joint

文档序号:1138392 发布日期:2020-10-09 浏览:3次 中文

阅读说明:本技术 一种双标签接头设计方法及制备方法 (Design method and preparation method of double-label joint ) 是由 郑建超 汪宇盈 羊光辉 叶明芝 于 2019-03-27 设计创作,主要内容包括:本发明公开了一种双标签接头设计方法及制备方法。本发明提供了一种构建待测DNA分子测序文库的试剂盒,包括双样本标签接头;所述双样本标签接头由接头序列L和接头序列S退火形成接头;所述双样本标签接头的一端用于连接待测DNA分子;本发明有以下几种优势:1)通过在紧邻插入DNA片段两端引入新的样本标签,减少测序引物的加入次数,降低测序成本;2)本发明还能够在不增加测序成本的情况下,让单端测序项目实现双样本标签,避免样本标签串扰带来的假阳性问题。这种接头设计方案,可以满足单端测序即可实现双样本标签,可以实现对样本标签串扰产生错误测序数据的过滤。(The invention discloses a design method and a preparation method of a double-label joint. The invention provides a kit for constructing a DNA molecule sequencing library to be detected, which comprises a double-sample label joint; annealing the double-sample label joint by a joint sequence L and a joint sequence S to form a joint; one end of the double-sample label joint is used for connecting a DNA molecule to be detected; the invention has the following advantages: 1) by introducing new sample labels at two ends of the inserted DNA fragment, the adding times of the sequencing primer are reduced, and the sequencing cost is reduced; 2) the invention can also realize double-sample labeling for the single-ended sequencing project under the condition of not increasing the sequencing cost, thereby avoiding the false positive problem caused by sample label crosstalk. The design scheme of the joint can meet the requirement that double-sample labels can be realized by single-ended sequencing, and can realize filtering of wrong sequencing data generated by sample label crosstalk.)

1. A kit for constructing a DNA molecule sequencing library to be detected comprises a double-sample label joint;

annealing the double-sample label joint by a joint sequence L and a joint sequence S to form a joint; one end of the double-sample label joint is used for connecting a DNA molecule to be detected;

the joint sequence L sequentially comprises a region A which is complementary with the joint sequence S and a region C which is not complementary with the joint sequence S from the end close to the DNA molecule to be detected;

the region A sequentially consists of a second sample label sequence and a fragment B for annealing and complementation from the end close to the DNA molecule to be detected;

a binding region of a primer PF in a bank building primer pair is arranged on the region C;

the joint sequence S sequentially comprises a region D which is complementary with the joint sequence L and a region E which is not complementary with the joint sequence L from the end close to the DNA molecule to be detected;

the region D consists of a complementary sequence of the second sample label sequence and a complementary sequence of the fragment B in sequence from the end near the DNA molecule to be detected;

and the region E comprises a binding region of the primer PR in the library-establishing primer pair from the end near the DNA molecule to be detected.

2. The kit of claim 1, wherein:

the kit also comprises the library building primer pair;

the library building primer pair consists of the primer PF and the primer PR;

the primer PR comprises, from the 5' end, a first sample tag sequence and a region which binds to the region E.

3. The kit of claim 1, wherein: the region E comprises a first sample tag sequence and a binding region of a primer PR in the library building primer pair from the end close to the DNA molecule to be detected;

the kit also comprises the library building primer pair;

the library building primer pair consists of the primer PF and the primer PR;

the primer PR contains a region which binds to the region E and does not contain the first sample tag sequence.

4. The kit according to any one of claims 1 to 3, wherein:

the length of the second sample label sequence is more than 3 nt;

or the length of the second sample label sequence is 3-10 nt.

5. The kit according to any one of claims 1 to 4, wherein:

the dual sample label tab is a drum bubble or Y-shaped structure.

Or, the last base phosphorylation modification of the joint sequence L from the end near the DNA molecule to be detected.

6. The kit according to any one of claims 1 to 5, wherein:

the double-sample label joint is formed by annealing a joint sequence L shown in a sequence 1 and a joint sequence S shown in a sequence 3;

or the double-sample label joint is formed by annealing a joint sequence L shown in a sequence 2 and a joint sequence S shown in a sequence 4;

or the pair of the library-establishing primers consists of a primer shown in a sequence 5 and a primer shown in a sequence 6 or 7.

7. A method for constructing a sequencing library of test DNA molecules using the kit of claims 1-6, comprising the steps of:

when a double-sample label is introduced into the library, a second sample label sequence in the double-sample label is positioned between a DNA molecule to be detected and a sequencing primer binding region;

or when double-sample labels are introduced in the database building, the second sample label sequence in the double-sample labels is close to the two ends of the DNA molecule to be detected.

8. The method of claim 7, wherein: the method comprises the following steps:

1) connecting the double-sample label adaptor of any one of claims 1-6 with the DNA molecule to be tested to obtain a ligation product;

2) amplifying the ligation products by using the pair of library-constructing primers of any one of claims 1 to 6 to obtain a sequencing library of the DNA molecules to be tested; and the second sample label sequence in the DNA molecule sequencing library to be detected is close to the two ends of the DNA molecule to be detected.

9. Use of a kit according to any one of claims 1 to 6 or a method according to claim 7 or 8 for constructing a sequencing library of test DNA molecules;

or, the use of a kit according to any one of claims 1 to 6 or a method according to claim 7 or 8 for single-ended sequencing of a DNA molecule to be tested;

or, the use of a kit according to any one of claims 1 to 6 or a method according to claim 7 or 8 for paired-end sequencing of a test DNA molecule;

or, the use of the double-sample tag adaptor and the corresponding pooling primer of any one of claims 1-6 for constructing a sequencing library of test DNA molecules;

or, the use of the double-sample tag adaptor of any one of claims 1-6 and the corresponding pool primer in single-ended sequencing of a DNA molecule to be tested;

or, the use of the double-sample tag adaptor and the corresponding pool primer of any one of claims 1-6 in paired end sequencing of a test DNA molecule.

10. Use according to claim 9, characterized in that: the single-ended sequencing is noninvasive prenatal gene sequencing, pathogenic microorganism gene sequencing or RNA sequencing.

Technical Field

The invention belongs to the technical field of biology, and particularly relates to a design method and a preparation method of a double-label joint.

Background

Currently, a high-throughput sequencing technology has become an important gene detection technology, and is widely applied to the fields of scientific research, medical detection, agricultural breeding, judicial identification and the like. The current mainstream providers of high throughput sequencing technology include Illumina corporation, Thermo fisher corporation, Pacbio corporation, nanopore corporation in the uk, and china megagene (BGI) and the like. In order to reduce the average sequencing cost of a sample, a strategy of performing mixed on-machine sequencing on a plurality of sample libraries is adopted in most cases. In the library construction process, a sample label (index) is added to each sample, sequencing data can be split into each sample according to the sample label, and finally high throughput and low cost of sequencing are achieved. Sample tagging has become an integral part of high throughput sequencing technologies.

In practical application, the problem of sample-label crosstalk (index-cross or index-switching) is often encountered, that is, data pollution of other samples can be found in data of a certain sample label, so that the accuracy of sequencing data is affected, for example, false positive results occur in pathogenic microorganism detection and tumor low-frequency mutation detection, and the result of RNA quantification is inaccurate. The main causes of sample tag crosstalk include library adaptor synthesis contamination, contamination during library construction, contamination in target region capture firing, pre-amplification contamination before sequencing, erroneous reading of sample tags during sequencing, sample residues in intermediate flow pipelines of two sequencing experiments, and the like.

At present, the main scheme for solving the problem of sample label crosstalk is to adopt double-label sequencing, namely, sample labels are introduced into two ends of DNA to be detected simultaneously, and only data with correct two labels can enter the analysis of the next link during sequencing data analysis. Thus, the problem of sample label crosstalk can be greatly reduced or even avoided.

At present, the library structure of the double-sample label is shown in figure 1, sequencing primer binding regions are arranged between the double-sample label and the position of DNA to be detected, and Illumina can filter sample data by adopting the scheme. After the library is loaded on a sequencing chip, one end of read1 and index1 is sequenced, after the sequencing is completed, the copying and synthesis of a second end sequencing template are carried out, then the sequencing primers of read2 and index2 are respectively added by taking the template as the template, and finally the reading of the double-sample tag sequence is realized. And if the two sample labels do not accord with the experimental design, deleting the corresponding sequencing reads data, and finally filtering the sample label crosstalk data.

The existing double-sample label design scheme has the following defects: 1) in order to realize the data acquisition of the double-sample label, 2 times of adding index sequencing primers are needed, so that the sequencing cost is increased; 2) since the sequencing templates for the two sample tags are both strands of the DNA library, template strand synthesis is required before sequencing of the second sample tag, resulting in increased sequencing time; 3) current double-sample tag designs are not compatible with single-ended sequencing.

Disclosure of Invention

In order to overcome the defects of the existing double-sample label, the invention provides the following technical scheme.

The invention provides a kit for constructing a DNA molecule sequencing library to be detected, which comprises a double-sample label joint;

annealing the double-sample label joint by a joint sequence L and a joint sequence S to form a joint; one end of the double-sample label joint is used for connecting a DNA molecule to be detected;

the double-sample label joints connected with the two ends of the DNA molecule to be detected are the same.

The DNA molecule to be detected can be a sticky end DNA molecule to be detected or a flat end DNA molecule to be detected, and if the sticky end DNA molecule to be detected is the flat end DNA molecule to be detected, the sticky end DNA molecule to be detected and the flat end DNA molecule to be detected can be connected with the double-sample label joint after the A is added.

The joint sequence L sequentially comprises a region A which is complementary with the joint sequence S and a region C which is not complementary with the joint sequence S from the end close to the DNA molecule to be detected;

the region A sequentially consists of a second sample label sequence and a fragment B for annealing and complementation from the end close to the DNA molecule to be detected;

a binding region of a primer PF in a bank building primer pair is arranged on the region C;

the joint sequence S sequentially comprises a region D which is complementary with the joint sequence L and a region E which is not complementary with the joint sequence L from the end close to the DNA molecule to be detected;

the region D consists of a complementary sequence of the second sample label sequence and a complementary sequence of the fragment B in sequence from the end near the DNA molecule to be detected;

and the region E comprises a binding region of the primer PR in the library-establishing primer pair from the end near the DNA molecule to be detected.

The kit also comprises the library building primer pair;

the library building primer pair consists of the primer PF and the primer PR;

the primer PR comprises, from the 5' end, a first sample tag sequence and a region which binds to the region E.

In the kit, the region E comprises a first sample tag sequence and a binding region of a primer PR in the library-building primer pair from the end close to a DNA molecule to be detected;

the kit also comprises the library building primer pair;

the library building primer pair consists of the primer PF and the primer PR;

the primer PR contains a region which binds to the region E and does not contain the first sample tag sequence.

In the kit, the length of the second sample tag sequence is greater than 3 nt;

or the length of the second sample label sequence is 3-10 nt. The length of the second sample label may be 3 bases or any combination of bases greater than 3 bases, and 10 bases or more are not recommended because of the large amount of data wasted.

In the above kit, the double sample label linker is in a bubble-like or Y-shaped structure or may be in other structures, and it is within the scope of the present invention to introduce new sample labels at both ends of the DNA adjacent to the insert.

In embodiments of the invention, the structure is a Y-type structure, wherein the complementary region of the 2-linker sequence is the backbone of the Y-type, and the non-complementary region is the bifurcation region of the Y-type.

The other end of the double-sample label joint is in a bubbly shape or a free non-complementary double-stranded structure; or, the last base phosphorylation modification of the joint sequence L from the end near the DNA molecule to be detected.

In the kit, the double-sample label adaptor is formed by annealing an adaptor sequence L shown in a sequence 1 and an adaptor sequence S shown in a sequence 3;

or the double-sample label joint is formed by annealing a joint sequence L shown in a sequence 2 and a joint sequence S shown in a sequence 4;

or the pair of the library-establishing primers consists of a primer shown in a sequence 5 and a primer shown in a sequence 6 or 7.

Another purpose of the invention is to provide a method for constructing a DNA molecule sequencing library to be tested by using the kit.

The method provided by the invention comprises the following steps:

when a double-sample label is introduced into the library, a second sample label sequence in the double-sample label is positioned between a DNA molecule to be detected and a sequencing primer binding region;

or when double-sample labels are introduced in the database building, the second sample label sequence in the double-sample labels is close to the two ends of the DNA molecule to be detected.

The method comprises the following steps:

1) connecting the double-sample label joint with a DNA molecule to be detected to obtain a connection product;

the DNA molecules to be detected can be sticky-end DNA molecules to be detected or flat-end DNA molecules to be detected, and if the DNA molecules to be detected are flat-end DNA molecules to be detected, the double-sample label joint can be connected after A is added;

2) amplifying the ligation product by using the library building primer pair to obtain a DNA molecule sequencing library to be detected; and the second sample label sequence in the DNA molecule sequencing library to be detected is close to the two ends of the DNA molecule to be detected.

The application of the kit or the method in constructing a DNA molecule sequencing library to be detected is also within the protection scope of the invention;

or, the application of the kit or the method in single-ended sequencing of the DNA molecule to be detected is also within the protection scope of the invention;

or, the application of the kit or the method in double-end sequencing of the DNA molecule to be detected is also the protection scope of the invention;

or, the application of the double-sample tag adaptor and the library-building primer corresponding to the double-sample tag adaptor in the construction of a DNA molecule sequencing library to be detected is also within the protection scope of the invention;

or, the application of the double-sample label joint and the corresponding library-establishing primer in single-ended sequencing of the DNA molecule to be detected is also within the protection scope of the invention;

or, the application of the double-sample tag adaptor and the corresponding library-establishing primer in double-end sequencing of the DNA molecule to be detected is also within the protection scope of the invention.

In the above application, the single-ended sequencing is non-invasive prenatal gene sequencing, pathogenic microorganism gene sequencing or RNA sequencing.

The invention realizes the sequencing of the double-sample label by adopting lower sequencing cost and sequencing time, and particularly realizes the high-efficiency filtration of the sample label crosstalk data on single-ended sequencing projects, such as noninvasive prenatal gene detection, pathogenic microorganism detection and the like, under the condition of not increasing the sequencing cost.

The invention has the following advantages: 1) by introducing new sample labels at two ends of the inserted DNA fragment, the adding times of the sequencing primer are reduced, and the sequencing cost is reduced; 2) the invention can also realize double-sample labeling for the single-ended sequencing project under the condition of not increasing the sequencing cost, thereby avoiding the false positive problem caused by sample label crosstalk. The design scheme of the joint can meet the requirement that double-sample labels can be realized by single-ended sequencing, and can realize filtering of wrong sequencing data generated by sample label crosstalk.

Drawings

FIG. 1 shows a common design scheme and sequencing method for a double-sample tag adapter.

FIG. 2 is a schematic diagram of library construction results of the double-sample tag adaptor of the present invention.

FIG. 3 illustrates a method for implementing the dual sample label adapter design of the present invention.

Detailed Description

The experimental procedures used in the following examples are all conventional procedures unless otherwise specified.

Materials, reagents and the like used in the following examples are commercially available unless otherwise specified.

14页详细技术资料下载
上一篇:一种医用注射器针头装配设备
下一篇:哮喘生物标志物KLRC1及其应用

网友询问留言

已有0条留言

还没有人留言评论。精彩留言会获得点赞!

精彩留言,会给你点赞!