ctDNA library construction and sequencing data analysis method for simultaneously detecting multiple liver cancer common mutations

文档序号:1609443 发布日期:2020-01-10 浏览:14次 中文

阅读说明:本技术 一种同时检测多种肝癌常见突变的ctDNA文库构建和测序数据分析方法 (ctDNA library construction and sequencing data analysis method for simultaneously detecting multiple liver cancer common mutations ) 是由 焦宇辰 曲春枫 王沛 陈坤 王宇婷 宋欠欠 王思振 阎海 于 2018-07-03 设计创作,主要内容包括:本发明公开了一种同时检测多种肝癌常见突变的ctDNA文库构建和测序数据分析方法。本发明建立的文库构建方法及测序数据分析方法具有以下优点:1、在不需要捕获的情况下同时检测肝癌多种突变形式;2、适合超小靶区的高效捕获;3、文库可以支持10-20次检测;4、在文库构建过程中将DNA条码barcode连接到起始ctDNA分子上,配合生物信息分析流程,实现ctDNA低频突变的高特异性检测;5、文库可以同时用于PCR的热点检测和捕获法测序,加入的DNA barcode可以有效滤除假阳性突变,实现基于duplex高特异性测序。本发明对于肝癌的早期筛查、病情追踪、疗效评估、预后预测等具有重要临床意义。(The invention discloses a ctDNA library construction and sequencing data analysis method for simultaneously detecting various liver cancer common mutations. The library construction method and the sequencing data analysis method established by the invention have the following advantages: 1. simultaneously detecting multiple mutation forms of the liver cancer without capturing; 2. the method is suitable for efficient capture of the ultra-small target area; 3. the library can support 10-20 times of detection; 4. in the library construction process, a DNA bar code barcode is connected to an initial ctDNA molecule, and high specificity detection of low-frequency mutation of ctDNA is realized by matching with a biological information analysis process; 5. the library can be simultaneously used for sequencing by a hot spot detection and capture method of PCR, and the added DNA barcode can effectively filter out false positive mutation, so that high specificity sequencing based on duplex is realized. The invention has important clinical significance for early screening, disease tracking, curative effect evaluation, prognosis prediction and the like of liver cancer.)

1. A sequencing library construction method sequentially comprises the following steps:

(1) sequentially carrying out terminal repair and 3' end adding A treatment on the DNA sample;

(2) connecting the DNA sample treated in the step (1) with a joint mixture, and performing PCR amplification to obtain a library;

the linker mixture consists of n linkers;

each adaptor is obtained by forming a partial double-stranded structure by one upstream primer A and one downstream primer A; the upstream primer A is provided with a sequencing joint A, a random tag, an anchoring sequence A and a base T positioned at the 3' end; the downstream primer A has an anchoring sequence B and a sequencing joint B; the partial double-stranded structure is formed by reverse complementation of an anchor sequence A in the upstream primer and an anchor sequence B in the downstream primer;

the sequencing joint A and the sequencing joint B are corresponding sequencing joints selected according to different sequencing platforms;

the random label is a random base of 8-14 bp;

the length of the anchoring sequence A is 14-20bp, and the number of continuous repetitive bases is less than or equal to 3;

n linkers adopt n different anchoring sequences A, and the bases at the same position are balanced, and the number of mismatched bases is more than 3;

n is any natural number more than or equal to 8.

2. The DNA library constructed by the method of claim 1.

3. A kit for constructing a sequencing library comprising the linker mixture of claim 1.

4. A kit for detecting liver cancer mutations in a DNA sample comprising the adaptor mixture and primer combination of claim 1; the primer combination comprises a primer group I, a primer group II, a primer group III and a primer group IV;

the primer group I consists of single-stranded DNA shown in a sequence 28 to a sequence 105 of a sequence table;

the primer group II consists of single-stranded DNA shown by a sequence 106 to a sequence 187 in a sequence table;

the primer group III consists of single-stranded DNA shown by a sequence 191 to a sequence 265 in a sequence table;

the primer group IV consists of single-stranded DNA shown in a sequence 266 to a sequence 344 of a sequence table.

5. The primer combination comprises a primer group I, a primer group II, a primer group III and a primer group IV;

the primer group I consists of single-stranded DNA shown in a sequence 28 to a sequence 105 of a sequence table;

the primer group II consists of single-stranded DNA shown by a sequence 106 to a sequence 187 in a sequence table;

the primer group III consists of single-stranded DNA shown by a sequence 191 to a sequence 265 in a sequence table;

the primer group IV consists of single-stranded DNA shown in a sequence 266 to a sequence 344 of a sequence table.

6. Use of the primer combination of claim 5 in the preparation of a kit for detecting liver cancer mutations in a DNA sample.

7. A method of detecting a mutation of interest in a DNA sample, comprising the steps of:

(1) constructing a library according to the method of claim 1;

(2) performing two rounds of nested PCR amplification on the library obtained in the step (1), sequencing the product, and analyzing the occurrence condition of target mutation in a DNA sample according to the sequencing result;

in the step (2), a first round of PCR amplification is carried out by adopting a primer combination A;

the primer combination A consists of an upstream primer A and a downstream primer combination A;

the upstream primer A is a library amplification primer and is used for library amplification in the step (1);

the downstream primer combination A is a combination of n primers designed according to n target spots;

taking the product of the first round of PCR as a template, and adopting a primer combination B to perform second round of PCR amplification;

the primer combination B consists of an upstream primer B, a downstream primer combination B and an index primer;

the upstream primer B is a library amplification primer and is used for amplifying a product of the first round of PCR;

the primers in the downstream primer combination B and the primers in the downstream primer combination A for detecting the same target point form a nested relation, and each primer has a section combined with the index primer;

the index primers comprise segments combined with the primers in the downstream primer combination B and index sequences.

8. The method of claim 7, wherein: the analysis method of the sequencing result comprises the following steps: and backtracking DNA molecule sequencing data with the same random tag sequence, the same length of the DNA insert and the same break points at both ends of the DNA insert to a molecule cluster, wherein if the number of molecules in the cluster is more than 5, the consistency rate of molecular mutation in the cluster is more than 80 percent, and the number of clusters is more than or equal to 5, the mutation is the true mutation from the original DNA sample.

9. A method for detecting multiple mutations of interest in a DNA sample, comprising the steps of:

(1) constructing a library according to the method of claim 1;

(2) and (2) enriching the target region of the library obtained in the step (1), sequencing, and analyzing the occurrence condition of the target mutation in the DNA sample according to the sequencing result.

10. The method of claim 9, wherein: the analysis method of the sequencing result comprises the following steps: backtracking initial DNA single-strand sequencing data with the same length of the DNA insert, breakpoints at two ends of the DNA insert and the same anchoring sequences at two ends to a molecular cluster; marking the molecular clusters of the same initial DNA double-chain with the same length of the inserted segments, the same sequences except the mutation points, the same anchoring sequences at two ends of the molecular clusters and opposite positions as a pair of duplex molecular clusters; for a mutation, it can be judged to be true if there is at least one pair of duplex clusters supported, or it can be judged to be true if there is no duplex cluster supported, there are at least 4 clusters supported.

Technical Field

The invention relates to a ctDNA library construction and sequencing data analysis method for simultaneously detecting various liver cancer common mutations.

Background

ctDNA (circulating tumor DNA), which is circulating tumor DNA, refers to tumor DNA that exists in body fluids such as blood and cerebrospinal fluid and is free from cells. ctDNA is often intermixed with free DNA derived from normal cells in blood, and is called cfDNA (cell free DNA increase blank space). By detecting mutations in ctDNA, targeted drug administration, monitoring of therapy, early screening for cancer, and the like can be guided. ctDNA-based detection methods include 1) PCR-based hot spot mutation detection methods, which typically detect one or several hot spot mutations or known mutations, cannot detect complex mutations such as gene fusions, and cannot detect unknown mutations. 2) Capture/next generation sequencing: more gene position mutations can be detected, including complex mutations, but the capture kit is generally expensive, complex to operate and long in time consumption. In the context of the above two methods, ctDNA detection currently suffers from the following difficulties: 1) the amount of ctDNA samples obtained by one-time blood drawing is limited, and usually only one-time detection is supported, so that ctDNA clinical detection is usually single-platform and disposable, and one mutation can not be detected any more by using a low-cost hot spot mutation method. In clinical tests, the target and scheme of the subsequent test are often judged according to the result of the first test, so that blood is required to be drawn again in the subsequent test. In addition, ctDNA-related clinical tests or studies often require a comparison of the merits of various techniques, which require several times the amount of samples normally drawn and are generally unacceptable to patients. 2) Whether the PCR method or the capture method is adopted, noise mutation generated in the amplification process can seriously interfere the detection of low-frequency mutation of ctDNA, so that false positive results are generated, and the diagnosis and treatment of patients are misled. 3) The ctDNA has low mutation content, and is easy to pollute in the operation process, thereby causing false positive results.

Liver cancer is the fifth most common tumor and the second most common tumor in the world, and more than half of liver cancer in the world occurs in China, and mainly hepatitis B related liver cancer. The hepatitis B related liver cancer has almost no hot point mutation such as KRAS, BRAF and the like, the mutation is mainly the coding region mutation of TP53, CTNNB1 and the like genes, the mutation of TERT high GC promoter region, and also comprises complex mutation forms such as HBV integration, TERT copy number variation and the like. So that no simple, low-cost and reliable scheme system for detecting the ctDNA mutation of the liver cancer exists at present. The ctDNA detection has important clinical significance in early screening, disease tracking, curative effect evaluation, prognosis prediction and the like of liver cancer.

Disclosure of Invention

The invention aims to provide a ctDNA library construction and sequencing data analysis method for simultaneously detecting various liver cancer common mutations.

The invention provides a sequencing library construction method, which sequentially comprises the following steps:

(1) sequentially carrying out terminal repair and 3' end adding A treatment on the DNA sample;

(2) connecting the DNA sample treated in the step (1) with a joint mixture, and performing PCR amplification to obtain a library;

the linker mixture consists of n linkers;

each adaptor is obtained by forming a partial double-stranded structure by one upstream primer A and one downstream primer A; the upstream primer A is provided with a sequencing joint A, a random tag, an anchoring sequence A and a base T positioned at the 3' end; the downstream primer A has an anchoring sequence B and a sequencing joint B; the partial double-stranded structure is formed by reverse complementation of an anchor sequence A in the upstream primer and an anchor sequence B in the downstream primer;

the sequencing joint A and the sequencing joint B are corresponding sequencing joints selected according to different sequencing platforms;

the random label is a random base of 8-14 bp;

the length of the anchoring sequence A is 14-20bp, and the number of continuous repetitive bases is less than or equal to 3;

n linkers adopt n different anchoring sequences A, and the bases at the same position are balanced, and the number of mismatched bases is more than 3;

n is any natural number more than or equal to 8.

The anchor sequence does not interact with other portions of the primer (e.g., to form hairpin structures, dimers, etc.).

The upstream primer A comprises a sequencing joint A, a random tag, an anchoring sequence A and a base T from the 5' end in sequence.

The downstream primer A is provided with an anchoring sequence B and a sequencing joint B from the 5' end in sequence.

The anchor sequence can be used as a built-in label for sequence fixation and is used for marking the original template molecule.

The DNA sample may be a genomic DNA, cDNA, ct DNA or cf DNA sample.

The n may be 12.

The random tag can be specifically 8bp of random base.

The length of the anchoring sequence A can be 12 bp.

When n is 12, the anchor sequence a may be specifically represented by sequence 1 from 5 'end 30-41 th, sequence 3 from 5' end 30-41 th, sequence 5 from 5 'end 30-41 th, sequence 7 from 5' end 30-41 th, sequence 9 from 5 'end 30-41 th, sequence 11 from 5' end 30-41 th, sequence 13 from 5 'end 30-41 th, sequence 15 from 5' end 30-41 th, sequence 17 from 5 'end 30-41 th, sequence 19 from 5' end 30-41 th, sequence 21 from 5 'end 30-41 th, and sequence 23 from 5' end 30-41 th of sequence.

The sequencing linker a may specifically be a sequencing linker of a Truseq sequencing kit from Illumina. The sequencing linker A can be specifically shown as 1-29 th from 5' end of sequence 1 in a sequence table of a sequence table.

The sequencing linker b may specifically be a sequencing linker of the nextera sequencing kit from Illumina corporation. The sequencing joint B can be specifically shown as 13 th to 41 th positions from 5' end of a sequence 2 in a sequence table of a sequence table.

When n is 12, the 12 linkers are as follows:

the adaptor 1 is obtained by forming a partial double-stranded structure by a single-stranded DNA molecule shown in a sequence 1 and a single-stranded DNA molecule shown in a sequence 2 in a sequence table; the adaptor 2 is obtained by forming a partial double-stranded structure by a single-stranded DNA molecule shown in a sequence 3 and a single-stranded DNA molecule shown in a sequence 4 in a sequence table; the adaptor 3 is obtained by forming a partial double-stranded structure by a single-stranded DNA molecule shown in a sequence 5 and a single-stranded DNA molecule shown in a sequence 6 in a sequence table; the adaptor 4 is obtained by forming a partial double-stranded structure by a single-stranded DNA molecule shown in a sequence 7 and a single-stranded DNA molecule shown in a sequence 8 in a sequence table; the adaptor 5 is obtained by forming a partial double-stranded structure by a single-stranded DNA molecule shown in a sequence 9 and a single-stranded DNA molecule shown in a sequence 10 in a sequence table; the adaptor 6 is obtained by forming a partial double-stranded structure by a single-stranded DNA molecule shown in a sequence 11 and a single-stranded DNA molecule shown in a sequence 12 in a sequence table; the adaptor 7 is obtained by forming a partial double-stranded structure by a single-stranded DNA molecule shown in a sequence 13 and a single-stranded DNA molecule shown in a sequence 14 in a sequence table; the adaptor 8 is obtained by forming a partial double-stranded structure by a single-stranded DNA molecule shown in a sequence 15 and a single-stranded DNA molecule shown in a sequence 16 in a sequence table; the adaptor 9 is obtained by forming a partial double-stranded structure by a single-stranded DNA molecule shown in a sequence 17 and a single-stranded DNA molecule shown in a sequence 18 in a sequence table; the adaptor 10 is obtained by forming a partial double-stranded structure by a single-stranded DNA molecule shown in a sequence 19 and a single-stranded DNA molecule shown in a sequence 20 in a sequence table; the adaptor 11 is obtained by forming a partial double-stranded structure by a single-stranded DNA molecule shown in a sequence 21 and a single-stranded DNA molecule shown in a sequence 22 in a sequence table; the adaptor 12 is obtained by forming a partial double-stranded structure by a single-stranded DNA molecule shown in a sequence 23 and a single-stranded DNA molecule shown in a sequence 24 in a sequence table.

The adaptor can be obtained by annealing the upstream primer A and the downstream primer A.

In the linker mixture, each linker is mixed equimolar.

The method also comprises the step of amplifying the library obtained in the step (2). The primer pair adopted by the amplification consists of two single-stranded DNA molecules shown as a sequence 25 and a sequence 26 in a sequence table.

The invention also protects the DNA library constructed by the method.

The invention also provides a kit for constructing a sequencing library, which comprises the adaptor mixture.

The kit also comprises a reagent for DNA extraction, a reagent for DNA library construction, a reagent for library purification, a reagent for library capture and the like, which are used as materials for library construction.

The invention also provides a kit for detecting liver cancer mutation in a DNA sample, which comprises any one of the joint mixture and the primer combination; the primer combination comprises a primer group I, a primer group II, a primer group III and a primer group IV;

the primer group I consists of single-stranded DNA shown in a sequence 28 to a sequence 105 of a sequence table;

the primer group II consists of single-stranded DNA shown by a sequence 106 to a sequence 187 in a sequence table;

the primer group III consists of single-stranded DNA shown by a sequence 191 to a sequence 265 in a sequence table;

the primer group IV consists of single-stranded DNA shown in a sequence 266 to a sequence 344 of a sequence table.

The kit also comprises a reagent for DNA extraction, a reagent for DNA library construction, a reagent for library purification, a reagent for library capture and the like, which are used as materials for library construction.

The invention also protects a primer combination which comprises a primer group I, a primer group II, a primer group III and a primer group IV;

the primer group I consists of single-stranded DNA shown in a sequence 28 to a sequence 105 of a sequence table;

the primer group II consists of single-stranded DNA shown by a sequence 106 to a sequence 187 in a sequence table;

the primer group III consists of single-stranded DNA shown by a sequence 191 to a sequence 265 in a sequence table;

the primer group IV consists of single-stranded DNA shown in a sequence 266 to a sequence 344 of a sequence table.

The primer combination is used for preparing a kit for detecting liver cancer mutation in a DNA sample.

The invention also discloses application of the primer combination in preparing a kit for detecting liver cancer mutation in a DNA sample.

The invention also provides a method for detecting target mutation in a DNA sample, which comprises the following steps:

(1) constructing a library according to the method of claim 1;

(2) performing two rounds of nested PCR amplification on the library obtained in the step (1), sequencing the product, and analyzing the occurrence condition of target mutation in a DNA sample according to the sequencing result;

in the step (2), a first round of PCR amplification is carried out by adopting a primer combination A;

the primer combination A consists of an upstream primer A and a downstream primer combination A;

the upstream primer A is a library amplification primer and is used for library amplification in the step (1);

the downstream primer combination A is a combination of n primers designed according to n target spots;

taking the product of the first round of PCR as a template, and adopting a primer combination B to perform second round of PCR amplification;

the primer combination B consists of an upstream primer B, a downstream primer combination B and an index primer;

the sequence of the upstream primer part B is a library amplification primer and is used for amplifying a product of the first round of PCR;

the primers in the downstream primer combination B and the primers in the downstream primer combination A for detecting the same target point form a nested relation, and each primer has a section combined with the index primer;

the index primers comprise segments combined with the primers in the downstream primer combination B and index sequences.

The upstream primer A is specifically shown as a sequence 27 in a sequence table.

The primer B is specifically shown as a sequence 188 in a sequence table.

The index primer comprises a segment A, index sequence and a segment B from the 5' end; the segment A is shown as a sequence 189 in a sequence table, and the segment B is shown as a sequence 190 in the sequence table.

When the target mutation is liver cancer mutation, the primer combination A consists of any one of the primer group I and the primer group II; the primer combination B consists of any one of the primer group III and the primer group IV; respectively carrying out first round PCR amplification on the template by using the primer group I and the primer group II, carrying out amplification by using a product amplified by using the primer group I as a template for second round amplification by using the primer group III, carrying out amplification by using a product amplified by using the primer group II as a template for second round amplification by using the primer group IV, and then mixing the amplification products in equal volume.

The analysis method of the sequencing result comprises the following steps: and backtracking DNA molecule sequencing data with the same random tag sequence, the same length of the DNA insert and the same break points at both ends of the DNA insert to a molecule cluster, wherein if the number of molecules in the cluster is more than 5, the consistency rate of molecular mutation in the cluster is more than 80 percent, and the number of clusters is more than or equal to 5, the mutation is the true mutation from the original DNA sample.

The invention also provides a method for detecting multiple target mutations in a DNA sample, which comprises the following steps:

(1) constructing a library according to the method of claim 1;

(2) and (2) enriching the target region of the library obtained in the step (1), sequencing, and analyzing the occurrence condition of the target mutation in the DNA sample according to the sequencing result.

The target region enrichment can be performed by adopting an existing commercially available targeted capture kit (such as Agilent sureselectXT targeted capture kit, Agilent5190-8646), and a primer pair of the last step of PCR amplification is replaced by a primer pair consisting of a primer A and a primer B; the primer A is shown as a sequence 345 in a sequence table; the primer B comprises a segment A, an index sequence and a segment B; the segment A is shown as a sequence 346 in a sequence table; and the segment B is shown as a sequence 347 in a sequence table.

The analysis method of the sequencing result comprises the following steps: backtracking initial DNA single-strand sequencing data with the same length of the DNA insert, breakpoints at two ends of the DNA insert and the same anchoring sequences at two ends to a molecular cluster; marking the molecular clusters of the same initial DNA double-chain with the same length of the inserted segments, the same sequences except the mutation points, the same anchoring sequences at two ends of the molecular clusters and opposite positions as a pair of duplex molecular clusters; for a mutation, it can be judged to be true if there is at least one pair of duplex clusters supported, or it can be judged to be true if there is no duplex cluster supported, there are at least 4 clusters supported.

Due to the adoption of the technical scheme, the invention has the following advantages:

1. under the condition of not needing to be captured, a plurality of mutation forms such as point mutation, insertion deletion mutation, HBV integration and the like in the ctDNA of the liver cancer are simultaneously detected. Compared with the capture method, the technology only needs a plurality of DNA primers, does not need expensive capture probes and hybridization reagents, and greatly reduces the cost; the operation flow is simple, and can be shortened from 36 hours to 8 hours in the capture method.

2. The method is suitable for efficient capture of an ultra-small target area, the target area can be as small as 10% of the minimum target area of a capture method, and sequencing efficiency is greatly improved. For example, the combination of common mutations of liver cancer, namely TP53, CTNNB1, AXIN1, TERT and HBV integration is an ultra-small target region suitable for the technology, the target region is enriched by a capture method, the target-loading rate is less than 10%, and the technology can reach more than 80%, thereby greatly improving the sequencing efficiency and reducing the sequencing cost.

3. After one detection, the amplified library can support 10-20 subsequent detections, and the result of each detection can represent the mutation status of all original ctDNA samples, so that the sensitivity and specificity are not reduced.

4. In the library construction process, a DNA bar code barcode is connected to an initial ctDNA molecule, and high specificity detection of low-frequency mutation of ctDNA is realized by matching with a biological information analysis process.

5. The library constructed by the technology can be used for sequencing by a hot spot detection and capture method of PCR (polymerase chain reaction), the library constructed by one specimen can simultaneously support multiple detections, and the added DNA barcode can effectively filter out false positive mutation, so that high-specificity sequencing based on duplex is realized.

The invention has important clinical significance for early screening, disease tracking, curative effect evaluation, prognosis prediction and the like of liver cancer.

Drawings

FIG. 1 is a schematic diagram of an adapter and a primer architecture.

FIG. 2 is a schematic diagram of enrichment of RaceSeq target regions and library construction.

FIG. 3 is a schematic of MC library capture and duplex sequencing.

Detailed Description

The following examples are given to facilitate a better understanding of the invention, but do not limit the invention. The experimental procedures in the following examples are conventional unless otherwise specified. The test materials used in the following examples were purchased from a conventional biochemical reagent store unless otherwise specified. The quantitative tests in the following examples, all set up three replicates and the results averaged.

79页详细技术资料下载
上一篇:一种医用注射器针头装配设备
下一篇:低起始量血浆游离DNA甲基化建库试剂盒及方法

网友询问留言

已有0条留言

还没有人留言评论。精彩留言会获得点赞!

精彩留言,会给你点赞!