Method, device and storage medium for detecting joint deletion of chromosomes

文档序号:685296 发布日期:2021-04-30 浏览:5次 中文

阅读说明:本技术 一种检测染色体联合缺失的方法、装置和存储介质 (Method, device and storage medium for detecting joint deletion of chromosomes ) 是由 许明炎 陈亚如 周衍庆 陈实富 于 2020-12-30 设计创作,主要内容包括:本申请公开了一种检测染色体联合缺失的方法、装置和存储介质。本申请方法包括,采用变异检测软件对去重后数据进行SNP分析,获得变异信息;读取1p、19q捕获区域内千人基因组数据库中人群频率高于0.01的dbSNP位点信息;采用拷贝数变异检测软件获取1p、19q区域内拷贝数变异信息;获取正常对照样本突变频率信息和肿瘤组织样本突变频率;基于点突变频率和拷贝数变化分析1p/19q联合缺失。本申请方法通过高通量测序数据的点突变频率和拷贝数变化分析1p/19q染色体联合缺失;填补了高通量测序数据分析检测1p/19q染色体联合缺失的空白;还能进行点突变、插入缺失、融合等变异检测;提高了测序数据使用效率。(The application discloses a method, a device and a storage medium for detecting joint deletion of chromosomes. The method comprises the steps of carrying out SNP analysis on the data after duplication removal by using mutation detection software to obtain mutation information; reading dbSNP locus information of which the crowd frequency is higher than 0.01 in a thousand-person genome database in the capture areas of 1p and 19 q; copy number variation detection software is adopted to obtain copy number variation information in the 1p and 19q regions; obtaining mutation frequency information of a normal control sample and mutation frequency of a tumor tissue sample; the 1p/19q combined deletions were analyzed based on point mutation frequency and copy number variation. The method analyzes the 1p/19q chromosome joint deletion through the point mutation frequency and copy number change of high-throughput sequencing data; the blank of analyzing and detecting 1p/19q chromosome combined deletion by high-throughput sequencing data is filled; the mutation detection such as point mutation, insertion deletion, fusion and the like can also be carried out; the sequencing data use efficiency is improved.)

1. A method for detecting a combined chromosomal deletion, comprising: comprises the following steps of (a) carrying out,

acquiring and comparing data, namely acquiring capture sequencing results of the short arm of the chromosome 1 and the long arm of the chromosome 19 of the tumor tissue sample and the corresponding normal control sample respectively, comparing the capture sequencing results of the tumor tissue sample and the normal control sample to a reference genome, and removing duplication to obtain data after duplication removal;

a variation detection step, which comprises carrying out SNP analysis on the data after duplication removal by using variation detection software to obtain variation information, wherein the variation information comprises a variation position, a base type of the position on a reference genome, a variation base type of the position in a sample and mutation frequency information;

the high-frequency dbSNP acquisition step comprises the steps of reading dbSNP locus information of which the population frequency is higher than 0.01 in a 1000 genes database in a capture region of a short arm of a No. 1 chromosome and a long arm of a No. 19 chromosome;

copy number variation detection, which comprises the steps of adopting copy number variation detection software to carry out copy number analysis on the data after duplication removal, and obtaining the copy number variation information in the short arm region and the long arm region of the No. 1 chromosome;

a mutation frequency analysis step, which comprises filtering out high-frequency dbSNP locus mutation information existing in the mutation information obtained in the mutation detection step based on the information obtained in the high-frequency dbSNP acquisition step, and acquiring mutation frequency information of a normal control sample and mutation frequency of a tumor tissue sample;

and a chromosome joint deletion analysis step of analyzing the copy number of the short arm of chromosome 1 and the long arm of chromosome 19 based on the result of the copy number variation detection step and the result of the mutation frequency analysis step, thereby obtaining the chromosome joint deletion analysis result of the two.

2. The method for detecting combined chromosomal deletion according to claim 1, wherein: the normal control sample is leucocyte DNA;

preferably, the reference genome is the reference genome hg19.

3. The method for detecting combined chromosomal deletion according to claim 1, wherein: the mutation detection software is VarScan 2.

4. A method for detecting combined chromosomal deletions according to any of claims 1-3, wherein: the copy number variation detection software is CNVkit.

5. An apparatus for detecting a combined chromosomal deletion, comprising: the system comprises a data acquisition and comparison module, a mutation detection module, a high-frequency dbSNP acquisition module, a copy number mutation detection module, a mutation frequency analysis module and a chromosome joint deletion analysis module;

the data acquisition and comparison module is used for respectively acquiring the capture sequencing results of the short arm of the chromosome 1 and the long arm of the chromosome 19 of the tumor tissue sample and the corresponding normal control sample, comparing the capture sequencing results of the tumor tissue sample and the normal control sample to a reference genome, and removing duplication to obtain data after duplication removal;

the mutation detection module is used for carrying out SNP analysis on the data after duplication removal by adopting mutation detection software to obtain mutation information, wherein the mutation information comprises a mutation position, a base type of the position on a reference genome, a mutation base type of the position in a sample and mutation frequency information;

the high-frequency dbSNP acquisition module is used for reading dbSNP locus information of which the population frequency is higher than 0.01 in a 1000 genes database in a capture region of a short arm of a No. 1 chromosome and a long arm of a No. 19 chromosome;

the copy number variation detection module is used for performing copy number analysis on the data subjected to duplication removal by using copy number variation detection software to obtain copy number variation information in the short arm regions of the No. 1 chromosome and the long arm regions of the No. 19 chromosome;

the mutation frequency analysis module is used for filtering out high-frequency dbSNP site mutation information existing in the mutation information obtained by the mutation detection module based on the information of the high-frequency dbSNP acquisition module, and acquiring mutation frequency information of a normal control sample and mutation frequency of a tumor tissue sample;

the chromosome joint deletion analysis module is used for analyzing the copy number conditions of the short arm of the chromosome 1 and the long arm of the chromosome 19 based on the result of the copy number variation detection module and the result of the mutation frequency analysis module so as to obtain the chromosome joint deletion analysis result of the short arm and the long arm of the chromosome 1.

6. The apparatus for detecting chromosomal co-deletions of claim 5, wherein: the normal control sample is leucocyte DNA;

preferably, the reference genome is the reference genome hg19.

7. The apparatus for detecting chromosomal co-deletions of claim 5, wherein: the mutation detection software is VarScan 2.

8. Device for detecting combined chromosomal deletions according to any of claims 5 to 7, characterized in that: the copy number variation detection software is CNVkit.

9. An apparatus for detecting a combined chromosomal deletion, comprising: the apparatus includes a memory and a processor;

the memory including a memory for storing a program;

the processor comprising a program for implementing the method of detecting a joint chromosome deletion of any one of claims 1-4 by executing the program stored in the memory.

10. A computer-readable storage medium characterized by: the medium has stored therein a program executable by a processor to implement the method for detecting a joint deletion of chromosomes according to any one of claims 1 to 4.

Technical Field

The present application relates to the field of chromosome detection, and in particular, to a method, an apparatus, and a storage medium for detecting a combined deletion of chromosomes.

Background

Gliomas are mainly classified into astrocytic tumors, oligodendroglioma cytotumors, oligodendroastrocytic tumors, ependymal tumors, choroid plexus tumors, and the like, according to their difference in tissue morphology. Wherein, the combined deletion of 1p/19q exists in astrocytic tumor, oligodendroglioma cytoma and oligodendroastrocytic tumor, namely the combined deletion of chromosomes of the short arm of the chromosome 1 and the long arm of the chromosome 19, or the heterozygous deletion of chromosomes. Research shows that the combined deletion rate of 1p/19q chromosomes in the oligodendroglioma cytoma is the highest, and the incidence rate can reach 50% -80%; secondly, the tumor is an oligoastrocytic tumor, and the combined deletion rate of 1p/19q chromosomes is about 36 percent; the combined deletion rate of 1p/19q chromosomes in astrocytic tumors was about 11%.

Therefore, the detection of the 1p/19q chromosome combined deletion is an important reference data and theoretical basis for the diagnosis of glioma, individualized treatment and selection of postoperative radiotherapy or chemotherapy. There are two main ways to detect the combined deletion of chromosomes: one is based on the kit and method that the sequencing of amplicon detects the loss of heterozygosity of chromosome; another method is to use single fluorescence to mark multiple sites in a PCR system to detect the 1p/19q combined deletion. Both techniques can only detect 1p/19q combined deletion condition, and can not obtain other variation information of tumor tissue samples.

High-throughput sequencing can simultaneously sequence millions of short sequences, and with the development of high-throughput sequencing technology, a plurality of variations such as point mutation, insertion deletion, fusion and the like of a sample can be analyzed and detected based on high-throughput sequencing data. Therefore, the mutation detection based on the high-throughput sequencing has the advantages of high efficiency, rapidness, accuracy and the like. However, no method for analyzing the 1p/19q chromosome combined deletion aiming at high-throughput sequencing data exists at present.

Disclosure of Invention

The purpose of the present application is to provide a novel method, apparatus and storage medium for detecting a combined deletion of chromosomes.

In order to achieve the purpose, the following technical scheme is adopted in the application:

a first aspect of the present application discloses a method for detecting a combined deletion of chromosomes, comprising the steps of:

acquiring and comparing data, namely acquiring capture sequencing results of the short arm of the chromosome 1 and the long arm of the chromosome 19 of the tumor tissue sample and the corresponding normal control sample respectively, comparing the capture sequencing results of the tumor tissue sample and the normal control sample to a reference genome, and removing duplication to obtain data after duplication removal;

a variation detection step, which comprises adopting variation detection software to carry out SNP analysis on the data after duplication removal to obtain variation information; wherein the mutation information comprises a mutation position, a base type of the position on the reference genome, a mutation base type of the position in the sample and mutation frequency information;

the high-frequency dbSNP acquisition step comprises the steps of reading dbSNP locus information of which the population frequency is higher than 0.01 in a 1000 genes database in a capture region of a short arm of a No. 1 chromosome and a long arm of a No. 19 chromosome;

copy number variation detection, which comprises the steps of adopting copy number variation detection software to carry out copy number analysis on the data after duplication removal, and obtaining the copy number variation information in the short arm region and the long arm region of the No. 1 chromosome;

a mutation frequency analysis step, which comprises filtering out high-frequency dbSNP site mutation information existing in the mutation information obtained in the mutation detection step based on the information obtained in the high-frequency dbSNP acquisition step, and acquiring mutation frequency information of a normal control sample and mutation frequency of a tumor tissue sample;

and a chromosome joint deletion analysis step which comprises analyzing the copy number of the short arm (1p) of the chromosome 1 and the long arm (19q) of the chromosome 19 based on the result of the copy number variation detection step and the result of the mutation frequency analysis step, thereby obtaining the chromosome joint deletion analysis result of the short arm (1p) and the long arm (19q) of the chromosome 1 and the mutation frequency.

The chromosome joint deletion detection method is used for detecting the chromosome heterozygous deletion based on high-throughput capture sequencing analysis, and analyzing the chromosome heterozygous deletion by combining two factors of point mutation frequency and copy number change, so that the blank of carrying out 1p/19q chromosome joint deletion analysis on high-throughput sequencing data is filled. It is understood that the detection method of the present invention can not only perform 1p/19q chromosome combination deletion analysis, but also perform other mutation detection, such as detection of multiple mutations, e.g., point mutation, indel, fusion, etc., based on high throughput sequencing data, and is not limited herein.

In one implementation of the present application, the normal control sample is leukocyte DNA.

In one implementation of the present application, the reference genome is reference genome hg19.

In one implementation of the present application, the mutation detection software employs VarScan 2.

In one implementation of the present application, the copy number variation detection software employs a CNVkit.

The second aspect of the application discloses a device for detecting combined chromosome deletion, which comprises a data acquisition and comparison module, a mutation detection module, a high-frequency dbSNP acquisition module, a copy number mutation detection module, a mutation frequency analysis module and a combined chromosome deletion analysis module;

the data acquisition and comparison module is used for respectively acquiring the capture sequencing results of the short arm of the chromosome 1 and the long arm of the chromosome 19 of the tumor tissue sample and the corresponding normal control sample, comparing the capture sequencing results of the tumor tissue sample and the normal control sample to a reference genome, and removing the duplication to obtain the data after the duplication removal;

the variation detection module is used for carrying out SNP analysis on the data after duplication removal by using variation detection software to obtain variation information, wherein the variation information comprises a variation position, a base type of the position on a reference genome, a variation base type of the position in a sample and mutation frequency information;

the high-frequency dbSNP acquisition module comprises dbSNP locus information used for reading the human population frequency higher than 0.01 in a 1000 genes database in a capture region of a short arm of a No. 1 chromosome and a long arm of a No. 19 chromosome;

the copy number variation detection module is used for carrying out copy number analysis on the data subjected to duplication removal by using copy number variation detection software to obtain copy number variation information in the short arm regions of the No. 1 chromosome and the long arm regions of the No. 19 chromosome;

the mutation frequency analysis module is used for filtering out high-frequency dbSNP site mutation information existing in the mutation information obtained by the mutation detection module based on the information of the high-frequency dbSNP acquisition module, and acquiring mutation frequency information of a normal control sample and mutation frequency of a tumor tissue sample;

and the chromosome joint deletion analysis module comprises a copy number analysis module for analyzing the copy number conditions of the short arm of the chromosome 1 and the long arm of the chromosome 19 based on the result of the copy number variation detection module and the result of the mutation frequency analysis module so as to obtain the chromosome joint deletion analysis result of the short arm and the long arm.

It should be noted that, the apparatus for detecting combined deletion of chromosomes according to the present application actually implements each step in the method for detecting combined deletion of chromosomes according to the present application through each module; therefore, specific definitions of the modules can be referred to in the present application for the detection of chromosomal co-deletions without being reiterated herein.

A third aspect of the present application discloses an apparatus for detecting a joint deletion of chromosomes, the apparatus comprising a memory and a processor; wherein, the memory comprises a memory for storing programs; a processor comprising a program for implementing the method for detecting a joint chromosome deletion of the present application by executing the program stored in the memory.

A fourth aspect of the present application discloses a computer-readable storage medium having stored therein a program executable by a processor to implement the method of detecting a joint deletion of chromosomes of the present application.

Due to the adoption of the technical scheme, the beneficial effects of the application are as follows:

the method and the device for detecting the combined chromosome deletion carry out point mutation frequency and copy number change analysis through high-throughput sequencing data, and analyze the 1p/19q combined chromosome deletion based on the point mutation frequency and the copy number change; not only fills the blank of analyzing and detecting 1p/19q chromosome combined deletion by high-throughput sequencing data; furthermore, various variations such as point mutation, insertion deletion, fusion and the like can be further analyzed and detected; the use efficiency of high-throughput sequencing data is improved.

Drawings

FIG. 1 is a block diagram of a flow chart of a method for detecting a combined chromosome deletion in an embodiment of the present application;

FIG. 2 is a block diagram showing the structure of a chromosome joint deletion detection apparatus according to an embodiment of the present invention.

Detailed Description

The present application will be described in further detail below with reference to the accompanying drawings by way of specific embodiments. In the following description, numerous details are set forth in order to provide a better understanding of the present application. However, those skilled in the art will readily recognize that some of the features may be omitted or replaced with other elements, materials, methods in different instances. In some instances, certain operations related to the present application have not been shown or described in detail in this specification in order to avoid obscuring the core of the present application from excessive description, and it is not necessary for those skilled in the art to describe these operations in detail, so that they can be fully understood from the description in this specification and the general knowledge of the art.

The method for detecting the combined chromosome deletion comprises a data acquisition and comparison step 11, a mutation detection step 12, a high-frequency dbSNP acquisition step 13, a copy number mutation detection step 14, a mutation frequency analysis step 15 and a combined chromosome deletion analysis step 16, as shown in FIG. 1.

And the data acquisition and comparison step 11 comprises the steps of respectively acquiring the capture sequencing results of the short arm of the chromosome 1 and the long arm of the chromosome 19 of the tumor tissue sample and the normal control sample, comparing the capture sequencing results of the tumor tissue sample and the normal control sample to a reference genome, and performing duplication elimination to obtain the data after duplication elimination.

In one implementation mode of the application, tumor tissues and blood are taken as samples, DNA in the tumor tissues is extracted to be taken as tumor DNA, then leukocyte DNA in the blood is extracted to be taken as normal DNA, probes of a short arm of a No. 1 chromosome and a long arm of a No. 19 chromosome are adopted for capturing, a library after the capturing is well established is subjected to PE150 high-throughput sequencing by using a Novaseq6000 sequencer, and an R1.fastq.gz R2.fastq.gz file is obtained after the machine is taken off. PE150 was paired-end sequencing, reading 150bp in length, generating two files, R1 and R2, for subsequent analysis. In one implementation of the present application, the reference genome employs hg19, and the de-duplication after alignment employs conventional methods, which are not repeated herein.

A variation detection step 12, including performing SNP analysis on the data after duplication removal by using variation detection software to obtain variation information; wherein the mutation information comprises a mutation position, a base type of the position on the reference genome, a mutation base type of the position in the sample and mutation frequency information.

In one implementation of the present application, variation detection software VarScan2 is specifically used to compare SNP analysis performed on deduplicated mpieup data.

And a high-frequency dbSNP acquisition step 13, which comprises the step of reading dbSNP locus information of which the population frequency is higher than 0.01 in a 1000 genes database in the capture region of the short arm and the long arm of the No. 1 chromosome.

In an implementation manner of the application, a program written in perl language is specifically used for reading, and high-frequency dbSNP site information is obtained.

And 14, copy number variation detection, namely performing copy number analysis on the data after duplication removal by using copy number variation detection software to obtain copy number variation information in the short arm regions of the No. 1 chromosome and the long arm regions of the No. 19 chromosome.

In one implementation of the present application, copy number variation detection software CNVkit is specifically used for analysis of bam data after alignment to reference genome hg19 and deduplication. The command line parameter "cnvkit. py batch tune. bam-n normal. bam-t hapononco 605panel. bed-f hg19.fa- -access-5 k-mapped. hg19.bed-d result _ cnv". Py is a python software, so the suffix is py.batch is a parameter for analyzing cnv, and cnvkit. Bam is the bam file after aligning the tumor tissue sample to hg19 and deduplication, bam after aligning the n input leukocyte normal control sample to hg19 and deduplication, bed file of the t capture region, f reference genome fa file, -access is a file generated by cnvkit, -d output result path, the final result will be generated under this folder.

The generate command line for this file, access-5 k-mapped. hg19.bed, is "cnvkit. py access hg19.fa-o access-5 k-mapped. hg19. bed", and access is a bed file that generates non-N regions in the reference genome, which has N in addition to ATCG. The important information of the Bed file is the coordinates of the starting site and the ending site of the chr start end chromosome with 3 columns, and the same is true of the format of the HapOnco605.Bed file.

And a mutation frequency analysis step 15, which comprises the step of filtering out the high-frequency dbSNP site mutation information existing in the mutation information obtained in the mutation detection step 12 based on the information obtained in the high-frequency dbSNP acquisition step 13, and acquiring the mutation frequency information of the normal control sample and the mutation frequency of the tumor tissue sample.

In an implementation manner of the present application, a program written in perl language is specifically used for reading, and mutation frequency information of a normal control sample and mutation frequency of a tumor tissue sample are obtained.

And a combined chromosome deletion analysis step 16 which comprises analyzing the copy number of the short arm (1p) of chromosome 1 and the long arm (19q) of chromosome 19 based on the result of the copy number variation detection step 14 and the result of the mutation frequency analysis step 15, thereby obtaining the combined chromosome deletion analysis result of the two.

In an implementation manner of the present application, a program written in perl language is specifically used for analysis, and a 1p/19q chromosome joint deletion analysis result is obtained.

Those skilled in the art will appreciate that all or part of the functions of the above-described methods may be implemented by hardware, or may be implemented by computer programs. When all or part of the functions of the above method are implemented by means of a computer program, the program may be stored in a computer-readable storage medium, and the storage medium may include: a read only memory, a random access memory, a magnetic disk, an optical disk, a hard disk, etc., and the program is executed by a computer to realize the above functions. For example, the program may be stored in a memory of the device, and when the program in the memory is executed by the processor, all or part of the functions described above may be implemented. In addition, when all or part of the functions in the above embodiments are implemented by a computer program, the program may be stored in a storage medium such as a server, another computer, a magnetic disk, an optical disk, a flash disk, or a removable hard disk, and may be downloaded or copied to a memory of a local device, or may be version-updated on a system of the local device, and when the program in the memory is executed by a processor, all or part of the functions in the above methods may be implemented.

Therefore, based on the method for detecting the combined chromosome deletion, the present application provides an apparatus for detecting the combined chromosome deletion, as shown in fig. 2, which includes a data acquisition and comparison module 21, a mutation detection module 22, a high-frequency dbSNP acquisition module 23, a copy number mutation detection module 24, a mutation frequency analysis module 25, and a combined chromosome deletion analysis module 26.

The data acquiring and comparing module 21 is configured to acquire the capture sequencing results of the short arm of chromosome 1 and the long arm of chromosome 19 of the tumor tissue sample and the corresponding normal control sample, compare the capture sequencing results of the tumor tissue sample and the normal control sample with a reference genome, and perform deduplication to obtain data after deduplication.

The mutation detection module 22 is configured to perform SNP analysis on the deduplicated data by using mutation detection software to obtain mutation information, where the mutation information includes a mutation position, a base type of the position on the reference genome, a variant base type of the position in the sample, and mutation frequency information.

The high-frequency dbSNP acquisition module 23 comprises dbSNP locus information used for reading the population frequency of more than 0.01 in a 1000 genes database in the capture region of the short arm of the No. 1 chromosome and the long arm of the No. 19 chromosome.

The copy number variation detection module 24 is configured to perform copy number analysis on the deduplicated data by using copy number variation detection software, and obtain copy number variation information in the short arm region of chromosome 1 and the long arm region of chromosome 19.

And the mutation frequency analysis module 25 is used for filtering out high-frequency dbSNP site mutation information existing in the mutation information obtained by the mutation detection module 22 based on the information of the high-frequency dbSNP acquisition module 23, and acquiring mutation frequency information of a normal control sample and mutation frequency of a tumor tissue sample.

And a combined chromosome deletion analysis module 26, which comprises a copy number analysis module for analyzing the copy number of the short arm of chromosome 1 and the long arm of chromosome 19 based on the result of the copy number variation detection module 24 and the result of the mutation frequency analysis module 25, so as to obtain the combined chromosome deletion analysis result of the short arm and the long arm.

The device can realize the method for detecting the combined chromosome deletion, particularly realize corresponding steps in the method through the modules of the device, thereby realizing automatic combined chromosome deletion detection.

There is also provided in another implementation form of the present application an apparatus for detecting a joint deletion of chromosomes, the apparatus including a memory and a processor; a memory including a memory for storing a program; a processor comprising instructions for implementing the following method by executing a program stored in a memory: acquiring and comparing data, namely acquiring capture sequencing results of the short arm of the chromosome 1 and the long arm of the chromosome 19 of the tumor tissue sample and the corresponding normal control sample respectively, comparing the capture sequencing results of the tumor tissue sample and the normal control sample to a reference genome, and removing duplication to obtain data after duplication removal; a variation detection step, which comprises adopting variation detection software to carry out SNP analysis on the data after duplication removal to obtain variation information; wherein the mutation information comprises a mutation position, a base type of the position on the reference genome, a mutation base type of the position in the sample and mutation frequency information; the high-frequency dbSNP acquisition step comprises the steps of reading dbSNP locus information of which the population frequency is higher than 0.01 in a 1000 genes database in a capture region of a short arm of a No. 1 chromosome and a long arm of a No. 19 chromosome; copy number variation detection, which comprises the steps of adopting copy number variation detection software to carry out copy number analysis on the data after duplication removal, and obtaining the copy number variation information in the short arm region and the long arm region of the No. 1 chromosome; a mutation frequency analysis step, which comprises filtering out high-frequency dbSNP site mutation information existing in the mutation information obtained in the mutation detection step based on the information obtained in the high-frequency dbSNP acquisition step, and acquiring mutation frequency information of a normal control sample and mutation frequency of a tumor tissue sample; and a chromosome joint deletion analysis step, which comprises analyzing the copy number conditions of the short arm of the chromosome 1 and the long arm of the chromosome 19 based on the result of the copy number variation detection step and the result of the mutation frequency analysis step, thereby obtaining the chromosome joint deletion analysis result of the short arm and the long arm.

There is also provided, in another implementation, a computer-readable storage medium having a program stored therein, the program being executable by a processor to implement the method of: acquiring and comparing data, namely acquiring capture sequencing results of the short arm of the chromosome 1 and the long arm of the chromosome 19 of the tumor tissue sample and the corresponding normal control sample respectively, comparing the capture sequencing results of the tumor tissue sample and the normal control sample to a reference genome, and removing duplication to obtain data after duplication removal; a variation detection step, which comprises adopting variation detection software to carry out SNP analysis on the data after duplication removal to obtain variation information; wherein the mutation information comprises a mutation position, a base type of the position on the reference genome, a mutation base type of the position in the sample and mutation frequency information; the high-frequency dbSNP acquisition step comprises the steps of reading dbSNP locus information of which the population frequency is higher than 0.01 in a 1000 genes database in a capture region of a short arm of a No. 1 chromosome and a long arm of a No. 19 chromosome; copy number variation detection, which comprises the steps of adopting copy number variation detection software to carry out copy number analysis on the data after duplication removal, and obtaining the copy number variation information in the short arm region and the long arm region of the No. 1 chromosome; a mutation frequency analysis step, which comprises filtering out high-frequency dbSNP site mutation information existing in the mutation information obtained in the mutation detection step based on the information obtained in the high-frequency dbSNP acquisition step, and acquiring mutation frequency information of a normal control sample and mutation frequency of a tumor tissue sample; and a chromosome joint deletion analysis step, which comprises analyzing the copy number conditions of the short arm of the chromosome 1 and the long arm of the chromosome 19 based on the result of the copy number variation detection step and the result of the mutation frequency analysis step, thereby obtaining the chromosome joint deletion analysis result of the short arm and the long arm.

The nomenclature of the terminology related to this application is explained as follows:

high-throughput sequencing: also called second generation sequencing, compared with the first generation sequencing technology represented by Sanger, the method has the characteristics of high flux, high yield, high accuracy, automatic analysis and the like.

High-throughput capture sequencing: the high-throughput capture sequencing is to adopt a high-density synthesized probe, enrich the interested part on the genome through base complementation and then sequence by using a high-throughput sequencing technology.

BAM file: BWA alignment software is used to align the off-line sequence to a file generated on the human reference gene containing details of the sequence's position on the reference gene, alignment quality, etc.

dbSNP: the single nucleotide polymorphism database dbSNP is established by the cooperation of NCBI and the human genome institute, and includes data such as SNP, short insertion deletion polymorphism and the like, and information such as sources, detection and verification methods, genotype information, crowd frequency and the like. The present application mainly used dbSNP sites located in the capture region with higher population frequency on 1p and 19q for analysis.

SNP: single nucleotide site variation. Unlike the base at the position of the reference genome, the base at the position on the sample genome may be replaced with other types of bases.

VarScan: the application mainly uses the analyzed SNP result file, which is usually in a VCF format, for detecting gene variation detection software in sample data.

CNV: variation in copy number. The increase or decrease of the copy number of large fragment sequences on genome can be divided into deletion (deletion) and duplication (duplication), which are important molecular mechanisms.

CNVkit: the application mainly uses the analyzed CNS result files.

CNS documents: and (3) detecting a result file generated when copy number variation in sample data is detected by using CNVkit software, wherein the file contains detailed information such as position information of start and end of a large segment on a reference gene, log2ratio and the like.

Examples

In this example, genomic DNA was captured by hybridization using capture probes for the 1p and 19q regions, followed by high throughput sequencing; the present example is based on high throughput capture sequencing for the analysis and detection of chromosomal heterozygous deletions, and specifically, the present example combines two factors, point mutation frequency and copy number variation, to analyze chromosomal heterozygous deletions of 1p and 19 q. The method comprises the following specific steps:

respectively obtaining the capture sequencing results of the short arm of the chromosome 1 and the long arm of the chromosome 19 of the tumor tissue sample and the corresponding normal control sample, comparing the capture sequencing results of the tumor tissue sample and the normal control sample to a reference genome, removing duplication, obtaining data after duplication removal, and then carrying out the following operations:

a, reading the SNP result file generated by the VarScan2 to obtain variation information, which mainly comprises variation positions, the base types of the positions on the reference genome, the variation base types of the positions in the sample and mutation frequency information.

The mutation detection software VarScan2 adopted in this example is SNP analysis performed by comparing the deduplicated mpieup data; the detection process comprises the following steps: tumor tissues and blood are taken as samples, dna in the tumor tissues is extracted to be taken as tomor dna, then leukocyte dna in the blood is extracted to be taken as normal dna, a Heplos hapOnco605 probe is adopted for capturing, a built and captured library is subjected to PE150 high-throughput sequencing by using a Novaseq6000 sequencer, and an R1.fastq.gZ R2.fastq.gZ file is obtained after the machine is removed. In this case, PE150 was sequenced at both ends and read 150bp long, so two files were generated, R1 and R2.

b, reading dbSNP locus information of which the human population frequency is higher than 0.01 in a 1000 genes database in the capture region of the short arm of the chromosome 1 and the long arm of the chromosome 19.

This example uses a program written in the perl language for high throughput sequencing of the short arm of chromosome 1 and the long arm of chromosome 19 in the read capture region.

And c, reading a CNS result file generated by the CNVkit. Copy number variation information in the 1p and 19q regions was mainly obtained.

The CNVkit of this example was an analysis of bam data after alignment to the reference genome hg19 and de-duplication. Command line parameter "cnvkit. py batch tune. bam-n normal. bam-t hapononco 605panel. bed-f hg19.fa- -access-5 k-mapped. hg19.bed-d result. cnv"; py is a python software, so the suffix is py.batch is a parameter used to analyze cnv, which has many parameters corresponding to different functions. Bam is the bam file after aligning the tissue sample to hg19 and de-weighting, n input leukocyte alignment to hg19 and de-weighting, bad file of the t capture region, f reference genomic fa file, access is one generated by cnvkit, d output result path, the final result will be generated under this folder.

The generate command line for this file access-5 k-mapped. hg19.bed is "cnvkit. py access hg19.fa-o access-5 k-mapped. hg19. bed"; access is a bed file that generates non-N regions in the reference genome, which has N in addition to ATCG.

The important information of the Bed file is the coordinates of the starting site and the ending site of the chr start end chromosome with 3 columns, and the same is true of the format of the HapOnco605.Bed file.

And d, filtering out the high-frequency dbSNP site mutation information existing in the step a based on the information in the step b, and mainly acquiring mutation frequency information of a normal control sample and mutation frequency in a tumor tissue sample. This example is also read using a program written in the perl language.

In this example, the tumor tissue sample and the normal control sample are analyzed identically after being downloaded from the data, and the common analysis procedure is as follows, as exemplified by tomor: (1) data quality control, using fastp software to analyze memory, ram, r1, fastq, gz memory, ram, r2, fastq, gz, mainly filtering out some low quality data to obtain memory, clean, r1, fastq, gz memory, clean, r2, fastq, gz command line parameters: fast-I tulor. raw. R1.fast q. gz-I tulor. raw. R2.fast q. gz-O tulor. clean. R1.fast q. gz-O tulor. clean. R2.fast q. gz; (2) comparing data, namely comparing the filtered clean.R1.fastq.gz clean.R2.fastq.gz to a reference genome hg19, generating a sam file, converting the sam file into a bam file by samtools software, and then sorting and comparing the bam file by the samtools software to obtain three command lines, namely, bwa mem-R '@ RG \ tID: tomor \ tLB: tomor \ tSM: tomor \ \ tPL: ILLUMINA' -M hg19. tufa.clean.R.R1. fastq.gz tomor.clear.R2. fastq.gz > tomor.sam; samtools view-bS tomor.sam-o tomor.bam; samtools sort tun.bam-o tun.sort.bam; (3) duplicate removal, using gencore software to perform duplicate removal processing on the tomor. sort. bam, duplicate removal of PCR and the like, command line: gene-i tulor. sort.bam-o tulor. dedup.bam-r hg19.fa &); (4) generating an mpieup file, wherein the required input of the varscan software is the mpieup file, the bulge software is used for processing the bulge.depend.bam to generate the bulge.depend.mpieup file, and the command line parameters are as follows: sampools plunger-AB-Q25-Q30-d 10000-f hg19.fa-l HapOnco. bed plunger. Dedup. bam > plunger. Dedup. plunger.

e, analyzing the condition of 1p/19q copy number based on the results of the two steps c and d. This example also analyzes a program written in the perl language.

In the embodiment, the method is adopted to carry out 1p/19q chromosome combined deletion detection on 14 glioma tissue samples; and the same glioma tissue sample is subjected to chromosome joint deletion detection by adopting the conventional FISH technology, so as to verify the 1p/19q chromosome joint deletion detection result of the embodiment. The results of 14 glioma tissue samples are shown in table 1.

TABLE 1 glioma tissue sample 1p/19q chromosome combination deletion assay results

Sample numbering Tumor type 1p19q analytical results Results of FISH technique Comparative analysis
S001 Glioma 1p19q co-deletion 1p19q co-deletion Uniformity
S002 Glioma 1p19q co-deletion 1p19q co-deletion Uniformity
S003 Glioma 1p19q co-deletion 1p19q co-deletion Uniformity
S004 Glioma 1p19q copy Normal 1p19q copy Normal Uniformity
S005 Glioma 1p19q co-deletion 1p19q co-deletion Uniformity
S006 Glioma 1p19q copy Normal 1p19q copy Normal Uniformity
S007 Glioma 1p19q copy Normal 1p19q copy Normal Uniformity
S008 Glioma 1p19q co-deletion 1p19q co-deletion Uniformity
S009 Glioma 1p19q co-deletion 1p19q co-deletion Uniformity
S010 Glioma 1p19q copy Normal 1p19q copy Normal Uniformity
S011 Glioma 19q deletion of 1p Normal 19q deletion of 1p Normal Uniformity
S012 Glioma 1p19q co-deletion 1p19q co-deletion Uniformity
S013 Glioma 1p19q copy Normal 1p19q copy Normal Uniformity
S014 Glioma 1p19q copy Normal 1p19q copy Normal Uniformity

The results in Table 1 show that the method for detecting the combined deletion of chromosomes in the embodiment can accurately detect the combined deletion of 1p/19q chromosomes, and the detection result is highly consistent with the verification result of the FISH technology. In addition, the present example performs 1p/19q chromosome joint deletion detection based on high throughput sequencing results; the same sequencing data can be used for analyzing and detecting multiple variations such as point mutation, insertion deletion, fusion and the like, so that the use efficiency of the high-throughput sequencing data is improved.

The foregoing is a more detailed description of the present application in connection with specific embodiments thereof, and it is not intended that the present application be limited to the specific embodiments thereof. It will be apparent to those skilled in the art from this disclosure that many more simple derivations or substitutions can be made without departing from the spirit of the disclosure.

13页详细技术资料下载
上一篇:一种医用注射器针头装配设备
下一篇:基于染色体微阵列的ROH数据分析系统

网友询问留言

已有0条留言

还没有人留言评论。精彩留言会获得点赞!

精彩留言,会给你点赞!