Gene variation cis-trans position relation detection method, device, equipment and storage medium

文档序号:1289188 发布日期:2020-08-28 浏览:12次 中文

阅读说明:本技术 基因变异顺反位置关系检测方法、装置、设备和存储介质 (Gene variation cis-trans position relation detection method, device, equipment and storage medium ) 是由 胡金伟 谢克林 王明民 王凯 于 2020-04-15 设计创作,主要内容包括:本发明涉及一种能够提高检测效率且有利于提高检测结果准确性的基因变异顺反位置关系检测方法、装置、设备和存储介质。该检测方法包括获取基因变异测序数据;按照基因对所述基因变异测序数据中的变异情况进行分组,同一基因内的变异为同一组;针对同一组的任意两个待分析的目标变异,对各唯一比对read,分别获取各类型read的数目;根据获取的各类read的数目检测两个目标变异的顺反位置关系。通过使用上述基因变异顺反位置关系检测方法和装置可以对测序合格的基因变异测序数据进行自动化注释分析,直接判断两个变异是顺式位置关系还是反式位置关系,整个检测过程无需人工手动检测,节省人力成本,并且检测结果的误判或漏判率低。(The invention relates to a method, a device, equipment and a storage medium for detecting the cis-trans position relation of genetic variation, which can improve the detection efficiency and are beneficial to improving the accuracy of a detection result. The detection method comprises the steps of obtaining gene variation sequencing data; grouping variation conditions in the gene variation sequencing data according to genes, wherein variation in the same gene is the same group; aiming at any two target variations to be analyzed in the same group, respectively obtaining the number of each type of read for each unique comparison read; and detecting the cis-trans position relation of the variation of the two targets according to the obtained number of various reads. By using the method and the device for detecting the cis-trans positional relationship of the genetic variation, the sequencing data of the genetic variation with qualified sequencing can be automatically annotated and analyzed, whether the two variations are in the cis-positional relationship or in the trans-positional relationship can be directly judged, manual detection is not needed in the whole detection process, the labor cost is saved, and the misjudgment or the missing judgment rate of the detection result is low.)

1. A method for detecting cis-trans positional relationship of gene variation is characterized by comprising the following steps:

obtaining gene variation sequencing data;

grouping variation conditions in the gene variation sequencing data according to genes, wherein variation in the same gene is the same group;

aiming at any two target variations to be analyzed in the same group, respectively acquiring the number of reads meeting the following conditions for each unique comparison read:

acquiring the number of reads which meet the requirement that two target variations are positioned on the same continuous read;

acquiring the number of reads which only contain target variation with a front position and have the tail end position more front than the position of another target variation;

acquiring the number of reads which only contain target variation with a later position and have a read starting position which is more later than the position of another target variation;

acquiring the number of reads which meet the requirement that the positions of two target variants are continuous and only contain the target variant at the front position;

obtaining the number of reads which satisfy the condition that two target variants are continuous between positions and only contain target variants at the later positions;

and detecting the cis-trans position relation of the variation of the two targets according to the obtained number of various reads.

2. The method of detecting cis-trans positional relationship of genetic variation according to claim 1, further comprising, before obtaining the number of reads:

the variation of each group is ordered from small to large according to its position on the corresponding chromosome.

3. The method for detecting cis-trans positional relationship of genetic variation according to claim 1 or 2, wherein the step of detecting cis-trans positional relationship of two target variations according to the obtained number of each type of read comprises:

detecting the cis-trans position relation of the two target variations according to the cis-relation judgment condition and the trans-relation judgment condition;

the cis relationship judgment condition is as follows: m is more than or equal to 1 and m + n + x is more than or equal to 3;

the condition for judging the trans-form relationship is as follows: p is more than or equal to 1, q is more than or equal to 1, and p + q is more than or equal to 3;

wherein m is the number of reads with two target variants on the same continuous read;

n is the number of reads that contain only the forward target variant and whose end positions are further forward than the other target variant;

x is the number of reads that contain only the next-in-place target variant and whose start positions of the reads are further back than the positions of the other target variants;

p is the number of reads that are consecutive between the positions of two target variants and contain only the top target variant;

q is the number of reads that are consecutive between the positions of two target variants and contain only the next target variant.

4. The method for detecting cis-trans positional relationship of genetic variation according to claim 3, wherein the detecting cis-trans positional relationship of two target variations based on the cis-relationship judgment condition and the trans-relationship judgment condition comprises detecting that the two target variations satisfy cis-relationship, or trans-relationship, or both cis-relationship and trans-relationship, or conventional variations that satisfy neither cis-relationship nor trans-relationship.

5. The method for detecting cis-trans positional relationship of genetic variation according to claim 3, further comprising:

calculating the abundance of post-merger variation af (in cis) ═ 2m + n + x)/(Dvar1+ Dvar2 in a cis relationship;

where Dvar1 and Dvar2 are the sequencing depths at the positions of the two targeted variations, respectively.

6. The method for detecting cis-trans positional relationship of genetic variation according to claim 3, further comprising:

the abundance AF (var1) and AF (var2) of the two target variants were calculated separately in trans: AF (var1) ═ var1 total abundance, AF (var2) ═ var2 total abundance, and var1 and var2 represent two target variants, respectively.

7. The method for detecting cis-trans positional relationship of genetic variation according to claim 3, further comprising:

respectively calculating the abundance AF (incis) of the combined variation in the cis relation and the abundances AF (var1) and AF (var2) of two target variations in the trans relation under the condition of simultaneously satisfying the cis relation and the trans relation:

AF(in cis)=(2m+n+x)/(Dvar1+Dvar2),

AF(var1)=p/Dvar1,

AF(var2)=q/Dvar2;

where Dvar1 and Dvar2 are the sequencing depths at the positions of the two targeted variations, respectively.

8. A genetic variation cis-trans positional relationship detection device is characterized by comprising:

the sequencing data acquisition module is used for acquiring gene variation sequencing data;

the grouping module is used for grouping variation conditions in the gene variation sequencing data according to genes, and variation in the same gene is the same group;

a read number obtaining module, configured to, for any two target variants to be analyzed in the same group, obtain, for each unique comparison read, a number of reads that satisfy that the two target variants are located on the same continuous read, a number of reads that satisfy that only the target variant with the front position is included and that the end position of the read is located before the position of another target variant, a number of reads that satisfy that only the target variant with the rear position is included and that the start position of the read is located after the position of another target variant, a number of reads that satisfy that the two target variants are located between positions and that only the target variant with the front position is included, and a number of reads that satisfy that the two target variants are located between positions and that only the target variant with the rear position is included; and

and the forward and backward detection module is used for detecting the forward and backward position relation of the variation of the two targets according to the obtained number of various reads.

9. The apparatus according to claim 8, further comprising a sorting module for sorting the variations of each group from small to large according to their positions on the corresponding chromosomes.

10. The apparatus according to claim 8, wherein the cis-trans detecting module is configured to detect the cis-trans positional relationship between two target variants according to the cis-relationship determining condition and the trans-relationship determining condition;

the cis relationship judgment condition is as follows: m is more than or equal to 1 and m + n + x is more than or equal to 3;

the condition for judging the trans-form relationship is as follows: p is more than or equal to 1, q is more than or equal to 1, and p + q is more than or equal to 3;

wherein m is the number of reads with two target variants on the same continuous read;

n is the number of reads that contain only the forward target variant and whose end positions are further forward than the other target variant;

x is the number of reads that contain only the next-in-place target variant and whose start positions of the reads are further back than the positions of the other target variants;

p is the number of reads that are consecutive between the positions of two target variants and contain only the top target variant;

q is the number of reads that are consecutive between the positions of two target variants and contain only the next target variant.

11. The apparatus according to claim 10, further comprising a cis-abundance calculating module for calculating the abundance of the combined variation af (in cis) ═ 2m + n + x)/(Dvar1+ Dvar2) in cis relation, wherein Dvar1 and Dvar2 are the sequencing depths at the positions of the two target variations, respectively.

12. The apparatus according to claim 10, further comprising a trans-abundance calculating module for calculating abundance AF (var1) and AF (var2) of two target variants in trans relation, respectively: AF (var1) ═ var1 total abundance, AF (var2) ═ var2 total abundance, and var1 and var2 represent two target variants, respectively.

13. The apparatus according to claim 10, further comprising a cis-trans abundance calculating module for calculating the abundance AF (in cis) of the combined variation in the cis relationship and the abundance AF (var1) and AF (var2) of two target variations in the trans relationship, respectively, while satisfying the cis relationship and the trans relationship:

AF(in cis)=(2m+n+x)/(Dvar1+Dvar2),

AF(var1)=p/Dvar1,

AF(var2)=q/Dvar2;

where Dvar1 and Dvar2 are the sequencing depths at the positions of the two targeted variations, respectively.

14. A computer device having a processor and a memory, the memory having a computer program stored thereon, the processor implementing the steps of the method for detecting cis-trans positional relationship of genetic variations according to any one of claims 1 to 7 when executing the computer program.

15. A computer storage medium having a computer program stored thereon, wherein the computer program is executed to implement the steps of the method for detecting cis-trans positional relationship of genetic variations according to any one of claims 1 to 7.

Technical Field

The invention relates to the technical field of high-throughput sequencing data analysis, in particular to a method, a device, equipment and a storage medium for detecting cis-trans position relation of genetic variation.

Background

In the gene detection process of diseases such as cancer and the like, tissues or blood samples of a patient need to be subjected to data quality interpretation and output variant data of the sequencing data of the illumina platform by an extraction experiment, a library construction experiment, a library capture experiment, a library detection experiment, sequencing on the illumina platform and a biological information technology, annotation of qualified sequencing variant data, disease gene clinical examination report writing and finally a qualified disease related gene clinical examination report are provided.

The traditional link of annotating qualified sequencing variation data usually depends on a software called IGV (sequencing unique reads (reads from map to one position only)) to perform manual judgment, the variation is searched according to the input of the chromosome position of each variation during manual reading, the truth and the falseness of the variation are judged, and whether the variation and other variations have a cis-trans (in cis/in trans) position relationship is found through visual observation. The method can cost a great deal of manpower, and in the manual checking process, some variations with longer distance or higher sequencing depth are not easy to display on the same IGV display interface, so that the forward and reverse position relationship is easy to be omitted.

Disclosure of Invention

In view of the above, it is necessary to provide a method, an apparatus, a device and a storage medium for detecting cis-trans positional relationship of genetic variation, which can improve the detection efficiency and is advantageous for improving the accuracy of the detection result.

A method for detecting cis-trans positional relationship of gene variation comprises the following steps:

obtaining gene variation sequencing data;

grouping variation conditions in the gene variation sequencing data according to genes, wherein variation in the same gene is the same group;

aiming at any two target variations to be analyzed in the same group, respectively acquiring the number of reads meeting the following conditions for each unique comparison read:

acquiring the number of reads which meet the requirement that two target variations are positioned on the same continuous read;

acquiring the number of reads which only contain target variation with a front position and have the tail end position more front than the position of another target variation;

acquiring the number of reads which only contain target variation with a later position and have a read starting position which is more later than the position of another target variation;

acquiring the number of reads which meet the requirement that the positions of two target variants are continuous and only contain the target variant at the front position;

obtaining the number of reads which satisfy the condition that two target variants are continuous between positions and only contain target variants at the later positions;

and detecting the cis-trans position relation of the variation of the two targets according to the obtained number of various reads.

In one embodiment, before obtaining the number of reads, the method further comprises:

the variation of each group is ordered from small to large according to its position on the corresponding chromosome.

In one embodiment, the detecting the cis-trans positional relationship of the two target variants according to the obtained number of the various types of reads includes:

detecting the cis-trans position relation of the two target variations according to the cis-relation judgment condition and the trans-relation judgment condition;

the cis relationship judgment condition is as follows: m is more than or equal to 1 and m + n + x is more than or equal to 3;

the condition for judging the trans-form relationship is as follows: p is more than or equal to 1, q is more than or equal to 1, and p + q is more than or equal to 3;

wherein m is the number of reads with two target variants on the same continuous read;

n is the number of reads that contain only the forward target variant and whose end positions are further forward than the other target variant;

x is the number of reads that contain only the next-in-place target variant and whose start positions of the reads are further back than the positions of the other target variants;

p is the number of reads that are consecutive between the positions of two target variants and contain only the top target variant;

q is the number of reads that are consecutive between the positions of two target variants and contain only the next target variant.

In one embodiment, the detecting the cis-trans positional relationship of the two target variants according to the cis-relationship determination condition and the trans-relationship determination condition includes detecting that the two target variants are in a cis-relationship, or in a trans-relationship, or satisfy both the cis-relationship and the trans-relationship, or satisfy neither the cis-relationship nor the trans-relationship.

In one embodiment, the method for detecting cis-trans positional relationship of genetic variation further comprises:

calculating the abundance of post-merger variation af (in cis) ═ 2m + n + x)/(Dvar1+ Dvar2 in a cis relationship;

where Dvar1 and Dvar2 are the sequencing depths at the positions of the two targeted variations, respectively.

In one embodiment, the method for detecting cis-trans positional relationship of genetic variation further comprises:

the abundance AF (var1) and AF (var2) of the two target variants were calculated separately in trans: AF (var1) ═ var1 total abundance, AF (var2) ═ var2 total abundance, and var1 and var2 represent two target variants, respectively.

In one embodiment, the method for detecting cis-trans positional relationship of genetic variation further comprises:

respectively calculating the abundance AF (incis) of the combined variation in the cis relation and the abundances AF (var1) and AF (var2) of two target variations in the trans relation under the condition of simultaneously satisfying the cis relation and the trans relation:

AF(in cis)=(2m+n+x)/(Dvar1+Dvar2),

AF(var1)=p/Dvar1,

AF(var2)=q/Dvar2;

where Dvar1 and Dvar2 are the sequencing depths at the positions of the two targeted variations, respectively.

A gene variation cis-trans positional relationship detection device comprises:

the sequencing data acquisition module is used for acquiring gene variation sequencing data;

the grouping module is used for grouping variation conditions in the gene variation sequencing data according to genes, and variation in the same gene is the same group;

a read number obtaining module, configured to, for any two target variants to be analyzed in the same group, obtain, for each unique comparison read, a number of reads that satisfy that the two target variants are located on the same continuous read, a number of reads that satisfy that only the target variant with the front position is included and that the end position of the read is located before the position of another target variant, a number of reads that satisfy that only the target variant with the rear position is included and that the start position of the read is located after the position of another target variant, a number of reads that satisfy that the two target variants are located between positions and that only the target variant with the front position is included, and a number of reads that satisfy that the two target variants are located between positions and that only the target variant with the rear position is included; and

and the forward and backward detection module is used for detecting the forward and backward position relation of the variation of the two targets according to the obtained number of various reads.

In one embodiment, the apparatus for detecting cis-trans positional relationship of genetic variation further comprises a sorting module, wherein the sorting module is configured to sort the variations of each group from small to large according to their positions on the corresponding chromosomes.

In one embodiment, the cis-trans detection module is configured to detect a cis-trans positional relationship between two target variants according to a cis-relationship determination condition and a trans-relationship determination condition;

the cis relationship judgment condition is as follows: m is more than or equal to 1 and m + n + x is more than or equal to 3;

the condition for judging the trans-form relationship is as follows: p is more than or equal to 1, q is more than or equal to 1, and p + q is more than or equal to 3;

wherein m is the number of reads with two target variants on the same continuous read;

n is the number of reads that contain only the forward target variant and whose end positions are further forward than the other target variant;

x is the number of reads that contain only the next-in-place target variant and whose start positions of the reads are further back than the positions of the other target variants;

p is the number of reads that are consecutive between the positions of two target variants and contain only the top target variant;

q is the number of reads that are consecutive between the positions of two target variants and contain only the next target variant.

In one embodiment, the apparatus for detecting cis-trans positional relationship of genetic variation further comprises a cis-abundance calculating module, configured to calculate the abundance of combined variation, af, (in cis) ═ 2m + n + x)/(Dvar1+ Dvar2, in cis relationship, where Dvar1 and Dvar2 are the sequencing depths of the positions of the two target variations, respectively.

In one embodiment, the apparatus for detecting cis-trans positional relationship of genetic variation further comprises a trans-abundance calculating module, configured to calculate abundances AF (var1) and AF (var2) of two target variations in trans relationship, respectively: AF (var1) ═ var1 total abundance, AF (var2) ═ var2 total abundance, and var1 and var2 represent two target variants, respectively.

In one embodiment, the apparatus for detecting cis-trans positional relationship of genetic variation further comprises a cis-trans abundance calculating module, wherein the cis-trans abundance calculating module is configured to calculate abundance AF (in cis) of the combined variation in the cis relationship and abundance AF (var1) and AF (var2) of two target variations in the trans relationship, respectively, while satisfying the cis relationship and the trans relationship:

AF(in cis)=(2m+n+x)/(Dvar1+Dvar2),

AF(var1)=p/Dvar1,

AF(var2)=q/Dvar2;

where Dvar1 and Dvar2 are the sequencing depths at the positions of the two targeted variations, respectively.

A computer device having a processor and a memory, wherein the memory stores a computer program, and the processor implements the steps of the method for detecting cis-trans positional relationship of genetic variation according to any of the above embodiments when executing the computer program.

A computer storage medium having a computer program stored thereon, the computer program when executed implementing the steps of the method for detecting cis-trans positional relationship of genetic variation according to any one of the above embodiments.

By using the method and the device for detecting the cis-trans positional relationship of the genetic variation, the sequencing data of the genetic variation qualified in sequencing can be automatically analyzed and annotated, and whether the two variations are in cis (in cis) positional relationship or in trans (in trans) positional relationship can be directly judged.

Drawings

FIG. 1 is a schematic flow chart of a method for detecting cis-trans positional relationship of genetic variation according to an embodiment of the present invention;

FIG. 2 is a schematic diagram showing the grouping and ranking of all variants satisfying the conditions according to the genes;

FIG. 3 is a schematic representation of two target variants var1 and var2 both located in the same continuous read;

FIG. 4 is a schematic diagram of a read containing only the forward target variant var1 and having a read end position (position end) that is further forward than the position (position B) of another target variant var 2;

FIG. 5 is a schematic diagram of a read with only a later target variant var2 and a read start position (position start) further back than the position (position A) of another target variant var 1;

FIG. 6 is a schematic view of a read containing only the most advanced or only the most advanced of the target variants var1 or only the most advanced of the target variants var2, which is continuous between the positions (position A and position B) of the two target variants var1 and var 2;

FIG. 7 is a schematic block diagram of a device for detecting cis-trans positional relationship of genetic variation according to an embodiment of the present invention;

FIG. 8 is a flowchart of the analysis of step (1) in a specific test case;

FIG. 9 is a schematic diagram showing the sequencing of EGFR p.T790M and p.C797G mutations in chromosomal location-ordered relative positions and the unique alignment of reads from smaller to larger chromosomal locations for specific detection cases, with different reads sequenced from top to bottom;

FIG. 10 is a diagram of unique alignment reads for m values in a specific test case;

FIG. 11 is a diagram of a unique alignment read for a specific test case where n is located;

FIG. 12 is a diagram of unique alignment reads for the x value in a specific test case.

Detailed Description

To facilitate an understanding of the invention, the invention will now be described more fully with reference to the accompanying drawings. Preferred embodiments of the present invention are shown in the drawings. This invention may, however, be embodied in many different forms and should not be construed as limited to the embodiments set forth herein. Rather, these embodiments are provided so that this disclosure will be thorough and complete.

Unless defined otherwise, all technical and scientific terms used herein have the same meaning as commonly understood by one of ordinary skill in the art to which this invention belongs. The terminology used in the description of the invention herein is for the purpose of describing particular embodiments only and is not intended to be limiting of the invention. As used herein, the term "and/or" includes any and all combinations of one or more of the associated listed items.

"read" as used herein refers to sequencing data generated by a sequencer, which is actually a small segment of DNA fragments, and a read corresponds to a record in the sequencing data, wherein the unique alignment read is a read that can uniquely match (mapping) to a specific position in the genome in a second generation high-throughput paired-end sequencing (paired-end sequencing) detection result, also called unique read; the normal cells of the human body are Diploid (Diploid), each gene locus has two alleles (allele) which are respectively inherited from a male parent and a female parent, the two variations (variant) are in cis-position relation on the same chromosome, namely incis, and the two variations are in trans-position relation on different chromosomes, namely in trans; the sequencing depth refers to the ratio of the total base number obtained by sequencing to the size of a genome to be tested, for example, the size of the genome is about 5G after one genome is subjected to resequencing, and the sequencing depth is 20X when 100G of data volume is obtained by sequencing; the coverage degree refers to the proportion of the sequence obtained by sequencing in the whole genome, and due to the existence of complex structures such as high GC, repeated sequences and the like in the genome, the sequence obtained by final splicing and assembling of sequencing often cannot cover all regions, and the region which is not obtained in the sequencing often becomes Gap, for example, the coverage rate is 98% in the whole genome sequencing, which indicates that 2% of the sequence region is not obtained by sequencing; the abundance of a variation refers to the relative proportion of the alleles of the variation (relative to the wild-type allele) among all alleles at a given locus, i.e., equal to variation/(variation + wild-type).

As shown in fig. 1, an embodiment of the invention provides a method for detecting cis-trans positional relationship of genetic variation, which includes the following steps S110 to S140.

Step S110: and obtaining gene variation sequencing data.

If the gene variation sequencing data is derived from somatic cells (somatic), the gene variation sequencing data can be derived from, but not limited to, sequencing data of tumor tissues and the like. Specifically, the sequencing quality and sample requirements for tumor tissues are: the average coverage (mean coverage) is more than 700X, the interval between the sampling time and the sequencing time is less than 6 months, the content of tumor variant cells is more than or equal to 20 percent, and the variation with the variation abundance of more than or equal to 1 is preferentially applied. If the non-somatic sequencing data does not have the sequencing quality and sample requirement.

In a specific example, the genetic variation includes, but is not limited to, base substitution mutation (mutation), deletion mutation (optionally <50bp), insertion mutation (insertion, including repeat mutation and deletion-insertion mutation, optionally <50bp), and the like.

Step S120: the variation in the gene variation sequencing data is grouped according to genes, and the variation in the same gene is the same group.

As shown in FIG. 2, in this step, all mutations satisfying the condition are grouped according to the gene, and the mutations in the same gene are grouped into the same group.

Optionally, in a specific example, the method further comprises ranking the variations of each group according to their positions on the corresponding chromosome. Further optionally, the position coordinates are sorted sequentially from small to large.

For example, in fig. 2, the variations var1, var2, var3 … … var9 … … varN are grouped according to the gene GeneN, and the variations on each group are further arranged from small to large according to the position coordinate position.

Step S130: aiming at any two target variations to be analyzed in the same group, respectively acquiring the number of each type of read for each unique comparison read:

acquiring the number of reads which meet the requirement that two target variations are positioned on the same continuous read;

acquiring the number of reads which only contain target variation with a front position and have the tail end position more front than the position of another target variation;

acquiring the number of reads which only contain target variation with a later position and have a read starting position which is more later than the position of another target variation;

acquiring the number of reads which meet the requirement that the positions of two target variants are continuous and only contain the target variant at the front position;

the number of reads satisfying the sequence between the positions of the two target variants and containing only the target variant at the later position is obtained.

The location is the location of the corresponding genetic variation on the chromosome.

The present embodiment classifies each type of read according to the position and number of the target mutation, and determines whether two target mutations are in a cis-positional relationship or a trans-positional relationship according to the number of the reads.

FIG. 3 shows the case where both of the targeted variants var1 and var2 are located in the same continuous read;

FIG. 4 shows a case where only the target variant var1 located at the front position is included and the read end position (position end) is located at a position (position B) before the other target variant var 2;

FIG. 5 shows a case where only the target variant var2 is located at the back and the read start position (position start) is located at a position (position A) that is further back than the position of another target variant var 1;

fig. 6 shows a situation in which a read of only the target variant var1 located at the front position or only the target variant var2 located at the rear position is continued between the positions (position a and position B) of the two target variants var1 and var 2.

Step S140: and detecting the cis-trans position relation of the variation of the two targets according to the obtained number of various reads.

In a specific example, the detecting the cis-trans positional relationship of the two target variants according to the obtained number of the various types of reads includes:

detecting the cis-trans position relation of the two target variations according to the cis-relation judgment condition and the trans-relation judgment condition;

the cis relationship judgment condition is as follows: m is more than or equal to 1 and m + n + x is more than or equal to 3;

the conditions for judging the trans-relation are as follows: p is more than or equal to 1, q is more than or equal to 1, and p + q is more than or equal to 3;

wherein m is the number of reads with two target variants on the same continuous read;

n is the number of reads that contain only the forward target variant and whose end positions are further forward than the other target variant;

x is the number of reads that contain only the next-in-place target variant and whose start positions of the reads are further back than the positions of the other target variants;

p is the number of reads that are consecutive between the positions of two target variants and contain only the top target variant;

q is the number of reads that are consecutive between the positions of two target variants and contain only the next target variant.

After the detection and judgment result is the cis-position relation or the trans-position relation, the method also comprises the step of merging and outputting a cis-position relation result (such as p. [ variable 1; variable 2]) or a trans-position relation result (such as p. [ variable 1 ]; [ variable 2]) in an HGVS (sequential variable nomenclature) naming standard format.

The detecting the cis-trans position relationship of the two target variations according to the cis-relationship judgment condition and the trans-relationship judgment condition comprises detecting that the two target variations are in a cis-relationship, or in a trans-relationship, or satisfy both the cis-relationship and the trans-relationship, or satisfy neither the cis-relationship nor the trans-relationship.

In a more specific example, the method for detecting cis-trans positional relationship of genetic variation further includes:

calculating the abundance of post-merger variation af (in cis) ═ 2m + n + x)/(Dvar1+ Dvar2 in a cis relationship;

where Dvar1 and Dvar2 are the sequencing depths at the positions of the two targeted variations, respectively.

In a more specific example, the method for detecting cis-trans positional relationship of genetic variation further comprises:

the abundance AF (var1) and AF (var2) of the two target variants were calculated separately in trans: AF (var1) ═ var1 total abundance, AF (var2) ═ var2 total abundance, and var1 and var2 represent two target variants, respectively.

In a more specific example, the method for detecting cis-trans positional relationship of genetic variation further comprises:

respectively calculating the abundance AF (incis) of the combined variation in the cis relation and the abundances AF (var1) and AF (var2) of two target variations in the trans relation under the condition of simultaneously satisfying the cis relation and the trans relation:

AF(in cis)=(2m+n+x)/(Dvar1+Dvar2),

AF(var1)=p/Dvar1,

AF(var2)=q/Dvar2;

where Dvar1 and Dvar2 are the sequencing depths at the positions of the two targeted variations, respectively.

Optionally, for reads that do not support either cis or trans positional relationships, no statistics are included.

As shown in fig. 7, the present invention further provides an apparatus 200 for detecting cis-trans positional relationship of genetic variation, which includes a sequencing data obtaining module 210, a grouping module 220, a read number obtaining module 230, and a cis-trans detecting module 240.

The sequencing data obtaining module 210 is configured to obtain genetic variation sequencing data.

The grouping module 220 is configured to group variations in the genetic variation sequencing data according to genes, where variations in the same gene are in the same group.

The read number obtaining module 230 is configured to, for any two target variants to be analyzed in the same group, obtain, for each unique comparison read, a number of reads that satisfy that the two target variants are located on the same continuous read, a number of reads that satisfy that only the target variant with the front position is located and the end position of the read is located before the position of the other target variant, a number of reads that satisfy that only the target variant with the rear position is located and the start position of the read is located after the position of the other target variant, a number of reads that satisfy that the two target variants are located continuously and only the target variant with the front position is located, and a number of reads that satisfy that the two target variants are located continuously and only the target variant with the rear position is located between the positions of the two target variants.

The cis-trans detection module 240 is configured to detect a cis-trans positional relationship of two target variations according to the obtained number of each type of read.

Further, in a specific example, the apparatus 200 for detecting cis-trans positional relationship of genetic variation further includes a sorting module. The ranking module 250 is configured to rank the variations of each group from small to large according to their positions on the corresponding chromosome.

In a specific example, the cis-trans detection module 240 is configured to detect a cis-trans positional relationship of two target variants according to a cis-relationship determination condition and a trans-relationship determination condition;

the cis relationship judgment condition is as follows: m is more than or equal to 1 and m + n + x is more than or equal to 3;

the conditions for judging the trans-relation are as follows: p is more than or equal to 1, q is more than or equal to 1, and p + q is more than or equal to 3;

wherein m is the number of reads with two target variants on the same continuous read;

n is the number of reads that contain only the forward target variant and whose end positions are further forward than the other target variant;

x is the number of reads that contain only the next-in-place target variant and whose start positions of the reads are further back than the positions of the other target variants;

p is the number of reads that are consecutive between the positions of two target variants and contain only the top target variant;

q is the number of reads that are consecutive between the positions of two target variants and contain only the next target variant.

In a specific example, the apparatus 200 for detecting cis-trans positional relationship of genetic variation further includes a cis-abundance calculating module. The cis-abundance calculating module is used for calculating the abundance of the merged variation, af (in cis) ═ 2m + n + x)/(Dvar1+ Dvar2, in a cis relationship, and Dvar1 and Dvar2 are the sequencing depths of the positions of the two target variations, respectively.

In a specific example, the apparatus 200 for detecting cis-trans positional relationship of genetic variation further includes a trans-abundance calculating module. The trans-abundance calculation module is used for calculating abundance AF (var1) and AF (var2) of the two target variants respectively under a trans-relationship: AF (var1) ═ var1 total abundance, AF (var2) ═ var2 total abundance, and var1 and var2 represent two target variants, respectively.

In a specific example, the apparatus 200 for detecting cis-trans positional relationship of genetic variation further includes a cis-trans abundance calculation module. The cis-trans abundance calculating module is used for respectively calculating the abundance AF (in cis) of the combined variation in the cis relation and the abundances AF (var1) and AF (var2) of two target variations in the trans relation under the condition of simultaneously satisfying the cis relation and the trans relation:

AF(in cis)=(2m+n+x)/(Dvar1+Dvar2),

AF(var1)=p/Dvar1,

AF(var2)=q/Dvar2;

where Dvar1 and Dvar2 are the sequencing depths at the positions of the two targeted variations, respectively.

By using the method and the device for detecting the cis-trans positional relationship of the genetic variation, the sequencing data of the genetic variation qualified in sequencing can be automatically annotated and analyzed, whether the two variations are in cis (in cis) positional relationship or in trans (in trans) positional relationship can be directly judged, the whole detection process is high in efficiency, manual detection is not needed, the labor cost is saved, the misjudgment or missing judgment rate of a detection result is low, and the accuracy is obviously improved.

Based on the embodiments and/or specific examples described above, the present invention further provides a computer device for detecting cis-trans positional relationship of genetic variation, which has a processor and a memory, wherein the memory stores a computer program, and the processor implements the steps of the cis-trans positional relationship detection method of genetic variation according to any one of the specific examples when executing the computer program.

It will be understood by those skilled in the art that all or part of the processes of the above methods may be implemented by a computer program, which may be stored in a non-volatile computer-readable storage medium, and in the embodiments of the present invention, the program may be stored in the storage medium of a computer system and executed by at least one processor in the computer system to implement the processes of the embodiments including the methods described above. The storage medium may be a magnetic disk, an optical disk, a Read-Only Memory (ROM), a Random Access Memory (RAM), or the like.

Accordingly, the present invention further provides a computer storage medium for detecting cis-trans positional relationship of genetic variation, wherein a computer program is stored thereon, and when executed, the steps of the method for detecting cis-trans positional relationship of genetic variation according to any of the above embodiments are realized.

The method for detecting cis-trans positional relationship of genetic variation according to the present invention will be further described with reference to a specific detection example.

The specific detection case is used for automatically judging the in cis relationship between EGFR p.T790M and p.C797G in the genetic variation of S sample SSS of a tumor patient, and the detection steps are as follows.

(1) Among the numerous somatic mutations of the sample SSS, EGFR p.t790m and p.c797g mutations belong to EGFR genes, and therefore are grouped into EGFR genes and sorted according to the size of the positions on the chromosome, and the flow chart is shown in fig. 8 and the arrangement positions are shown in fig. 9;

(2) as shown in FIG. 9, 26 mutations, i.e., m 25, can be automatically detected, as shown in FIG. 10, it supports that the EGFR p.T790M and p.C797G mutations are simultaneously on a unique alignment read, i.e., m >3, thus satisfying the conditions of m ≧ 1 and m + n + x ≧ 3, and the two mutations of this sample only satisfy the case where two target mutations are located on the same continuous read, so that the EGFR p.T790M and p.C797 are in cis-positional relationship.

(3) Looking at the bioinformatics link processing, the sequencing depth at EGFR p.t790m was 1800X, the sequencing depth at egfrp.t797g was 1800X, and m ═ 26, n ═ 9 (fig. 11), and X ═ 2 (fig. 12) were automatically read out, so the abundance of variation in cis position relationship, af, (in cis) ═ 1.75% (26 × 2+9+2)/(1800+1800) (-1.75%, was calculated.

The technical features of the embodiments described above may be arbitrarily combined, and for the sake of brevity, all possible combinations of the technical features in the embodiments described above are not described, but should be considered as being within the scope of the present specification as long as there is no contradiction between the combinations of the technical features.

The above-mentioned embodiments only express several embodiments of the present invention, and the description thereof is more specific and detailed, but not construed as limiting the scope of the invention. It should be noted that, for a person skilled in the art, several variations and modifications can be made without departing from the inventive concept, which falls within the scope of the present invention. Therefore, the protection scope of the present patent shall be subject to the appended claims.

21页详细技术资料下载
上一篇:一种医用注射器针头装配设备
下一篇:基于MapReduce并行的circRNA识别方法

网友询问留言

已有0条留言

还没有人留言评论。精彩留言会获得点赞!

精彩留言,会给你点赞!