Mutation detection method and device, storage medium and electronic equipment

文档序号:1688205 发布日期:2020-01-03 浏览:21次 中文

阅读说明:本技术 变异检测方法及装置、存储介质及电子设备 (Mutation detection method and device, storage medium and electronic equipment ) 是由 刘兵 张凯 于 2019-11-28 设计创作,主要内容包括:本公开涉及数据处理技术领域,具体涉及一种变异检测方法及装置、计算机可读存储介质及电子设备,所述方法包括:将基础数据加入于概率计算队列;获取概率计算队列中的一组基础数据组,以并行的方式基于基础数据组中各基础数据计算各激活区域内的各待测序片段是单倍型的概率值,并将基础数据和对应的概率值加入概率输出队列;获取概率输出队列中的基础数据和对应的概率值,根据基础数据和概率值计算所述激活区域中的变异信息。本公开实施例的技术方案能够通过并行的方式同时获取并计算多个基础数据对应的概率值,避免依次计算概率值导致的概率值计算速度低的问题,进而加快变异检测的速度。(The present disclosure relates to the field of data processing technologies, and in particular, to a variation detection method and apparatus, a computer-readable storage medium, and an electronic device, where the method includes: adding the basic data into a probability calculation queue; acquiring a group of basic data groups in a probability calculation queue, calculating the probability value of each segment to be tested in each activation region being a haplotype based on each basic data in the basic data groups in a parallel mode, and adding the basic data and the corresponding probability value into a probability output queue; and acquiring basic data and a corresponding probability value in the probability output queue, and calculating variation information in the activation region according to the basic data and the probability value. According to the technical scheme, the probability values corresponding to the plurality of basic data can be acquired and calculated simultaneously in a parallel mode, the problem that the probability value calculation speed is low due to the fact that the probability values are calculated in sequence is avoided, and then the speed of variation detection is increased.)

1. A mutation detection method, comprising:

adding the basic data into a probability calculation queue; wherein, the basic data comprises a segment to be tested in an activation region and a haplotype;

acquiring a group of basic data groups in a probability calculation queue, calculating probability values of the haplotypes of the fragments to be sequenced in the activation regions based on the basic data in the basic data groups in a parallel mode, and adding the basic data and the corresponding probability values into a probability output queue;

and acquiring the basic data and the corresponding probability value in the probability output queue, and calculating variation information in the activation region according to the basic data and the probability value.

2. The method of claim 1, wherein prior to said adding the base data to the probability calculation queue, the method further comprises:

and generating at least one piece of basic data according to the fragments to be sequenced and preset genetic data.

3. The method of claim 2, wherein the generating at least one of the basic data according to the fragments to be sequenced and the preset genetic data comprises:

comparing the fragment to be sequenced with the preset genetic data to obtain comparison data, and identifying at least one activation region according to the comparison data;

determining the haplotypes in each activation region according to the fragments to be sequenced in each activation region and the preset genetic data;

and generating basic data according to the fragments to be sequenced and the haplotypes in the activation regions.

4. The method of claim 3, wherein said determining said haplotype in each of said activation regions from said fragments to be sequenced and said predetermined genetic data in each of said activation regions comprises:

and locally assembling the fragments to be sequenced in each activation region and the preset genetic data, and determining the haplotypes in the activation regions according to the local assembly result.

5. The method according to claim 2, wherein the generating at least one of the basic data according to the segment to be sequenced and the preset genetic data is performed in a multi-thread manner.

6. The method of claim 1, wherein said calculating a probability value that each of the fragments to be sequenced in each of the activation regions is the haplotype based on each of the basis data in the set of basis data comprises:

and respectively inputting the basic data into a preset model to calculate the probability value of the haplotype of the segment to be sequenced in the activation region corresponding to the basic data.

7. The method of claim 6, wherein the pre-set model comprises a pair of hidden Markov models.

8. The method of claim 1, wherein the calculating variant information in the activation region according to the base data and the probability value comprises:

and counting the probability value that the fragment to be sequenced is the haplotype to determine the variation information of each variation data point on the fragment to be sequenced in the activation region.

9. The method of claim 8, wherein the probability value that the to-be-sequenced fragment is the haplotype is counted by a Bayesian statistical method.

10. The method of claim 1, wherein computing the probability values that the respective segments to be sequenced in the respective activation regions are the haplotypes based on the respective basis data in the basis data sets in a parallel manner, and wherein adding the basis data and the corresponding probability values to a probability output queue is performed in parallel by a programmable logic gate array.

11. The method of claim 1, wherein the calculating variant information in the activation region from the base data and the probability value is performed in a multi-threaded manner.

12. The method of claim 1, wherein the base data set comprises a maximum amount of base data determined according to a preset amount.

13. A variation detecting apparatus, comprising:

the data generation module is used for adding the basic data into the probability calculation queue; wherein, the basic data comprises a segment to be tested in an activation region and a haplotype;

a probability calculation module, configured to obtain a group of basic data groups in a probability calculation queue, calculate, in a parallel manner, probability values that the segments to be sequenced in the activation regions are the haplotypes based on the basic data in the basic data groups, and add the basic data and the corresponding probability values to a probability output queue;

and the variation calculation module is used for acquiring the basic data and the corresponding probability value in the probability output queue and calculating variation information in the activation region according to the basic data and the probability value.

14. A computer-readable storage medium, on which a computer program is stored, which, when being executed by a processor, carries out the mutation detection method according to any one of claims 1 to 12.

15. An electronic device, comprising:

a processor; and

memory storing one or more programs that, when executed by the one or more processors, cause the one or more processors to implement the variation detection method of any of claims 1 to 12.

17页详细技术资料下载
上一篇:一种医用注射器针头装配设备
下一篇:基于能量和概率的局部结构胃癌耐药lncRNA二级结构预测方法

网友询问留言

已有0条留言

还没有人留言评论。精彩留言会获得点赞!

精彩留言,会给你点赞!