Activation area identification method and device, storage medium and electronic equipment

文档序号:1536701 发布日期:2020-02-14 浏览:25次 中文

阅读说明:本技术 激活区域识别方法及装置、存储介质及电子设备 (Activation area identification method and device, storage medium and electronic equipment ) 是由 赵俊涛 蔡怡然 沈一鸣 于 2019-10-17 设计创作,主要内容包括:本公开涉及数据处理技术领域,具体涉及一种激活区域识别方法及装置、计算机可读存储介质及电子设备,所述方法包括:获取待识别数据与预设遗传数据的比对数据,并根据预设规则对比对数据进行分块以获取分块后的数据块;以并行方式对各数据块按照预设窗口长度进行遍历以计算预设遗传数据中的数据点是激活点的概率值;分别对各窗口内的所有概率值进行平滑处理以获取各所述窗口对应的概率曲线,并根据概率曲线识别各窗口内的激活区域。本公开实施例的技术方案通过根据预设规则对比对数据进行分块得到数据块,进而分别对数据块进行并行处理能够加快激活区域的识别效率,进而避免激活区域识别效率低造成的限制变异检测速度的问题。(The present disclosure relates to the field of data processing technologies, and in particular, to a method and an apparatus for identifying an activation region, a computer-readable storage medium, and an electronic device, where the method includes: acquiring comparison data of the data to be identified and preset genetic data, and partitioning the comparison data according to a preset rule to acquire partitioned data blocks; traversing each data block according to the length of a preset window in a parallel mode to calculate the probability value that a data point in preset genetic data is an activation point; and respectively smoothing all probability values in each window to obtain a probability curve corresponding to each window, and identifying the activation region in each window according to the probability curve. According to the technical scheme of the embodiment of the disclosure, the data blocks are obtained by blocking the comparison data according to the preset rules, and then the data blocks are processed in parallel respectively, so that the identification efficiency of the activation region can be increased, and the problem of limiting the variation detection speed caused by low identification efficiency of the activation region is avoided.)

1. An active area identification method, comprising:

acquiring comparison data of data to be identified and preset genetic data, and partitioning the comparison data according to a preset rule to acquire partitioned data blocks;

traversing each data block according to the length of a preset window in a parallel mode to calculate the probability value that a data point in the preset genetic data is an activation point;

and smoothing all the probability values in the windows respectively to obtain probability curves corresponding to the windows, and identifying the activation regions in the windows according to the probability curves.

2. The method of claim 1, wherein the preset rules include chromosome rules and preset partitioning values;

the blocking the comparison data according to a preset rule to obtain a blocked data block includes:

dividing the comparison data according to the chromosome where the preset genetic data is located to obtain chromosome data corresponding to each chromosome;

and partitioning each chromosome data according to a preset partitioning value to obtain at least one data block.

3. The method of claim 2, wherein the preset block value comprises a preset block length or a preset number of blocks.

4. The method of claim 1, wherein traversing each of the data blocks by a preset window length to calculate a probability value that a data point in the preset genetic data is an activation point comprises:

searching a first data point covered by data to be identified in preset genetic data corresponding to each data block through a preset tool;

and traversing each data block by a preset window length from the first data point corresponding to each data block to calculate the probability value corresponding to each data block.

5. The method of claim 4, wherein said calculating probability values corresponding to each of said data points comprises:

calculating the matching degree of all the data to be identified covered on each data point and preset genetic data;

and calculating the average value of the corresponding matching degree of each data point, and configuring the average value as the probability value of each data point as an activation point.

6. The method of claim 1, wherein smoothing all the probability values in each window to obtain a probability curve corresponding to each window comprises:

and respectively carrying out smoothing processing on all the probability values in the windows in parallel to obtain probability curves corresponding to the windows.

7. The method of claim 6, wherein the smoothing process comprises a Gaussian filtering process.

8. The method of claim 1, wherein determining whether an activation region exists in each of the windows according to the probability curve comprises:

identifying continuous regions with probability values larger than a preset threshold value in the probability curves corresponding to the windows, and configuring the continuous regions as activated regions in the windows; the continuous region comprises probability values corresponding to at least a preset number of continuous data points.

9. The method of claim 1, wherein traversing the data blocks in parallel according to a preset window length to calculate the probability value that a data point in the preset genetic data is an activation point is implemented in parallel by a programmable logic gate array.

10. An activation region identification apparatus, comprising:

the data blocking module is used for acquiring comparison data of the data to be identified and preset genetic data and blocking the comparison data according to a preset rule to acquire blocked data blocks;

the probability calculation module is used for traversing the data blocks in a parallel mode according to the length of a preset window so as to calculate the probability value that the data points in the preset genetic data are the activation points;

and the region identification module is used for respectively smoothing all the probability values in the windows to obtain probability curves corresponding to the windows and judging whether the activated regions exist in the windows according to the probability curves.

11. The apparatus of claim 10, wherein the region identification module comprises:

and the smoothing unit is used for respectively performing smoothing processing on all the probability values in the windows in parallel so as to obtain the probability curves corresponding to the windows.

12. A computer-readable storage medium, on which a computer program is stored, which, when being executed by a processor, carries out an activation region identification method according to any one of claims 1 to 9.

13. An electronic device, comprising:

a processor; and

memory for storing one or more programs which, when executed by the one or more processors, cause the one or more processors to implement an activation region identification method as claimed in any one of claims 1 to 9.

15页详细技术资料下载
上一篇:一种医用注射器针头装配设备
下一篇:基因测序数据的存储读取方法及系统

网友询问留言

已有0条留言

还没有人留言评论。精彩留言会获得点赞!

精彩留言,会给你点赞!