Sequencing sequence processing method and device, storage medium and electronic equipment

文档序号:1536707 发布日期:2020-02-14 浏览:24次 中文

阅读说明:本技术 测序序列处理方法及装置、存储介质、电子设备 (Sequencing sequence processing method and device, storage medium and electronic equipment ) 是由 张凯 何新军 沈一鸣 于 2019-10-17 设计创作,主要内容包括:本发明实施例是关于一种测序序列处理方法及装置、存储介质、电子设备,涉及生物信息处理技术领域,该方法包括:获取在对待测序序列与参考测序序列进行映射时得到的多个完全匹配子串;根据各所述完全匹配子串与所述参考测序序列之间的欧氏距离对各所述完全匹配子串进行分组得到多个子串组;根据预设排序规则对各所述子串组进行排序,并对排序后的各所述子串组中的各所述完全匹配子串进行处理得到处理结果;在确认所述处理结果满足预设条件时,将该处理结果作为所述待测序序列的对齐结果。本发明实施例提高对排序后的各子串组中的各完全匹配子串进行处理的效率,进而提高了对待测序序列进行比对的效率。(The embodiment of the invention relates to a sequencing sequence processing method and device, a storage medium and electronic equipment, relating to the technical field of biological information processing, wherein the method comprises the following steps: acquiring a plurality of completely matched substrings obtained when a sequence to be sequenced and a reference sequencing sequence are mapped; grouping the complete matching substrings according to Euclidean distances between the complete matching substrings and the reference sequencing sequence to obtain a plurality of substring groups; sequencing each sub-string group according to a preset sequencing rule, and processing each completely matched sub-string in each sequenced sub-string group to obtain a processing result; and when the processing result meets the preset condition, taking the processing result as the alignment result of the sequence to be sequenced. According to the embodiment of the invention, the efficiency of processing each complete matching sub-string in each sorted sub-string group is improved, and the efficiency of comparing the sequences to be sequenced is further improved.)

1. A method of sequencing sequence processing, comprising:

acquiring a plurality of completely matched substrings obtained when a sequence to be sequenced and a reference sequencing sequence are mapped;

grouping the complete matching substrings according to Euclidean distances between the complete matching substrings and the reference sequencing sequence to obtain a plurality of substring groups;

sequencing each sub-string group according to a preset sequencing rule, and processing each completely matched sub-string in each sequenced sub-string group to obtain a processing result;

and when the processing result meets the preset condition, taking the processing result as the alignment result of the sequence to be sequenced.

2. The method of claim 1, wherein the sorting each of the sub-string sets according to a predetermined sorting rule comprises:

respectively calculating the total number of the complete matching substrings in each substring group;

and sequencing each sub-string group according to the total number of the complete matching sub-strings in each sub-string group.

3. The method of claim 2, wherein after grouping each of the perfect match substrings into a plurality of substring groups, the method further comprises:

sorting the complete matching substrings in the substring group according to the current lengths of the complete matching substrings;

and deleting the complete matching substrings arranged behind the preset numerical value according to the sorting result when the total number of all the complete matching substrings in the substring group is judged to be larger than the preset numerical value.

4. The method of claim 1, wherein prior to obtaining the plurality of perfect match substrings obtained when mapping the sequence to be sequenced to the reference sequencing sequence, the method further comprises:

and mapping the plurality of sequences to be sequenced and the reference sequencing sequence in a parallel mode to obtain a plurality of completely matched substrings.

5. The method of claim 1, wherein processing each of the fully matched substrings in the ordered substring sets to obtain a processing result comprises:

and carrying out fuzzy matching scoring and backtracking processing on each completely matched sub-string in each sorted sub-string group to obtain the processing result.

6. The method of claim 1, wherein after obtaining the plurality of perfect match substrings obtained when mapping the sequence to be sequenced to the reference sequencing sequence, the method further comprises:

judging whether the length of each complete matching substring is smaller than a preset length or not;

and deleting the complete matching substrings when the length of any complete matching substring is judged to be smaller than the preset length.

7. The method of sequencing sequence processing according to claim 1, further comprising:

grouping the alignment results according to chromosome models to obtain a plurality of result groups;

and converting the alignment results in each result group according to a preset format to obtain a plurality of conversion files with the preset format.

8. The sequencing sequence processing method according to claim 1, wherein the sequencing is performed according to a preset sequencing rule on each of the substring groups, and each of the completely matched substrings in each of the sequenced substring groups is processed to obtain a processing result; and when the processing result is confirmed to meet the preset condition, the processing result is used as the alignment result of the sequence to be tested and is realized in a parallel mode through a field programmable gate array.

9. A sequencing sequence processing apparatus, comprising:

the complete matching substring acquisition module is used for acquiring a plurality of complete matching substrings obtained when the sequence to be sequenced and the reference sequencing sequence are mapped;

the complete matching substring grouping module is used for grouping the complete matching substrings according to Euclidean distances between the complete matching substrings and the reference sequencing sequence to obtain a plurality of substring groups;

the first sequencing module is used for sequencing each sub-string group according to a preset sequencing rule and processing each completely matched sub-string in each sequenced sub-string group to obtain a processing result;

and the processing result judging module is used for taking the processing result as the alignment result of the sequence to be detected when the processing result is confirmed to meet the preset condition.

10. A sequencing sequence processing system, comprising:

the general processor is used for acquiring a plurality of completely matched substrings obtained when the sequence to be sequenced and the reference sequencing sequence are mapped; and

grouping the complete matching substrings according to Euclidean distances between the complete matching substrings and the reference sequencing sequence to obtain a plurality of substring groups;

the field programmable gate array is in communication connection with the general processor and is used for sequencing each sub-string group according to a preset sequencing rule and processing each completely matched sub-string in each sequenced sub-string group to obtain a processing result; and

and when the processing result meets the preset condition, taking the processing result as the alignment result of the sequence to be sequenced.

11. A computer-readable storage medium, on which a computer program is stored, which, when being executed by a processor, carries out the method of processing a sequencing sequence according to any one of claims 1 to 8.

12. An electronic device, comprising:

a processor; and

a memory for storing executable instructions of the processor;

wherein the processor is configured to perform the sequencing sequence processing method of any of claims 1-8 via execution of the executable instructions.

18页详细技术资料下载
上一篇:一种医用注射器针头装配设备
下一篇:全基因组重测序分析及用于全基因组重测序分析的方法

网友询问留言

已有0条留言

还没有人留言评论。精彩留言会获得点赞!

精彩留言,会给你点赞!