Visualization analysis method, system and storage medium based on next generation sequencing

文档序号:1143096 发布日期:2020-09-11 浏览:44次 中文

阅读说明:本技术 基于二代测序的可视化分析方法、系统及存储介质 (Visualization analysis method, system and storage medium based on next generation sequencing ) 是由 周在威 王剑 陈瑛 陈爱玲 吴境雄 于 2020-05-27 设计创作,主要内容包括:本发明公开了一种基于二代测序的可视化分析方法、系统及存储介质,方法包括:获取资料库中存储的基因二代测序数据以及预设的样本类型;根据预设的模式信息,生成测序任务信息;根据所述测序任务信息,对所述基因二代测序数据进行筛选;根据预设的样本类型并通过调用生物信息学软件库和脚本库,对筛选后的基因二代测序数据进行分析,获得分析结果。本发明提供的基于二代测序的可视化分析方法、系统及存储介质,可以有效的避免了医学人员不熟悉计算机编程代码的问题。实现简单,便捷的操作,达到了医学人员进行二代测序(NGS)数据分析的要求。(The invention discloses a visualized analysis method, a visualized analysis system and a storage medium based on next generation sequencing, wherein the method comprises the following steps: acquiring gene next-generation sequencing data stored in a database and a preset sample type; generating sequencing task information according to preset mode information; screening the second-generation gene sequencing data according to the sequencing task information; and analyzing the screened gene second-generation sequencing data by calling a bioinformatics software library and a script library according to a preset sample type to obtain an analysis result. The visualized analysis method, the visualized analysis system and the storage medium based on the second-generation sequencing provided by the invention can effectively avoid the problem that medical personnel are not familiar with computer programming codes. The method is simple to implement and convenient to operate, and meets the requirement of medical personnel on the analysis of Next Generation Sequencing (NGS) data.)

1. A visual analysis method based on next generation sequencing is characterized by comprising the following steps:

acquiring gene next-generation sequencing data stored in a database and a preset sample type;

generating sequencing task information according to preset mode information;

screening the second-generation gene sequencing data according to the sequencing task information;

and analyzing the screened gene second-generation sequencing data by calling a bioinformatics software library and a script library according to a preset sample type to obtain an analysis result.

2. The visual analysis method of claim 1, wherein the obtaining of gene secondary sequencing data stored in the repository comprises:

judging whether the second-generation sequencing data of the gene is more than 10 g;

if the data is more than 10g, acquiring the second-generation sequencing data of the gene;

if the data is not more than 10g, the second generation sequencing is carried out on the gene sample again.

3. A visual analysis method according to claim 1, wherein the predetermined pattern information includes allele frequencies less than one in a thousand, or biological information affecting protein function.

4. A visual analysis method according to claim 1, wherein said preset sample types include: a genetic pattern type comprising dominant inheritance or recessive inheritance, and a medical phenotype.

5. A visual analytics method as claimed in claim 1, further comprising displaying the analytics results.

6. The visual analysis method of claim 1, wherein the analysis result is a gene mutation table according to disease characteristics.

7. A visual analysis system based on next generation sequencing, comprising:

the acquisition unit is used for acquiring the second-generation sequencing data of the gene stored in the database and a preset sample type;

the test task generating module is used for generating sequencing task information according to preset mode information;

the screening unit is used for screening the second-generation sequencing data of the gene according to the sequencing task information;

and the analysis unit is used for analyzing the screened gene second-generation sequencing data by calling the bioinformatics software library and the script library according to the preset sample type to obtain an analysis result.

8. The visualization analysis system of claim 7, wherein the acquisition unit is further configured to:

judging whether the second-generation sequencing data of the gene is more than 10 g;

if the data is more than 10g, acquiring the second-generation sequencing data of the gene;

if the data is not more than 10g, the second generation sequencing is carried out on the gene sample again.

9. A visualization analysis system as recited in claim 7, wherein the visualization analysis system further comprises:

and the display unit is used for displaying the result of the genome re-sequencing analysis.

10. A storage medium storing computer-executable instructions for performing the method for visual analysis based on next-generation sequencing of any one of claims 1-6.

Technical Field

The invention relates to the field of gene second-generation sequencing, in particular to a visualized analysis method, a visualized analysis system and a visualized analysis storage medium based on second-generation sequencing.

Background

With the popularization and commercialization of next-generation sequencing technologies (ngs), a great deal of human genetic information has been examined. However, the size of the human genome is about 30 hundred million base pairs, 3000 Mbp. Genome size is usually expressed in the number of nucleotide base pairs, in millions, written as Mb or Mbp. The human genome consists of 23 pairs of chromosomes (46 in total), each containing hundreds of genes. Chromosome 1 to 22 are numbered in the order of their sizes from large to small, and chromosome 23 is a sex-determining sex chromosome. The largest chromosomes contain about 2 hundred million 5 million base pairs and the smallest have about 3800 ten thousand base pairs. A total of about 30 hundred million base pairs, 3000 Mbp.

Disclosure of Invention

In order to solve the above problems, embodiments of the present invention provide a visualization analysis method, system and storage medium based on second-generation sequencing.

In a first aspect, an embodiment of the present invention provides a method for visual analysis based on next generation sequencing, including: acquiring gene next-generation sequencing data stored in a database and a preset sample type; generating sequencing task information according to preset mode information; screening the second-generation gene sequencing data according to the sequencing task information; and analyzing the screened gene second-generation sequencing data by calling a bioinformatics software library and a script library according to a preset sample type to obtain an analysis result.

In one possible implementation, the obtaining of the second-generation sequencing data stored in the repository includes: judging whether the second-generation sequencing data of the gene is more than 10 g; if the data is more than 10g, acquiring the second-generation sequencing data of the gene; if the data is not more than 10g, the second generation sequencing is carried out on the gene sample again.

In one possible implementation, the predetermined pattern information includes allele frequencies less than one in a thousand, or biological information affecting protein function.

In one possible implementation, the preset sample types include: a genetic pattern type comprising dominant inheritance or recessive inheritance, and a medical phenotype.

In one possible implementation, the visual analysis method further includes displaying the analysis result.

In one possible implementation, the analysis results are a table of gene mutations that are characteristic of the disease.

In a second aspect, an embodiment of the present invention further provides a second-generation sequencing-based visualization analysis system, including: the acquisition unit is used for acquiring the second-generation sequencing data of the gene stored in the database and a preset sample type; the test task generating module is used for generating sequencing task information according to preset mode information; the screening unit is used for screening the second-generation sequencing data of the gene according to the sequencing task information; and the analysis unit is used for analyzing the screened gene second-generation sequencing data by calling the bioinformatics software library and the script library according to the preset sample type to obtain an analysis result.

In a possible implementation manner, the obtaining unit is further configured to: judging whether the second-generation sequencing data of the gene is more than 10 g; if the data is more than 10g, acquiring the second-generation sequencing data of the gene; if the data is not more than 10g, the second generation sequencing is carried out on the gene sample again.

In one possible implementation, the visualization analysis system further includes: and the display unit is used for displaying the result of the genome re-sequencing analysis.

In a third aspect, embodiments of the present invention further provide a storage medium storing computer-executable instructions for performing the method for performing the second-generation sequencing-based visualization analysis according to the claims above.

The visualized analysis method, the visualized analysis system and the storage medium based on the second-generation sequencing provided by the embodiment of the invention can effectively solve the problem that medical personnel are not familiar with computer programming codes. The method is simple to implement and convenient to operate, and meets the requirement of medical personnel on the analysis of Next Generation Sequencing (NGS) data.

Drawings

In order to more clearly illustrate the embodiments of the present invention or the technical solutions in the prior art, the drawings used in the description of the embodiments or the prior art will be briefly described below, it is obvious that the drawings in the following description are only some embodiments of the present invention, and for those skilled in the art, other drawings can be obtained according to the drawings without creative efforts.

FIG. 1 is a flow chart of a method for visualization analysis based on next generation sequencing according to an embodiment of the present invention;

FIG. 2 shows a schematic structural diagram of a visualization analysis system based on next generation sequencing provided by an embodiment of the invention.

Detailed Description

The following detailed description of the present invention is provided in conjunction with the accompanying drawings, but it should be understood that the scope of the present invention is not limited to the specific embodiments.

In the description of the present invention, it is to be understood that the terms "center", "longitudinal", "lateral", "length", "width", "thickness", "upper", "lower", "front", "rear", "left", "right", "vertical", "horizontal", "top", "bottom", "inner", "outer", "clockwise", "counterclockwise", and the like, indicate orientations and positional relationships based on those shown in the drawings, and are used only for convenience of description and simplicity of description, and do not indicate or imply that the device or element being referred to must have a particular orientation, be constructed and operated in a particular orientation, and thus, should not be considered as limiting the present invention.

Throughout the specification and claims, unless explicitly stated otherwise, the word "comprise", or variations such as "comprises" or "comprising", will be understood to imply the inclusion of a stated element or component but not the exclusion of any other element or component.

Furthermore, the terms "first", "second" and "first" are used for descriptive purposes only and are not to be construed as indicating or implying relative importance or implicitly indicating the number of technical features indicated. Thus, a feature defined as "first" or "second" may explicitly or implicitly include one or more of that feature. In the description of the present invention, "a plurality" means two or more unless specifically defined otherwise.

The embodiment of the invention provides a visualized analysis method based on next generation sequencing, which is shown in a figure 1 and comprises the following steps: step 1-step 6;

step 1, acquiring second-generation genetic sequencing data and a preset sample type, wherein the second-generation genetic sequencing data is second-generation sequencing (NGS) data;

step 2, generating sequencing task information according to preset mode information;

in one implementation, the predetermined pattern information includes allele frequencies less than one in a thousand, or biological information that affects protein function.

Step 3, screening the second-generation sequencing data of the gene according to the sequencing task information;

and 4, analyzing the screened gene second generation sequencing data by calling a bioinformatics software library and a script library according to the preset sample type to obtain an analysis result.

Therefore, the visualization analysis method based on next generation sequencing provided by the embodiment can effectively avoid the problem that medical personnel is not familiar with computer programming codes. The method is simple to implement and convenient to operate. The requirement of medical personnel for carrying out the analysis of Next Generation Sequencing (NGS) data is met.

In one implementation, the preset sample types include: a genetic pattern type comprising dominant inheritance or recessive inheritance, and a medical phenotype.

In one implementation, step S1 may be preceded by: and performing gene second-generation sequencing on the gene sample.

In one implementation, step S1 may include: judging whether the second-generation sequencing data of the gene is more than 10 g; if the data is more than 10g, acquiring the second-generation sequencing data of the gene; if the data is not more than 10g, the second generation sequencing is carried out on the gene sample again.

In one implementation, step 4 may be further followed by:

and 5, displaying the analysis result.

In one implementation, the analysis results are a table of gene mutations that are characteristic of the disease.

After receiving the second-generation sequencing data, the hospital or the third-party detection mechanism is operated by medical and genetical professionals to extract the related second-generation sequencing data, and the related gene mutation table can be obtained after the related sample type, the genetic pattern and the medical phenotype are matched and processed.

In this embodiment, the bioinformatics software library may include SEEDERSEQ, GATK, ANNOVAR.

The script library may include SNP detection scripts, medical analysis scripts, and merged non-conventional raw data scripts.

It is understood that the visualization analysis in this embodiment is based on genome sequencing of known human genome sequences, and medical relevance analysis is performed on diseased individuals or populations. Individuals with genome sequencing can find a large number of SNP (single Nucleotide polymorphisms) of single Nucleotide polymorphism sites through sequence comparison, and carry out comprehensive analysis according to related medical phenotypes of patients and related genetic patterns of genetic diseases.

In the prior art, multiple pieces of bioinformatics software are needed for sequencing data analysis, the using method of each piece of software needs to be known relatively, and the linkage between different analysis modules needs manual intervention, so that the analysis is complicated and the efficiency is low.

Furthermore, the embodiment simplifies the analysis process of the visual analysis method based on the second-generation sequencing by calling the bioinformatics software and the personalized analysis script library, improves the genome sequencing efficiency and saves the scientific research cost.

The present embodiment also provides a visualization analysis system based on next generation sequencing, as shown in fig. 2, including: the device comprises an acquisition unit 1, a test task generation module 2, a screening unit 3 and an analysis unit 4.

The acquisition unit 1 is used for acquiring second-generation gene sequencing data and a preset sample type, wherein the second-generation gene sequencing data are stored in a database and are data of a gene sample subjected to second-generation sequencing.

And the test task generating module 2 is used for generating sequencing task information according to preset mode information.

And the screening unit 3 is used for screening the second-generation gene sequencing data according to the sequencing task information.

And the analysis unit 4 is used for analyzing the screened gene second-generation sequencing data by calling the bioinformatics software library and the script library according to the preset sample type to obtain an analysis result.

The obtaining unit 1 is further configured to: judging whether the second-generation sequencing data of the gene is more than 10 g; if the data is more than 10g, acquiring the second-generation sequencing data of the gene; if the data is not more than 10g, the second generation sequencing is carried out on the gene sample again.

In one implementation, the visualization analysis system further includes: and the display unit 5 is used for displaying the genome re-sequencing analysis result.

Therefore, the visualization analysis system based on the second-generation sequencing provided by the embodiment can effectively avoid the problem that medical personnel is not familiar with computer programming codes. The method is simple to implement and convenient to operate. The requirement of medical personnel for carrying out the analysis of Next Generation Sequencing (NGS) data is met.

Embodiments of the present invention further provide a storage medium, where the storage medium stores computer-executable instructions, which include a program for executing the above visualization analysis method based on next generation sequencing, and the computer-executable instructions may execute the method in any of the above method embodiments.

The storage medium may be any available medium or data storage device that can be accessed by a computer, including but not limited to magnetic memory (e.g., floppy disks, hard disks, magnetic tape, magneto-optical disks (MOs), etc.), optical memory (e.g., CDs, DVDs, BDs, HVDs, etc.), and semiconductor memory (e.g., ROMs, EPROMs, EEPROMs, nonvolatile memory (NAND FLASH), Solid State Disks (SSDs)), etc.

The above description is only for the specific embodiments of the present invention, but the scope of the present invention is not limited thereto, and any person skilled in the art can easily conceive of the changes or substitutions within the technical scope of the present invention, and all the changes or substitutions should be covered within the scope of the present invention. Therefore, the protection scope of the present invention shall be subject to the protection scope of the claims.

8页详细技术资料下载
上一篇:一种医用注射器针头装配设备
下一篇:基因比对加速装置、方法及系统

网友询问留言

已有0条留言

还没有人留言评论。精彩留言会获得点赞!

精彩留言,会给你点赞!