Biological sequence database storage method and system

文档序号:1783986 发布日期:2019-12-06 浏览:29次 中文

阅读说明:本技术 一种生物学序列的数据库保存方法和系统 (Biological sequence database storage method and system ) 是由 蓝田 李钟文 岑文杰 于 2019-08-06 设计创作,主要内容包括:本发明公开了一种生物学序列的数据库保存方法及系统,所述方法包括如下步骤:步骤S1,根据将生物学序列的通用保存格式内有关生物学序列的信息构建若干用于保存生物学序列部分信息的数据表,且各数据表通过特定逻辑进行关联;步骤S2,获取生物学序列信息,对所获取的生物学序列信息进行解析,依据解析结果将生物学序列信息的各部分存入对应的数据表,通过本发明,可降低生物学序列的信息复杂度,为生物学序列的数据使用和网络传输提供方便。(The invention discloses a method and a system for storing a database of biological sequences, wherein the method comprises the following steps: step S1, constructing a plurality of data tables for storing partial information of the biological sequence according to the information related to the biological sequence in the general storage format of the biological sequence, and associating the data tables through specific logic; and step S2, acquiring biological sequence information, analyzing the acquired biological sequence information, and storing each part of the biological sequence information into a corresponding data table according to the analysis result.)

1. A method for database preservation of biological sequences, comprising the steps of:

Step S1, constructing a plurality of data tables for storing partial information of the biological sequence according to the information related to the biological sequence in the general storage format of the biological sequence, and associating the data tables through specific logic;

And step S2, acquiring biological sequence information, analyzing the acquired biological sequence information, and storing each part of the biological sequence information into a corresponding data table according to the analysis result.

2. The method of claim 1, further comprising, after step S2, the steps of:

and step S3, designing a data interface for each data sheet so as to realize the operation of the biological column information in each data sheet through the data interface.

3. A method of database storage of biological sequences as claimed in claim 2, characterized in that: in step S1, data tables including, but not limited to, an annotation table for storing names of biological sequence information and corresponding description information, a type table for storing types of biological sequence information, a component table for storing hashcodes of biological sequences and sequences themselves, a node table for storing nodes of evolutionary trees of biological sequences, and an annotation _ types table for establishing a relationship between the annotation table and the type table are constructed.

4. A method of database storage of biological sequences as claimed in claim 3, characterized in that: in step S2, the biological sequence information is analyzed, the name and the corresponding description information of the biological sequence information in the biological sequence information are obtained and stored in the annotation table, annotation _ ID is generated, the type of the biological sequence information in the biological sequence information is obtained and stored in the type table, type _ ID is generated, hash code and sequence itself in the biological sequence information are obtained and stored in the component table, component _ ID is generated, the evolutionary tree node in the biological sequence information is obtained and stored in the node table, node _ ID is generated, the annotation _ types table is established according to the annotation _ ID and the type _ ID, and finally, the data tables are related by the IDs of the tables.

5. A method of database storage of biological sequences according to claim 4, characterized in that: in step S3, the data interface is used to achieve the purpose of adding, deleting, modifying and checking the biological column information.

6. A database preservation system for biological sequences, comprising:

The data table construction unit is used for constructing a plurality of data tables for storing partial information of the biological sequences according to the information related to the biological sequences in the general storage format of the biological sequences, and the data tables are associated through specific logic;

And the analysis storage unit is used for acquiring the biological sequence information, analyzing the acquired biological sequence information and storing all parts of the biological sequence information into corresponding data tables according to analysis results.

7. The database storage system for biological sequences according to claim 6, further comprising:

And the data interface design unit is used for designing a data interface for each data sheet so as to realize the operation of the biological column information in each data sheet through the data interface.

8. A database storage system for biological sequences according to claim 7, wherein: the data table construction unit constructs a data table for storing partial information of biological sequences, including but not limited to the following: the method comprises the steps of respectively setting an annotation table, a type table, a component table, a node table and an annotation _ types table, wherein the annotation table is used for storing the name of biological sequence information and corresponding description information, the type table is used for storing the type of the biological sequence information, the component table is used for storing the hash code and the sequence of the biological sequence, the node table is used for storing the evolutionary tree node of the biological sequence, and the annotation _ types table is used for establishing the relationship between the annotation table and the type table.

9. A database storage system for biological sequences as claimed in claim 8 wherein: the analysis storage unit analyzes the biological sequence information, obtains the name and corresponding description information of the biological sequence information in the biological sequence information, stores the name and the corresponding description information into the annotation table, generates annotation _ ID, obtains the type of the biological sequence information in the biological sequence information, stores the type of the biological sequence information into the type table, generates type _ ID, obtains hash code and sequence itself in the biological sequence information, stores the hash code and the sequence itself in the component table, generates component _ ID, obtains evolution tree nodes in the biological sequence information, stores the evolution tree nodes into the node table, generates node _ ID, establishes the annotation _ types table according to the annotation _ ID and the type _ ID, and finally associates the data tables by using the IDs of the tables.

10. A database storage system for biological sequences as claimed in claim 9 wherein: the data interface design unit achieves the purposes of increasing, deleting, modifying and checking biological column information through the data interface.

Technical Field

The invention relates to the technical field of data processing, in particular to a method and a system for storing a database of biological sequences.

Background

Most of the existing biological sequences are stored in a file format, such as Genbank (DNA sequence database), FastA (text-based representation of nucleic acid sequences or polypeptide sequences), etc., and various biological software generally has its own unique data storage format, which can be converted to some extent by the Genbank as a common format. However, it is also necessary for the user to know how to use Genbank for format conversion, which is time-consuming and labor-consuming. In addition, although there is a split and database preservation approach for Genbank on NCBI (National Center for Biotechnology Information ), the hierarchy and complexity of this preservation approach is not convenient enough for use in network applications.

Disclosure of Invention

In order to overcome the defects of the prior art, the invention aims to provide a method and a system for storing a biological sequence database, so as to reduce the information complexity of the biological sequence and provide great convenience for data use and network transmission of the biological sequence.

In order to achieve the above object, the present invention provides a method for storing a database of biological sequences, comprising the following steps:

step S1, constructing a plurality of data tables for storing partial information of the biological sequence according to the information related to the biological sequence in the general storage format of the biological sequence, and associating the data tables through specific logic;

and step S2, acquiring biological sequence information, analyzing the acquired biological sequence information, and storing each part of the biological sequence information into a corresponding data table according to the analysis result.

Preferably, after step S2, the method further includes the following steps:

and step S3, designing a data interface for each data sheet so as to realize the operation of the biological column information in each data sheet through the data interface.

Preferably, in step S1, a data table including, but not limited to, the following for storing partial information of biological sequences is constructed: the method comprises the steps of respectively setting an annotation table, a type table, a component table, a node table and an annotation _ types table, wherein the annotation table is used for storing the name of biological sequence information and corresponding description information, the type table is used for storing the type of the biological sequence information, the component table is used for storing the hash code and the sequence of the biological sequence, the node table is used for storing the evolutionary tree node of the biological sequence, and the annotation _ types table is used for establishing the relationship between the annotation table and the type table.

Preferably, in step S2, the biological sequence information is analyzed, the name and corresponding description information of the biological sequence information in the biological sequence information are obtained and stored in the annotation table, annotation _ ID is generated, the type of the biological sequence information in the biological sequence information is obtained and stored in the type table, type _ ID is generated, hash code and sequence itself in the biological sequence information are obtained and stored in the component table, component _ ID is generated, the evolution tree node in the biological sequence information is obtained and stored in the node table, node _ ID is generated, the annotation _ types table is established according to the annotation _ ID and the type _ ID, and finally, the data tables are associated with each other by using the IDs of the tables.

Preferably, in step S3, the purpose of adding, deleting, modifying and checking the biological column information is achieved through the data interface.

In order to achieve the above object, the present invention further provides a database storage system for biological sequences, comprising:

the data table construction unit is used for constructing a plurality of data tables for storing partial information of the biological sequences according to the information related to the biological sequences in the general storage format of the biological sequences, and the data tables are associated through specific logic;

and the analysis storage unit is used for acquiring the biological sequence information, analyzing the acquired biological sequence information and storing all parts of the biological sequence information into corresponding data tables according to analysis results.

preferably, the system further comprises:

And the data interface design unit is used for designing a data interface for each data sheet so as to realize the operation of the biological column information in each data sheet through the data interface.

preferably, the data table construction unit constructs a data table for storing partial information of biological sequences including, but not limited to, the following: the method comprises the steps of respectively setting an annotation table, a type table, a component table, a node table and an annotation _ types table, wherein the annotation table is used for storing the name of biological sequence information and corresponding description information, the type table is used for storing the type of the biological sequence information, the component table is used for storing the hash code and the sequence of the biological sequence, the node table is used for storing the evolutionary tree node of the biological sequence, and the annotation _ types table is used for establishing the relationship between the annotation table and the type table.

Preferably, the parsing storage unit parses the biological sequence information, obtains a name and corresponding description information of the biological sequence information in the biological sequence information, stores the name and the corresponding description information in the annotation table, generates annotation _ ID, obtains a type of the biological sequence information in the biological sequence information, stores the type of the biological sequence information in the type table, generates type _ ID, obtains hash code and sequence itself in the biological sequence information, stores the hash code and the sequence itself in the component table, generates component _ ID, obtains an evolutionary tree node in the biological sequence information, stores the evolutionary tree node in the node table, generates node _ ID, establishes the annotation _ types table according to the annotation _ ID and the type _ ID, and finally associates the data tables by using IDs of the tables.

Preferably, the data interface design unit realizes the purpose of increasing, deleting, modifying and checking biological column information through the data interface.

compared with the prior art, the biological sequence database storage method and the biological sequence database storage system have the advantages that the biological sequence information is divided into a plurality of data tables according to the information related to the biological sequence in the general storage format of the biological sequence, the data is stored through the specific logical relation, meanwhile, the biological sequence information can be managed and used in a network very conveniently and quickly through designing the data interfaces for increasing, deleting, modifying and checking the data tables and calling the interfaces without dividing the back end and the front end of a platform, so that the information processing threshold is greatly reduced, the information complexity of the biological sequence is reduced, the information processing speed is improved, and great convenience is provided for the data use and network transmission of the biological sequence.

Drawings

FIG. 1 is a flow chart illustrating the steps of a method for database storage of biological sequences according to the present invention;

FIG. 2 is a diagram of the system architecture of a database storage system for biological sequences according to the present invention;

FIG. 3 is a diagram illustrating the structure of each data table according to an embodiment of the present invention.

Detailed Description

Other advantages and capabilities of the present invention will be readily apparent to those skilled in the art from the present disclosure by describing the embodiments of the present invention with specific embodiments thereof in conjunction with the accompanying drawings. The invention is capable of other and different embodiments and its several details are capable of modification in various other respects, all without departing from the spirit and scope of the present invention.

FIG. 1 is a flow chart of the steps of a method for database storage of biological sequences according to the present invention. As shown in FIG. 1, the method for storing a database of biological sequences of the present invention comprises the following steps:

In step S1, several data tables for storing partial information of the biological sequence are constructed according to the information about the biological sequence in the general storage format of the biological sequence, and the data tables are related by specific logic.

In the embodiment of the invention, five data tables for storing the partial information of the biological sequence are constructed, namely an annotation table, a type table, a component table, a node table and an annotation _ types table, wherein the annotation table is used for storing the name of the biological sequence information and corresponding description information, the type table is used for storing the type of the biological sequence information, the component table is used for storing the hash code and the sequence of the biological sequence, the node table is used for storing the evolutionary tree node of the biological sequence, and the annotation _ types table is a relationship table for establishing the relationship between the annotation table and the type table.

And step S2, acquiring biological sequence information, analyzing the acquired biological sequence information, and storing each part of the biological sequence information into a corresponding data table according to the analysis result. Specifically, when a certain biological sequence information is obtained, the biological sequence information is analyzed, the name and corresponding description information of the biological sequence information in the biological sequence information are obtained and stored in an annotation table, annotation _ ID is generated, the type of the biological sequence information in the biological sequence information is obtained and stored in a type table, type _ ID is generated, hash code and sequence itself in the biological sequence information are obtained and stored in a component table, component _ ID is generated, an evolution tree node in the biological sequence information is obtained and stored in a node table, node _ ID is generated, an annotation _ types table is established according to the annotation _ ID and the type _ ID, and finally, the data tables are related by using the IDs of the tables.

Preferably, the method for storing a database of biological sequences of the present invention further comprises:

and step S3, designing a data interface for each data sheet so as to realize the purpose of increasing, deleting, modifying and checking the biological column information through the data interface. In the invention, the biological sequence information can be conveniently and quickly managed and used in the network by calling the interfaces without dividing the back end and the front end of the platform.

FIG. 2 is a diagram of the architecture of a database storage system for biological sequences according to the present invention. As shown in fig. 2, the present invention provides a database storage system for biological sequences, comprising:

A data table constructing unit 201, configured to construct a plurality of data tables for storing partial information of the biological sequence according to the information about the biological sequence in the general storage format of the biological sequence, and each data table is associated by a specific logic.

In an embodiment of the present invention, the data table constructing unit 201 constructs five data tables for storing partial information of a biological sequence, which are an annotation table, a type table, a component table, a node table and an annotation _ types table, wherein the annotation table is used for storing names of the biological sequence information and corresponding description information, the type table is used for storing types of the biological sequence information, the component table is used for storing hashcodes and sequences of the biological sequence, the node table is used for storing nodes of an evolutionary tree of the biological sequence, and the annotation _ types table is a relationship table for establishing a relationship between the annotation table and the type table.

the analysis storage unit 202 is configured to obtain biological sequence information, analyze the obtained biological sequence information, and store each part of the biological sequence information into a corresponding data table according to an analysis result. Specifically, when a certain biological sequence information is obtained, the analysis storage unit 202 first analyzes it, acquires the name of the biological sequence information and corresponding description information in the biological sequence information, stores it in an interpretation table, generates an interpretation _ ID, acquires the type of the biological sequence information in the biological sequence information, stores it in a type table, generates a type _ ID, acquires the hash and the sequence itself in the biological sequence information, stores it in a component table, generates a component _ ID, acquires the evolution tree node in the biological sequence information, stores it in a node table, generates a node _ ID, creates an interpretation _ types table from the interpretation _ ID and the type _ ID, and finally associates the data tables with the IDs of the tables.

Preferably, the database storage system for biological sequences of the present invention further comprises:

And the data interface design unit is used for designing a data interface for each data table so as to realize the purposes of increasing, deleting, modifying and checking the biological column information through the data interface. In the invention, the biological sequence information can be conveniently and quickly managed and used in the network by calling the interfaces without dividing the back end and the front end of the platform.

10页详细技术资料下载
上一篇:一种医用注射器针头装配设备
下一篇:基于单峰热解曲线的炭化可燃物热解动力学参数计算方法

网友询问留言

已有0条留言

还没有人留言评论。精彩留言会获得点赞!

精彩留言,会给你点赞!