Interactive database system for whole process of virtual screening of drugs

文档序号：1467606 发布日期：2020-02-21 浏览：15次中文

阅读说明：本技术 面向药物虚拟筛选全过程交互式数据库系统 (Interactive database system for whole process of virtual screening of drugs ) 是由刘昊王寒星魏志强李阳阳于 2019-10-10 设计创作，主要内容包括：本发明公开了一种面向药物虚拟筛选全过程交互式数据库系统，包括数据库底层、数据集、Web后台框架和交互式前台，所述数据集包括可直接进行分子对接实验的数据集、分子对接结果数据集、分子对接计算辅助数据集。本发明的数据库系统对整个分子对接计算过程进行交互式管理，前台集成对接计算的数据选择、计算任务提交、对接计算、查看计算节点、分配计算核心、删除计算任务、查看任务进程、计算任务纠错检验、失败任务重新提交、结果文件筛选以及对接数据、结果数据的三维可视化。利用本发明的交互式数据库系统进行药物虚拟筛选，解决了大数据量下的分子对接实验的时间成本问题，并且运用多种打分函数筛选出最优对接分子，筛选准确性高。(The invention discloses an interactive database system for the whole process of virtual screening of drugs, which comprises a database bottom layer, a data set, a Web background framework and an interactive foreground, wherein the data set comprises a data set capable of directly carrying out a molecular docking experiment, a molecular docking result data set and a molecular docking calculation auxiliary data set. The database system carries out interactive management on the whole molecular docking calculation process, and the foreground integrates data selection of docking calculation, calculation task submission, docking calculation, calculation node checking, calculation core distribution, calculation task deletion, task process checking, calculation task error correction checking, failure task resubmission, result file screening and three-dimensional visualization of docking data and result data. The interactive database system is used for virtual drug screening, the time cost problem of molecular docking experiments under large data volume is solved, optimal docking molecules are screened out by using various scoring functions, and the screening accuracy is high.)

1. Interactive database system towards virtual screening overall process of medicine, its characterized in that: the system comprises a database bottom layer, a data set, a Web background framework and an interactive foreground, wherein the database bottom layer adopts a relational database; the relational database is a MySQL database, and is used for constructing entities, relationships among the entities and specific attributes of the stored entities in an integral database system, storing dok files generated in the molecular docking process and supporting file retrieval and downloading; the data set comprises a data set which can be directly subjected to a molecular docking experiment, a molecular docking result data set and a molecular docking calculation auxiliary data set; the data set that allows direct molecular docking experiments includes the relationships and attributes between various entities in the receptor, ligand, protein, drug, docking site, pocket information, protein-ligand compound.

2. The full-process interactive database system for virtual drug-oriented screening of claim 1, wherein: the interactive foreground is used for carrying out interactive management on the whole molecular docking calculation process and comprises a homepage, a ligand information module, a receptor information module, a result data set module, an online data sharing module, an online molecular docking calculation module and a database introduction module; the online data sharing module is used for uploading data of a molecular docking experiment, and the uploaded data is stored in a database after being audited by managers; the online molecular docking calculation module is used for performing a molecular docking experiment online.

3. The full-process interactive database system for virtual drug-oriented screening of claim 2, wherein: the online molecular docking computing module comprises a data preprocessing tool, an online submitting computing task tool and a resource management tool, wherein the data preprocessing tool comprises a file format conversion tool and is used for converting files in different formats; the online submitting computing task tool is used for storing the processed data into a data set and distributing specific nodes to perform online molecular docking computing; the resource management tool is used for automatically allocating the docking computing resources.

4. The establishment method of the interactive database system facing the whole process of the virtual screening of the drugs is characterized by comprising the following steps:

a. constructing a database bottom layer: the relational database MySQL is adopted at the bottom layer of the database and is responsible for constructing entities, the relations among the entities and the specific attributes of the stored entities in the whole database system;

b. building a Web background framework based on an SSM framework: the SSM framework realizes decoupling by layered design and is divided into five layers including DAOImpl, DAO, serviceImpl, Service and Action layers;

c. and (3) building an interactive foreground facing a user: the system comprises a homepage, a ligand information module, a receptor information module, a result data set module, an online data sharing module, an online molecular docking calculation module and a database introduction module;

d. data acquisition oriented to the whole process of molecular docking: the data stored in the database system comprises international open free ZINC library and PDB library data, molecular docking experiment result data and molecular docking calculation auxiliary data;

e. and (3) entering data: and performing batch formatting processing on the data and storing the data, wherein when the relational database data are recorded, an index is added to a specific field, and the database is optimized in the form of a sub-field and a sub-table.

5. The database system establishment method according to claim 4, wherein: in the step a, a plurality of entity tables and relation tables thereof of a ligand, a receptor, a protein, a drug, a molecular docking result, a docking site, a molecular docking pocket, a protein-ligand compound are newly established in the relational database and are used for displaying the specific attributes of each entity and the relation between the entities.

6. The database system establishment method according to claim 4, wherein: in the step c, scientific research personnel perform molecular docking experiments on line through an on-line molecular docking calculation module; the online data sharing module is used for uploading data of a molecular docking experiment, scientific researchers upload own experimental data according to a specified file format or fill in an Excel document in a specified format, and managers of the database system store the uploaded data into the database and give public notice to the data according to the requirements of data sharers after experts verify that the uploaded data are correct.

7. The virtual drug screening method is carried out by using the interactive database system for the virtual drug screening overall process of claim 3, and comprises the following steps:

step1. establishment of receptor model

Firstly, extracting protein information from a data set to establish a receptor model, wherein the receptor model comprises a macromolecular structure and binding site information; pretreating a protein target structure;

step2. Generation of Small molecule datasets

Before molecular docking, converting ligand molecular data which is extracted from a data set and converted into two dimensions into a three-dimensional structure through a structure conversion program, and then adding hydrogen atoms and charges into the generated three-dimensional structure to complete a three-dimensional small molecular data set for docking;

step3. on-line molecular docking and scoring

step4. post-treatment of hit Compounds

And integrating and storing the selected result information with high score into a database.

8. The virtual drug screening method of claim 7, wherein step1 comprises the following steps:

(1) protein pdb format file pre-processing

Checking whether the protein has a ligand, and when a plurality of molecules exist, distinguishing whether the molecules are the ligands or not;

(2) to see if the protein has mutations

Mutations in proteins are classified into pathological and non-pathological mutations, which can be seen in PDB;

(3) checking whether the amino acid residue has a defect;

(4) removing water molecules in the crystal unless literature reports prove that the water molecules must be kept and ligand heteroatom is removed;

(5) the protein and the cofactor required by the protein work are reserved;

(6) binding pocket sites are defined.

9. The virtual drug screening method according to claim 7, wherein: in the step3 molecular docking calculation process, mutual conversion among files in different formats is realized by utilizing a data preprocessing tool; and submitting the processed data to a data set by using an online submitting computing task tool, and distributing specific nodes to perform molecular docking computation.

10. The virtual drug screening method according to claim 7, wherein: in step3, screening out the results with the docking calculation score meeting the requirement that the Gibbs free energy fraction is less than-8 and the ratio of the Gibbs free energy to the number of heavy atoms is less than-0.3 from the result set data of the fine screening to obtain the small molecules which have the highest activity and are most easy to become drugs for a specific receptor.

Technical Field

The invention belongs to the technical field of computer-aided drug design, and particularly relates to an interactive database system for a whole process of virtual drug screening.

Background

In the field of computer-aided drug design, virtual screening has become a practical tool, namely before biological activity screening is carried out, molecular docking software on a computer is used for simulating the interaction between a target point and a candidate drug and calculating the affinity between the target point and the candidate drug so as to reduce the number of the actually screened compounds and improve the discovery efficiency of lead compounds. In the virtual screening molecular docking experiment process, the related database website provides related basic data for the molecular docking process.

The existing large database websites facing the whole process of virtual screening at home and abroad are few and few, and mainly provide source files and related parameter information necessary for virtual screening of medicines, and the database systems are divided according to the types of the provided data sets, and mainly comprise three types: a receptor database, a ligand database, a receptor-ligand complex database.

The receptor database mainly provides relevant basic Data information of receptor proteins, the current most popular Protein database is RCSB Protein Data Bank, which collects information of thousands of pdb and related ligand micromolecules, the database provides experimental Data of pdb and performs visual processing on corresponding Data, and provides relevant target spot Data information of each pdb molecule, and the database provides basic Data and an auxiliary tool for molecular docking; the ligand micromolecule database mainly comprises ZINC12 and ZINC15 which are two different versions of the same database, and the ZINC database provides basic data of ligands micromolecules and related functions such as downloading, visualization, retrieval and the like; the receptor-ligand complex database collects information on compounds generated in molecular docking experiments mainly on the basis of Protein Data Bank, and the database mainly comprises bindingdb and pdbbind, and both databases contain information on compounds generated in the actual molecular docking process and related pdb and ligands.

Although the existing data set website can basically provide data support for virtual drug screening, the existing data set website has more defects.

1. The method has the advantages of relatively single content, obvious limitation of the molecular library, weak systematicness of the database, and incapability of providing data of a certain aspect, supporting the whole drug virtual screening process and providing a set of data sets which have comprehensive data and can directly carry out a molecular docking experiment for researchers. Scientific researchers have to process and process the data again on the basis of the basic data provided by the data set websites so as to apply the data to the molecular docking process. For example, the receptor database Protein Data Bank lacks information on ligands small molecules and molecular docking result compounds; the ligand database ZINC15 is deficient in support of receptor information; the data underlying the active compounds and their pockets contained in the receptor-ligand compound database are not comprehensive. The content of the existing database is biased to basic data, too biased to the sharing of original molecules, and lacks data of a screening process and a result; and due to lack of interactivity, scientific research personnel can only extract data on the database level, and the workload is increased.

2. Docking software used in the field of virtual drug screening at present is various, and different molecular docking software systems often need basic files with different file format requirements to perform related molecular docking calculation, however, existing data set websites cannot specifically provide files with corresponding formats for each molecular docking website. However, in actual scientific research, researchers often use several different molecular docking software simultaneously in order to improve the accuracy of molecular docking, which puts new requirements on the formats of basic receptor and ligand data.

3. The database website in the current medicine virtual screening field does not integrate the management of the whole virtual screening process, only simply provides inquiry and download of data, lacks interactive experimental management, and scientific research personnel need to inquire and download data from the website and experiment in local, thereby undoubtedly increasing certain workload.

At present, when a scientific research worker performs a virtual drug screening experiment, a data set website system which has a complete data set, can be directly applied to a molecular docking experiment, and can perform an online experiment and an experiment management is urgently needed. The existing database cannot well meet the requirements. The data volume is too large, the molecular structure is difficult to store, and online experimental management becomes a difficult point for realizing the database.

Disclosure of Invention

Aiming at the defects in the prior art, the invention provides an interactive database system and a virtual drug screening method for the whole process of virtual drug screening, wherein the database system integrates all data in the virtual screening process, the complete molecular docking process and the dynamic display of the crystal state before and after docking by utilizing the parallel computing capability of a supercomputer, so that the time cost problem of a molecular docking experiment under large data volume is solved, and the optimal docking molecules are screened out by applying various scoring functions.

In order to achieve the purpose, the invention adopts the technical scheme that: the interactive database system for the whole process of virtual screening of the medicines comprises a database bottom layer, a data set, a Web background framework and an interactive foreground, wherein the database bottom layer adopts a relational database; the relational database is a MySQL database, and is used for constructing entities, relationships among the entities and specific attributes of the stored entities in an integral database system, storing dok files generated in the molecular docking process and supporting file retrieval and downloading; the data set comprises a data set which can be directly used for a molecular docking experiment, a molecular docking result data set and a molecular docking calculation auxiliary data set.

Further, the data set that can be directly subjected to molecular docking experiments includes relationships and attributes between various entities in receptors, ligands, proteins, drugs, docking sites, pocket information, protein-ligand compounds.

Further, the interactive foreground is used for carrying out interactive management on the whole molecular docking calculation process and comprises a homepage, a ligand information module, a receptor information module, a result data set module, an online data sharing module, an online molecular docking calculation module and a database introduction module; the online data sharing module is used for uploading data of a molecular docking experiment, and the uploaded data is stored in a database after being audited by managers; the online molecular docking calculation module is used for performing a molecular docking experiment online. The interactive foreground integrates data selection of butt joint calculation, calculation task submission, butt joint calculation, calculation node viewing, calculation core distribution, calculation task deletion, task viewing, calculation task error correction inspection, failure task resubmission, result file screening and three-dimensional visualization of butt joint data and result data.

Furthermore, the online molecular docking calculation module comprises a data preprocessing tool, an online submitting calculation task tool and a resource management tool, wherein the data preprocessing tool comprises a file format conversion tool for converting files of different formats; the online submitting computing task tool is used for storing the processed data into a data set and distributing specific nodes to perform online molecular docking computing; the resource management tool is used for automatically distributing butt joint computing resources, a user can specify the node resources to carry out computing on the basis of automatically distributing nodes, and the resource management tool can also be used for checking the occupation condition of the computing resources and carrying out interactive management operations such as submitting, checking and deleting task processes.

Furthermore, the foreground is also provided with a visual module for dynamically displaying the crystal state before and after butt joint.

Further, the on-line molecular docking calculation process comprises data selection of docking calculation, calculation task submission, docking calculation, calculation node checking, calculation core distribution, calculation task deletion, task process checking, calculation task error correction inspection, failed task resubmission and result file screening; and connecting the super-computation computing nodes by using VPN (virtual private network) and Shell scripts and providing an API (application programming interface) interface mode. The management of the butt-joint calculation process can be started through the interactive foreground, and the butt-joint operation can be subjected to addition, deletion, modification and check.

The invention also provides a method for establishing the interactive database system for the whole process of the virtual screening of the medicines, which comprises the following steps:

Further, in the step c, scientific research personnel perform a molecular docking experiment on line through an on-line molecular docking calculation module; the online data sharing module is used for uploading data of a molecular docking experiment, scientific researchers upload own experimental data according to a specified file format or fill in an Excel document in a specified format, and managers of the database system store the uploaded data into the database and give public notice to the data according to the requirements of data sharers after experts verify that the uploaded data are correct.

Further, in step a, a plurality of entity tables and relationship tables thereof of ligands, receptors, proteins, drugs, molecular docking results, docking sites, molecular docking pockets, protein-ligand compounds are newly established in the relational database for displaying specific attributes of each entity and relationships between entities.

The invention also provides a virtual drug screening method which is carried out by utilizing the interactive database system facing the whole process of virtual drug screening and comprises the following steps:

step1. establishment of receptor model

the pretreatment process is as follows:

(1) protein pdb format file pre-processing

Checking whether the protein has a ligand, and when a plurality of molecules exist, distinguishing whether the molecules are the ligands or not;

(2) to see if the protein has mutations

Mutations in proteins are classified into pathological and non-pathological mutations, which can be seen in PDB;

(3) checking whether the amino acid residue has a defect;

(4) removing water molecules in the crystal unless literature reports prove that the water molecules must be kept and ligand heteroatom is removed;

(5) the protein and the cofactor required by the protein work are reserved;

(6) binding pocket sites are defined.

step2. Generation of Small molecule datasets

step3. on-line molecular docking and scoring

step4. post-treatment of hit Compounds

And integrating and storing the selected result information with high score into a database.

Further, in step3, the online molecular docking calculation process includes data selection of docking calculation, calculation task submission, docking calculation, checking calculation nodes, calculation core distribution, calculation task deletion, task process checking, calculation task error correction checking, failed task resubmission, and result file screening; the database system is connected with the super-computation computing nodes through VPN and Shell scripts and an API interface mode.

Further, in the step3 molecular docking calculation process, a data preprocessing tool is used for realizing the interconversion between files with different formats; and submitting the processed data to a data set by using an online submitting computing task tool, and distributing specific nodes to perform molecular docking computation.

Further, in step3, screening out the result with the score of Gibbs free energy less than-8 and the ratio of Gibbs free energy to heavy atom number less than-0.3 from the result set data of fine screening to obtain the small molecule with highest activity and easy to become medicine for a specific receptor.

Compared with the prior art, the invention has the advantages that:

(1) the scope and content of the database are expanded: besides storing ligand and receptor files of the docking experiment, the docking experiment also comprises auxiliary information such as docking sites, pockets and the like; the parallel computing power of a super computer is utilized to realize the large data volume complete storage of ligand, receptor, docking site, pocket information and docking results and the integration of a public database website, a database system integrates all data of a virtual screening process, a complete molecule docking process and the dynamic display of crystal states before and after docking, and particularly in receptor data, a large number of protein conformation files which are calibrated and preprocessed are provided; the invention solves the time cost problem of molecular docking experiments under large data volume, and screens out optimal docking molecules by applying various scoring functions, thereby improving the calculation efficiency and the accuracy of screening small molecules.

(2) The database system of the invention provides a set of complete data set which can be directly used for molecular docking experiments, format conversion is not needed, the retrieval speed is high, the online molecular docking calculation speed is high, and the drug screening time is shortened.

(3) The database system has the functions of inquiring, visually displaying and sharing molecular data sets such as ligands, receptors and the like for medicament virtual screening, and inquiring, displaying and sharing the butt joint and screening results; the MySQL is optimized in a table-splitting mode to improve retrieval speed, and is directly stored in the file formats of mol2, pdb, dok and the like of molecules, so that a user can conveniently and quickly download related data sets.

(4) The database system of the invention can automatically allocate computing resources when performing molecular docking computation, and mainly aims at a super computing allocation computing core and a resource allocation principle: resource average distribution, load balancing and user specified core; the traditional molecular docking is oriented to single-core operation, and can hardly be completed for molecular docking with large data volume.

(5) The database is provided with an online molecular docking calculation module, scientific research personnel can perform molecular docking experiments on line through the online molecular docking calculation module, and the user specified resource calculation is supported on the basis of automatically allocating resources; and the management of the butt joint calculation process can be directly started through the website, the occupation condition of the calculation resources can be checked, and interactive management operations such as task processes are submitted, checked and deleted, so that the manual experiment operation process is simplified, and the efficiency is greatly improved.

(6) The database system is provided with an online data sharing module, allows other scientific researchers to upload open data including receptor and ligand data, molecular docking experiment result data and the like at any time, and has certain error correction and inspection functions.

(7) The database system interactively manages the whole butt joint calculation process, and an interactive foreground website integrates data selection of butt joint calculation, calculation task submission, butt joint calculation, calculation node checking, calculation core distribution, calculation task deletion, task process checking, calculation task error correction check, failure task resubmission, result file screening and three-dimensional visualization of butt joint data and result data.

Drawings

FIG. 1 is a functional block diagram of an interactive database system of the present invention.

FIG. 2 is a flow chart of the virtual drug screening process of the present invention.

Detailed Description

The invention is further described with reference to the following figures and specific embodiments.

The interactive database system for the whole process of virtual screening of the medicine is a database system for the whole process of molecular docking, which is established by combining the actual molecular docking calculation requirements of virtual screening of the medicine on the basis of a public data set, and the calculation requirements of hundreds of millions of orders of magnitude of medicine screening and the storage requirements of hundreds of millions of files are realized on the basis of a supercomputer and domestic many-core.

The supercomputer center has tens of thousands of computing nodes, has strong computing power, greatly improves the efficiency of molecular docking computing, and ensures the efficient and smooth implementation of virtual drug screening. After comparison experiments are carried out by scientific researchers, the same molecular docking task can be completed only by spending a few days on a single i7 processor, and the calculation task is distributed to 10000 calculation nodes of a super calculation center cluster for processing, so that the calculation task can be completed within a few hours. In the research and development of new drugs, the traditional method is often required to prepare drugs for several years or even more than ten years, and the method of virtual drug screening is applied to the research and development process of new drugs, and the affinity of drug targets and active small molecules of drug effects is simulated through the operational capacity of a supercomputer, so that the method can greatly improve the efficiency, save the cost and shorten the pharmaceutical cycle.

The calculation amount required in the process of virtual screening of the drugs is quite large, and the calculation task is difficult to be rapidly completed only by means of the ten-thousand intel nodes on the super computer, so that the overall progress of virtual screening of the drugs can be accelerated undoubtedly by transplanting molecular docking calculation to domestic many cores. Meanwhile, the molecular docking software (such as vina and ledock) suitable for the Intel processor is transplanted to the domestic Shenweizhong nucleus, which is beneficial to the autonomous development of the pharmaceutical industry in China.

The composition and construction of the database system of the present invention are described below.

Database system

The database system comprises a database bottom layer, a data set, a Web background framework and an interactive foreground, wherein the database bottom layer adopts a relational database MySQL database; the relational database is used for constructing entities, relationships among the entities and specific attributes of the stored entities in an integral database system, storing dok files generated in the molecular docking process and supporting file retrieval and downloading. Because a database system needs to present the relationships among entities such as receptors, ligands, proteins, drugs and the like and the attributes thereof, while a non-relational database, mongoDB, is relatively deficient in representing the relationships among the entities, entity tables such as ligands, receptors, proteins, drugs, docking points, protein-ligand compounds and the like and relationship tables thereof are newly built in the database system for displaying the specific attributes of each entity and the relationships among the entities.

The database system covers a complete data set, and the data set comprises a data set capable of directly carrying out a molecular docking experiment, a molecular docking result data set and a molecular docking calculation auxiliary data set.

Data sets for molecular docking experiments can be performed directly: including the receptor, ligand, protein, drug, docking site, pocket information, relationships and attributes between various entities in the protein-ligand compound. The database system integrates international open free ZINC library and PDB library data, integrates base data of PDB and Ligandds, and can directly utilize the existing data to perform molecular docking calculation on the system while supporting browsing and downloading of related data. The Ligands data come from a ZINC15 data set website, and 1660 or more than ten thousand Ligands ligand small molecule data, thousands of pdb data, protein information, drug information and other data in a ZINC library. pdb Data is derived from the RCSB Protein Data Bank, drug and Protein Data is derived from the RCSB Protein Data Bank and other Data set websites authoritative in the field.

Molecular docking results dataset: the molecular docking result set data comes from a molecular docking experiment, the data integrates experimental data obtained in the molecular docking process of ledock and schrodinger, the experimental data comprise docking data of 1660 tens of thousands of marine small molecules and thousands of pdb, and besides all dok files generated by molecular docking, specific scores in result files are stored in a relational database, for example: docking score, mmgbsa, etc., which facilitates the user in small molecule screening. And the other part is the molecular docking data which is uploaded in the foreground and passes the audit.

Molecular docking computational assistance dataset: the molecular docking auxiliary calculation data mainly comprise preparation files dock.in of molecular docking ledock and lattice files, protein information, drug information and the like required by the Schrodinger calculation process.

As shown in fig. 1, the interactive foreground is used for performing interactive management on the whole molecular docking calculation process, integrating data selection of docking calculation, calculation task submission, docking calculation, checking calculation nodes, allocating calculation cores, deleting calculation tasks, checking task processes, error correction checking of calculation tasks, resubmission of failed tasks, result file screening, and three-dimensional visualization of docking data and result data.

The interactive foreground comprises a homepage, a Ligands ligand information module, a pdb receptor information module, a results data set module, an online data sharing module, an online molecular docking calculation module and a database profile module, wherein the homepage mainly displays news dynamics and related information, for example: latest scientific achievements show, news, etc.; the database profile module is mainly a software introduction. And the foreground adopts a UI frame easy UI, and the database system is subjected to standardized processing through the foreground frame. In the actual database system development process, the display of foreground data is optimized: the acquired related codes of the back-end data are optimized, json data are returned according to the actual requirements of the user, the display speed of tens of millions of levels of data is increased, and the problem of blockage under large data volume is solved.

(1) The ligand information module integrates the detailed data of 1660 tens of thousands of ZINC micromolecules, and supports the operations of conditional query, retrieval, viewing, downloading and the like of ligand micromolecule information;

(2) the pdb receptor information module presents detailed data for thousands of receptors, for example: the times of discovery, the two-dimensional structure, the three-dimensional structure, the type, the protein, the related medicine information and the like, and the receptor module supports the operations of receptor condition query, viewing, downloading and the like;

(3) the results data set module covers a series of scoring data generated in the docking process, wherein the scoring data comprises molecular docking results of ledock software, screening data generated by the software is classified as 'primary screening' data, experimental data of Schrodinger molecular docking calculation is further included, and related data is divided into 'fine screening' data. The method supports docking result query, docking result three-dimensional display and the like, a user can accurately search the molecular docking data according to the search conditions, the search conditions comprise 'screening stage, pdbid and ZINCID', and the accurate search of the molecular docking data can be realized through the search in such a way.

(4) The online data sharing module is mainly used for facilitating scientific research personnel to upload data of molecular docking experiments, the scientific research personnel upload own experimental data according to a specific file format or fill in Excel documents in a specified format, and managers of the database system can store the uploaded data into the database after relevant experts verify that the uploaded data are correct, and can perform public presentation on the data according to requirements of data sharing personnel. The online data sharing module also supports the downloading of ligand small molecule mol2 files, protein conformation pdb files and docking result dok files.

(5) The on-line molecular docking calculation module provides scientific research personnel with a set of tools for carrying out molecular docking experiments on line, and the tools comprise a data preprocessing tool, an on-line submitting calculation task tool and a resource management tool, wherein the data preprocessing tool comprises a file format conversion tool and is used for converting files with different formats; the on-line submitting computing task tool is used for storing the processed data into a data set and distributing specific nodes for molecular docking computing; the resource management tool is used for automatically distributing and utilizing the super-computation resources, basically comprises resource average distribution, load balance and support of a specified computation core, a user can specify node resources to compute on the basis of automatically distributing nodes, management of a butt-joint computation process can be directly started through a website, and the resource management tool can be used for checking the occupation condition of the computation resources and performing interactive management operations such as submitting, checking and deleting task processes. After processing the data of the ligand and the receptor according to a specified format (or directly using the data provided by a database system), scientific researchers can perform a molecular docking experiment on line by using ledock _ go, ledock _ pro and Schrodinger according to the actual requirements of the scientific researchers. The user of the database system needs to acquire the related authority in advance to submit the molecular docking calculation task on line. The user can view the progress of the molecular docking.

The foreground is also provided with a visual module for dynamically displaying the crystal states before and after docking, and the visual module comprises small molecule two-dimensional structure display, small molecule three-dimensional structure display, protein conformation two-dimensional structure display, protein conformation three-dimensional structure display, docking result three-dimensional display and the like.

Second, database system establishment method

The database system establishing step includes:

a. constructing a database bottom layer: the relational database MySQL is adopted at the bottom layer of the database, and is responsible for constructing entities and the relationship between the entities, storing the specific attributes of the entities, and creating an entity table and a relationship table thereof for displaying the specific attributes of each entity and the relationship between the entities in the whole database system. In the relational database system, there are entities such as Ligands, Protein, pdb, results, medicine, pocket, etc., which respectively represent ligand small molecules, proteins, receptor pdb idea, molecular docking results, drugs, molecular docking pocket entities.

Ligands is information of ligand small molecules, and entities contain attribute information such as hydrogen bonds, rotational energy, van der waals force, purchasability and the like; pdb describes a target entity, and attribute information such as pdb year, concept, file in pdb format and the like is related in the database table; the results entity mainly comprises molecular docking result data, wherein the molecular docking result data mainly comprises primary screening data and fine screening data; the protein entity stores the related information of the protein, the data is used for auxiliary calculation, a user of the database system can find corresponding protein data when looking up the data of the pdb, and the protein entity data table comprises data of the type, the year, the organizational structure, the crystal structure and the like; the data sheet related to the medicine entity mainly comprises information of some medicines, types of the medicines, research and development organization mechanisms, years of finished medicines, description of drug effects and the like;

in order to perfect the logical structure between database entities, an m _ protein _ medicine many-to-many logical relationship table is created in the database, and the data table is used for realizing the corresponding relationship between proteins and medicines.

b. Building a Web background framework based on an SSM framework: the SSM framework realizes decoupling by layered design and is divided into five layers including DAOImpl, DAO, serviceImpl, Service and Action layers. The layered design mainly aims at achieving decoupling, a high-cohesion and low-coupling structure is beneficial to project optimization, and decoupling design is more efficient when project requirements change. The DAO and Service layers are interface layers and are mainly used for transferring parameters and realizing decoupling. The serviceImpl and the DAOImpl are service logic implementation layers, the DAOImpl performs data interaction with the bottom layer of the SSM framework, and the sql optimization implemented by the DAOImpl layer can greatly repeat and improve the overall performance of the system. The Action layer is responsible for realizing specific Service logic on the basis of integrating the Service interface layer, and the Action layer realizes data interaction work of data and foreground UI.

c. And (3) building an interactive foreground facing a user: the system comprises a homepage, a Ligands ligand information module, a pdb receptor information module, a results data set module, an online data sharing module, an online molecular docking calculation module and a database introduction module. d. Data acquisition oriented to the whole process of molecular docking: the data stored in the database system comprises international open free ZINC library and PDB library data, molecular docking experiment result data and molecular docking calculation auxiliary data.

e. And (3) entering data: the database system relates to the work of inputting and searching ten million levels of data, wherein the Ligandds data has 1660 and over ten thousand, and each receptor in pdb format is docked with 1660 and over 1660 ligand micromolecule docking to generate 1660 and over ten thousand dok files. In general, database systems contain data on the order of tens of TB, in which case it is undoubtedly a challenge to normalize and methodically log the mass of data into the database system. In the actual data normalization processing and entry process, data is formatted in batches, and batch data is stored, for example: a program for reading system files in batch and storing the system files in a database, a data formatting preprocessing program and the like.

In the process of batch data entry, the problem that the retrieval speed is extremely low when ten million levels of data are stored in the relational database exists. In the case of tens of millions of records, in order to improve the retrieval speed, indexes are added to specific fields in the database to improve the retrieval speed, and the database is optimized in the form of sub-fields, sub-tables and the like.

Third, virtual screening method for medicine

The interactive database system provides a whole set of auxiliary molecular docking calculation tools such as a file preparation tool, a data preprocessing tool, an online submission calculation task tool and the like for the molecular docking process.

(1) Molecular docking preparation document: the method is characterized in that pdb files containing receptor three-dimensional data information, dock in files marked with pocket docking point information, ligand small molecule mol2 and files in an sdf format, pdb and lattice files supporting Schrodinger calculation and the like are processed. The molecular docking preparation file can be directly applied to two types of software of ledocck and Schrodinger to carry out molecular docking calculation. The existing molecular docking preparation file provides convenience for docking calculation and accelerates the progress of molecular docking calculation.

(2) A data pre-processing tool: files in various formats are often required in actual molecular docking calculations, for example: the ligand small molecule has the formats of mol2, sdf and the like. The database system provides a file format conversion tool which supports mutual conversion among data formats such as mol2, sdf and the like.

(3) And (3) online submission of a calculation task tool: in the process of virtual drug screening, processed data needs to be submitted to a data cluster, and specific nodes are allocated for molecular docking calculation. This calculation process requires a series of ancillary calculation tools. Such as file resource allocation tools, computing resource allocation tools, docking result verification tools, docking result screening tools, and the like. The method mainly comprises the steps of remotely connecting a supercomputer of a national laboratory of Qingdao ocean science and technology through a VPN technology under the condition of having permission, utilizing scripts such as shell and python through an API interface provided in advance, finishing checking calculation nodes, distributing calculation cores, inquiring data, selecting data of butt joint calculation, submitting calculation tasks, performing butt joint calculation, deleting calculation tasks, checking task processes, checking calculation task error correction and verification, re-submitting failed tasks, checking butt joint results and screening butt joint results aiming at a supercomputer Linux system, and finishing the on-line molecular butt joint calculation process.

As shown in fig. 2, the virtual drug screening method using the database system includes the following steps:

step1. establishment of receptor model

the pretreatment process is as follows:

(1) protein pdb format file pre-processing

Checking whether the protein has a ligand, and when a plurality of molecules exist, distinguishing whether the molecules are the ligands or not;

(2) to see if the protein has mutations

Mutations in proteins are classified into pathological and non-pathological mutations, which can be seen in PDB;

(3) checking whether the amino acid residue has a defect;

(4) removing water molecules in the crystal unless literature reports prove that the water molecules must be kept and ligand heteroatom is removed;

(5) the protein and the cofactor required by the protein work are reserved;

(6) binding pocket sites are defined.

step2. Generation of Small molecule datasets

step3. molecular docking and scoring

Placing all molecules in the generated small molecule data set at the binding site of the receptor to simulate the binding effect and predict the conformation of the complex generated by the receptor and the ligand; during molecular docking calculation, firstly, primarily screening data of a ligand and a receptor through a ledock _ pro and ledock _ go scoring function to obtain primarily screened result set data; then, performing molecular docking calculation on the result set data obtained by primary screening by using Schrodinger, wherein the process is called as fine screening; and finally, screening out result information with high score and good docking result from the result set data of fine screening: namely screening out the results that the docking calculation score meets the condition that the Gibbs free energy fraction is less than-8 and the ratio of the Gibbs free energy to the number of heavy atoms is less than-0.3, and obtaining the micromolecule which has the highest activity for a specific receptor and is most easy to become a medicament. The virtual screening method for molecule docking calculation by combining multiple docking methods improves the accuracy of screening small molecules and avoids screening errors caused by inaccurate calculation of single software.

Step4. post-treatment of hit Compounds

And integrating and storing the selected result information with high score into a database.

1600 ten thousand ZINC small molecules and a plurality of target structures are selected and processed to carry out molecular docking on protein conformations, each protein generates 1600 ten thousand scoring result files, and in view of the time complexity of molecular docking calculation, the patent divides the docking calculation of each protein conformation and all small molecules into 2500 calculation cores of a super computer to carry out calculation in parallel. Time after parallel: the calculation time of the ledock _ pro function is prolonged to about 3 days, the calculation time of the ledock _ go is prolonged to about 2 days, the efficiency is improved by 2500 times, and the virtual screening of the protein conformation to the large-scale small molecule library is realized.

It is understood that the above description is not intended to limit the present invention, and the present invention is not limited to the above examples, and those skilled in the art should understand that they can make various changes, modifications, additions and substitutions within the spirit and scope of the present invention.

14页详细技术资料下载

上一篇：一种医用注射器针头装配设备

下一篇：基于边界元法的单层碳纤维增强塑料电导率模型构建方法

Interactive database system for whole process of virtual screening of drugs

相关技术

网友询问留言