Inverse synthetic problem solving method and device based on improved Monte Carlo reinforcement learning method

文档序号:1818173 发布日期:2021-11-09 浏览:41次 中文

阅读说明:本技术 基于改进的蒙特卡罗强化学习方法的逆合成问题求解方法及装置 (Inverse synthetic problem solving method and device based on improved Monte Carlo reinforcement learning method ) 是由 刘娟 张蔷 杨志辉 冯晶 于 2021-07-05 设计创作,主要内容包括:本发明提供的基于改进的蒙特卡罗强化学习方法的逆合成问题求解方法及装置,方法包括:步骤1.将待求解逆合成的目标化合物作为根节点,选择改进的UCT函数值最高的作为最佳子节点;步骤2.执行扩展动作产生新的子节点;步骤3.若迭代产生的所有产物都出现在代谢空间代谢物底盘菌株上,即得到了最终结果,将根据奖励政策返还奖励或罚款;否则,从可用的变换中随机采样反应规则,应用于当前化合物;计算子节点的策略产物和代谢物底盘菌株或可买到的化学产物的集合的Tanimoto得分;剔除排序靠后的反应规则;步骤4.将当前节点获得的Tanimoto得分返回给其父节点;步骤5.循环直到终止,得到逆合成求解结果。(The invention provides an inverse synthetic problem solving method and device based on an improved Monte Carlo reinforcement learning method, wherein the method comprises the following steps: step 1, taking a target compound to be solved and inversely synthesized as a root node, and selecting an improved UCT with the highest function value as an optimal child node; step 2, executing the expansion action to generate a new child node; step 3, if all products generated by iteration appear on metabolic space metabolite chassis strains, a final result is obtained, and rewards or fines are returned according to a reward policy; otherwise, randomly sampling a reaction rule from available transformations, and applying the reaction rule to the current compound; calculating Tanimoto scores for the strategy products and metabolite chassis strains of the sub-nodes or a collection of commercially available chemical products; rejecting reaction rules in the back of the sequence; step 4, returning the Tanimoto score obtained by the current node to the father node of the current node; and 5, circulating until the end, and obtaining an inverse synthesis solving result.)

1. The method for solving the inverse synthetic problem based on the improved Monte Carlo reinforcement learning method is characterized by comprising the following steps of:

step 1, selecting: taking a target compound to be solved and inversely synthesized as a root node, calculating improved UCT function values of all nodes from the root node, and selecting the improved UCT function value with the highest value as an optimal child node so as to determine an intermediate product until reaching a leaf node which corresponds to a product existing on a metabolic space metabolite chassis strain; the improved UCT function is as follows:

in the formula, node viIs the ith child of node v, Q (-) is a function of the cumulative value of the acquisition node, N (-) is a function of the cumulative number of accesses of the acquisition node, TiIs the Tanimoto score for that node, C is a weight parameter;

step 2, expanding: taking the optimal child node as a node to be expanded, determining a reaction rule which is not expanded by the current child node in the current metabolic space as an unexpanded action, and then executing an expansion action on the node to be expanded to generate a new child node;

and 3, simulating: checking from a start state for an iterative process; if all products generated by iteration in the inverse synthesis iteration process appear on metabolic space metabolite chassis strains, the final result is obtained, and the reward is returned according to a reward policy; if no final result is obtained, randomly sampling a reaction rule from available transformations, and applying the reaction rule to the current compound; calculating sub-node viRandom simulated variation of (a) to yield a strategy product M and a metabolite chassis strain in the metabolic space or a collection of commercially available chemical products S ═ S (S)1,S2,…Sn) Tanimoto score, the corresponding formula is as follows:

Ti=min E(Si,M) (2)

where E (-) is the Tanimoto score function;

the Tanimoto score is brought into an improved UCT function, reaction rules are sorted according to the height of the improved UCT function value, reaction rules which are unlikely to occur are removed from the back of the sorting, and the process is executed and repeated until the maximum expansion step number or the maximum depth of the tree is reached;

and 4, updating: returning the improved UCT function value or Tanimoto score obtained by the current node to the father node of the current node to update the value and the access times of the current node, and taking the updated value and the access times as the basis for selecting nodes in the next iteration;

and 5, circulating based on the steps 1 to 4 until a circulation termination condition is reached to obtain an inverse synthesis solving result.

2. The improved Monte Carlo reinforcement learning method-based inverse synthetic problem solving method according to claim 1, wherein:

wherein, in step 3, the Tanimoto score is calculated using the open source chemistry informatics kit RDKit in Python using an extended ligation fingerprint of diameter 4.

3. The improved Monte Carlo reinforcement learning method-based inverse synthetic problem solving method according to claim 1, wherein:

wherein, in step 3, E (-) is specifically:

where n is the length of the molecular sequence calculated for the compound using the open source chemistry kit RDKit in Python using the extended ligation fingerprint ECFP with a diameter of 4.

4. The improved Monte Carlo reinforcement learning method-based inverse synthetic problem solving method according to claim 1, wherein:

in step 3, the reaction rules are sorted according to the improved UCT function value, and those reaction rules which are sorted after 10 and are unlikely to occur are removed.

5. The improved Monte Carlo reinforcement learning method-based inverse synthetic problem solving method according to claim 1, further comprising:

a standardization step: standardizing all compounds in the metabolic database;

a reaction rule coding step: extracting all known biochemical reactions with complete reaction information from a standardized metabolic database, identifying atoms which change the configuration of the reaction as reaction centers by using atom-atom mapping executed by reaction decoder software, defining atoms around the reaction centers by bond distance, and coding the chemical reactions into a set of reaction rules by using a SMARTS form;

a metabolic space expanding step: the reaction rule is applied to all compounds in the metabolic database, generating a template for the reaction rule.

6. An inverse synthetic problem solving device based on an improved Monte Carlo reinforcement learning method is characterized by comprising the following steps:

a selection module, which takes the target compound to be solved and inversely synthesized as a root node, calculates the improved UCT function value of each node from the root node, and selects the improved UCT function value with the highest value as the best child node so as to determine an intermediate product until reaching a leaf node which corresponds to a product existing on the metabolic space metabolite chassis strain; the improved UCT function is as follows:

in the formula, node viIs the ith child of node v, Q (-) is a function of the cumulative value of the acquisition node, N (-) is a function of the cumulative number of accesses of the acquisition node, TiIs the Tanimoto score for that node, C is a weight parameter;

the expansion module is used for determining a reaction rule which is not expanded by the current child node in the current metabolic space as an unexpanded action by taking the optimal child node as the node to be expanded, and then executing an expansion action on the node to be expanded to generate a new child node;

a simulation module for performing an iterative process from a start state check; if all products generated in the iterative process appear on metabolic space metabolite chassis strains, the final result is obtained, and the reward is returned according to a reward policy; if no final result is obtained, randomly sampling a reaction rule from available transformations, and applying the reaction rule to the current compound; calculating sub-node viRandom simulated variation of (a) to yield a strategy product M and a metabolite chassis strain in the metabolic space or a collection of commercially available chemical products S ═ S (S)1,S2,…Sn) Tan of (2)imoto score, the corresponding formula is as follows:

Ti=minE(Si,M) (2)

where E (-) is the Tanimoto score function;

then, the Tanimoto score is brought into an improved UCT function, reaction rules are sorted according to the improved UCT function value, reaction rules which are unlikely to occur are eliminated, and the process is executed and repeated until the maximum expansion step number or the maximum depth of the tree is reached;

the updating module returns the improved UCT function value or Tanimoto score obtained by the current node to the father node of the node so as to update the value and the access times of the node as the basis for selecting the node in the next iteration; and

and the control module is in communication connection with the selection module, the expansion module, the simulation module and the updating module and controls the selection module, the expansion module, the simulation module and the updating module to circularly operate and process until a circulation termination condition is reached to obtain an inverse synthesis solving result.

7. The apparatus for solving the inverse synthetic problem based on the improved monte carlo reinforcement learning method according to claim 6, wherein:

wherein, in the simulation module, the Tanimoto score is calculated using the open source chemistry informatics toolkit RDkit in Python with an extended connection fingerprint of diameter 4.

8. The apparatus for solving the inverse synthetic problem based on the improved monte carlo reinforcement learning method according to claim 6, wherein:

wherein, in the simulation module, E (-) is specifically:

where n is the length of the molecular sequence calculated for the compound using the open source chemistry kit RDKit in Python using the extended ligation fingerprint ECFP with a diameter of 4.

9. The apparatus for solving the inverse synthetic problem based on the improved monte carlo reinforcement learning method according to claim 6, further comprising:

and the input display is in communication connection with the control module, enables a user to input an operation instruction and displays the solved inverse synthesis result according to the operation instruction.

10. The apparatus for solving the inverse synthetic problem based on the improved monte carlo reinforcement learning method according to claim 6, further comprising:

a preprocessing module: standardizing all compounds in the metabolic database; extracting all known biochemical reactions with complete reaction information from a standardized metabolic database, identifying atoms which change the configuration of the reaction as reaction centers by using atom-atom mapping executed by reaction decoder software, defining atoms around the reaction centers by bond distance, and coding the chemical reactions into a set of reaction rules by using a SMARTS form; the reaction rule is applied to all compounds in the metabolic database, generating a template for the reaction rule.

Technical Field

The invention belongs to the technical field of organic chemistry inverse synthesis solving, and particularly relates to an inverse synthesis problem solving method and device based on an improved Monte Carlo reinforcement learning method.

Technical Field

Organic synthesis is the core content in the field of organic chemistry, and inverse synthesis is an important method for solving the problem of organic synthesis. The goal of the reverse synthesis scheme is to find a reverse synthesis path from the target molecule to the available starting material.

In recent years, deep learning techniques have been gradually introduced into inverse synthetic analysis, which can be roughly classified into two categories: 1) a rule-based two-step model; 2) fully data driven end-to-end analysis.

Both methods use a training set of known reactions to identify the inverse mapping of the synthesis reaction of a given product to an unknown reactant. The first method can be divided into two separate steps: the first step is to sort the reaction templates by experts or to automatically extract the templates from the database using machine learning methods. In the second step, the target molecule is retro-synthesized to a simpler reaction precursor based on the template. The automatic extraction of the reaction rules from the database is a more mainstream mode at present, and in 2017, Waller and the like train 350 ten thousand collected reaction data by using a deep neural network model and can automatically extract templates. Subsequently, the Waller topic group attempted to search synthetic routes for 40 drug-like molecules using monte carlo tree search and deep neural network strategies.

With the development of social science, a complete end-to-end method based on a neural network is gradually developed. The chemical structures of the product and reactants are coded in the (SMILES) chemical language and the inverse synthesis problem is equivalent to finding a transition path from the character coded product to the character coded reactant. In 2017, Liu et al established an end-to-end sequence model for converting the reaction product SMILES to the reactant SMILES. For a given synthetic target molecule, the reverse predictive model may recursively generate branch reactant sequences until the growing reverse synthetic tree reaches a prescribed set of purchasable compounds. The commonly used monte carlo tree search can efficiently identify chemically rational synthetic routes from an infinite search tree. Such as: mathilde et al in 2019 apply the Monte Carlo tree reinforcement learning method to the neural network, and a good effect is achieved.

At present, solving the inverse synthesis problem by using a Monte Carlo reinforcement learning method becomes one of hot spots, but the problems of large search amount and low search efficiency exist when the method is used for solving, so that the development of the method is greatly restricted, and urgent solution is needed.

Disclosure of Invention

The present invention has been made to solve the above-mentioned problems, and an object of the present invention is to provide a method and an apparatus for solving an inverse synthetic problem based on an improved monte carlo reinforcement learning method, which can effectively reduce a search space of monte carlo reinforcement learning and greatly improve search efficiency.

In order to achieve the purpose, the invention adopts the following scheme:

< method >

As shown in fig. 1 and 2, the present invention provides an inverse synthetic problem solving method based on an improved monte carlo reinforcement learning method, which is characterized by comprising the following steps:

step 1, selecting: taking a target compound to be solved and inversely synthesized as a root node, calculating improved UCT function values of all nodes from the root node, and selecting the improved UCT function value with the highest value as an optimal child node so as to determine an intermediate product until reaching a leaf node which corresponds to a product existing on a metabolic space metabolite chassis strain; the improved UCT function is as follows:

in the formula, node viIs the ith child of node v, Q (-) is a function of the cumulative value of the acquisition node, N (-) is a function of the cumulative number of accesses of the acquisition node, TiIs the Tanimoto score for that node, C is a weight parameter;

step 2, expanding: taking the optimal child node as a node to be expanded, determining a reaction rule which is not expanded by the current child node in the current metabolic space as an unexpanded action, and then executing an expansion action on the node to be expanded to generate a new child node;

and 3, simulating: checking from a start state for an iterative process; if it isAll products generated by iteration in the inverse synthesis iteration process appear on metabolic space metabolite chassis strains, namely the final result is obtained, and rewards or fines are returned according to a reward policy; if no final result is obtained, randomly sampling a reaction rule from available transformations, and applying the reaction rule to the current compound; calculating sub-node viRandom simulated variation of (a) the resulting strategy product M (corresponding to the node shown by the dashed circle in fig. 2) and the metabolite chassis strain in the metabolic space or the set of commercially available chemical products S ═ S (S)1,S2,…Sn) Tanimoto score, the corresponding formula is as follows:

Ti=min E(Si,M) (2)

where E (-) is the Tanimoto score function;

the Tanimoto score is brought into an improved UCT function, reaction rules are sorted according to the height of the improved UCT function value, reaction rules which are unlikely to occur are removed from the back of the sorting, and the process is executed and repeated until the maximum expansion step number or the maximum depth of the tree is reached;

and 4, updating: returning the improved UCT function value or Tanimoto score obtained by the current node to the father node of the current node to update the value and the access times of the current node, and taking the updated value and the access times as the basis for selecting nodes in the next iteration;

and 5, circulating based on the steps 1 to 4 until a circulation termination condition is reached (the maximum circulation iteration number is reached or the termination time is reached), and obtaining an inverse synthesis solving result.

Preferably, the inverse synthetic problem solving method based on the improved monte carlo reinforcement learning method provided by the invention can further have the following characteristics: in step 3, the Tanimoto score is calculated using the open source chemistry informatics kit RDKit in Python with an extended ligation fingerprint of diameter 4.

Preferably, the inverse synthetic problem solving method based on the improved monte carlo reinforcement learning method provided by the invention can further have the following characteristics: in step 3, E (-) is specifically:

where n is the length of the molecular sequence calculated for the compound using the open source chemistry kit RDKit in Python using the extended ligation fingerprint ECFP with a diameter of 4.

Preferably, the inverse synthetic problem solving method based on the improved monte carlo reinforcement learning method provided by the invention can further have the following characteristics: in step 3, the reaction rules are sorted according to the improved UCT function value, and those reaction rules which are less likely to occur and are sorted after 10 are removed (namely, 10 reaction rules with top scores are selected).

Preferably, the method for solving the inverse synthetic problem based on the improved monte carlo reinforcement learning method provided by the present invention may further include:

a standardization step: standardizing all compounds in the metabolic database;

first, all given target molecules are normalized and biochemical reactions with known reaction information are extracted from a database (e.g., MetaNetX database). The reaction center (reaction center refers to the atom that changes its configuration when a chemical reaction occurs. that is, those atoms that are attached to bonds that are broken, formed, or change order, and that change charge and stereochemistry when a reaction occurs) is then identified, and the simplest way to control the abstraction of the reaction substrate is to encode the reaction around its center. This entails compiling a list of atoms belonging to the reaction center, i.e. atoms that change their configuration upon reaction, identifying the reaction center based on an atom-atom mapping performed by the reaction decoder software, and defining the atoms around the reaction center with a bond distance. Instead of reversing the multi-product reaction in the retro-synthetic approach, an extended metabolic space metabolite chassis strain was constructed using the reaction rules. CO-materials and CO-products (e.g. water, CO)2ATP, NADP, etc.) can be omitted from the rules, assuming they are available in the cell;

a reaction rule coding step: extracting all known biochemical reactions with complete reaction information from the standardized metabolic database, identifying atoms which change the configuration of the reaction as reaction centers by using atom-atom mapping executed by reaction decoder software, defining atoms around the reaction centers by bond distance, and encoding the chemical reactions into a set of reaction rules by using SMARTS form, wherein the diameter around the reaction centers ranges from 2 to 16 (indicating that the chemical bonds of the reaction centers are reserved with 2-16 bonds);

a metabolic space expanding step: the reaction rule is applied to all compounds in the metabolic database, a template of the reaction rule is generated, and the metabolic space is expanded. The number of rules returned depends on the parameters of compound-reactive enzyme scrambling (diameter) in the database. Multiple generated rules may belong to the same EC class, and one rule may correspond to multiple EC classes.

< apparatus >

Further, the present invention provides an inverse synthetic problem solving apparatus based on the improved monte carlo reinforcement learning method, which is characterized by comprising:

the selection module is used for taking a target compound to be solved and subjected to inverse synthesis as a root node, calculating the UCT function value of each node from the root node, and selecting the highest UCT function value as the optimal child node so as to determine an intermediate product until the leaf node is reached, wherein the leaf node corresponds to a product existing on a metabolic space metabolite chassis strain; the UCT function is as follows:

in the formula, node viIs the ith child of node v, Q (-) is a function of the cumulative value of the acquisition node, N (-) is a function of the cumulative number of accesses of the acquisition node, TiIs the Tanimoto score for that node, C is a weight parameter;

the expansion module is used for determining a reaction rule which is not expanded by the current child node in the current metabolic space as an unexpanded action by taking the optimal child node as the node to be expanded, and then executing an expansion action on the node to be expanded to generate a new child node;

a simulation module for performing an iterative process from a start state check; if all products generated in the iterative process appear on metabolic space metabolite chassis strains, the final result is obtained, and the reward is returned according to a reward policy; if no final result is obtained, randomly sampling a reaction rule from available transformations, and applying the reaction rule to the current compound; calculating sub-node viRandom simulated variation of (a) to yield a strategy product M and a metabolite chassis strain in the metabolic space or a collection of commercially available chemical products S ═ S (S)1,S2,…Sn) Tanimoto score, the corresponding formula is as follows:

Ti=min E(Si,M) (2)

where E (-) is the Tanimoto score function;

then, the Tanimoto score is brought into a UCT function, reaction rules are sorted according to the height of the UCT function value, reaction rules which are unlikely to occur are removed from the reaction rules which are sorted later, and the process is executed and repeated until the maximum expansion step number or the maximum depth of the tree is reached;

the updating module returns the UCT function value or Tanimoto score obtained by the current node to the father node of the current node so as to update the value and the access times of the current node, and the updated value and the access times are used as the basis for selecting the node in the next iteration; and

and the control module is in communication connection with the selection module, the expansion module, the simulation module and the updating module and controls the selection module, the expansion module, the simulation module and the updating module to circularly operate and process until a circulation termination condition is reached to obtain a solved inverse synthetic result.

Preferably, the inverse synthetic problem solving method based on the improved monte carlo reinforcement learning method provided by the invention can further have the following characteristics: in the simulation block, the Tanimoto score was calculated using the open source chemistry informatics toolkit RDKit in Python with an extended ligation fingerprint of diameter 4.

Preferably, the inverse synthetic problem solving device based on the improved monte carlo reinforcement learning method provided by the invention can further have the following characteristics: in the simulation module, E (-) is specifically:

where n is the length of the molecular sequence calculated for the compound using the open source chemistry kit RDKit in Python using the extended ligation fingerprint ECFP with a diameter of 4.

Preferably, the apparatus for solving an inverse synthetic problem based on the improved monte carlo reinforcement learning method provided by the present invention may further include: and the input display is in communication connection with the control module, so that a user can input an operation instruction and display the solved inverse synthesis result according to the operation instruction.

Preferably, the apparatus for solving an inverse synthetic problem based on the improved monte carlo reinforcement learning method provided by the present invention may further include: a preprocessing module: standardizing all compounds in the metabolic database; extracting all known biochemical reactions with complete reaction information from a standardized metabolic database, identifying atoms which change the configuration of the reaction as reaction centers by using atom-atom mapping executed by reaction decoder software, defining atoms around the reaction centers by bond distance, and coding the chemical reactions into a set of reaction rules by using a SMARTS form; the reaction rule is applied to all compounds in the metabolic database, generating a template for the reaction rule.

Action and Effect of the invention

According to the method and the device for solving the inverse synthetic problem based on the improved Monte Carlo reinforcement learning method, molecular Tanimoto scores are returned in the simulation strategy of Monte Carlo reinforcement learning and are applied to UCT scores, so that reaction rules which are considered to be unreliable can be eliminated, the search space of Monte Carlo reinforcement learning is reduced, and the search efficiency is greatly improved.

Although the method of the prior art achieves better effect to a certain extent, the method still has larger defects and limitations in algorithm complexity and solving precision. If the Monte Carlo reinforcement learning is used for solving the inverse synthesis problem, the Monte Carlo reinforcement learning searches from a purely random angle, a large amount of invalid searches are caused, the algorithm efficiency is reduced, and even the obtained result has no chemical reaction significance. A method of improving the accuracy of the algorithm by only the number of monte carlo experiments is not sufficient.

For the UCT function in the monte carlo reinforcement learning, assuming that an empty tree is before searching, the time complexity of the monte carlo reinforcement learning search is O (O) (p) log (N) + N O (V)), where N is the number of searches, i.e., the number of nodes in the tree after the search is completed, O (p) the complexity of the monte carlo reinforcement learning extension operation, and O (V) the complexity of the monte carlo simulation operation.

As shown in table 1 below, the complexity o (p) of the extended monte carlo reinforcement learning operation is reduced due to the improved monte carlo reinforcement learning method in the present invention. In the limited search N, the improved UCT function ensures that the algorithm can carry out more effective searches, namely, the improved Monte Carlo reinforcement learning algorithm reduces the time complexity of the algorithm and improves the solving precision of the algorithm.

TABLE 1 Algorithm time complexity analysis

Drawings

FIG. 1 is a flow chart of an inverse synthetic problem solving method based on an improved Monte Carlo reinforcement learning method according to the present invention;

FIG. 2 is a diagram of the search process of the improved Monte Carlo tree to which the present invention relates.

Detailed Description

The following describes in detail a specific embodiment of an inverse synthetic problem solving method based on an improved monte carlo reinforcement learning method according to the present invention with reference to the drawings.

< example >

The inverse synthetic problem solving method based on the improved monte carlo reinforcement learning method provided by the embodiment comprises the following steps:

step 1. compounds are normalized.

(1a) Compounds were treated using the SantizeMol method from RDkit

(1b) Isotope removal

(1c) Neutralizing the charge

(1d) Removing solid

(1e) The compound is converted into an international compound identifier (InChI), and the uniformity of the structure is ensured.

And 2, encoding the reaction rule.

(2a) Known biochemical reactions were extracted from the metabolic database.

(2b) The reaction centers are identified using atom-to-atom mapping performed by reaction decoder software.

(2c) SMARTS form coding, and extracting reaction rules.

And 3, expanding the metabolic space.

The reaction rule is applied once to all compounds in the metabolic database, generating a template for the reaction rule.

And 4. obtaining an inverse reaction path by adopting improved Monte Carlo reinforcement learning recursion as shown in figures 1 and 2.

(4a) Selecting: starting from the root node, the best child node is selected according to the modified UCT function. The main form of the UCT function in the patent is as the formula:

in the above formula, node viIs the ith child of node v, Q (-) is a function of the cumulative value of the acquisition node, N (-) is a function of the cumulative number of accesses of the acquisition node, TiIs the Tanimoto score of the node, and C is a weight parameter for adjusting the two parts before and after the plus sign in the whole.

(4b) Expanding: generating new child nodes is performed based on the selection of a node that most needs to be expanded from the ordering scheme given above, and the action of the node not being expanded.

(4c) Simulation: this is an iterative process, from the start of the state check. If the final result is obtained, the reward is based onThe policy returns the reward (or fine). If not terminal, randomly sampling a transformation from the available transformations, calculating the random simulated variation of the sub-node leaves to yield compound M and metabolite chassis strain in metabolic space or a set of commercially available chemical products S ═ S (S)1,S2,…Sn) Tanimoto score, the corresponding formula is as follows:

Ti=min E(Si,M) (2)

where E (-) is the Tanimoto score function calculated in Python using the open Source chemical informatics toolkit RDkit with an extended connection fingerprint of diameter 4. And the process is repeated. This will be performed until a maximum number of expansion steps or a maximum depth of the tree is reached.

(4d) Updating: the score obtained by the current node will be returned to its parent node to update its value and access times.

In addition, the embodiment also provides an inverse synthetic problem solving device capable of automatically realizing the method to obtain an inverse synthetic solving result, and the device comprises a preprocessing module, a selection module, an extension module, a simulation module, an updating module, an input display and a control module.

A preprocessing module: standardizing all compounds in the metabolic database; extracting all known biochemical reactions with complete reaction information from a standardized metabolic database, identifying atoms which change the configuration of the reaction as reaction centers by using atom-atom mapping executed by reaction decoder software, defining atoms around the reaction centers by bond distance, and coding the chemical reactions into a set of reaction rules by using a SMARTS form; the reaction rule is applied to all compounds in the metabolic database, generating a template for the reaction rule.

A selection module: taking a target compound to be solved and inversely synthesized as a root node, calculating a UCT function value of each node from the root node, selecting the highest UCT function value as an optimal child node so as to determine an intermediate product until the leaf node is reached, wherein the leaf node corresponds to a product existing on a metabolic space metabolite chassis strain; the UCT function is as follows:

in the formula, node viIs the ith child of node v, Q (-) is a function of the cumulative value of the acquisition node, N (-) is a function of the cumulative number of accesses of the acquisition node, TiIs the Tanimoto score for that node, and C is a weight parameter.

An expansion module: and taking the optimal child node as a node needing to be expanded, determining a reaction rule which is not expanded by the current child node in the current metabolic space as an unexpanded action, and then executing the expansion action on the node needing to be expanded to generate a new child node.

A simulation module: from the start state check, an iterative process is performed; if all products generated in the iterative process appear on metabolic space metabolite chassis strains, the final result is obtained, and the reward is returned according to a reward policy; if no final result is obtained, randomly sampling a reaction rule from available transformations, and applying the reaction rule to the current compound; calculating sub-node viRandom simulated variation of (a) to yield a strategy product M and a metabolite chassis strain in the metabolic space or a collection of commercially available chemical products S ═ S (S)1,S2,…Sn) Tanimoto score, the corresponding formula is as follows:

Ti=min E(Si,M) (2)

where E (-) is the Tanimoto score function;

and then, substituting the Tanimoto score into a UCT function, sorting the reaction rules according to the height of the UCT function value, eliminating the reaction rules which are unlikely to occur and are sorted later, and executing and repeating the process until the maximum expansion step number or the maximum depth of the tree is reached.

An update module: and returning the UCT function value or Tanimoto score obtained by the current node to the father node of the current node to update the value and the access times of the current node, wherein the UCT function value or the Tanimoto score is used as a basis for selecting the node in the next iteration.

A control module: and the selection module, the expansion module, the simulation module and the updating module are in communication connection and are controlled to circularly operate and process until a circulation termination condition is reached to obtain a solved inverse synthesis result.

Inputting a display: and the control module is in communication connection with the control module, so that a user can input an operation instruction and perform corresponding display. For example, the input display may display the inverse synthesis result obtained by the solution according to the corresponding operation instruction, may also display the solution progress, and may display corresponding information generated by each module in the solution process.

The above embodiments are merely illustrative of the technical solutions of the present invention. The method and apparatus for solving the inverse synthetic problem based on the improved monte carlo reinforcement learning method according to the present invention are not limited to the contents described in the above embodiments, but shall be subject to the scope defined by the claims. Any modification or supplement or equivalent replacement made by a person skilled in the art on the basis of this embodiment is within the scope of the invention as claimed in the claims.

12页详细技术资料下载
上一篇:一种医用注射器针头装配设备
下一篇:一种基于HITRAN数据库的吸收光谱快速获取方法

网友询问留言

已有0条留言

还没有人留言评论。精彩留言会获得点赞!

精彩留言,会给你点赞!