Structure search method, structure search device, and recording medium

文档序号：587542 发布日期：2021-05-25 浏览：45次中文

阅读说明：本技术 结构搜索方法、结构搜索设备和记录介质 (Structure search method, structure search device, and recording medium ) 是由佐藤博之于 2020-11-03 设计创作，主要内容包括：提供了结构搜索方法、结构搜索设备和记录介质。结构搜索方法包括：通过计算机在三维晶格空间的多个晶格点中的每个晶格点处顺序地布置n个化合物基团,以在三维晶格空间中创建化合物的三维结构,n个化合物基团在化合物中彼此耦联；以及通过对基于针对每个晶格点的约束条件而变换的伊辛模型执行使用退火方法的基态搜索来计算伊辛模型的最小能量,该约束条件包括：n个化合物基团中的每个化合物基团被布置在仅一个晶格点处的第一约束；n个化合物基团在每个晶格点处彼此不交叠的第二约束；以及第三约束,其与n个化合物基团的耦联相关并且在不满足该约束时增加所计算的伊辛模型的能量。(A structure search method, a structure search apparatus, and a recording medium are provided. The structure searching method comprises the following steps: sequentially arranging, by a computer, n compound groups at each of a plurality of lattice points of a three-dimensional lattice space to create a three-dimensional structure of a compound in the three-dimensional lattice space, the n compound groups being coupled to each other in the compound; and calculating a minimum energy of the Esin model by performing a ground state search using an annealing method on the Esin model transformed based on a constraint condition for each lattice point, the constraint condition including: a first constraint that each compound group of the n compound groups is disposed at only one lattice point; a second constraint that the n compound groups do not overlap each other at each lattice point; and a third constraint that relates to the coupling of the n compound groups and that increases the energy of the calculated Esino model when the constraint is not satisfied.)

1. A structure search method, comprising:

sequentially arranging, by a computer, n compound groups at each of a plurality of lattice points of a three-dimensional lattice space to create a three-dimensional structure of a compound in the three-dimensional lattice space, the n compound groups being coupled to one another in the compound; and

calculating a minimum energy of an Esin model transformed based on a constraint condition for each of the lattice points by performing a ground state search using an annealing method on the Esin model, the constraint condition including:

a first constraint that each compound group of the n compound groups is disposed at only one lattice point;

a second constraint that the n compound groups do not overlap with each other at each of the lattice points; and

a third constraint associated with the coupling of the n compound groups and increasing the calculated energy of the Esino model when the constraint is not satisfied.

2. The structure searching method according to claim 1,

the third constraint is the following constraint:

in the case where a compound group is present at a first lattice point, a compound group is present at only one lattice point of all lattice points adjacent to the first lattice point, and

in the case where no compound group is present at the first lattice point, no compound group is present at any lattice point adjacent to the first lattice point, or a compound group is present at only one lattice point of all lattice points adjacent to the first lattice point.

3. The structure searching method according to claim 1,

the compound is a protein, and

the compound group is an amino acid residue.

4. A non-transitory computer-readable recording medium in which a program that causes a computer to execute a process is stored, the process comprising:

sequentially arranging n compound groups at each of a plurality of lattice points of a three-dimensional lattice space to create a three-dimensional structure of a compound in the three-dimensional lattice space, the n compound groups being coupled to each other in the compound; and

a first constraint that each compound group of the n compound groups is disposed at only one lattice point;

a second constraint that the n compound groups do not overlap with each other at each of the lattice points; and

a third constraint associated with the coupling of the n compound groups and increasing the calculated energy of the Esino model when the constraint is not satisfied.

5. The non-transitory computer-readable recording medium of claim 4,

the third constraint is the following constraint:

in the case where a compound group is present at a first lattice point, a compound group is present at only one lattice point of all lattice points adjacent to the first lattice point, and

6. The non-transitory computer-readable recording medium of claim 4,

the compound is a protein, and

the compound group is an amino acid residue.

7. A structure search apparatus comprising:

sequentially arranging n compound groups at each of a plurality of lattice points of a three-dimensional lattice space to create a unit of a three-dimensional structure of a compound in the three-dimensional lattice space, the n compound groups being coupled to each other in the compound; and

calculating a minimum energy unit of an Esin model transformed based on a constraint condition for each of the lattice points by performing a ground state search using an annealing method on the Esin model, the constraint condition including:

a first constraint that each compound group of the n compound groups is disposed at only one lattice point;

a second constraint that the n compound groups do not overlap with each other at each of the lattice points; and

a third constraint associated with the coupling of the n compound groups and increasing the calculated energy of the Esino model when the constraint is not satisfied.

8. The structure searching apparatus according to claim 7,

the third constraint is the following constraint:

in the case where a compound group is present at a first lattice point, a compound group is present at only one lattice point of all lattice points adjacent to the first lattice point, and

9. The structure searching apparatus according to claim 7,

the compound is a protein, and

the compound group is an amino acid residue.

Technical Field

Embodiments discussed herein relate to a structure search method, a structure search apparatus, and a recording medium.

Background

In recent years, in a scenario such as drug discovery, it may be unavoidable to obtain a stable structure of molecules having a large size by using an information processing apparatus (computer). However, for example, in the case of a large-sized molecule such as a protein, it may be difficult to search for a stable structure in an actual time in a calculation in which all atoms are carefully considered.

Therefore, techniques for reducing the calculation time by roughly capturing the structure of the molecule (coarse graining) have been studied.

As a technique for coarsely granulating a molecular structure, for example, the following techniques have been studied: in this technique, the molecular structure is coarsely granulated into a linear (chain) simple cubic lattice structure based on one-dimensional sequence information of amino acid residues in the protein, and is treated as a lattice protein. A technique of searching for a stable structure in a lattice protein at a high speed by using a quantum annealing technique has been reported.

In such a technique of searching for a stable structure in a lattice protein using an annealing machine, it is difficult to satisfy a plurality of constraints simultaneously and to efficiently search for a stable structure.

Related art is disclosed in, for example, R.Babbush et al, "Construction of Energy Functions for latex Heteropolder Models: A Case Study in Construction software Programming and Adiastic Quantum Optimization," Advances in Chemical Physics,155,201-244, 4 months 4 days 2014.

In one aspect, an object of an embodiment is to provide a structure search method, a structure search program, and a structure search device capable of efficiently searching for a stable structure.

Disclosure of Invention

According to an aspect of an embodiment, a structure search method includes: sequentially arranging, by a computer, n compound groups at each of a plurality of lattice points of a three-dimensional lattice space to create a three-dimensional structure of a compound in the three-dimensional lattice space, the n compound groups being coupled to each other in the compound; and calculating a minimum energy of the Esin model by performing a ground state search using an annealing method on the Esin model transformed based on a constraint condition for each of the lattice points, the constraint condition including: a first constraint that each compound group of the n compound groups is disposed at only one lattice point; a second constraint that the n compound groups do not overlap with each other at each of the lattice points; and a third constraint associated with the coupling of the n compound groups and increasing the energy of the calculated Esino model when the constraint is not satisfied.

In one aspect of the embodiments, a structure search method, a structure search program, and a structure search apparatus for searching for a stable structure may be provided.

Drawings

Fig. 1A is a schematic diagram (part 1) showing an example in which proteins are coarsely granulated to search for a stable structure.

Fig. 1B is a schematic diagram (part 2) showing an example in which proteins are coarsely granulated to search for a stable structure.

Fig. 1C is a schematic diagram (part 3) showing an example in which proteins are coarsely granulated to search for a stable structure.

Fig. 2A is a schematic diagram (part 1) for explaining an example of a diamond encoding method.

Fig. 2B is a diagram (part 2) for explaining an example of the diamond encoding method.

Fig. 2C is a diagram (part 3) for explaining an example of the diamond encoding method.

Fig. 2D is a diagram (part 4) for explaining an example of the diamond encoding method.

Fig. 2E is a diagram (part 5) for explaining an example of the diamond encoding method.

FIG. 3 is a diagram for explaining H_oneA diagram of an example of (a).

FIG. 4 is a diagram for explaining H_olapA diagram of an example of (a).

FIG. 5 is a diagram for explaining H_connA diagram of an example of (a).

FIG. 6 is a diagram for explaining H_pairA diagram of an example of (a).

Fig. 7 is a diagram for explaining an example of the third constraint.

Fig. 8 shows a block diagram of an example of the structure search apparatus disclosed herein.

Fig. 9 is a diagram showing a configuration example of the structure search device disclosed herein.

Fig. 10 is a diagram showing another configuration example of the structure search device disclosed herein.

Fig. 11 is a diagram showing another configuration example of the structure search device disclosed herein.

Fig. 12 is a flowchart illustrating an example of a method for searching for a stable structure of a protein.

FIG. 13 is a view showing a structure represented by S_rA graph showing the condition of each lattice with radius r.

Fig. 14A is a diagram (part 1) showing a lattice point set of destinations of amino acid residues.

Fig. 14B is a diagram (part 2) showing a lattice point set of destinations of amino acid residues.

Fig. 14C is a diagram showing a lattice point set of destinations of amino acid residues (part 3).

Fig. 14D is a diagram showing a lattice point set of destinations of amino acid residues (part 4).

FIG. 15 is a three-dimensional representation S₁、S₂、S₃The figure (a).

FIG. 16A is a diagram showing a scheme in which spatial information is assigned to a bit X₁To position X_nA diagram (section 1) of an example of the state of each bit in (b).

FIG. 16B is a diagram showing a scheme in which spatial information is assigned to a bit X₁To position X_nA diagram (section 2) of an example of the state of each bit in (b).

FIG. 16C is a diagram showing a scheme in which spatial information is assigned to a bit X₁To X_nA diagram (section 3) of an example of the state of each bit in (b).

FIG. 17 is a diagram for explaining H_oneThe figure (a).

FIG. 18 is a diagram for explaining H_olapThe figure (a).

FIG. 19A is a diagram for explaining H_pairFigure (part 1).

FIG. 19B is a diagram for explaining H_pairFigure (part 2).

Fig. 20 is a diagram showing an example of the weight file.

Fig. 21 is a diagram showing an example of the functional configuration of an optimization apparatus (control unit) used in the annealing method.

Fig. 22 is a block diagram showing an example of a circuit level of the transition control unit.

Fig. 23 is a diagram showing an example of the operation flow of the transition control unit.

Fig. 24 is a data configuration example of a storage unit of the structure search device.

Fig. 25 is a process flow corresponding to the data configuration example of fig. 24.

Detailed Description

First, before describing details of the technology disclosed herein, a method of obtaining a folding structure of a protein by a diamond encoding method, which is one of the technologies using lattice proteins, will be described.

When searching for the structure of a protein (or peptide) using a lattice protein, first, the protein is coarsely granulated. As shown in fig. 1A, for example, coarse graining of the protein is performed by coarse graining the atoms 2 constituting the protein into coarse grained particles 1A, 1B, and 1C, each of the coarse grained particles 1A, 1B, and 1C being a unit for each amino acid residue, thereby creating a coarse graining model.

The created coarse grained model is then used to search for stable binding structures. Fig. 1B shows an example of a case where the bonded structure of the coarsely granulated particles 1C at the end point of the arrow is stable. The stable binding structure is searched by a diamond encoding method described later.

As shown in fig. 1C, the coarse grained model is restored to the all-atomic model based on the stable binding structure searched by using the diamond encoding method.

The rhombohedral encoding method is a method of embedding coarsely granulated particles (coarsely granulated model) on chain amino acids forming a protein into lattice points of a rhombohedral lattice, and can express a three-dimensional protein structure.

In the following description, for the sake of simplifying the explanation, a diamond encoding method for a two-dimensional case will be described as an example.

Fig. 2A shows an example of a structure in which a linear pentapeptide having five amino acid residues bound to each other has a linear structure. In fig. 2A to 2E, the numbers in the circles indicate the numbers of amino acid residues in the linear pentapeptide.

In the diamond coding method, first, when the amino acid residue numbered 1 is arranged at the center of the diamond lattice, as shown in fig. 2B, the position at which the amino acid residue numbered 2 can be arranged is limited to the position near the center (the position given the number 2).

Subsequently, the position at which the amino acid residue numbered 3 to be bound to the amino acid residue numbered 2 can be disposed is limited to the position adjacent to the position given the number 2 in fig. 2B (the position given the number 3 in fig. 2C).

The position at which the amino acid residue numbered 4 to be bound to the amino acid residue numbered 3 can be disposed is limited to a position adjacent to the position given the number 3 in FIG. 2C (the position given the number 4 in FIG. 2D).

The position at which the amino acid residue numbered 5 to be bound to the amino acid residue numbered 4 can be disposed is limited to a position adjacent to the position given the number 4 in FIG. 2D (the position given the number 5 in FIG. 2E).

By linking the positions thus designated as arrangeable positions in the order of the numbering of the amino acid residues, the coarsely granulated structure of the protein can be expressed.

The minimum energy of the Esin model is calculated by performing a ground state search using an annealing method on the Esin model transformed based on the constraint conditions on the coarsely grained structure of the protein. By doing so, a stable structure of the protein can be obtained.

When H is present_one、H_olap、H_connIs set to a restraint condition and H_pairWhen set as a cost function, the total energy in the diamond coding method can be expressed as follows.

E(x)＝H＝H_one+H_olap+H_conn+H_pair

H_oneMeans that the protein includes only one of each of the first to nth amino acid residuesThe constraints of the individual instances.

H_olapDenotes the constraint that the first to nth amino acid residues do not overlap with each other.

H_connRepresents the constraint that the first to nth amino acid residues are linked to each other.

H_pairRepresenting a cost function expressing the interaction between amino acid residues.

With respect to Hamiltonian (H)_one) In the case where two amino acid residues are present as shown in FIG. 3, the Hamiltonian (H) represented by the following formula (A)_one) Is positive. In other words, in the case of two amino acid residues, the Hamiltonian (H) according to the constraint_one) Increasing the total energy.

H_one+＝C₁q_iq_jFormula (A)

C₁Are weighting coefficients and are positive integers. q. q.s_iTaking "1" or "0", q_jTake "1" or "0".

With respect to Hamiltonian (H)_olap) In the case where specific amino acid residues overlap at a specific lattice point as shown in FIG. 4, the Hamiltonian (H) represented by the following formula (B)_olap) Is positive. In other words, in the case where specific amino acid residues overlap at a specific lattice point, the Hamiltonian (H) according to the constraint_olap) Increasing the total energy.

H_olap+＝C₂q_iq_jFormula (B)

C₂Are weighting coefficients and are positive integers. q. q.s_iTaking "1" or "0", q_jTake "1" or "0".

With respect to Hamiltonian (H)_conn) In the case where two adjacent amino acid residues are linked as shown in FIG. 5, the Hamiltonian (H) represented by the following formula (C)_conn) Is negative. In other words, the Hamiltonian (H) according to this constraint, in the case of two adjacent amino acid residues being linked_conn) Reducing the total energy.

H_conn-＝C₃q_iq_jFormula (C)

C₃Are weighting coefficients and are positive integers. q. q.s_iTaking "1" or "0", q_jTake "1" or "0".

With respect to Hamiltonian (H)_pair) In the case where two adjacent amino acid residues interact with each other as shown in FIG. 6, the Hamiltonian (H) represented by the following formula (D)_pair) Is positive. In other words, the Hamiltonian (H) according to this constraint in the case of interaction between two adjacent amino acid residues_olap) Increasing the total energy.

H_pair+＝E₁₄q_iq_jFormula (D)

E₁₄Is a coefficient related to the interaction and is a positive integer. q. q.s_iTaking "1" or "0", q_jTake "1" or "0". The interaction is determined by a combination of two amino acid residues and the interaction is determined with reference to, for example, a Miyazawa-jernigan (mj) matrix.

When the corresponding constraint is not satisfied, H_oneAnd H_olapIncreasing the total energy. In other words, H_oneAnd H_olapIs a constraint that destabilizes the protein structure when the corresponding constraint is not satisfied.

And in general, H_connThe total energy is reduced when the constraint is not satisfied. In other words, H_connIs a constraint that stabilizes the protein structure when the constraint is not satisfied.

Thus, H_oneAnd H_olapAnd H_connThe relationship between is dependent and when one constraint is satisfied, it is unlikely that the other constraint will be satisfied. As a result, it is difficult to efficiently search for a stable structure.

Thus, using the disclosed technique, H_oneAnd H_olapAnd H_connThe relationships between are independent and all constraints may be satisfied. In other words, when using the disclosed techniques, multiple constraints may be satisfied simultaneously. As a result, when using the disclosed techniques, one can efficiently search for artifactsAnd (4) fixing the structure.

(Structure search method and Structure search apparatus)

The structure search method disclosed herein is a method for searching for a stable structure of a compound in which n compound groups are coupled.

The structure search method is a method using a computer.

The structure search method includes a process of creating a three-dimensional structure and a process of calculating minimum energy, and further includes additional processes according to requirements.

The structure search apparatus disclosed herein includes a unit to create a three-dimensional structure and a unit to calculate minimum energy, and further includes additional units according to requirements.

The structure search apparatus includes, for example, a memory and a processor, and further includes additional units as necessary.

The processor is coupled to the memory.

The processor is configured to perform a process of creating a three-dimensional structure.

The processor is configured to perform a process of calculating a minimum energy.

The processor is, for example, a Central Processing Unit (CPU), a Graphics Processing Unit (GPU), or a combination of a central processing unit and a graphics processing unit.

In the process of creating a three-dimensional structure, n compound groups are sequentially arranged at each lattice point of a three-dimensional lattice space as a lattice set to create a three-dimensional structure of a compound in the three-dimensional lattice space.

The unit for creating a three-dimensional structure sequentially arranges n compound groups at each lattice point of a three-dimensional lattice space as a lattice set, and creates a three-dimensional structure of a compound in the three-dimensional lattice space.

The compound group is, for example, an amino acid residue.

In the case where the compound group is an amino acid residue, examples of the compound include proteins.

The amino acid to be the source of the amino acid residue may be a natural amino acid or an artificial amino acid. Examples of natural amino acids include alanine, arginine, asparagine, aspartic acid, cysteine, glutamine, glutamic acid, glycine, histidine, isoleucine, leucine, lysine, methionine, phenylalanine, proline, serine, threonine, tryptophan, tyrosine, valine, β -alanine, β -phenylalanine, and the like. Examples of the artificial amino acid include p-hydroxybenzoylphenylalanine and the like.

The number of amino acid residues in the protein is not particularly limited and may be appropriately selected according to the purpose, and for example, may be about 10 to 30, or may be several hundred.

For example, the number may be about 10 to 30 as long as the protein is a protein of interest for medium-molecule drug discovery.

Next, in the process of calculating the minimum energy, a ground state search using an annealing method is performed on the izod model converted based on the constraint condition for each lattice point, thereby calculating the minimum energy of the izod model.

The unit for calculating the minimum energy performs a ground state search using an annealing method on the Esinc model converted based on the constraint condition for each lattice point, thereby calculating the minimum energy of the Esinc model.

The constraint conditions include a first constraint, a second constraint, and a third constraint.

The first constraint is that each of the n compound groups is disposed at only one lattice point.

The second constraint is that the n compound groups do not overlap each other at each lattice point.

The third constraint is a constraint related to the coupling of the n compound groups, and is a constraint that increases the energy of the calculated Esino model when the constraint is not satisfied.

The third constraint is, for example, a constraint represented by the following (1) and (2).

(1) In the case where a compound group is present at a specific lattice point, the compound group is present at only one lattice point among all lattice points adjacent to the lattice point.

(2) In the case where no compound group is present at a specific lattice point, no compound group is present at all lattice points adjacent to the lattice point, or a compound group is present at only one lattice point among all lattice points adjacent to the lattice point.

An example of the third constraint may be represented by the following formula (E). This example is an example of a two-dimensional case using a diamond encoding method.

H+＝C(Q-q₀)(Q-1)

In this formula, C is a weighting coefficient and is a positive integer. q. q.s₀、q₁、q₂、q₃And q is₄Each of which takes a "1" or a "0". q. q.s₀、q₁、q₂、q₃And q is₄Is the positional relationship shown in fig. 7.

η(q₀) Is to indicate the proximity of q₀And is coupled to q₀A collection of positions of the compound group of (1).

q₀The case of "1" is a case where a compound group is present at a specific lattice point. At q₀In the case of "1", H is "0" only when Q is "1". In the case of the positional relationship shown in FIG. 7, when q is₁+q₂+q₃+q₄When 1, Q is "1". In other words, is q₁、q₂、q₃And q is₄Of which only one is "1". Therefore, in the case where a compound group is present at only one lattice point among all lattice points adjacent to a specific lattice point, H is "0".

q₀The case of "0" is a case where no compound group is present at a specific lattice point. At q₀In the case of "0", H is "0" when Q is "0" or when Q is "1". In the case of the positional relationship shown in FIG. 7, q is₁+q₂+q₃+q₄When H is 0 or 1, H is "0". In other words, at q₁、q₂、q₃And q is₄All are provided withIn the case of "0", or in the case of q₁、q₂、q₃And q is₄In the case where only one of them is "1", H is "0". Thus, H is "0" in the case where no compound group is present at any lattice point adjacent to a specific lattice point, or in the case where a compound group is present at only one lattice point of all lattice points adjacent to the specific lattice point.

FIG. 8 shows a block diagram of an example of the disclosed structure search apparatus.

The structure search apparatus 10 of fig. 8 includes a unit 51 for creating a three-dimensional structure and a unit 52 for calculating minimum energy.

Fig. 9 shows a configuration example of the disclosed structure search device.

The structure search device 10 is configured, for example, in such a manner that the control unit 11, the memory 12, the storage unit 13, the display unit 14, the input unit 15, the output unit 16, the I/O interface unit 17, and the like are coupled via the system bus 18.

The control unit 11 is a processor that performs arithmetic operations (four arithmetic operations, comparison operations, and the like), operation control of hardware and software, and the like.

The memory 12 is a memory such as a Random Access Memory (RAM), a Read Only Memory (ROM), or the like. The RAM stores an Operating System (OS), application programs, and the like read out from the ROM and storage unit 13, and serves as a main memory and a work area of the control unit 11.

The storage unit 13 is a device for storing various programs and data, and is, for example, a hard disk. The storage unit 13 stores a program to be executed by the control unit 11, data to be used for executing the program, an OS, and the like.

The program is stored in the storage unit 13, loaded into a RAM (main memory) of the memory 12, and executed by the control unit 11.

The display unit 14 is a display device, and is, for example, a display apparatus such as a CRT monitor, a liquid crystal panel, or the like.

The input unit 15 is an input device for various data, and is, for example, a keyboard, a pointing device (e.g., a mouse, etc.), or the like.

The output unit 16 is an output device for various data, and is, for example, a printer.

The I/O interface unit 17 is an interface for coupling various external devices. For example, the I/O interface unit 17 allows input and output of data of a compact disc read only memory (CD-ROM), a digital versatile disc read only memory (DVD-ROM), a magneto-optical (MO) disc, a Universal Serial Bus (USB) memory, and the like.

Fig. 10 shows another configuration example of the disclosed structure search device.

The configuration example of fig. 10 is a cloud type configuration example, and includes the elements 11 to 18 described with reference to fig. 9, and the control unit 11 is independent of the storage unit 13 and the like. In this configuration example, the computer 30 of the storage control unit 11 and the like are coupled via the network interface units 19 and 20 with the computer 30 of the storage unit 13 and the like.

The network interface units 19 and 20 are hardware that performs communication using the internet.

Fig. 11 shows another configuration example of the disclosed structure search device.

The configuration example of fig. 11 is a cloud type configuration example, and includes the elements 11 to 18 described with reference to fig. 9, and the storage unit 13 is independent of the control unit 11 and the like. In this configuration example, the computer 30 storing the control unit 11 and the like and the computer 40 storing the storage unit 13 are coupled via the network interface units 19 and 20.

Hereinafter, examples of the disclosed technology will be described with reference to flowcharts and the like.

FIG. 12 is a flow chart for searching for stable structures of proteins.

< step S101>

First, a three-dimensional lattice space, which is a lattice set in which a plurality of amino acid residues are sequentially arranged, is defined based on the number (n) of amino acid residues (S101).

Examples of defining a three-dimensional lattice space will now be described. The lattice space is three-dimensional, but hereinafter, the case of two-dimensional is described for the sake of simplicity.

First, a lattice set having a radius r in a rhombohedral lattice space is called a Shell (Shell),and each lattice point is represented as S_r. Each lattice point S_rCan be represented as shown in fig. 13.

For example, the set V of lattice points of the destinations of the first to fifth amino acid residues₁To V₅As shown in fig. 14A to 14D. In FIGS. 14A to 14D, V₁To V₅The character V of (a) is omitted and only the numerical subscripts are shown.

In FIG. 14A, V₁＝S₁，V₂＝S₂。

In FIG. 14B, V₃＝S₃。

In FIG. 14C, V₄＝S₂Or S₄。

In FIG. 14D, V₅＝S₃Or S₅。

Representing S in three dimensions as shown in FIG. 15₁、S₂And S₃. In fig. 15, a ═ S₁，B＝S₂And C ═ S₃。

The space V required for the i-th amino acid residue in a protein having n amino acid residues_iThis is expressed by the following formula.

J ═ 1, 3.... i } in the case of odd-numbered (i ═ odd-numbered) amino acid residues, and J ═ 2, 4.. i } in the case of even-numbered (i ═ even-numbered) amino acid residues.

< step S102>

Next, the set of lattice points of the destination of the ith amino acid residue is set to V_i(S102)。

Defining the space into which amino acid residues enter.

< step S103>

Next, a bit is assigned to each lattice point. In other words, spatial information is assigned to bit X₁To X_nEach (S103). Specifically, as shown in fig. 16A to 16CA position is assigned to a space into which each amino acid residue enters, and respectively, a position in the case where an amino acid residue is present at the position is represented by "1" and a position in the case where an amino acid residue is not present at the position is represented by "0". In fig. 16A to 16C, a plurality of xs_iAssigned to the corresponding amino acid residues 2 to 4, but in practice one position X_iIs assigned to one amino acid residue.

< step S104>

Next, set H_one、H_olap、H_connAnd H_pairTo create an Esin model, which is transformed based on the constraints for each lattice point (S104).

In the diamond coding method, the entire energy can be expressed as follows.

E(x)＝H＝H_one+H_olap+H_conn+H_pair

H_oneA constraint (first constraint) representing only one example of a protein including each of the first to nth amino acid residues.

H_olapDenotes a constraint (second constraint) that the first to nth amino acid residues do not overlap with each other.

H_connA third constraint is represented.

H_pairIs a cost function representing the interaction between amino acids.

H_one、H_olapAnd H_pairExamples of (c) are as follows.

In fig. 17 to 19A and 19B described below, X₁Indicates the position where the amino acid residue numbered 1 can be placed.

X₂To X₅Indicates the position at which the amino acid residue numbered 2 can be placed.

X₆To X₁₃Indicates the position at which the amino acid residue numbered 3 can be placed.

X₁₄To X₂₉Indicates the position at which the amino acid residue numbered 4 can be placed.

Description of the following_oneExamples of (2).

In the above function, X_aAnd X_bTake "1" or "0". In other words, in fig. 17, H_oneIs X₂、X₃、X₄And X₅Is "1", and thus is a function of the increase in energy where any two or more of them are "1", and is at X₂、X₃、X₄And X₅In case only one of them is "1" -H_oneA penalty of 0.

In the above function, λ_oneAre weighting coefficients.

Description of the following_olapExamples of (2).

In the above function, X_aAnd X_bTake "1" or "0". In other words, in fig. 18, H_olapIs when X₂X is 1₁₄A term that generates a penalty in the case of "1".

In the above function, λ_olapAre weighting coefficients.

Description of the following_pairExamples of (2).

In the above function, X_aAnd X_bTake "1" or "0". In other words, in fig. 19A and 19B, H_pairIs at X₁₅When X is "1₁At "1", interaction P_{ω(x1)ω(x15)}At X₁Amino acid residue of (A) and X₁₅Acting between amino acid residues to causeA function of the energy reduction. Determination of the interaction P by the combination of two amino acid residues_{ω(x1)ω(x15)}And determining the interaction P, e.g. with reference to Miyazawa-Jennigan (MJ) matrix or the like_{ω(x1)ω(x15)}。

Third constraint (H)_conn) Is a constraint on the coupling of the n compound groups and is a constraint that increases the energy of the calculated isooctyl model when the constraint is not satisfied.

The first constraint and the second constraint increase the total energy when the respective constraints are not satisfied. In other words, the first constraint and the second constraint are constraints that destabilize the structure of the protein when the respective constraints are not satisfied.

When the third constraint is not satisfied, the total energy is increased. The third constraint is a constraint that destabilizes the protein structure when the third constraint is not satisfied.

In the case where the third constraint is a constraint that reduces the total energy when the constraint is not satisfied, the relationship between the first constraint and the second constraint and the third constraint is dependent, and when one constraint is satisfied, the other constraint is unlikely to be satisfied. Therefore, it is difficult to efficiently search for a stable structure.

However, in the disclosed technique, the third constraint is a constraint that increases the total energy when the constraint is not satisfied. Therefore, the first constraint and the relationship between the second constraint and the third constraint become independent, and thus, all the constraints may be satisfied. In other words, a stable structure can be efficiently searched by simultaneously satisfying a plurality of constraints.

The third constraint is, for example, a constraint represented by the following (1) and (2).

(1) In the case where a compound group is present at a specific lattice point, the compound group is present at only one lattice point among all lattice points adjacent to the lattice point.

Followed byWeighting coefficients in corresponding functions to the above (e.g. λ)_one、λ_olap、λ_conn、λ_pairEtc.) the corresponding weight file extracted and optimized by calculation using the energy equation of the following Esinc model is, for example, a matrix and is at 2X₁X₂+4X₂X₃In the case of (2) is a file of a matrix as shown in fig. 20.

By using the created weight file, the following energy equation of the Esinon model can be expressed.

In the above function, state X_iAnd X_jIs "0" or "1", and "0" means that no amino acid residue is present, and "1" means that an amino acid residue is present. W in the first item on the right_ijAre weighting coefficients.

The first term on the right represents the sum of the products of the state and weighted value of two circuits without loss or redundancy count for all combinations of two circuits that can be selected from all circuits.

The second term on the right represents the sum of the products of the states of all circuits and the respective bias values. b_iIndicating the offset value of the ith circuit.

< step S105>

Next, in the annealing machine, a ground state search using an annealing method is performed on the izod model converted based on the constraint condition for each lattice point, thereby calculating the minimum energy of the izod model (S105).

The annealing machine is not particularly limited as long as it is a computer that performs a ground state search on an energy function represented by an izod model using an annealing method, and may be appropriately selected according to purpose. Examples of the annealer include, for example, a quantum annealer, a semiconductor annealer using semiconductor technology, and a machine for performing simulated annealing to be performed by software using a CPU or a Graphics Processing Unit (GPU) or the like. As the annealing machine, for example, a Digital Annealer (registered trademark) can be used.

Examples of the annealing method and the annealing machine will be described below.

The annealing method is a method of obtaining a solution randomly by using a random number or superposition of qubits. Hereinafter, a problem of minimizing the value of the evaluation function to be optimized will be described as an example, and the value of the evaluation function is referred to as energy. In case the value of the evaluation function is maximized, the sign of the evaluation function may be changed.

First, starting from an initial state in which each variable is assigned one discrete value, a state close to the current state (for example, a state in which only one of the variables has been changed) is selected based on the current state (combination of values of the variables), and the state transition is checked. Energy changes associated with the state transitions are calculated, and whether to take a state transition and change the current state or to maintain the original state without taking a state transition is randomly determined according to the calculated values. When the adoption probability of the state transition causing the energy decrease is set to be larger than the adoption probability of the state transition causing the energy increase, the state change occurs in the direction in which the energy decreases on average, and thus the state can be expected to be transitioned to a more appropriate state with the elapse of time. Therefore, it is possible that an optimal solution or an approximate solution that makes energy close to an optimal value may be finally obtained.

When state transitions that cause energy to fall are employed in a deterministic manner and state transitions that cause energy to rise are not employed, the energy change generally monotonically decreases over time, but no further change occurs once the local solution is reached. Since there are a very large number of local solutions in the discrete optimization problem as described above, in many cases, the state is stuck at a local solution that is not very close to the optimal value. Therefore, in solving the discrete optimization problem, it is important to determine whether to take the state randomly.

In the annealing method, it has been proved that the state reaches the optimal solution at the limit of infinite time (number of iterations) when the adopted (accepted) probability of the state transition is determined as follows.

Hereinafter, a method of determining an optimal solution using an annealing method will be described in order.

For the energy change (energy decrease) value (- Δ E) associated with a state transition, the probability of acceptance of the state transition, p, is determined by any of the following functions, f ().

p (Δ E, T) ═ f (- Δ E/T) (equation 1-1)

f_metro(x)＝min(1，e^x) (Meterobolis method) (equation 1-2)

T is a parameter called a temperature value, and may be changed as follows, for example.

(2) As expressed by the following equation, the temperature value T is reduced logarithmically with respect to the number of iterations T.

T₀An initial temperature value is indicated, and it is desirable to be set to a sufficiently large value according to the problem.

In the case of using the acceptance probability represented by formula (1), if a steady state is reached after a sufficient number of iterations, the occupancy probability of each state follows a boltzmann distribution for the thermal equilibrium state in thermodynamics.

Since the occupancy probability of the lower energy state increases as the temperature gradually decreases from the high initial temperature, it is assumed that the low energy state is obtained when the temperature sufficiently decreases. This method is referred to as an annealing method (or simulated annealing method) because this behavior is similar to the change in state when annealing a material. The random occurrence of state transitions that cause an energy rise corresponds to a thermal excitation in physics.

Fig. 21 shows an example of a functional configuration of an optimization apparatus for performing an annealing method. Although a case where a plurality of candidates for state transition are generated will also be described in the following description, transition candidates are generated one by one in the basic annealing method.

The optimization device 100 comprises a state holding unit 111, which state holding unit 11 holds the current state S (values of a plurality of state variables). The optimization apparatus 100 further includes an energy calculation unit 112, the energy calculation unit 112 calculating an energy change value { - Δ Ei } for a corresponding state transition in the event of a state transition from the current state S due to a change in any value of the values of the plurality of state variables. The optimizing device 100 further includes a temperature control unit 113 that controls the temperature value T and a transition control unit 114 that controls a state change.

The transition control unit 114 randomly determines whether to accept any one of the plurality of state transitions according to a relative relationship between the energy change value { - Δ Ei } and a thermal excitation energy based on the temperature value T, the energy change value { - Δ Ei } and a random value.

The transition control unit 114 includes a candidate generation unit 114a for generating candidates for state transition, and an acceptance determination unit 114b for randomly determining whether to accept state transition for each candidate according to the energy change value { - Δ Ei } and the temperature value T of the candidate. The transition control unit 114 includes a transition determination unit 114c for determining a candidate to be adopted from among the accepted candidates, and a random number generation unit 114d for generating a probability variable.

The operation in one iteration in the optimization apparatus 100 is as follows.

First, the candidate generating unit 114a generates one or more candidates (candidate numbers { Ni }) for a state transition from the current state S held by the state holding unit 111 to the next state. Next, the energy calculation unit 112 calculates the energy change value { - Δ Ei } for each of the state transitions listed as candidates by using the current state S and the candidates for the state transition. The acceptance determination unit 114b accepts the state transition using the temperature value T generated by the temperature control unit 113 and the probability variable (random number value) generated by the random number generation unit 114d according to the energy change value { - Δ Ei } of each state transition, with an acceptance probability obtained by the above formula (1).

The acceptance determination unit 114b outputs an acceptance { fi } for the corresponding state transition. In the case of accepting a plurality of state transitions, the transition determination unit 114c randomly selects one of the plurality of state transitions by using a random numerical value. Then, the transition determination unit 114c outputs the transition number N and the transition acceptance f of the selected state transition. In the case where there is an accepted state transition, the value of the state variable stored in the state holding unit 111 is updated in accordance with the state transition adopted.

The above-described iterative process is repeated while causing the temperature control unit 113 to lower the temperature value from the initial state, and the operation ends when a certain number of iterations is reached or when an end determination condition, such as a condition that the energy becomes lower than a predetermined value, is satisfied. The solution output by the optimization device 100 is the state at the end of the operation.

Fig. 22 is a circuit-level block diagram of a configuration example of a transition control unit, particularly an algorithm portion required for an acceptance determination unit, in the normal annealing method for generating candidates one by one.

The transition control unit 114 includes a random number generation circuit 114b1, a selector 114b2, a noise table 114b3, a multiplier 114b4, and a comparator 114b 5.

The selector 114b2 selects and outputs an energy change value corresponding to the transition number N, which is a random number value generated by the random number generation circuit 114b1, from among the energy change values { - Δ Ei } calculated for the candidates of the respective state transitions.

The function of the noise table 114b3 will be described later. For example, a memory such as a RAM, a flash memory, or the like may be used as the noise table 114b 3.

The multiplier 114b4 outputs a product (corresponding to the thermal excitation energy described above) obtained by multiplying the value output from the noise table 114b3 by the temperature value T.

The comparator 114b5 outputs a comparison result obtained by comparing the multiplication result output by the multiplier 114b4 with the energy change value- Δ E selected by the selector 114b2 as the transition acceptance f.

Although the transition control unit 114 shown in fig. 22 realizes the above-described functions substantially as it is, a mechanism of accepting a state transition with an acceptance probability represented by formula (1) will be described in more detail.

A circuit that outputs 1 and 0 with acceptance probabilities p and (1-p), respectively, can be realized by a comparator having two input terminals a and B such that the acceptance probability p is input to the input terminal a and a uniform random number taking a value in the interval [0, 1) is input to the input terminal B, 1 is output when a > B, and 0 is output when a < B. Therefore, the above function can be realized by inputting the value of the acceptance probability p calculated based on the energy change value and the temperature value T using equation (1) to the input terminal a of the comparator.

In other words, when it is assumed that f is the function used in equation (1) and u is a uniform random number taking values in the interval [0, 1), the above function can be realized by a circuit outputting 1 when f (Δ E/T) is larger than u.

The same functions as described above can be realized by the following changes.

Even when the same monotonically increasing function is allowed to act on two numbers, the two numbers maintain the same magnitude relationship. Therefore, the same output is obtained even when the same monotonically increasing function is allowed to act on both inputs of the comparator. When using the inverse function f of f^-1As such a monotonically increasing function, it can be seen that it can be provided that when- Δ E/T is greater than f^-1(u) a circuit for outputting 1. Since the temperature value T is positive, it can be seen that when- Δ E is greater than Tf^-1The circuit that outputs 1 is suitable (u).

The noise table 114b3 in FIG. 22 is for implementing the inverse function f^-1(u) and is a table for outputting values of the following functions with respect to inputs obtained by discretizing the interval [0, 1).

Although the transition control unit 114 is provided with a latch that holds a determination result or the like, a state machine that generates corresponding timing, and the like, they are not shown in fig. 22 for the sake of simplicity of illustration.

Fig. 23 is a diagram showing an example of the operation flow of the transition control unit 114. The operation flow shown in fig. 23 includes a step of selecting one state transition as a candidate (S0001), a step of comparing the energy change value of the state transition with the product of the temperature value and the random number value to determine whether to accept the state transition (S0002), and a step of adopting the state transition when accepting the state transition and not adopting the state transition when not accepting the state transition (S0003).

< step S106>

In S106, the calculation result is output. The result can be outputted as a three-dimensional structural diagram of the protein or coordinate information of each amino acid residue constituting the protein.

Fig. 24 is a diagram showing a data configuration example of a storage unit of the structure search device.

Fig. 25 shows a processing flow corresponding to the data configuration example.

In the processing flow shown in fig. 25, first, in S201, a first constraint is constructed. The first constraint here is expressed as follows.

< first constraint >

H[1][k][I]＝L[1]*X[k]*X[I]

Subsequently, in S202, a second constraint is constructed. The second constraint here is expressed as follows.

< second constraint >

H[2][AB[i]+k，_{0＜＝k＜VB[i]]，0＜i＜＝NP}][AB[j]+k，_{0＜＝k＜VB[i]，i＜j＜＝NP}]

＝L[2]*X[AB[i]+k]*X[AB[j]+k]

Subsequently, in S203, a third constraint is constructed. The third constraint here is expressed as follows.

< third constraint >

H[3]+＝L[3]*∑_i[{∑_j(X[j])-X[i]}{∑_j(X[j])-1}]

Subsequently, in S204, a cost function based on the interaction between two adjacent compound groups is constructed. The cost function here is expressed as follows.

< cost function >

H[4][AB[i]+k，_{0＜＝k＜VB[i]}][AB[jj]+I，_{0＜＝I＜VB[j]，i+2＜j}]

＝EPQ[i][j]*ADJ[i][j]*X[AB[i]+k]*X[AB[j]+I]

Subsequently, in S205, in the annealing machine, a ground state search using an annealing method is performed on the izod model transformed based on the constraint condition for each lattice point using the constructed first constraint, second constraint, third constraint, and cost function. The obtained energy is stored in the storage unit E and the obtained structure is stored in the storage unit.

Regarding the third constraint in the disclosed structure search method, for example, Hamiltonian (H) in the related art_conn) Is represented as follows.

H[3][AB[i]+k，_{0＜＝k＜VB[i]，0＜＝i＜NP}][AB[i+1]+I，_{0＜＝I＜VB[i+1]，0＜＝i＜NP}]

＝-L[3]*∑(ADJ(AB[i]+k，AB[i+1]+I)*X[AB[i]+k]*X[AB[i+1]+I])

(procedure)

The disclosed structure search program is a program that causes a computer to execute the disclosed structure search method.

In the structure search program, the aspect of performing the structure search method is the same as that in the disclosed structure search method.

The program may be created using various known program languages according to the configuration of the computer system to be used and the type, version, and the like of the operating system.

The program may be recorded on a recording medium such as an internal hard disk or an external hard disk, or may be recorded on a recording medium such as a compact disc read only memory (CD-ROM), a digital versatile disc read only memory (DVD-ROM), a magneto-optical (MO) disc, or a Universal Serial Bus (USB) memory (USB flash drive). In the case where the program is recorded on a recording medium such as a CD-ROM, a DVD-ROM, an MO disk, a USB memory, or the like, the program may be used directly by a recording medium reading device included in the computer system, or may be used by being installed on a hard disk according to the purpose. The program may be recorded in an external storage area (another computer or the like) accessible from the computer system through an information communication network, and may also be used directly from the external storage area through the information communication network, or may be used by being mounted on a hard disk according to the purpose.

The program can be recorded in a plurality of recording media by being divided for each arbitrary process.

(recording Medium)

The disclosed recording medium records the disclosed structure search program.

The disclosed recording medium is computer-readable.

The disclosed recording medium may be transitory or non-transitory.

The disclosed recording medium is, for example, a recording medium recording a program for causing a computer to execute the disclosed structure search method.

The recording medium is not particularly limited and may be appropriately selected according to purpose, and examples thereof include, for example, an internal hard disk, an external hard disk, a CD-ROM, a DVD-ROM, an MO disk, a USB memory, and the like.

The recording medium may be a plurality of recording media in which the program is divided and recorded for each arbitrary process.

[ Experimental example ]

Hereinafter, a specific experimental example of the present embodiment will be described.

(comparative example 1)

A stable structure search of the coarse grained lattice model of the Chignolin protein was performed according to the flowchart of fig. 12.

H is to be_one，H_olap，H_connSet as a constraint and H_pairSet as a cost function. The total energy in the diamond coding method can be expressed as follows.

E(x)＝H＝H_one+H_olap+H_conn+H_pair

H_oneRepresents the constraint that the protein includes only one instance of each of the first through nth amino acid residues.

H_olapDenotes the constraint that the first to nth amino acid residues do not overlap with each other.

H_connIs a constraint that the first to nth amino acid residues are linked to each other.

H_pairIs a cost function representing the interaction between amino acid residues.

Of the 216 available modes of the three parameters that set the constraints (in the case where each parameter takes a value of an integer multiple of 5 out of 5 to 30), only two modes are able to search for the most stable structure.

The search for 300,000 anneal iterations was performed 20 times using the annealer for each of the two modes, and the most stable structure was reached only once as a result of obtaining the minimum value of e (x).

(Experimental example 1)

According to the flowchart of fig. 12, a stable structure search of the coarse grained lattice model of the Chignolin protein was performed in the same manner as in comparative example 1 except for H in comparative example 1_connReplaced by a third constraint in the disclosed technique.

The third constraint is a constraint represented by (1) and (2) below.

(1A) In the case where an amino acid residue is present at a specific lattice point, an amino acid residue is present at only one lattice point among all lattice points adjacent to the lattice point.

(2) In the case where an amino acid residue is not present at a specific lattice point, an amino acid residue is not present at any lattice point adjacent to the lattice point, or an amino acid residue is present at only one lattice point of all lattice points adjacent to the lattice point.

The most stable structure can be searched for in all 216 available modes of the three parameters that set the constraints (in the case where each parameter takes a value of an integer multiple of 5 out of 5 to 30).

The search for 300,000 anneal iterations was performed 20 times using the annealer for each of 216 modes, and all modes reached the most stable structure as a result of obtaining a minimum value for e (x).

From the comparison between comparative example 1 and experimental example 1, it was confirmed that the disclosed technology can efficiently search for a stable structure.

36页详细技术资料下载

上一篇：一种医用注射器针头装配设备

下一篇：分子杂交方法及平台

Structure search method, structure search device, and recording medium

相关技术

网友询问留言