Simulation prediction method for identifying different promoters through directed evolution of monomeric polymerase

文档序号：1939957 发布日期：2021-12-07 浏览：22次中文

阅读说明：本技术 一种单体聚合酶定向进化识别不同启动子的模拟预测方法 (Simulation prediction method for identifying different promoters through directed evolution of monomeric polymerase ) 是由鄂超戴维江谢潇郭霞杨天森郑善喜于 2021-09-13 设计创作，主要内容包括：本发明提供了一种单体聚合酶定向进化识别不同启动子的模拟预测方法,涉及生命科学技术领域。本发明所述方法通过使用Hdcok对得到并通过结构比对与MD模拟筛选出最佳复合体模型、参照定向进化实验对复合体模型进行特定突变、每个结构执行1μs时长的全原子分子动力学模拟和对每段轨迹进行能量计算机氢键盐桥分析,得出进化路径的能量学和结构动力学细节。利用本发明所述方法,成功把一个噬菌体RNAP从识别其原始启动子切换到另一个启动子上。(The invention provides a simulation prediction method for identifying different promoters by directed evolution of monomeric polymerase, and relates to the technical field of life science. The method comprises the steps of obtaining an optimal complex model by using an Hdcok pair, screening the optimal complex model by structure comparison and MD simulation, carrying out specific mutation on the complex model by referring to an directed evolution experiment, carrying out full-atom molecular dynamics simulation on each structure for 1 mu s, and carrying out energy computer hydrogen bond salt bridge analysis on each section of track to obtain the energetics and structure dynamics details of an evolution path. Using the methods described herein, a bacteriophage RNAP is successfully switched from recognizing its original promoter to another promoter.)

1. A method for identifying different promoters based on virus RNA polymerase of laboratory directed evolution is characterized by comprising the following steps:

(1) simulating a free structure of wild virus RNA polymerase, and butting the free structure with dsDNA containing an original recognition promoter to construct a closed initiation complex;

(2) carrying out all-atom molecular dynamics simulation on the closed-start compound, and screening an optimal compound model;

(3) performing specific mutation on the optimal compound model by referring to an directed evolution experiment to obtain a plurality of different RNA polymerase/promoter compounds;

(4) and performing full-atomic molecular dynamics simulation on the RNA polymerase/promoter complex, and performing data analysis on the obtained simulation track to obtain the energetics and structure dynamics data of the evolution path, thereby obtaining the viral RNA polymerase for identifying the target promoter.

2. The method of claim 1, wherein the virus comprises a bacteriophage.

3. The method as claimed in claim 1 or 2, wherein the original recognition promoter in step (1) comprises T7 promoter, and the nucleotide sequence of the dsDNA is shown as SEQ ID NO. 1.

4. The method of claim 1, wherein the docking of step (1) further comprises screening the construct to obtain the blocking-activating complex;

the screening comprises using an FFT-based global docking program to globally sample the putative binding patterns in the HDOCK server, using a modified shape-based pairwise scoring function to select the top 10 structural models with the best scores, then comparing the top 10 structural models with the original recognition promoter RNA polymerase open-start complexes, and screening out structures with similar DNA promoter locations to the original recognition promoter RNA polymerase.

5. The method of claim 1 or 4, wherein each simulation of the full atomic molecular dynamics simulation is 1 microsecond in duration.

6. The method of claim 1, wherein the specific mutation in step (3) comprises changing amino acids in the protein by using the Tleap method in AmberTools, mutating DNA base pairs in the originally recognized promoter to base pairs in the target recognized promoter, and keeping the nucleic acid backbone unchanged.

7. The method of claim 1 or 6, further comprising, after the specific mutation: the resulting mutation products are subjected to energy minimization to yield several different RNA polymerase/promoter complexes.

8. The method of claim 1, wherein the data analysis of step (4) comprises energy calculation and hydrogen-bonded salt bridge analysis.

9. The method of claim 8, wherein the energy is calculated as: and recalculating the electrostatic and van der Waals force interaction according to the simulated track of the water-containing model by using a g _ energy module in Gromacs to obtain energy data.

10. The method of claim 1, wherein the target promoter of step (4) comprises a T3 promoter.

Technical Field

The invention belongs to the technical field of life science, and particularly relates to a simulation prediction method for identifying different promoters by directed evolution of monomeric polymerase.

Background

In recent years, directed evolution techniques in laboratories have made major advances in the functional design or redesign of biomolecular systems, especially in terms of the activity and specificity of proteinases. In directed evolution in the laboratory, sequence mutations and recombinations are driven intensively, followed by locking in specific protein functions by high throughput screening or selection. For example, in Phage Assisted Continuous Evolution (PACE), phage genes with modified life cycles are used to transfer evolution between bacterial host cells to facilitate rapidly replicating phage populations containing genetic mutations for certain advantageous enzymatic activities. With the advancement of directed evolution technology, not only can single proteases be designed with certain functions, but also pathways or protein interaction networks can be regulated or rewired. At the same time, rational design of protein functions based on the molecular structure and biochemical properties of proteins has been a sought goal, which generally requires physical understanding and computational exploration of optimal solutions in the high-dimensional space of protein sequence or conformational evolution. Since biological macromolecules or enzymes are inherently complex systems evolving from highly complex structure-function relationships, direct physical or physical methods are often highly challenging and do not lead to optimal solutions for such complex systems.

Disclosure of Invention

In view of the above, the present invention aims to provide a simulation and prediction method for identifying different promoters by directed evolution of monomeric polymerases, which finds a potential physical mechanism and is expected to help knowledge/data learning or reasonably redesign the structure-function of a protease to realize the recognition function of the religated promoter.

In order to achieve the above object, the present invention provides the following technical solutions:

the invention provides a method for identifying different promoters by viral RNA polymerase based on directed evolution in a laboratory, which comprises the following steps: (1) simulating a free structure of wild virus RNA polymerase, and butting the free structure with dsDNA containing an original recognition promoter to construct a closed initiation complex;

(2) carrying out all-atom molecular dynamics simulation on the closed-start compound, and screening an optimal compound model;

(3) performing specific mutation on the optimal compound model by referring to an directed evolution experiment to obtain a plurality of different RNA polymerase/promoter compounds;

Preferably, the virus comprises a bacteriophage.

Preferably, the originally recognized promoter in step (1) comprises a T7 promoter, and the nucleotide sequence of the dsDNA is shown as SEQ ID NO. 1.

Preferably, the step (1) further comprises screening and constructing to obtain the closed start compound after the docking;

Preferably, each simulation of the full atomic molecular dynamics simulation has a duration of 1 microsecond.

Preferably, the specific mutation in step (3) comprises changing amino acids in the protein by using the Tleap method in AmberTools, mutating DNA base pairs in the originally recognized promoter to base pairs in the target recognized promoter, and keeping the nucleic acid backbone unchanged.

Preferably, the specific mutation is followed by an energy minimization treatment of the resulting mutation product to obtain several different RNA polymerase/promoter complexes.

Preferably, the data analysis in step (4) includes energy calculation and hydrogen bond salt bridge analysis.

Preferably, the energy is calculated as: and recalculating the electrostatic and van der Waals force interaction according to the simulated track of the water-containing model by using a g _ energy module in Gromacs to obtain energy data.

Preferably, the target promoter in step (4) comprises a T3 promoter.

Has the advantages that: the invention provides a method for identifying different promoters by virus RNA polymerase based on directed evolution in a laboratory, which comprises the steps of firstly constructing a closed initiation complex for wild type virus RNA polymerase (RNAP) and constructing a closed initiation complex for six mutant RNAP discovered from virus-assisted continuous evolution experiments. Complexes of these RNAPs with the original/target promoters were subjected to full atomic Molecular Dynamics (MD) simulations, each for 1 microsecond. The simulation results in the examples of the invention show that the promoter recognition preference of RNAP and variants is well determined by the electrostatic interaction of protein-DNA or the stability of the RNAP-DNA promoter interface. It was also confirmed that key residues and structural elements contribute greatly to the switch in promoter recognition: the first point mutation, N748D, in the specific loop, slightly detached RNAP from the promoter, preventing its recognition; the other helper helix (206-225) takes over the switching of promoter recognition after further mutations (E222K and E207K), redirecting differently on the T7 and T3 promoters by making additional charge interactions with the promoter DNA. Further mutations of the AT-rich loop and the specific loop may completely switch the recognition of the RNAP-promoter to the T3 promoter. The present invention is based on the approach of silico mutagenesis, i.e. the use of molecular modeling and full atomic Molecular Dynamics (MD) simulation to study the promoter recognition of variants of viral RNA polymerases found by directed evolution from the laboratory, and in particular the use of PACE to switch one bacteriophage RNAP from recognizing its original promoter to another promoter in a phage-like system.

Drawings

FIG. 1 is a flow chart of the method for identifying different promoters by viral RNA polymerase based on directed evolution in the laboratory;

FIG. 2 is a graph of the convergence of energy calculations for Wt-RNAP and RNAP variants over simulated time;

FIG. 3 is a graph showing the contribution of a single residue to the electrostatic bias of the RNAP-promoter;

FIG. 4 is the corresponding dynamic change of a single residue on the protein-DNA binding interface;

FIG. 5 is a graph showing the contribution of amino acids to RNAP-DNA promoter recognition.

Detailed Description

The invention provides a method for identifying different promoters by viral RNA polymerase based on directed evolution in a laboratory, which comprises the following steps: (1) simulating a free structure of wild virus RNA polymerase, and butting the free structure with dsDNA containing an original recognition promoter to construct a closed initiation complex;

(2) carrying out all-atom molecular dynamics simulation on the closed-start compound, and screening an optimal compound model;

(3) performing specific mutation on the optimal compound model by referring to an directed evolution experiment to obtain a plurality of different RNA polymerase/promoter compounds;

In the embodiment of the present invention, the method for identifying different promoters by using the laboratory directed evolution-based viral RNA polymerase is preferably as shown in FIG. 1, the virus preferably comprises phage, and T7 phage is preferably used for illustration in the embodiment, but it is not only regarded as the protection scope of the present invention.

The invention simulates the free structure of wild virus RNA polymerase, and the free structure is butted with dsDNA containing an original recognition promoter to construct a closed promoter compound. In the embodiment of the invention, the crystal structure of the T7 RNAP is PDB:1ARO, wherein T7 lysozyme is further contained, for the convenience of structure comparison, the lysozyme is preferably removed, and MODELLER is used for filling the blank (residues id 60-72, 165-182, 234-240, 345-384, 590-611) lacking in the protein, and the obtained free structure (apo T7 RNAP protein structure) has consistency with the apo T7 RNAP structure (PDB: 4RNP) only containing C alpha atoms.

The apo T7 RNAP protein structure is docked to a double-stranded (ds) DNA promoter to construct a closed-start complex, and the nucleotide sequence of the dsDNA is preferably shown as SEQ ID NO. 1. The method for obtaining dsDNA according to the invention preferably comprises generating a standard type B dsDNA comprising a T7 promoter using version 2.0 of the web 3DNA (w3DNA2.0) interface, the template strand consisting of 28 nucleotides (SEQ ID NO. 1). The apo T7 RNAP construct was then docked to 30bp dsDNA containing the T7 promoter using Hdock Server (http:// Hdock. phys. host. edu. cn /).

After obtaining the closed-start compound, the invention carries out full-atom molecular dynamics simulation on the closed-start compound and screens an optimal compound model.

The invention preferably further comprises screening the blocking-start complex before performing the full atomic Molecular Dynamics (MD) simulation; the screening preferably comprises global sampling of the putative binding patterns in the HDOCK server using an FFT-based global docking program, selecting the top 10 structural models with the best scores using a modified shape-based pairwise scoring function, and then comparing them with the original recognition promoter RNA polymerase open-start complex to screen out three structures with similar DNA promoter localization to the original recognition promoter RNA polymerase. The duration of each simulation of the all-atomic molecular dynamics simulation of the present invention is preferably 1 microsecond.

The MD simulation described herein is preferably performed using the GROMACS-5.1 software package and the system is described using AMBER99sb-2012 force field and PARMBSC0 nucleic acid parameters. In the MD simulation, 163 Na ions were added to neutralize the system and maintain the ion concentration at 0.15M⁺Ions and 119 Cl^-Ions; the simulation system contains a total of about 156,000 atoms. The cutoff distance for van der waals forces (vdW) and short-range electrostatic interactions was set to 10 angstroms. Particles for long-range electrostatic interactionAnd (4) processing by a net EWald method. The interacting neighbor list is updated every five steps with a time step of 2 fs. Then, the following procedure is performed at the time of running each simulation: (i) performing 20,000 step energy minimization using the steepest descent algorithm; (ii) NVT equilibration at 200ps was carried out, then by using 1000kJ mol^-1nm²The force constant of (2) is used to position limit heavy atoms and balance NPT at 500ps using a coupling constant of 0.1ps^-1The thermostat is readjusted to maintain the temperature at 310K; (iii) 1-s NTP set MD simulations were performed at 310K and 1 atmosphere using a speed-readjust thermostat and a Parriello-Rahman Barostat, respectively.

The present invention preferably also calculates the hydrogen bonding interactions (HBs) between the SPL and the promoter in the construct, and selects the construct with the highest score, which well reflects the HB interaction of protein-DNA, based on the existing open-start complex structure of T7 RNAP.

After obtaining the optimal compound model, the invention refers to the directed evolution experiment to carry out specific mutation on the optimal compound model to obtain a plurality of different RNA polymerase/promoter compounds. The specific mutation of the invention preferably comprises changing amino acids in the protein by using a Tleap method in AmberTools, mutating a DNA base pair in an original recognition promoter into a base pair in a target recognition promoter, and keeping a nucleic acid skeleton unchanged. The present invention preferably allows progressive evolution of wild-type T7 RNAP (wt-T7 RNAP) into a series of mt-RNAPs that recognize less of the T7 promoter and more of the T3 promoter, based on directed evolution in the laboratory. Based on these mutants, the present invention preferably constructs 14 mimicry systems with 7T 7 RNAPs, including wt-RNAP and variants (1M to 6M), and 2 dsDNA (containing either T7 or T3 promoters). After the structure of the present invention constructed as described above, it is preferable to perform a large number of energy minimization processes, more preferably 20,000 energy minimization processes, and then perform MD simulation. The MD simulation of the present invention is preferably the same as described above and will not be described further herein.

After obtaining the RNA polymerase/promoter complex, the invention carries out full atomic molecular dynamics simulation on the RNA polymerase/promoter complex, and carries out data analysis on the obtained simulation track to obtain the energetics and structure dynamics data of the evolution path, thereby obtaining the virus RNA polymerase for identifying the target promoter. The invention preferably forms HB interactions and SB interactions (> 10%) between the last 800ns of the MD simulation, the binding region of RNAP and promoter DNA. To determine HBs, the distance between the donor atom and the acceptor atom is less than 3.5 angstroms and the angle donor atom-hydrogen atom-acceptor atom is greater than 140 degrees. And a salt bridge means that the distance between the most positively charged N atom of an Arg or Lys residue in a protein to the two most negatively charged oxygen atoms on the phosphate group of a nucleotide is less than 5 angstroms.

In the present invention, protein-DNA interactions are preferably calculated between two sets of residues. One is promoter DNA (ds-DNA-17 to-1) and the other is the protein core within 25 angstroms of promoter ds-DNA (-17 to-1). The invention utilizes the g _ energy module in Gromacs to recalculate the electrostatic (ele) and vdW interactions from the simulated trajectory of the hydration model.

The invention discloses a physical mechanism for reconnecting the virus T7 RNAP variant with promoter recognition along a laboratory directed evolution path by utilizing the full-atom MD simulation, and the promoter recognition of the RNAP is switched from the original T7 promoter to the T3 promoter. Since the first point mutation, N748D, occurs on the SPL (specific loop) of T7 RNAP, biased toward the T3 promoter, it severely perturbs the HB pattern of the protein-DNA interface, alters the balance between downstream of the SPL and upstream of the ATL (AT-rich loop), and slightly decouples RNAP from the promoter, preventing recognition of the original promoter. A helper helix (AXH 206-225), through the second and third mutations of RNAP (E222K and E207K), takes over the recognition of the switched RNAP-promoter along the directed evolution pathway, since AXH interacts more closely with the promoter after both mutations, mainly through charge interactions, and then relocates differently on the T7 and T3 promoters to support further differentiation. The specificity of the promoter is finally switched after mutation on ATL (R96L + K98R), which adjusts the HB and SB patterns of the protein-DNA, resetting the equilibrium between ATL and SPL. Additional mutations on the SPL (R759L or R759S) further modulate RNAP interaction with the promoter and maintain promoter specificity. Such structural dynamics and energy details discovered from simulations may assist in the structure-function information learning of the system to facilitate further rational design of specific RNAP-promoter identifications.

The following examples are provided to illustrate the simulation and prediction method for identifying different promoters by directed evolution of monomeric polymerases in the present invention, but they should not be construed as limiting the scope of the present invention.

Example 1

1. Materials and methods

1.1 obtaining apo T7 RNAP protein Structure

According to the crystal structure of T7 RNAP (PDB:1ARO), T7 lysozyme contained in the protein is removed, and MODELLER is used for filling the blank lacking in the protein (residues id 60-72, 165-182, 234-240, 345-384, 590-611). The resulting structure was then compared with apo T7 RNAP structure (PDB: 4RNP) containing only C.alpha.atoms, and agreement was found between the two.

1.2 docking of apo T7 RNAP to the double-stranded (ds) DNA promoter to construct a blocked promoter complex

A standard type B dsDNA containing the T7 promoter was generated using version 2.0 of the web 3DNA (w3DNA2.0) interface, the template strand consisting of 28 nucleotides.

The apo T7 RNAP construct was then docked to 30bp dsDNA containing the T7 promoter using Hdock Server. First, 100 composite structures were generated from Hdock. The putative binding patterns in the HDOCK server were then globally sampled using an FFT-based global docking program (HDOCK lite) using a modified shape-based pairwise scoring function, selecting three structures from the top 10 models that showed similar DNA promoter localization as in the crystal structure of the T7 RNAP open-start complex, followed by 1s all-atom MD simulation, and calculating HBs between SPL and promoter in the structures. Finally, based on the existing open-start complex structure of T7 RNAP, the structure with the highest score was selected, which well reflects the HB interaction of protein-DNA.

1.3 construction of structural models of mutant RNAPs from directed evolution

According to the directed evolution of the laboratory, wt-T7 RNAP progressively evolved into a series of mt-RNAPs that recognized the T7 promoter less and the T3 promoter more. Based on these mutants, 7T 7 RNAPs were selected to construct 14 mock systems, including wt-RNAP and variants (1M to 6M), and 2 dsDNA (containing either the T7 or T3 promoters). For mutation, the Tleap method in AmberTools was chosen to change the amino acids in the protein, such as mutating the DNA base pair in the T7 promoter to the base pair in the T3 promoter, and keeping the nucleic acid backbone unchanged. All of these constructed structures were subjected to a number of energy minimization treatments (20,000 energy minimization treatments) followed by the following MD simulations.

1.4 setting of the atomic MD simulation

All MD simulations were performed using the GROMACS-5.1 software package, including AMBER99sb-2012 force field and PARMBSC0 nucleic acid parameters. To neutralize the system and maintain the ion concentration at 0.15M 163 Na ions were added⁺Ions and 119 Cl^-Ions. The simulation system contains a total of about 156,000 atoms. The cutoff distance for van der waals forces (vdW) and short-range electrostatic interactions was set to 10 angstroms. The long range electrostatic interaction is treated with the particle web Ewald method (54). The interacting neighbor list is updated every five steps with a time step of 2 fs. Then, the following procedure is performed at the time of running each simulation: (i) performing 20,000 step energy minimization using the steepest descent algorithm; (ii) NVT equilibration at 200ps was carried out, then by using 1000kJ mol^-1nm²The force constant of (2) is such that the position of heavy atoms is limited to an NPT equilibrium of 500 ps. Using coupling constants of 0.1ps^-1The thermostat is readjusted to maintain the temperature at 310K; (iii) 1-s NTP set MD simulations were performed at 310K and 1 atmosphere, respectively, using a velocity-readjust thermostat and a Parriello-Rahman Barostat.

1.5 calculation of protein-DNAHBs and SBs

HB-and SB-interactions (> 10%) were formed between the binding regions of RNAP and promoter DNA in the last 800ns of the simulation. To determine HBs, the distance between the donor atom and the acceptor atom is less than 3.5 angstroms and the angle donor atom-hydrogen atom-acceptor atom is greater than 140 degrees. And a salt bridge means that the distance between the most positively charged N atom of an Arg or Lys residue in a protein to the two most negatively charged oxygen atoms on the phosphate group of a nucleotide is less than 5 angstroms.

1.6 calculation of protein-DNA interaction energetics

protein-DNA interactions were calculated between two sets of residues. One is promoter DNA (ds-DNA-17 to-1) and the other is the protein core within 25 angstroms of promoter ds-DNA (-17 to-1). The electrostatic (ele) and vdW interactions were recalculated from the simulated trajectories of the water-containing model using the g _ energy module in gromac.

1.7 construction of coarse-grained (CG) models and setup of CG simulations

CG simulations were performed by CafeMol 3.0 software. The initial structure of T7 RNAP is derived from the crystal structure (PDB ID: 1 ARO). The CG protein structure was constructed by using a non-lattice Go model, where each CG particle is located on the C α atom, representing one amino acid, and the conformation is biased toward the native structure under the Go model potential.

In the CG model of dsDNA (200 bp in length), each nucleotide is represented by three CG particles, corresponding to base, sugar and phosphate groups, respectively, by the 3spn.1 model, taking into account bond stretching, angular bending, dihedral distortion, base-base interactions, exclusion of volume effects, solubility energy and electrostatic energy. Electrostatic interactions and excluded volume effects are taken into account. All CG simulations were performed by Langevin kinetics under constant temperature conditions of a Berendsen thermostat.

2. Results

2.1 protein-DNA Electrostatic interaction energetics provides a quantitative measure for RNAP-promoter recognition

To investigate whether protein-DNA interactions that stabilize RNAP on promoters also contribute to promoter identification and differentiation, we calculated the electrostatic (ele, E ^ ele) and van der Waals (vdW) interaction energies between promoter binding regions (-17 to-5) of RNAP protein and DNA for WtT7 RNAP and all mutants (14 mock systems). The interaction between the atoms of the protein and DNA is at the cut-off distanceTime-calculated (cut-off distance)The results tend to be consistent). The energy calculations for Wt-RNAP (A in FIG. 2) and RNAP variants converged over simulated time. The energetics results obtained by averaging one microsecond simulated trajectory for a single system (B in fig. 2).

The results indicate that the E ^ ele deviation between RNAP and promoter binding region well characterizes the recognition preference of RNAP: the electrostatic interaction energy E ^ ele of Wt T7 RNAP with the T7 promoter is lower than that with the T3 promoter, that is, it binds more stably and electrically to the T7 promoter, consistent with its higher activity or recognition of the T7 promoter better than the T3 promoter. For 1M, 2M and 3M RNAP variants, their binding to the T7 and T3 promoters had similar E ^ ele, consistent with their poor differentiation of promoter activity between T7 and T3. Notably, for 5M and 6M-1/2 RNAPs that exhibit high activity on the T3 promoter but not the T7 promoter, the protein-DNA electrostatic energetics at the T3 promoter are significantly lower E ^ ele than at the T7 promoter. At the same time, vdW energetics also show a similar trend, i.e. to stabilize proteins on more active promoters (see fig. 2).

2.2 contribution of Single residues to the Electrostatic deflection of the RNAP-promoter and the corresponding dynamic changes on the protein-DNA binding interface

Since for some RNAPs, the protein-DNA energy difference between the T7 and T3 promoters well characterizes the recognition preference of the promoters. We have calculated for each system

(Andthe interaction energy of RNAP with the T7 and T3 promoters, respectively) and will be directed against Δ E^eleProjected onto a single Amino Acid (AA) as the RNAP ith AA(FIG. 3). Key AA which contributes significantly to promoter recognition, i.e.with large amplitudeMainly located in ATL (AA 93-101), AXH (206-. For wt RNAP, ATL-R96, AXH-R215, INB-R231 and SPL-R746/756 stabilize both the T7 and T3 promoters, with greater effect on T7 and less effect on T3; q135 (between INB and AXH) only interacts significantly with T7. Accordingly, ATL-R96, Q135, AXH-R215, INB-R231, SPL-R746 haveI.e., to favor RNAP more stable at the T7 promoter (B in FIG. 3). In addition, SPL-N748 showed a preference for the T7 promoter, although its interaction with the T7/T3 promoter was not particularly strong. On the other hand, however, AXH-E218 is biased toward the T3 promoterThere was also no significant interaction with the respective promoters.

In 1M (N748D, C in FIG. 3), the mutation itself has a direct energy bias towards the stable T3 promoter. However, the energy contribution of N748D to the T7 or T3 promoters is still negligible. Although ATL-G97(&T101), Q135 and INB-R231 are still presentBut ATL-K95 and K98 (together with R96)&KR99), AXH-H211 and SPL-N748D (together with T745)&R746) onset is evidentI.e., biased toward the T3 promoter. Thus, the 1M variant was mutated to straighten by N748DThe instability of SPL was introduced followed by disruption of promoter bias for ATL and AXH (see a and B in fig. 4).

2M (N748D + E222K, D in FIG. 3) then had an additional AXH-E222K mutation. Now ATL-R96 and T101, AXH-R215 and SPL-T745&R746 (together with K765) contributesWhile ATL-K98, Q135, AXH-H211, INB-R231 and SPL-N748D (along with Q758&P759&T760) haveIt appears that the AXH-E222K mutation resulted in a switch in promoter preferences for Q135 and INB-R231 (C in FIG. 4). Careful examination revealed that INB-R231 could switch its side chain from 1M to 2M, since E222K brought its side chain to the T7 promoter (K222 was still about 10 angstroms), rather than the T3 promoter, and accordingly R231 bound much better to the T3 promoter in 2M than to the T7 promoter.

Next, for 3M (N748D + E222K + E207K, E in FIG. 3) with the third mutation AXH-E207K, ATL-K98 (along with R96) can be obtained&R99)、H205、AXH-E207K、SPL-T745&R746&K765 hasTo stabilize the T7 promoter. And ATL-K95, AXH-H211&R215, INB-R231 (together with A234)&G235) And SPL-N748D (along with R756)Biased toward the T3 promoter. N748D showed significant stable interactions with the T3 promoter (greater than those in 1M or 2M), whereas INB-R231 had close interactions with both the T7 and T3 promoters, but still remained biased towards T3. Although AXH-E207K is largely stable and biased toward the T7 promoter by itself, it promotes preferential binding of AXH-R215 to the T3 promoter. Competition of the DNA promoters by E207K and R215 actually indicated that, since E207K successfully bound more tightly to the T7 promoter than R215, R215 actually bound more tightly to the T3 promoter than E207K (D in 4). Since the promoter is linked to E207K and R215 interacted with each other from the N-and C-termini of AXH, the orientation angle of AXH to the major axis of DNA changed greatly (D in FIG. 4), T7 decreased from 130 ° 15 'in 2M to 104 ° 21' in 3M, and T3 increased from 97 ° 17 'in 2M to 111 ° 8' in 3M. Thus, the change in T7 aligned AXH more perpendicularly to the DNA, and this trend continued to 5M and 6M; the trend for T3 is exactly the opposite. Overall, 2M and 3M did not differ much between the T7 and T3 promoters, however they did provide for the necessary or critical residence configuration for 5M/6M promoter identification/differentiation. In particular, the bias of R215 towards the T3 promoter and the concomitant change in orientation of AXH relative to the promoter DNA appear to be critical.

In contrast, the recognition and differentiation of the promoter became prominent in 5M and 6M, where R96L + K98R on ATL appeared additionally (5M, F in FIG. 3), followed by SPL-P759L/S (6M, G and H in FIG. 3). In both cases, there are more residue pairsThe T3 promoter was stabilized. In 5M, AXH-E207K (along with E218) and SPL-T745 (along with R746)&K765) Still isAnd ATL-K93 (together with K98R)&R99)、AXH-R215&E222K, INB-R231 and SPL-N748D all haveSince it is the K96 and R98 bias towards the T7 promoter in 2M and 3M, respectively, the mutations in both largely abrogated the bias of ATL towards the T7 promoter. The stability of SPL-K765 to the T7 promoter also disappeared compared to 3M.

In 6M-1(P759L), SPL-T745 and SPL-K765 can still be pairedMake a contribution to ATL-K98R, AXH-R215&Of H211, SPL-N748D (and several residues of Q744 to T760)In 6M-2(P759S), ATL-T101, AXH-K206 and SPL-K765 contributedATL-K95 to R99, AXH-R215&H211, SFL-N748D (and residues from Q744 to T760) haveIt appears that the SPL-P759L/S mutation induces the Q744 to T760 region to further stabilize the T3 promoter, whereas the flexible INB-R231 does not necessarily favor the T3 promoter. AXH-E207K stabilization of the T7 promoter was no longer sustained in 6M. In 5M or 6M, the edge-stabilizing effect of ATL/AXH/SPL on the T7 promoter was still present. At the same time, SPL-N748D (starting at 1M) and AXH-R215 (triggered at 3M) contributed strongly to the T3 promoter.

2.3 analysis of the RNAP-DNA promoter interface for Hydrogen Bonding (HBs) and further exploration of the AA contribution to recognition

To investigate the specific recognition of RNAP on the promoter, we examined the interaction of the corresponding Hydrogen Bonds (HBs) and Salt Bridges (SB) at the RNAP-promoter interface of each mimetic system. Since most hydrogen bonds are fluctuating and highly dynamic, we recorded hydrogen bonds with an occupancy of at least-10% during the microsecond simulation (about 0.8 seconds). The corresponding results are summarized in fig. 5 a. Note that the DNA sequences of the T7 and T3 promoters differ only at the-17, -15, -12, -11, and-10 positions, whereas the-12 to-10 positions are critical to promoter specificity.

HBs formed by ATL and DNA span widely from the upstream minor groove region (-17 to-15) to the middle major groove (-15 to-12). In wt-RNAP, ATL forms about 4 HBs (and T7) and 7 HBs (and T3) in the upstream region (-17 to-15), mainly with the template strand (denoted T); on the T7 and T3 promoters, an HB was formed at a very downstream position, i.e. T101: NT-12(NT denotes the non-template strand). In the single point mutant (1M), ATL forms about 6 HBs (together with T7) and about 4 HBs (together with T3) upstream; two HBs T101 are formed on the T7 promoter: NT-12 and K98: NT-11, these two HBs are converted to T101 on the T3 promoter: NT-11 and K98: NT-12. ALT remained at-6 HBs (vs T7) but 4-7 HBs (vs T3) upstream in the double and triple mutants (2M and 3M); the most downstream HB were formed as T101: NT-12/-11 on the T7 promoter or as T101: NT-12 on the T3 promoter. In the T3 promoter-preferred mutants (5M and 6M), the upstream ATL HBs were significantly reduced. 1-2 HBs in T7 and 3-4 HBs in T3; on whichever promoter, the most downstream HBs always remains T101: NT-12. Thus, in wt-RNAP, ATL HB appeared to be less associated with the upstream template DNA strand than T7, whereas in the transition mutants (1M to 3M), this trend shifted, and in the directed mutants (5M and 6M, due to the R96L and K98R mutations on ATL), the overall association of ATL HB with the promoter was reduced. In particular, the association of ATL with the most downstream T101: NT-12 can be shifted to NT-11 in the transition mutants (1M and 3M), but to stable T101: NT-12 in the directed mutants.

As for the AAs at AXH, all HBs were concentrated in the middle of the major groove on the non-template DNA strand (NT-13 to-11). In wt-RNAP, AXH formed 3 identical HBs with the T7 and T3 promoters (R215: NT-13, R215: NT-12 and H211: NT-11; B in FIG. 5). Therefore, these HBs seem to not contribute to the differentiation of the promoter. In 1M, AXH lost the HB association with the T7 promoter completely, while the two AXH HBs (R215: NT-12 and H211: NT-12) remained associated with the T3 promoter (C in FIG. 5). R215 and H211 HBs recover in some way in 2M (after E222K), H211 persists on the T3 promoter biased toward HB formation, and in 3M (after E207K, K207: NT-11 formation for both promoters; D and E in FIG. 5). On the T7 and T3 promoters, K207 continued to bind to NT-11 and even NT-10, R215 formed HBs with NT-13/12 on both promoters, and H211 remained preferentially bound to the T3 promoter (5M and 6M). Thus, it appears that 1M (or SPL-N748D) critically breaks the balance of the promoter DNA associations between AXH HB and the two species (T7 and T3), leaving AXH-H211 as HB for the T3 promoter, but no longer associated with the T7 promoter; mutations E222K and E207K were then made. Occurring directly at AXH, enhanced the binding of AXH to DNA and supported the HB preference of H211 for the T3 promoter, continuing to the directional mutant.

HBs formed by SPL and promoter DNA were predominantly located far downstream (mainly on templates T-10 to-7) up to NT-11 (only N748 in wt-RNA and T7), or down to-6/5 (T760 in wt-RNAP and T7 or 6M and T3). It is noteworthy that N748D appeared as the first and most critical mutation, moving the base specific HB N748: NT-11 and T7 and N748: NT-10 and T3 to D748: NT-10 in all RNAP variants (from 1M to 6M; FIG. 5). In the transition mutant (1M-3M) and the directed mutant (5M-6M), D748 was the only residue that formed HB with the DNA base (NT-10), both on the T7 and T3 promoters; NT-10 is additionally associated with E207K and/or T745, only on the T7 and not the T3 promoter. Thus, N748D appears to be critical for the relocation of HB interactions recognized by the promoter and supports AXH re-wiring differentiation of the promoter. Other HB-forming residues in SPL appear to maintain stable contacts in all systems, including wt-RNAP and variants (R756: G-9, Q758: A-8, R746: G-7). Note that T760: T-6 is present in wt-RNAP, with the preference of the T7 promoter; then, due to the mutation P759L/S, it turned into a preference for the T3 promoter (6M).

Finally, the INB region forms HBs (occasionally also R231 to-8 or S241 to-5) primarily through the positions of R231, Q239 and S241 with templates-7 and-6. In wt-RNAP, INB formed more HBs with the T7 promoter than with the T3 promoter. In a variant of RNAP, such a bias may be reduced or even slightly reversed. It appears that INB may still play some role. In particular, INB-R231 showed a transient effect (in 2M and 5M) in promoting bias towards the T3 promoter, either energetically or by formation of HBs with DNA, whereas in general the R231 side chain was highly flexible, frequently swinging, without a sustained bias.

Furthermore, we have investigated SB interactions at the RNAP-promoter DNA interface. Upstream ATL had a dominant effect on SB interaction, indicating that there was no significant difference between the T7 and T3 promoters. However, in wt-RNAP, ATL SB R96-NT-16 with the T7 promoter does help to energetically stabilize the T7 system; in 2M/3M, the same is true for SB K98-T-15/T-14 on the T7 promoter (see FIG. 2B). Interestingly, this stabilization and bias was abolished in 5M/6M with R96L and K98R, indicating that mutations in the ATL just promote specificity for the T3 promoter. Meanwhile, in the critical-12 to-10 region of promoter differentiation, there were SB from AXH (e.g., R215-NT, all RNAPs except M3 with T7; K207-NT starting from 3M), from ATL (K98 or R98, all mt-RNAPs except M3 with T7; occasionally K95-NT, M1 with T3 and M3 with T7), and from SPL (occasionally K765-NT, 1M/5M/6M with T7). In particular, we can see that AXH is involved in more SB interactions with DNA, starting with mutations in 1M-2M, AXH-SB extending further in 3M-6M.

We have exploited full-atom MD simulations to reveal the physical mechanism of the viral T7 RNAP variant to reconnect promoter recognition along the laboridirected evolution pathway, as RNAP promoter recognition switches from the original T7 promoter to a slightly different T3 promoter. Since the first point mutation, N748D, occurs on the SPL (specific loop) of T7 RNAP, biased toward the T3 promoter, it severely perturbs the HB pattern of the protein-DNA interface, alters the balance between downstream of the SPL and upstream of the ATL (AT-rich loop), and slightly decouples RNAP from the promoter, preventing recognition of the original promoter. Notably, current studies have identified a helper helix (AXH 206-225) that takes over the recognition of the switched RNAP-promoter along the directed evolution pathway by the second and third mutations of RNAP (E222K and E207K), because AXH interacts more tightly with the promoter after the two mutations, primarily through charge interactions, and then relocates differently on the T7 and T3 promoters to support further differentiation. The specificity of the promoter is finally switched after mutation on ATL (R96L + K98R), which adjusts the HB and SB patterns of the protein-DNA, resetting the equilibrium between ATL and SPL. Additional mutations on the SPL (R759L or R759S) further modulate RNAP interaction with the promoter and maintain promoter specificity. Such structural dynamics and energy details discovered from simulations may assist in the structure-function information learning of the system to facilitate further rational design of specific RNAP-promoter identifications.

The foregoing is only a preferred embodiment of the present invention, and it should be noted that, for those skilled in the art, various modifications and decorations can be made without departing from the principle of the present invention, and these modifications and decorations should also be regarded as the protection scope of the present invention.

Sequence listing

<110> Huzhou survey planning and design institute, Inc. of Nuclear industry

<120> simulation prediction method for identifying different promoters by directed evolution of monomeric polymerase

<160> 1

<170> SIPOSequenceListing 1.0

<210> 1

<211> 28

<212> DNA

<213> Artificial Sequence (Artificial Sequence)

<400> 1

attatgctga gtgatcccct cttgactg 28

16页详细技术资料下载

Simulation prediction method for identifying different promoters through directed evolution of monomeric polymerase

相关技术

网友询问留言