Modified terminal deoxynucleotidyl transferase (TdT)

文档序号:54217 发布日期:2021-09-28 浏览:33次 中文

阅读说明:本技术 修饰的末端脱氧核苷酸转移酶(TdT) (Modified terminal deoxynucleotidyl transferase (TdT) ) 是由 迈克尔·春·浩·陈 戈登·罗斯·米钦罗伊 伊恩·哈斯顿·库克 陈思红 于 2020-02-04 设计创作,主要内容包括:本发明涉及工程化末端脱氧核苷酸转移酶(TdT)或任何物种的Polμ、Polβ、Polλ和Polθ的同源氨基酸序列或任何物种的X家族聚合酶的同源氨基酸序列及其在核酸合成的方法中的用途,涉及合成核酸的方法,以及涉及包含所述酶的试剂盒在核酸合成的方法中的用途。本发明还涉及新的末端脱氧核苷酸转移酶和3’-封闭的核苷三磷酸在非模板依赖性核酸合成的方法中的用途。(The present invention relates to an engineered terminal deoxynucleotidyl transferase (TdT) enzyme or homologous amino acid sequence of Pol μ, Pol β, Pol λ and Pol θ of any species or homologous amino acid sequence of X family polymerase of any species and use thereof in a method of nucleic acid synthesis, to a method of synthesizing nucleic acids, and to the use of a kit comprising said enzyme in a method of nucleic acid synthesis. The invention also relates to the use of novel terminal deoxynucleotidyl transferases and 3' -blocked nucleoside triphosphates in methods for template-independent nucleic acid synthesis.)

1. A modified terminal deoxynucleotidyl transferase (TdT) comprising an amino acid modification when compared to the wild-type sequence SEQ ID NO1 or a truncated form thereof or the homologous amino acid sequence of a terminal deoxynucleotidyl transferase (TdT) in other species or the homologous amino acid sequence of Pol μ, Pol β, Pol λ and Pol θ of any species or the homologous amino acid sequence of an X family polymerase of any species, wherein the amino acid is modified at one or more of the following amino acids: v32, a33, I34, F35, a53, V68, V71, E97, I101, M108, G109, a110, Q115, V116, S125, T137, Q143, M152, E153, N154, H155, N156, Q157, I158, I165, N169, N173, S175, E176, G177, P178, C179, L180, a181, F182, M183, R184, a185, L188, H194, a195, I196, S197, S198, S199, K200, E203, G204, D210, Q211, T212, K213, a214, I216, E217, D218, L220, Y222, V228, D230, Q238, T239, L242, L251, K260, G291, F261, H263, S264, L341, L312, L320, L312, L310, L312, L310, L53, K318, G291, G53, F262, H296, S263, S53, S351, S320, K53, K213, L320, K320, L320, K320, a53, K320, a214, K320, a53, K320, a53, K320, a53, K320, a53, K320, a53, K320, a53, K320, a53, K320, i363, L364, L365, Y366, Y367, D368, I369, V370, K376, T377, C381, K383, D388, H389, F390, Q391, K392, F394, I397, K398, K400, K401, E402, L403, a404, a405, G406, R407, D411, a421, P422, P423, V424, D425, N426, F427, a430, R438, F447, a448, R449, H450, E451, R452, K453, M454, L455, L456, D457, N458, H459, a460, L461, Y462, D463, K464, T465, K464, K467, T474, D477, D485, Y486, I487, D488, P489.

2. The modified terminal deoxynucleotidyl transferase (TdT) enzyme of claim 1, which comprises at least one amino acid modification when compared to the wild-type sequence or a truncated form thereof, wherein said modification is selected from one or more of amino acid region WLLNRLINRLQNQGILLYYDIV, VAIF, MGA, MENHNQI, SEGPCLAFMRA, HAISSS, DQTKA, KGFHS, QADNA, HFTKMQK, SAAVCK, EAQA, TVRLI, GKEC, TPEMGK, DHFQK, LAAG, APPVDNF, FARHERKMLLDNHALYDKTKK and DYIDP of the sequence of SEQ ID NO1 or a homologous region in other species or a homologous region of Pol μ, Pol β, Pol λ and Pol θ of any species or a homologous region of the X family polymerase of any species.

3. The modified terminal deoxynucleotidyl transferase (TdT) according to claim 1, which comprises at least one amino acid modification when compared to the wild-type sequence SEQ ID NO1 or a homologous amino acid sequence of a terminal deoxynucleotidyl transferase (TdT) in other species, wherein said modification is selected from one or more of amino acid region WLLNRLINRLQNQGILLYYDI, VAIF, MGA, ENHNQ, FMRA, HAI, TKA, FHS, QADNA, MQK, SAAVCK, EAQA, TVR, KEC, TPEMGK, DHFQ, LAAG, APPVDN, FARHERKMLLDNHA and YIDP of the sequence of SEQ ID NO1 or a homologous region in other species.

4. The modified terminal deoxynucleotidyl transferase (TdT) according to any of claims 1 to 3, which comprises at least the following sequence:

TVSQYACQRRTTMENHNQIFTDAFAILAENAEFNESEGPCLAFMRAASLLKSLPHAISSSKDLEGLPCLGDQTKAVIEDILEYGQCSKVQDVLCDDRYQTIKLFTSVFGVGLKTAEKWYRKGFHSLEEVQADNAIHFTKMQKAGFLYYDDISAAVCKAEAQAIGQIVEETVRLIAPDAIVTLTGGFRRGKECGHDVDFLITTPEMGKEVWLLNRLTNRLQNQGILLYYDIVESTFDKTRLPCRKFEAMDHFQKCFAIIKLKKELAAGRVQKDWKAIRVDFVAPPVDNFAFALLGWTGSRQFERDLRRFARHERKMLLDNHALYDKTKKIFLPAKTEEDIFAHLGLDYIDPWQRNA

or homologous regions in other species or homologous regions of Pol μ, Pol β, Pol λ and Pol θ of any species or homologous regions of X family polymerase of any species, wherein said sequence has one or more amino acid modifications in one or more of the following amino acid regions of said sequence: WLLNRLINRLQNQGILLYYDIV, MENHNQI, SEGPCLAFMRA, HAISSS, DQTKA, KGFHS, QADNA, HFTKMQK, SAAVCK, EAQA, TVRLI, GKEC, TPEMGK, DHFQK, LAAG, APPVDNF, FARHERKMLLDNHALYDKTKK and DYIDP.

5. The modified terminal deoxynucleotidyl transferase (TdT) according to any one of claims 1 to 4, comprising at least two amino acid modifications when compared to the wild-type sequence SEQ ID NO1 or a homologous amino acid sequence of a terminal deoxynucleotidyl transferase (TdT) in other species, wherein:

a. the first modification is in amino acid region WLLNRLINRLQNQGILLYYDIV of the sequence of SEQ ID NO1 or in a homologous region in other species; and is

b. The second modification is selected from one or more of the amino acid regions VAIF, MGA, ENHNQ, FMRA, HAI, TKA, FHS, QADNA, MQK, SAAVCK, EAQA, TVR, KEC, TPEMGK, DHFQ, LAAG, APPVDN, FARHERNKMLLDNHA and YIDP of the sequence of SEQ ID NO1 or homologous regions in other species.

6. The modified terminal deoxynucleotidyl transferase (TdT) of claim 5, wherein the modification in amino acid region WLLNRLINRLQNQGILLYYDIV improves the solubility of the enzyme compared to the wild-type sequence, and one or more modifications of VAIF, MGA, ENHNQ, FMRA, HAI, TKA, FHS, QADNA, MQK, SAAVCK, EAQA, TVR, KEC, TPEMGK, DHFQ, LAAG, APPVDN, FARHERNKMLLDNHA and YIDP improve the incorporation of a nucleotide having a modification at the 3' position.

7. The modified terminal deoxynucleotidyl transferase (TdT) of any one of claims 1 to 6, wherein the wild-type sequence is selected from the group consisting of:

8. the modified terminal deoxynucleotidyl transferase (TdT) of any one of claims 1 to 7, wherein the region of amino acidsWLLNRLINRLQNQGILLYYDIThe modification within V is at one or more of the underlined amino acids.

9. The modified terminal deoxynucleotidyl transferase (TdT) of claim 8, wherein the modification within amino acid region WLLNRLINRLQNQGILLYYDIV is a modification of QLLPKVINLWEKKGLLLYYDLV.

10. The modified terminal deoxynucleotidyl transferase (TdT) of any one of claims 1 to 9, wherein the modification is selected from one or more of the amino acid regions VAIF, MGA, ENHNQ, FMRA, HAI, TKA, FHS, QADNA, MQK, SAAVCK, EAQA, TVR, KEC, TPEMGK, DHFQ, LAAG, APPVDN, FARHERMLLDNHA and YIDP.

11. The modified terminal deoxynucleotidyl transferase (TdT) of any one of claims 1 to 9, wherein the modification is selected from one or more of amino acids a53, V68, V71, E97, I101, G109, Q115, V116, S125, T137, Q143, N154, H155, Q157, I158, I165, G177, L180, a181, M183, a195, K200, T212, K213, a214, E217, T239, F262, S264, Q269, N272, a273, K281, S291, K296, Q300, T309, R311, E330, T341, E343, G345, N352, N360, Q361, I363, Y367, H389, L403, G406, D411, a421, P422, V424, N, R426, F438, R447, L455, and/or D488.

12. The modified terminal deoxynucleotidyl transferase (TdT) according to any one of claims 1 to 9, wherein the modification is selected from the group consisting of amino acid changes a53, V68, V71, E97, I101, G109, Q115, V116, S125, T137, Q143, M152, N154, H155, Q157, I158, I165, N169, N173, S175, G177, C179, L180, a181, M183, L188, a195, S197, S198, S199, K200, E203, G204, D210, Q211, T212, K213, a214, I216, E217, D218, L220, Y291, V222, V228, D230, T262, D210, Q211, T212, T213, T312, K304, L312, Q309, L312, Q309, G271, Q309, L312, Q309, G312, L312, Q309, Q312, Q320, Q309, Q320, a 311, Q309, Q312, Q309, a 311, Q309, Q320, Q309, Q320, Q309, L311, Q309, Q320, L304, a 311, Q320, L304, a 311, Q309, Q320, a 311, Q309, a 311, Q320, Q309, Q320, a 311, Q309, Q320, a 311, Q309, Q320, a 311, Q320, a 304, a 311, Q320, D304, a 311, a 304, a31, Q320, a31, Q309, D304, Q309, a 311, Q309, Q320, a31, Q320, a31, Q320, a31, D304, Q320, D304, a31, G320, Q320, G304, D304, a 311, a 309, D304, a 311, D304, C320, C304, a 311, C304, C320, T341S, P342A, E343G, E343Q, M344A, M344Q, M344K, G345R, K346R, N352Q, K353R, K353D, V354I, V354L, I355V, I355M, N356R, N356D, W358L, N360K, Q361K, G362E, I363L, Y366L, Y367L, D368L, D L, V370L, K376L, T377L, C381L, K383L, H389L, K392L, F36394, I427, K L, K36400, K L, N421, N L, N401, N53, N L, N401, N53, N L, N53, N L, N401, N53, N L, N401, N53, N L, N53, N L, N401, N L, N53, N L, N401, N53, N L, N53, N L, N53, N L, N53, N L, N53, N L, N53, N L, N53, N L, N53, N.

13. The modified terminal deoxynucleotidyl transferase (TdT) of claim 10, wherein the modification is selected from the amino acid region FMRA、QADNA、EAQA、APPVDNFARHERKMLLDNHA and YIDP, the modification being at the underlined amino acids.

14. The modified terminal deoxynucleotidyl transferase (TdT) enzyme of claim 13, wherein the modification is selected from one or more of the following sequences FRRA, QADKA, EADA, mppvvdn, farrekmlldrha and YIPP.

15. The modified terminal deoxynucleotidyl transferase (TdT) enzyme of claim 13, wherein the modification is selected from two or more of the following sequences FRRA, QADKA, EADA, mppvvdn, farrekmlldrha and YIPP.

16. The modified terminal deoxynucleotidyl transferase (TdT) of claim 15, wherein the modification comprises each of the following sequences FRRA, QADKA, EADA, MPPVDN, farrekmlldrha and YIPP.

17. The modified terminal deoxynucleotidyl transferase (TdT) according to any of claims 1 to 16 comprising a sequence selected from SEQ ID NO 4 to SEQ ID NO 173 or a truncated form thereof.

18. The modified terminal deoxynucleotidyl transferase (TdT) according to any of claims 1 to 16 comprising a sequence selected from SEQ ID NO174 to SEQ ID NO 343.

19. The modified terminal deoxynucleotidyl transferase (TdT) according to any of claims 1 to 16 comprising a sequence selected from SEQ ID NO344 to SEQ ID NO 727.

20. A method of nucleic acid synthesis, the method comprising the steps of:

(a) providing an initiator oligonucleotide;

(b) adding 3' -blocked nucleoside triphosphates to the initiator oligonucleotide in the presence of a terminal deoxynucleotidyl transferase (TdT) as defined in any one of claims 1 to 19;

(c) removing all reagents from the initiator oligonucleotide;

(d) cleaving the blocking group in the presence of a cleaving agent; and is

(e) Removing the lysing agent.

21. The method defined in claim 20, wherein more than 1 nucleotide is added by repeating steps (b) to (e).

22. The method as defined in claim 20 or claim 21, wherein the 3' -blocked nucleoside triphosphate is blocked by a group selected from: 3' -O-azidomethyl, 3' -aminooxy, 3' -O- (N-oxime), 3' -O-allyl, 3' -O-cyanoethyl, 3' -O-acetyl, 3' -O-nitrate, 3' -phosphate, 3' -O-acetyllevulinate, 3' -O-tert-butyldimethylsilane, 3' -O-trimethyl (silyl) ethoxymethyl, 3' -O-O-nitrobenzyl or 3' -O-p-nitrobenzyl.

23. The method defined in claim 22, wherein the 3 '-blocked nucleoside triphosphate is blocked with a 3' -O-azidomethyl, 3 '-aminooxy or 3' -O-allyl group.

24. A kit comprising a terminal deoxynucleotidyl transferase (TdT) as defined in any one of claims 1 to 19 in combination with an initiator oligonucleotide and one or more 3' -blocked nucleoside triphosphates.

Technical Field

The present invention relates to the use of a specific terminal deoxynucleotidyl transferase (TdT) enzyme or a homologous amino acid sequence of Pol μ, Pol β, Pol λ and Pol θ of any species or a homologous amino acid sequence of an X family polymerase of any species in a method for nucleic acid synthesis, to a method for synthesizing nucleic acids, and to the use of a kit comprising said enzymes in a method for nucleic acid synthesis. The invention also relates to the use of terminal deoxynucleotidyl transferase or homologous enzyme and 3' -blocked nucleoside triphosphate in a method for template-independent nucleic acid synthesis.

Background

Nucleic acid synthesis is crucial to modern biotechnology. The ability of the scientific community to artificially synthesize DNA, RNA and proteins has made possible the rapid development of the biotechnology field.

Artificial DNA synthesis allows biotechnology and pharmaceutical companies to develop a range of peptide therapeutics, such as insulin for the treatment of diabetes. Artificial DNA synthesis allows researchers to characterize cellular proteins to develop new small molecule therapies for treating diseases such as heart disease and cancer that our aging population is facing today. Artificial DNA synthesis even paves the way to create life, as demonstrated by the Venter study in 2010 when they place an artificially synthesized genome in bacterial cells.

However, the current DNA synthesis technology does not meet the needs of the biotechnology industry. Despite the mature technology, it is practically impossible to synthesize DNA strands longer than 200 nucleotides, and most DNA synthesis companies only provide up to 120 nucleotides. In contrast, the average protein-encoding gene is on the order of 2000-3000 consecutive nucleotides (order), one chromosome is at least one million consecutive nucleotides in length, and the average number of eukaryotic genomes is billion nucleotides. To produce nucleic acid strands of thousands of base pairs in length, all major gene synthesis companies today rely on a variation of the "synthesis and splice" technique, in which overlapping 40-60-mer fragments are synthesized and spliced together by enzymatic copying and extension. Current methods generally allow lengths of up to 3kb for routine production.

The reason why DNA of more than 120-200 nucleotides cannot be synthesized at one time is due to the current methods for generating DNA, which use synthetic chemistry (i.e., phosphoramidite technology) to prepare DNA by coupling one nucleotide at a time. Even if the efficiency of each nucleotide coupling step is 99% efficient, it is mathematically impossible to synthesize DNA longer than 200 nucleotides in acceptable yields. The Venter study took 4 years and $ 2000 million to synthesize a relatively small bacterial genome illustrating this laborious process.

Known methods of DNA sequencing use a template-dependent DNA polymerase to add a 3' -reversible terminating nucleotide to a growing double-stranded substrate. In the "sequencing-by-synthesis" method, each added nucleotide contains a dye, allowing the user to identify the exact sequence of the template strand. This technique is capable of generating strands between 500-1000bp long, despite being on double stranded DNA. However, this technique is not suitable for de novo nucleic acid synthesis because existing nucleic acid strands are required as templates.

Various attempts have been made to use terminal deoxynucleotidyl transferase for de novo single stranded DNA synthesis. In contrast to controlled de novo single stranded DNA synthesis, uncontrolled de novo single stranded DNA synthesis takes advantage of the 3 '-tailing nature of the deoxynucleoside 5' -triphosphates (dntps) on single stranded DNA of TdT to generate, for example, homopolymeric adaptor sequences for next generation sequencing library preparation. In controlled extension, it is desirable to employ reversible deoxynucleoside 5 '-triphosphate termination techniques to prevent uncontrolled addition of dntps at the 3' -end of the growing DNA strand. The development of a controlled single-stranded DNA synthesis method by TdT would be invaluable for in situ DNA synthesis of gene assembly or hybridization microarrays, as it eliminates the need for a non-aqueous environment and allows the use of various polymers that are incompatible with organic solvents.

However, it has not been shown that TdT efficiently adds nucleoside triphosphates comprising a 3' -O-reversible terminating moiety to build a nascent single-stranded DNA strand necessary for de novo synthesis cycles. The 3' -O-reversible terminating moiety will prevent a terminal transferase such as TdT from catalyzing the nucleotidyl transferase reaction between the 3' -end of the growing DNA strand and the 5 ' -triphosphate of the incoming nucleoside triphosphate.

Thus, there is a need to identify modified terminal deoxynucleotidyl transferases that readily incorporate a 3' -O-reversible terminator nucleotide. The modified terminal deoxynucleotidyl transferase enzymes can be used to incorporate 3' -O-reversible terminating nucleotides in a manner useful for biotechnological and single-stranded DNA synthesis methods in order to provide improved nucleic acid synthesis methods that can overcome the problems associated with currently available methods.

Summary of The Invention

Described herein are modified terminal deoxynucleotidyl transferase (TdT) enzymes or homologous amino acid sequences of Pol μ, Pol β, Pol λ and Pol θ of any species or of X family polymerase of any species. Terminal transferases are ubiquitous in nature and are found in many species. Many known TdT sequences have been reported in the NCBI databasehttp://www.ncbi.nlm.nih.gov/In (1).

GI numbered species http:// www.ncbi.nlm.nih.gov >

gi |768 cattle (Bos taurus)

gi |460163 hen (Gallus Gallus)

gi |494987 Xenopus laevis (Xenopus laevis)

gi |1354475 Rainbow trout (Oncorhynchus mykiss)

gi |2149634 short-tailed opossum (Monodelphis domestica)

gi |12802441 little mouse (Mus musculus)

gi |28852989 Mexico blunt mud eel (Ambystoma mexicanum)

gi |38603668 Fugu rubripes (Takifugu rubripes)

gi |40037389 crystal skate (Raja eglanteria)

gi |40218593 shark (Ginglymostoma cirratum)

gi |46369889 Zebra fish (Danio rerio)

gi |73998101 family dog (Canis lupus family)

gi |139001476 round-tail Lemur (Lemur cata)

gi |139001490 Shibata fox monkey (Microcebus murinus)

gi |139001511 Small ear big baby monkey (Otolemur garnettii)

gi |148708614 rattus norvegicus

gi |149040157 Brown rat (Rattus norvegicus)

gi |149704611 horse (Equus caballus)

gi |164451472 domestic cattle

gi |169642654 tropical Xenopus laevis (Xenopus (Silurana) tropicalis)

gi |291394899 European rabbit (Oryctolagus cuniculus)

gi |291404551 European Rabbit

gi |301763246 panda (Ailuropoda melanoleuca)

gi |311271684 wild boar (Sus scrofa)

gi |327280070 Angelen (angios carolinensis)

gi |334313404 short-tail chinchilla

gi 344274915 African elephant (Loxodonta africana)

gi |345330196 duckbill (Ornithionchus anatinus)

gi |348588114 Guinea pig (Cavia porcellus)

gi |351697151 naked mole rat (Heterocephalus glaber)

gi |355562663 Kiwi (Macaca mulatta)

gi |395501816 badger (Sarcophilus harrisii)

gi 395508711 badger

gi |395850042 Small ear big baby monkey

gi |397467153 bonobo (Pan paniscus)

gi|403278452 Saimiri boliviensis boliviensis

gi |410903980 Fugu rubripes

gi |410975770 family cat (Felis cat)

gi |432092624 giant Wei mouse ear bat (Myotis davidii)

gi |432113117 giant Wei mouse ear bat

gi |444708211 Tree shrew (Tupaia chinensis)

gi |460417122 Erlenmeyer mud eel (Pleurodels waltl)

gi |466001476 whale tiger (Orcinus orca)

gi |471358897 Florida Populus (Trichechus manatus)

gi |478507321 white rhinoceros (Ceratotherium simulum)

Gi |478528402 white rhinoceros

Gi |488530524 Jiu band armadillo (Dasypus novemcintus)

gi |499037612 Pseudolarix Zebra (Maylandia zebra)

gi |504135178 North American rat rabbit (Ochotona princeps)

gi |505844004 common shrew (Sorex aranaeus)

gi |505845913 common shrew

gi |507537868 African squirrel (Jaculus Jaculus)

gi 507572662 African diving rat

gi |507622751 Chile eight-dentate mouse (Octodon degus)

gi |507640406 Small horse island hedgehog (Echinops telfairi)

gi 507669049 small hedgehog

gi |507930719 Star-nose mole (Condylura cristata)

gi |507940587 Star-nose mole

gi |511850623 mink eyes (Mustella purorius furo)

gi |512856623 tropical Xenopus laevis

gi |512952456 naked mole rat

gi |524918754 golden hamster (Mesocriticus auratus)

gi |527251632 budgerigar (Melopsis undaria)

gi |528493137 Zebra fish

gi |528493139 Zebra fish

gi 529438486 falcon (Falco peregrinus)

gi |530565557 Western tortoise (Chrysomys picta bellii)

gi |532017142 grassland voles (Microtus ochrogaster)

gi |532099471 Tenebrio trilineage squirrel (Icatomyes tridencemlineata)

gi |533166077 Choline rat (Chinchilla lanigera)

gi 533189443 Chonemus Nastus

gi |537205041 Chinese hamster (Cricetulus griseus)

gi |537263119 Chinese hamster

gi |543247043 Chinese canary (geospizza fortis)

gi |543351492 Crow (pseudopodococcus humileis)

gi |543731985 original pigeon (Columba livia)

gi |544420267 cynomolgus monkey (Macaca fascicularis)

gi |545193630 domestic horse

gi|548384565 Pundamilia nyererei

gi |551487466 Camphor maculatus (Xiphorus maculotus)

gi |551523268 Cambodia maculata

gi |554582962 batus brucei (Myotis brandtii)

gi |554588252 bats of Brucella

gi |556778822 Tibetan antelope (Panthopes hodgsonii)

gi |556990133 spear fish (Latimeria chalumnae)

gi |557297894 winnowing crocodile (Alligator sinensis)

gi |558116760 Chinese Soft shelled turtle (Pelodiscus sinensis)

gi |558207237 Small brown bat (Myotis lucifugus)

gi |560895997 Camel (Camelus ferus)

gi |560897502 camel

gii |562857949 tree shrew

gii |562876575 tree shrew

gi |564229057 Mississississippinensis crocodile (Alligator mississippiensis)

gi 564236372 Mississississippi crocodile

gi |564384286 brown rats.

The sequences of the various terminal transferases described show some regions of highly conserved sequence, as well as some regions of high difference between different species. The sequence alignment used for sequences from selected species is shown in figure 2.

The present inventors have modified terminal transferase from finless eel (finless eel) (shown as SED ID 1). However, corresponding modifications may be introduced to similar terminal transferase sequences from any other species, including the sequences listed above in the various NCBI entries, including those shown in fig. 2 or truncated forms thereof.

The amino acid sequence of sparrow eel (Lepisosteus oculatus) is shown below (SEQ ID 1)

An engineered variant of this sequence was previously identified as SEQ ID NO 8 in publication WO 2016/128731. Further engineering improvements to the published sequences are described herein. The modified sequences disclosed herein differ from SEQ ID NO 8 disclosed in the prior art. SEQ ID NO 2 of WO2016/128731 is a "mis-annotated" wild-type finless eel sequence.

SEQ ID NO 8 in publication WO2016/128731 is shown below, wherein engineered mutations were identified:

the present inventors have identified various amino acid modifications in the amino acid sequence with improved properties. Certain regions improve enzyme solubility and handling. Certain other regions improve the ability to incorporate nucleotides with modifications; these modifications include modifications at the 3' -position of the sugar and modifications to the base.

Described herein are modified terminal deoxynucleotidyl transferases (TdT) comprising amino acid modifications when compared to the wild-type sequence SEQ ID NO1 or truncated forms thereof or the homologous amino acid sequence of a terminal deoxynucleotidyl transferase (TdT) in other species or the homologous amino acid sequence of Pol μ, Pol β, Pol λ and Pol θ of any species or the homologous amino acid sequence of a family X polymerase of any species, wherein the amino acid is modified at one or more of the following amino acids:

V32、A33、I34、F35、A53、V68、V71、E97、I101、M108、G109、A110、Q115、V116、S125、T137、Q143、M152、E153、N154、H155、N156、Q157、I158、I165、N169、N173、S175、E176、G177、P178、C179、L180、A181、F182、M183、R184、A185、L188、H194、A195、I196、S197、S198、S199、K200、E203、G204、D210、Q211、T212、K213、A214、I216、E217、D218、L220、Y222、V228、D230、Q238、T239、L242、L251、K260、G261、F262、H263、S264、L265、E267、Q269、A270、D271、N272、A273、H275、F276、T277、K278、M279、Q280、K281、S291、A292、A293、V294、C295、K296、E298、A299、Q300、A301、Q304、I305、T309、V310、R311、L312、I313、A314、I318、V319、T320、G328、K329、E330、C331、L338、T341、P342、E343、M344、G345、K346、W349、L350、L351、N352、R353、L354、I355、N356、R357、L358、Q359、N360、Q361、G362、I363、L364、L365、Y366、Y367、D368、I369、V370、K376、T377、C381、K383、D388、H389、F390、Q391、K392、F394、I397、K398、K400、K401、E402、L403、A404、A405、G406、R407、D411、A421、P422、P423、V424、D425、N426、F427、A430、R438、F447、A448、R449、H450、E451、R452、K453、M454、L455、L456、D457、N458、H459、A460、L461、Y462、D463、K464、T465、K466、K467、T474、D477、D485、Y486、I487、D488、P489。

modifications to improve solubility include modifications within the amino acid region WLLNRLINRLQNQGILLYYDIV highlighted in the sequence below.

Modifications that improve the incorporation of modified nucleotides can be at one or more of the selected regions shown below. Regions were selected based on mutation data (FIGS. 1 and 3), sequence alignment (FIG. 2) and structural data obtained from the spotted finless eel TdT co-crystallized with DNA and 3' -modified dNTPs (FIGS. 4-14). The second modification may be selected from one or more of the amino acid regions VAIF, MGA, MENHNQI, segpclefmra, haiss, DQTKA, KGFHS, QADNA, HFTKMQK, SAAVCK, EAQA, TVRLI, GKEC, TPEMGK, YYDIV, DHFQK, LAAG, APPVDNF, farrekmlldnhalydktkk and DYIDP as highlighted in the sequences below.

Specific modifications that improve the incorporation of modified nucleotides may be at one or more of the selected regions shown below. The second modification may be selected from one or more of the amino acid regions VAIF, MGA, ENHNQ, FMRA, HAI, TKA, FHS, QADNA, MQK, SAAVCK, EAQA, TVR, KEC, TPEMGK, DHFQ, LAAG, APPVDN, farherekmlldnha and YIDP, as highlighted in the sequences below.

Described herein is a modified terminal deoxynucleotidyl transferase (TdT) comprising at least one amino acid modification when compared to the wild-type sequence of SEQ ID NO1 or a homologous amino acid sequence of a terminal deoxynucleotidyl transferase (TdT) in another species, wherein the modification is selected from one or more of amino acid region WLLNRLINRLQNQGILLYYDIV, VAIF, MGA, MENHNQI, SEGPCLAFMRA, HAISSS, DQTKA, KGFHS, QADNA, HFTKMQK, SAAVCK, EAQA, TVRLI, GKEC, TPEMGK, DHFQK, LAAG, APPVDNF, FARHERKMLLDNHALYDKTKK and DYIDP of the sequence of SEQ ID NO1 or a homologous region in another species.

Reference to a particular sequence includes truncated forms thereof. Included herein are modified terminal deoxynucleotidyl transferases (TdT) comprising at least one amino acid modification when compared to the wild-type sequence of SEQ ID NO1 or a truncated form thereof or the homologous amino acid sequence of a terminal deoxynucleotidyl transferase (TdT) in other species, wherein said modification is selected from the group consisting of amino acid region WLLNRLINRLQNQGILLYYDIV, VAIF, MGA, MENHNQI, SEGPCLAFMRA, HAISSS, DQTKA, KGFHS, QADNA, HFTKMQK, SAAVCK, EAQA, TVRLI, GKEC, TPEMGK, DHFQK, LAAG, APPVDNF, FARHERKMLLDNHALYDKTKK of the sequence of SEQ ID NO1 and one or more of DYIDP or the homologous region of other species.

The truncated protein may comprise at least the regions shown below

Described herein are modified terminal deoxynucleotidyl transferases (TdT) comprising at least the following sequence:

or homologous regions in other species, wherein the sequence has one or more amino acid modifications in one or more of the following amino acid regions of the sequence: WLLNRLINRLQNQGILLYYDI, MENHNQI, SEGPCLAFMRA, HAISSS, DQTKA, KGFHS, QADNA, HFTKMQK, SAAVCK, EAQA, TVRLI, GKEC, TPEMGK, DHFQK, LAAG, APPVDNF, FARHERKMLLDNHALYDKTKK and DYIDP.

Disclosed herein is a modified terminal deoxynucleotidyl transferase (TdT) comprising at least one amino acid modification when compared to the wild-type sequence, wherein the modification is selected from one or more of amino acid region WLLNRLINRLQNQGILLYYDI, VAIF, MGA, MENHNQI, SEGPCLAFMRA, HAISSS, DQTKA, KGFHS, QADNA, HFTKMQK, SAAVCK, EAQA, TVRLI, GKEC, TPEMGK, DHFQK, LAAG, APPVDNF, FARHERKMLLDNHALYDKTKK and DYIDP of the sequence of SEQ ID NO1 or a region of homology in other species.

Disclosed herein is a modified terminal deoxynucleotidyl transferase (TdT) comprising at least one amino acid modification when compared to the wild-type sequence, wherein the modification is selected from one or more of amino acid region WLLNRLINRLQNQGILLYYDI, VAIF, MGA, ENHNQ, FMRA, HAI, TKA, FHS, QADNA, MQK, SAAVCK, EAQA, TVR, KEC, TPEMGK, DHFQ, LAAG, APPVDN, FARHERKMLLDNHA and YIDP of the sequence of SEQ ID NO1 or a region of homology in other species.

Described herein is a modified terminal deoxynucleotidyl transferase (TdT) comprising at least one amino acid modification when compared to the wild-type sequence of SEQ ID NO1 or a homologous amino acid sequence of a terminal deoxynucleotidyl transferase (TdT) in another species, wherein the modification is selected from one or more of amino acid region WLLNRLINRLQNQGILLYYDI, VAIF, MGA, ENHNQ, FMRA, HAI, TKA, FHS, QADNA, MQK, SAAVCK, EAQA, TVR, KEC, TPEMGK, DHFQ, LAAG, APPVDN, FARHERKMLLDNHA and YIDP of the sequence of SEQ ID NO1 or a homologous region in another species.

Homologous refers to a protein sequence having a common evolutionary origin between two or more proteins (including proteins from a superfamily in the same organism species as well as homologous proteins from different species). Such proteins (and their encoding nucleic acids) have sequence homology as reflected by their sequence similarity, whether in terms of percent identity or by the presence of specific residues or motifs and conserved positions. Sequence homology can be determined using various means of protein (and nucleic acid encoding it) sequence alignment. For example, the Clustal Omega multiple sequence alignment program provided by the European Molecular Biology Laboratory (EMBL) can be used to determine sequence homology or regions of homology.

The improved sequences as described herein may comprise two modifications, i.e.

a. The first modification is within amino acid region WLLNRLINRLQNQGILLYYDIV of the sequence of SEQ ID NO1 or within a region of homology in other species; and is

b. The second modification is selected from one or more of the amino acid regions VAIF, MGA, MENHNQI, SEGPCLAFMRA, HAISSS, DQTKA, KGFHS, QADNA, HFTKMQK, SAAVCK, EAQA, TVRLI, GKEC, TPEMGK, DHFQK, LAAG, APPVDNF, FARERKMLLDNHALYDKTKK and DYIDP of the sequence of SEQ ID NO1 or homologous regions in other species.

The sequence may be truncated.

The improved sequences as described herein may comprise two modifications, i.e.

a. The first modification is within amino acid region WLLNRLINRLQNQGILLYYDI of the sequence of SEQ ID NO1 or within a region of homology in other species; and is

b. The second modification is selected from one or more of the amino acid regions VAIF, MGA, ENHNQ, FMRA, HAI, TKA, FHS, QADNA, MQK, SAAVCK, EAQA, TVR, KEC, TPEMGK, DHFQ, LAAG, APPVDN, FARHERNKMLLDNHA and YIDP of the sequence of SEQ ID NO1 or homologous regions in other species.

As a comparison with other species, the sequence of cattle (bovine) TdT is shown below:

homologous regions in the sequences are highlighted below.

Modifications to improve solubility include modifications within the amino acid region QLLPKVINLWEKKGLLLYYDLV highlighted in the sequence below.

Modifications that improve the incorporation of modified nucleotides can be at one or more of the selected regions shown below. The second modification may be selected from one or more of the amino acid regions LVLF, MGA, LNNYNHI, NEVSYVTFMRA, FTIISM, DKVKC, MGFRS, MSDKT, KFTKMQK, VSCVTR, EAEA, AVWAFL, GKKI, SPGSAE, YYDLV, DHFQK, MCPYENR, YATHERKMMLDNHALYDKTKR and DYIEP highlighted in the sequence below.

Modifications that improve the incorporation of modified nucleotides can be at one or more of the selected regions shown below. The second modification may be selected from one or more of the amino acid regions LVLF, MGA, NNYNH, FMRA, FTI, VKC, FRS, MSDKT, MQK, EAEA, AVW, KKI, SPGSAE, DHFQ, MCPYEN, yathermkmdnha and YIEP as highlighted in the sequences below.

As a comparison with other species, the sequence of the mouse (mouse) TdT is shown below:

modifications to improve solubility include modifications within the amino acid region QLLHKVTDFWKQQGLLLYCDIL highlighted in the sequence below.

Modifications that improve the incorporation of modified nucleotides can be at one or more of the selected regions shown below. The second modification may be selected from one or more of the amino acid regions LVLF, MGA, LNNYNQL, negscharfmra, FPITSM, DKVKS, MGFRT, QSDKS, RFTQMQK, VSCVNR, EAEA, AVVTFL, GKMT, SPEATE, DHFQK, SGQ, MCPYDRR, YATHERKMMLDNHALYDRT, R and DYIEP highlighted in the sequences below.

Modifications that improve the incorporation of modified nucleotides can be at one or more of the selected regions shown below. The second modification may be selected from one or more of the amino acid regions LVLF, MGA, LNNYNQ, negstscharmr, FPI, VKS, FRT, SKIQSDKS, MQK, VSCVNR, EAEA, AVV, KMT, SPEATE, DHFQK, MCPYDR, yathermkmmldnha and YIEP highlighted in the sequences below.

Modifications that improve the incorporation of modified nucleotides can be at one or more of the selected regions shown below. The second modification may be selected from one or more of the amino acid regions LVLF, MGA, NNYNQ, FMRA, FPI, VKS, FRT, QSDKS, MQK, VSCVNR, EAEA, AVV, KMT, speed, DHFQ, MCPYDR, yathermkmmldnha and YIEP highlighted in the sequences below.

Thus, by the process of aligning the sequences, it is immediately apparent which regions in the terminal transferase sequences from other species correspond to the sequences described herein for the spotted finless eel sequence shown in SEQ ID NO 1.

Sequence homology extends to all modified or wild-type members of the X family of polymerases, such as DNA Pol μ (also known as DNA polymerase μ or POLM), DNA Pol β (also known as DNA polymerase β or POLB), and DNA Pol λ (also known as DNA polymerase λ or POLL). It is well known in the art that all X family member polymerases in which TdT is a member either have terminal transferase activity or can be engineered to obtain terminal transferase activity similar to terminal deoxynucleotidyl transferase (Biochim Biophys acta.2010 May; 1804(5): 1136-. For example, when the following human TdT Loop 1 amino acid sequence

...ESTFEKLRLPSRKVDALDHF...

Engineered to replace the following human Pol μ amino acid residues

... HSCCESPTRLAQQSHMOAF.,

chimeric human Pol μ comprising human TdT loop 1 achieved robust terminal transferase activity: (Nucleic Acids Res.2006Sep;34(16):4572-4582)。

Furthermore, it is generally demonstrated in U.S. patent application No. 2019/0078065 that a family X polymerase can obtain robust terminal transferase activity when engineered to comprise TdT loop 1 chimeras. Additionally, it was also shown that TdT can be converted into a template-dependent polymerase by specific mutations in the loop 1 motif (Nucleic Acids Research, Jun 2009,37(14): 4642-4656). As has been shown in the art, X family polymerases can be generally modified to exhibit template-dependent or non-template-dependent nucleotidyl transferase activity. Thus, all of the motifs, regions and mutations set forth in this patent can be extended generally to modified X family polymerases such that the modified X family polymerases are generally capable of incorporating 3' -modified nucleotides, reversible terminator nucleotides and modified nucleotides to effect a method of nucleic acid synthesis.

As a comparison with other X family polymerases, the human Pol μ sequence is shown below:

modifications to improve solubility include modifications within the amino acid region GLLPRVMCRLQDQGLILYHQHQ highlighted in the sequence below.

Modifications that improve the incorporation of modified nucleotides can be at one or more of the selected regions shown below. The second modification may be selected from one or more of the amino acid regions VAIY, LGA, LTHHNTG, SEGRLLTFCRAA, SPVTTL, EHSSR, EGLRT, REQP, KLTQQQKA, STPVLR, DVDA, AVGQA, GKLQ, HPKEGQ, YHQHQ, DAFER, VAPVSQ, FSEKGLWLNSHGLFDPEQK and EYLPP highlighted in the sequence below.

Thus, by the process of aligning the sequences, it is immediately apparent which regions of the sequence of all X family polymerases from any species correspond to the sequences described herein with respect to the zebra eel sequence shown in SEQ ID NO 1.

Furthermore, it was demonstrated that the family a polymerase DNA Pol θ (also called DNA polymerase θ or POLQ) shows robust terminal transferase ability: (eLife2016; 5: e 13740). DNA Pol θ has also been shown to be useful in methods for nucleic acid synthesis (UK patent application No. 2553274). In U.S. patent application No. 2019/0078065, it is shown that chimeras of DNA Pol θ and a family X polymerase can be engineered to obtain a robust terminal transferaseActive and become capable of use in methods for nucleic acid synthesis. Thus, all motifs, regions and mutations displayed in this patent can be extended to modified family a polymerases, in particular DNA Pol θ in general, such that the modified family a polymerases are generally able to incorporate 3' -modified nucleotides, reversible terminator nucleotides and modified nucleotides to achieve a method of nucleic acid synthesis.

Brief Description of Drawings

FIG. 1. incorporation of 3' -O-CH by terminal deoxynucleotidyl transferase (TdT) SEQ ID NO 1-1732-N3Nucleoside 5' -triphosphates. The incorporation rate is defined by the amount of reversibly terminating nucleoside 5 '-triphosphate (pmol) incorporated per nanogram of TdT per minute into the 3' -end of a single-stranded DNA molecule. Wild-type bovine TdT activity is represented by a dashed line near 0 on the y-axis. The dynamic range of the assay was saturated at 2.0pmol incorporation per nanogram of TdT per minute.

FIG. 2 sequence alignment of interspecies homologs of selected wild-type terminal deoxynucleotidyl transferases using the Clustal Omega multiple sequence alignment program supplied by the European Molecular Biology Laboratory (EMBL) multiple sequence alignment site.

FIG. 3, top inset: in single nucleotide incorporation assays, base-modified substrates a, C, G, and T are provided to engineered variants. A describes 3' -aminooxy 6-azido 2 ' -deoxyadenosine 5 ' -triphosphate; c describes 3' -aminooxy 4-azido 5-methyl 2 ' -deoxycytidine 5 ' -triphosphate; g describes 3' -aminooxy 2-azido 2 ' -deoxyguanosine 5 ' -triphosphate; t describes 3' -aminooxy 5- (-hydroxy-1-butynyl) 2 ' -deoxyuridine 5 ' -triphosphate. A pool of DNA initiators with degenerate ends is immobilized to a solid support and exposed to a Nucleotide Addition Mixture (NAM) solution comprising (1) a TdT variant, (2) a neutral pH buffer, (3) a monovalent salt, (4) cobalt chloride, and (5) 3' -ONH2-dXTP, wherein X is a modified nucleobase yielding A, C, G or T. The incubation temperature was 37 ℃ and the reaction time was 2.5 minutes. The solid support is then washed with a high salt solution at neutral pH followed by a low salt solution at neutral pH. The starter pool was converted to a sequencing library and analyzed by Next Generation Sequencing (NGS) with PE30 reads on Illumina NextSeq 500. The bcl file was converted to a fastq file using the Illumina's bcl2fastq conversion software and analyzed in R. The added incorporation efficiency ([ reads comprising N +1 products) for each modified base was calculated for all possible initiator contexts (initiator contexts)]/{ [ reads containing N initiators]Read containing N +1 products]). The average incorporation efficiency across all backgrounds is shown in figure 3 (upper panel). Single nucleotide incorporation of the base-modified reversible termination nucleotides a, C, G and T by wild-type bovine and wild-type spotted finless eel (spottgar) TdT is indicated by the dashed line. Since the wild-type bovine or wild-type spotted finless eel TdT was unable to incorporate these base-modified reversible terminator nucleotides in the assay, the dashed line appears at y-0. Mutations in all of the presented TdT variants resulted in improved TdT-modified base incorporation relative to wild-type bovine and wild-type spotted finless eel TdT.

The following figures: using 3' -ONH2Nucleoside 5' -triphosphates with terminal deoxynucleotidyl transferase (TdT) SEQ ID NO1&344-727 was performed on a solid support for the 8-nt nucleic acid sequence. The DNA sequence 5 ' -ATCGATCG-3 ' is synthesized by repeatedly exposing a solid support-bound DNA initiator to a Nucleotide Addition Mixture (NAM) solution comprising (1) TdT, (2) a neutral pH buffer, (3) a monovalent salt, (4) cobalt chloride, and (5) 3' -ONH2-dntps, wherein N is selected from adenine, thymine, cytosine or guanine. One cycle consists of: (A) incubating a NAM solution with the indicated A, T, C or G reversible terminator nucleotide on a solid support for 5 minutes at 37 ℃; (B) the solid support is then washed with a high salt solution at neutral pH; (C) then exposing the solid support to an aqueous acidic sodium nitrite solution; and (D) the solid support is then washed with the same high salt solution at neutral pH as (B). Then (A) - (D) were repeated 7 more times to synthesize the desired 8-nt sequence. Synthesis ofThe DNA of (a) was analyzed by running the reaction on a denaturing polyacrylamide gel and quantified by a fluorophore covalently attached to the DNA initiator. The fraction of full length (8-nt species) was determined by taking the fluorescence intensity of the 8-nt band and dividing by the total lane intensity. The TdT activity of wild-type bovine and wild-type spotted finless eels is indicated by the dotted line because it is related to the full-length fraction. All mutations contained in the TdT variant resulted in an improvement in TdT relative to wild-type bovine and wild-type spotted finless eel TdT.

Figure 4 sparus spotted TdT homology structure based on co-crystal structure of preferred engineered nucleora TdT variants binding to DNA initiator and reversibly terminating dntps. Engineered TdT variants with BRCT domain truncation were expressed in e.coli (Escherichia coli) and purified by immobilized metal affinity chromatography followed by size exclusion chromatography. Then the above-mentioned TdT (60. mu.M) was mixed with DNA oligonucleotide (5' -TTTTT [ ddC ]]-3'; 120 μ M), cobalt chloride (1mM) in the absence (upper) or presence (lower) of dATP-ONH2(1mM) cocrystallization was performed by sitting drop vapor diffusion (sitting drop vapor diffusion) in the following reservoir solutions: Bis-Tris HCl (100mM), NaCl (27mM), 22% w/v PEG 3350 or Bis-Tris HCl (10mM) and 27.5% w/v PEG MME 500, respectively. Plate-like crystals appeared and grown to 400X 50X 5 μm over 1 week3And diffract to 2.6 angstroms and 2.5 angstroms, respectively, wherein space group is P212121And cell sizes are respectivelyAnd data is indexed (extended), integrated, and scaled using an automated software pipeline at Diamond Light Source. Molecular replacement with homology models of sparrow eel (spotted eel), followed by rigid-body-interspersed artificial model construction (man)The cycling of the real model building interconnected with the structured-body), simulated annealing, energy minimization and the individual isotropic D-factor, yields a complete structural model (R, respectively)free27.8% and 28.6%). Cobalt ions were identified by their anomalous scattering. (ii) Bromus punctatus TdT wild type sequence with BRCT domain truncation>95% sequence identity) was modeled on both structures.

Figure 5. surface representation of sparrow eel TdT homology structure as modeled on the crystal structure of the engineered TdT variant as depicted in figure 4 with the MENHNQI motif, which contains the ENHNQ motif, highlighted in black. TdT contains two entrances to the nucleotidyl transferase active site where a DNA initiator and reversibly terminating dntps can enter and bind. The MENHNQI and ENHNQ motifs control access to the nucleotidyl transferase active site as determined by modeling and visual inspection of the active site, and the mutations shown in this patent demonstrate that mutating this motif results in increased TdT incorporation activity of the reversible terminating nucleotide.

Figure 6. the spotted finless TdT homology structure, as modeled on the crystal structure of the engineered TdT variant as depicted in figure 4, is represented by the surface of the SEGPCLAFMRA motif (180 degree rotation of left side view on right) which SEGPCLAFMRA motif comprises the FMRA motif, highlighted in black. TdT contains two entrances to the nucleotidyl transferase active site where a DNA initiator and reversibly terminating dntps can enter and bind. As determined by modeling and visual inspection of the active site, SEGPLCLAFMRA and the FMRA motif (1) control the entry of the reversible terminating nucleotide into the nucleotidyl transferase active site through steric effects and electrostatic interactions, and (2) are packaged directly against the critical TGSR motif, which is in direct contact with the reversible terminating nucleotide. Specifically, SEGPLCLAFMRA and the FMRA motif directly contact and regulate the positioning of R438, and R438 directly contacts the reversible terminator nucleotide, in contrast to the TGSR motif. The mutations shown in this patent demonstrate that mutating these motifs results in increased TdT incorporation activity of the reversible terminator nucleotide.

Figure 7. surface representation of sparrow eel TdT homology structure modeled on the crystal structure of the engineered TdT variant as depicted in figure 4 with HFTKMQK motif, which contains MQK motif, highlighted in black. TdT contains two entrances to the nucleotidyl transferase active site where a DNA initiator and reversibly terminating dntps can enter and bind. The HFTKMQK and MQK motifs bind to the nucleic acid initiator and help locate the 3' -end of the initiator to nucleophilic attack the incoming nucleotide as determined by modeling and visual inspection of the active site. The mutations shown in this patent demonstrate that mutating this motif results in increased TdT incorporation activity of the reversible terminator nucleotide.

Figure 8 surface representation of sparrow eel TdT homology structures as modeled on the crystal structure of the engineered TdT variants as depicted in figure 4 with SAAVCK motif highlighted in black. TdT contains two entrances to the nucleotidyl transferase active site where a DNA initiator and reversibly terminating dntps can enter and bind. The SAAVCK motif controls the entry of a reversible terminating nucleotide into the nucleotidyl transferase active site as determined by modeling and visual inspection of the active site. The mutations shown in this patent demonstrate that mutating this motif results in increased TdT incorporation activity of the reversible terminator nucleotide.

Figure 9 shows the spotted finless TdT homology structure modeled onto the crystal structure of the engineered TdT variant as depicted in figure 4, with the surface (left) and cartoon/globular (right) representation of the GKEC motif, which contains the KEC motif, highlighted in black. TdT contains two entrances to the nucleotidyl transferase active site where a DNA initiator and reversibly terminating dntps can enter and bind. The GKEC and KEC motifs control the reversible termination of nucleotides into and direct binding to the nucleotidyl transferase active site as determined by modeling and visual inspection of the active site. The mutations shown in this patent demonstrate that mutating this motif results in increased TdT incorporation activity of the reversible terminator nucleotide.

Figure 10 surface representation of sparrow eel TdT homology structure modeled onto the crystal structure of the engineered TdT variant as depicted in figure 4 with DHFQK motif comprising DHFQ motif highlighted in black. TdT contains two entrances to the nucleotidyl transferase active site where a DNA initiator and reversibly terminating dntps can enter and bind. The DHFQK and DHFQ motifs control the entry of the nucleic acid initiator into the nucleotidyl transferase active site by steric impacting (sterically immunizing) the nucleic acid initiator as determined by modeling and visual inspection of the active site. The mutations shown in this patent demonstrate that mutating this motif results in increased TdT incorporation activity of the reversible terminator nucleotide.

FIG. 11-DNA initiator and dATP-ONH as depicted in FIG. 42The crystal structure of the combined engineered TdT variant is highlighted in black spheres with a cartoon representation of the FARHERKMLLDNHA motif. TdT contains two entrances to the nucleotidyl transferase active site where a DNA initiator and reversibly terminating dntps can enter and bind. The FARHERKMLLDNHA motif controls nucleotide and nucleic acid initiator binding in the nucleotidyl transferase active site by sterically impacting nucleotides and initiators as determined by modeling and visual inspection of the active site. Additionally, motif FARHERKMLLDNHA was mutated to FARHERKMLLDRHA. The mutation at Asn to Arg results in the direct binding of the Arg residue to one of the two catalytic cobalt ions, which is essential for activity. The mutations shown in this patent demonstrate that the mutated residues within this motif mediate Co2+A reversible terminator nucleotide and/or a nucleic acid initiator, resulting in increased TdT incorporation activity of the reversible terminator nucleotide.

FIG. 12-DNA initiator and dATP-ONH as depicted in FIG. 42The cartoon representation of the crystal structure of the combined engineered TdT variant and the FARHERKMLLDNHALYDKTKK motif is highlighted as black spheres. TdT contains two entrances to the nucleotidyl transferase active site where a DNA initiator and reversibly terminating dntps can enter and bind. The FARHERKMLLDNHALYDKTKK motif controls nucleotide and nucleic acid initiator binding in the nucleotidyl transferase active site by sterically impacting nucleotides and initiators as determined by modeling and visual inspection of the active site. Additionally, motif FARHERKMLLDNHALYDKTKK was mutated to FARHERKMLLDRHALYDKTKK are provided. The mutation at Asn to Arg results in the direct binding of the Arg residue to one of the two catalytic cobalt ions, which is essential for activity. The mutations shown in this patent demonstrate that the mutated residues within this motif mediate Co2+CanThe positioning of the critical amino acids contacted by the reverse terminator nucleotide and/or nucleic acid initiator results in an increase in TdT incorporation activity of the reversible terminator nucleotide.

FIG. 13-DNA initiator and dATP-ONH as depicted in FIG. 42A cartoon representation of the combined engineered TdT variant crystal structure and the DYIDP motif, which contains the YIDP motif, highlighted in black spheres. TdT contains two entrances to the nucleotidyl transferase active site where a DNA initiator and reversibly terminating dntps can enter and bind. The DYIDP and YIDP motifs control nucleotide binding in the active site of the nucleotidyl transferase by modulating the positioning of a494 within the active site as determined by modeling and visual inspection of the active site. This most C-terminal (most C-terminal) residue (Ala494) sterically hits the entering nucleotide. Mutating the YIDP motif modulates the positioning of the Ala residue closest to the C-terminus, resulting in increased TdT incorporation activity of the reversible terminator nucleotide.

FIG. 14-DNA initiator and dATP-ONH as depicted in FIG. 42The cartoon representation of the combined engineered TdT variant crystal structure and YYDIV motif, highlighted in black spheres. TdT contains two entrances to the nucleotidyl transferase active site where a DNA initiator and reversibly terminating dntps can enter and bind. The YYDIV motif controls the positioning of the nucleic acid initiators by spatially impinging upon the third to last (III), second to last (II), and finally (I) 3' -nucleotides in the nucleic acid initiators as determined by modeling and visual inspection of the active site. In particular, this motif impacts the pyrimidine (left, e.g. crystalline, I shown in spherical representation&II), but in particular purines (right, II modeled into the structure as a purine). Mutations in the YYDIV motif regulate the positioning and entry of the nucleic acid initiator into the nucleotidyl transferase active site, resulting in increased TdT incorporation activity of the reversible terminating nucleotide.

FIG. 15 interaction with DNA initiator and dATP-ONH as depicted in FIG. 42The crystal structure of the combined engineered TdT variant is in cartoon representation with an APPVDNF motif comprising the APPVDN motif highlighted in black spheres. In the case of this construction, it is preferable that,APPVDNF motif is shown to be mutated from Ala to MetMPPVDNF. By modeling, it was determined that the APPVDNF motif and the APPVDN motif control the localization of the following motifs by packaging directly against them: YYDIV (upper right) and FARHERKMLLDNHALYDKTKK (lower left) motifs (both shown as light grey spheres). Additionally, the APPVDNF and APPVDN motifs control the localization of the following motifs by determining the loop conformation of the motifs: the TGSR motif (top left) and the aspartate catalytic triad (bottom right) critical to the nucleotidyl transferase mechanism are both shown in light gray spheres. Mutations in the APPVDNF motif and APPVDN motif directly affect how the above-mentioned key protein motifs regulate their contact with reversible terminator nucleotides and nucleic acid initiators. It has been shown in this patent that mutations in the APPVDNF and APPVDN motifs result in increased TdT incorporation activity of the reversible terminator nucleotide.

Detailed Description

Modified terminal deoxynucleotidyl transferases (tdts) are described herein. Terminal transferases are ubiquitous in nature and are found in many species. Many known TdT sequences have been reported in the NCBI database. The sequences described herein are modified from those of spotted finless eels, but corresponding variations may be introduced into homologous sequences from other species. The homologous amino acid sequences of Pol. mu, Pol. beta, Pol. lambda and Pol. theta or the homologous amino acid sequences of the X family polymerases also have terminal transferase activity. Reference to terminal transferase also includes the homologous amino acid sequences of Pol μ, Pol β, Pol λ and Pol θ, or the homologous amino acid sequences of X family polymerases wherein such sequences have terminal transferase activity.

Disclosed herein is a modified terminal deoxynucleotidyl transferase (TdT) comprising at least one amino acid modification when compared to the wild-type sequence, wherein the modification is selected from one or more of amino acid region WLLNRLINRLQNQGILLYYDIV, VAIF, MGA, MENHNQI, SEGPCLAFMRA, HAISSS, DQTKA, KGFHS, QADNA, HFTKMQK, SAAVCK, EAQA, TVRLI, GKEC, TPEMGK, DHFQK, LAAG, APPVDNF, FARHERKMLLDNHALYDKTKK and DYIDP of the sequence of SEQ ID NO1 or a region of homology in other species.

Described herein are modified terminal deoxynucleotidyl transferases (TdT) comprising at least the sequence ID 729:

or equivalent homologous regions in other species, wherein the sequence has one or more amino acid modifications in one or more of the following amino acid regions of the sequence: WLLNRLINRLQNQGILLYYDIV, MENHNQI, SEGPCLAFMRA, HAISSS, DQTKA, KGFHS, QADNA, HFTKMQK, SAAVCK, EAQA, TVRLI, GKEC, TPEMGK, DHFQK, LAAG, APPVDNF, FARHERKMLLDNHALYDKTKK and DYIDP. The above 355 amino acid sequence can be attached to other amino acids without affecting the function of the enzyme. For example, there may be an additional N-terminal sequence incorporated simply as a protease cleavage site, such as the sequence MENLYFQG.

Disclosed is a modified terminal deoxynucleotidyl transferase (TdT) comprising at least one amino acid modification when compared to the wild-type sequence of SEQ ID NO1 or a homologous amino acid sequence of a terminal deoxynucleotidyl transferase (TdT) in another species, wherein the modification is selected from one or more of amino acid region WLLNRLINRLQNQGILLYYDI, VAIF, MGA, ENHNQ, FMRA, HAI, TKA, FHS, QADNA, MQK, SAAVCK, EAQA, TVR, KEC, TPEMGK, DHFQ, LAAG, APPVDN, FARHERKMLLDNHA and YIDP of the sequence of SEQ ID NO1 or a homologous region in another species.

Also disclosed is a modified terminal deoxynucleotidyl transferase (TdT) comprising at least two amino acid modifications when compared to the wild-type sequence of SEQ ID NO1 or a homologous amino acid sequence of the terminal deoxynucleotidyl transferase (TdT) in other species, wherein;

a. the first modification is within amino acid region WLLNRLINRLQNQGILLYYDIV of the sequence of SEQ ID NO1 or within a region of homology in other species; and is

b. The second modification is selected from one or more of the amino acid regions VAIF, MGA, ENHNQ, FMRA, HAI, TKA, FHS, QADNA, MQK, SAAVCK, EAQA, TVR, KEC, TPEMGK, DHFQ, LAAG, APPVDN, FARHERNKMLLDNHA and YIDP of the sequence of SEQ ID NO1 or homologous regions in other species.

When compared to the sequence SEQ ID NO 2 of cattle (bovine) TdT;

a. the first modification is within amino acid region QLLPKVINLWEKKGLLLYYDLV of the sequence of SEQ ID NO 2 or within a region of homology in other species; and is

b. The second modification is selected from one or more of the amino acid regions LVLF, MGA, NNYNH, FMRA, FTI, VKC, FRS, MSDKT, MQK, EAEA, AVW, KKI, SPGSAE, MCP, YATHERMLKMDNHA and YIEP of the sequence of SEQ ID NO 2 or homologous regions in other species.

When compared to the sequence of SEQ ID NO3 of mouse (mouse) TdT;

a. the first modification is within amino acid region QLLHKVTDFWKQQGLLLYCDIL of the sequence of SEQ ID NO3 or within a region of homology in other species; and is

b. The second modification is selected from one or more of the amino acid regions LVLF, MGA, NNYNQ, FMRA, FPI, VKS, FRT, QSDKS, MQK, VSCVNR, EAEA, AVV, KMT, SPEATE, DHFQ, MCPYDR, YATHERKMMLDNHA and YIEP of the sequence of SEQ ID NO3 or homologous regions in other species.

The modification may be selected from any amino acid other than the wild type sequence. The amino acid may be a naturally occurring amino acid. The modified amino acid may be selected from ala, arg, asn, asp, cys, gln, glu, gly, his, ile, leu, lys, met, phe, pro, ser, thr, trp, val, and sec.

For the sake of brevity, modifications related to SEQ ID NO1 are also described, but the modifications apply to sequences from other species, such as those listed above with sequences in the NCBI database. Sequence modifications are also applicable to truncated forms of SEQ ID NO 1.

The sequences may also be modified at positions other than those described. Embodiments of the invention may include, for example, sequences having modifications to amino acids outside of defined positions, provided that the sequences retain terminal transferase activity. Embodiments of the invention may include, for example, sequences having amino acid truncations outside of defined positions, provided that the sequences retain terminal transferase activity. For example, the sequence may be BRCT truncated as described in application WO2018215803, wherein amino acids are removed from the N-terminus while retaining or improving activity. Thus, alterations, additions, insertions or deletions or truncations of amino acid positions outside the claimed regions are within the scope of the invention, provided that the claimed regions as defined are modified as claimed. The sequences described herein refer to TdT enzymes, which are typically at least 300 amino acids in length. All sequences described herein can be considered to have at least 300 amino acids. The claims do not include peptide fragments or sequences that do not function as terminal transferases.

Modifications in region WLLNRLINRLQNQGILLYYDIV or corresponding regions from other species help to improve the solubility of the enzyme. Amino acid regionWLLNRLINRLQNQGILLYYDIThe modification within V may be at one or more of the underlined amino acids.

Specific variations may be selected from W-Q, N-P, R-K, L-V, R-L, L-W, Q-E, N-K, Q-K or I-L.

The sequence WLLNRLINRLQNQGILLYYDIV may be changed to QLLPKVINLWEKKGLLLYYDLV.

The second modification improves incorporation of nucleotides having a modification at the 3' position compared to the wild type sequence. The second modification may be selected from one or more of the amino acid regions VAIF, MGA, ENHNQ, FMRA, HAI, TKA, FHS, QADNA, MQK, SAAVCK, EAQA, TVR, KEC, TPEMGK, DHFQ, LAAG, APPVDN, FARHERNKMLLDNHA and YIDP of the sequence of SEQ ID NO1 or homologous regions in other species. The second modification may be selected from two or more of the amino acid regions VAIF, EDN, MGA, ENHNQ, FMRA, HAI, TKA, FHS, QADNA, MQK, SAAVCK, EAQA, TVR, KEC, TPEMGK, DHFQ, LAAG, appvdf, farrekmlldnha and YIDP of the sequence of SEQ ID NO1 or homologous regions in other species shown highlighted in the sequence below.

The identified positions start at positions V32, M108, F182, T212, D271, M279, E298, a421, L456, Y486. The modifications disclosed herein comprise at least one modification at a defined position.

In the sequences below, the modified regions are numbered as such

WLLNRLINRLQNQGILLYYDIV, 349 to 370

VAIF, 32 to 35

MGA, 108 to 110

MENHNQI, 152 to 158

SEGPCLAFMRA, 175 to 185

HAISSS, 194 to 199

DQTKA, 210 to 214

KGFHS, 260 to 264

QADNA, 269 to 273

HFTKMQK, 275 to 281

SAAVCK, 291 to 296

EAQA, 298 to 301

TVRLI, 309 to 313

GKEC, 328 to 331

TPEMGK, 341 to 346

DHFQK, 388 to 392

LAAG, 403 to 406

APPVDNF, 421 to 427

FARHERKMLLDNHALYDKTKK, 447 to 467

DYIDP, 485 to 489

The modified amino acids may be in the region FMRA. The modified amino acids may be in the region QADNA. The modified amino acids may be in the region EAQA. The modified amino acids may be in the region APP. The modified amino acids may be in the region LDNHA. The modified amino acid may be in the region YIDP. Region FARHERKMLLDNHA facilitates removal of substrate bias in modification. FARHERKMLLDNHA regions appear to be highly conserved across species.

Selected from the amino acid region FMRA、QADNA、EAQA、APP、FARHERKMLLDNHA and YIDThe modification of one or more of P may be at one or more underlined amino acids.

The present invention may be described by modifications at certain specified amino acids rather than by modifications in specified domains. Described herein are modified terminal deoxynucleotidyl transferases (TdT) comprising amino acid modifications when compared to the wild-type sequence SEQ ID NO1 or truncated forms thereof or the homologous amino acid sequence of the terminal deoxynucleotidyl transferase (TdT) in other species, wherein the amino acids are modified at one or more of the following amino acids: a53, V68, V71, E97, I101, G109, Q115, V116, S125, T137, Q143, M152, N154, H155, Q157, I158, I165, N169, S175, G177, C179, L180, a181, M183, a195, S197, S198, S199, K200, D210, Q211, T212, K213, a214, E217, T239, K260, F262, S264, Q269, D271, N272, a273, H275, T277, K281, S291, K296, Q488, Q300, T309, R311, L312, I313, G328, E330, C331, T341, P342, E343, M344, G345, K346, N352, R406, L354, I355, N392, N360, L358, N447, G362, Y363, Y424, Y438, K405, K414, R392, N354, L392, L358, N464, N414, Y363, Y62, Y363, Y414, F485, F411, K414, R411, R414, and K414.

Described herein are modified terminal deoxynucleotidyl transferases (TdT) comprising amino acid modifications when compared to the wild-type sequence SEQ ID NO1 or the homologous amino acid sequence of the terminal deoxynucleotidyl transferase (TdT) in other species, wherein the amino acids are modified at one or more of the following amino acids:

V32、A33、I34、F35、A53、V68、V71、E97、I101、M108、G109、A110、Q115、V116、S125、T137、Q143、M152、E153、N154、H155、N156、Q157、I158、I165、N169、N173、S175、E176、G177、P178、C179、L180、A181、F182、M183、R184、A185、L188、H194、A195、I196、S197、S198、S199、K200、E203、G204、D210、Q211、T212、K213、A214、I216、E217、D218、L220、Y222、V228、D230、Q238、T239、L242、L251、K260、G261、F262、H263、S264、L265、E267、Q269、A270、D271、N272、A273、H275、F276、T277、K278、M279、Q280、K281、S291、A292、A293、V294、C295、K296、E298、A299、Q300、A301、Q304、I305、T309、V310、R311、L312、I313、A314、I318、V319、T320、G328、K329、E330、C331、L338、T341、P342、E343、M344、G345、K346、W349、L350、L351、N352、R353、L354、I355、N356、R357、L358、Q359、N360、Q361、G362、I363、L364、L365、Y366、Y367、D368、I369、V370、K376、T377、C381、K383、D388、H389、F390、Q391、K392、F394、I397、K398、K400、K401、E402、L403、A404、A405、G406、R407、D411、A421、P422、P423、V424、D425、N426、F427、A430、R438、F447、A448、R449、H450、E451、R452、K453、M454、L455、L456、D457、N458、H459、A460、L461、Y462、D463、K464、T465、K466、K467、T474、D477、D485、Y486、I487、D488、P489。

specific amino acid changes may include any of the following: a53, V68, V71, E97, I101, G109, Q115, V116, S125, T137, Q143, M152, N154, H155, Q157, I158, I165, N169, N173, S175, G177, C179, L180, a181, M183, L188, a195, S197, S198, S199, K200, E203, G204, D210, Q211, T212, K213, a214, I216, E217, D218, L220, Y222, V228, D230, Q238, T239, L242, L251, K260, F262, L291, S271, E271, Q354, L312, K353, N312, N320, N181, M353, M344, N312, N320, G344, N320, G210, G211, G320, G210, T211, T213, T320, K304, K320, L320, K320, L320, K320, L320, K320, L18, L320, L18, L320, L181, L320, K353, L320, y367, D368, V370, K376, T377, C381, K383, H389, K392, F394, I397, K398, K400, K401, E402, L403, a405, G406, R407, D411, a421, P422, V424, N426, F427, a430, R438, F447, R452, L455, K453, Y462, K464, T465, K467, T474, D477, D485, I487, and/or D488.

Amino acid changes include any two or more of the following: a53, V68, V71, E97, I101, G109, Q115, V116, S125, T137, Q143, M152, N154, H155, Q157, I158, I165, N169, N173, S175, G177, C179, L180, a181, M183, L188, a195, S197, S198, S199, K200, E203, G204, D210, Q211, T212, K213, a214, I216, E217, D218, L220, Y222, V228, D230, Q238, T239, L242, L251, K260, F262, L291, S271, E271, Q354, L312, K353, N312, N320, N181, M353, M344, N312, N320, G344, N320, G210, G211, G320, G210, T211, T213, T320, K304, K320, L320, K320, L320, K320, L320, K320, L18, L320, L18, L320, L181, L320, K353, L320, y367, D368, V370, K376, T377, C381, K383, H389, K392, F394, I397, K398, K400, K401, E402, L403, a405, G406, R407, D411, a421, P422, V424, N426, F427, a430, R438, F447, R452, L455, K453, Y462, K464, T465, K467, T474, D477, D485, I487, and/or D488.

Modification of QADNA to KADKA, QADKA, KADNA, QADNS, KADNT or QADNT facilitates incorporation of 3 '-O-modified nucleoside triphosphates into the 3' -terminus of a nucleic acid and removes substrate bias during incorporation of modified nucleoside triphosphates. Modification of APPVDN to MCPVDN, mpvdn, ACPVDR, VPPVDN, LPPVDR, ACPYDN, LCPVDN or MAPVDN facilitates incorporation of 3 '-O-modified nucleoside triphosphates into the 3' -end of nucleic acids and removes substrate bias during incorporation of modified nucleoside triphosphates. FARHERKMLLDRHA to WARHERKMILDNHA, FARHERKMILDNHA, WARHERKMLLDNHA, FARHERKMLLDRHA or FARHEKKMLLDNHA also facilitate the incorporation of 3 '-O-modified nucleoside triphosphates at the 3' -end of the nucleic acid and remove substrate bias during incorporation of the modified nucleoside triphosphates.

The modification may be selected from one or more of the following sequences FRRA, QADKA, EADA, MPP, farherekmlldrha and YIPP. Included is a modified terminal deoxynucleotidyl transferase (TdT), wherein the second modification is selected from two or more of the following sequences FRRA, QADKA, EADA, MPP, farhererlldrha, and YIPP. A modified terminal deoxynucleotidyl transferase (TdT) enzyme is included, wherein the second modification comprises each of the following sequences FRRA, QADKA, EADA, MPP, FARHERPMLLLDRHA and YIPP.

The crystal structures shown herein show the following domains, which may be preferred as the domains to be modified:

FIG. 5A MENHNQI motif (152 to 158) comprising the ENHNQ motif

FIG. 6, SEGPCLAFMRA (175 to 185), the SEGPCLAFMRA (175 to 185) comprising the FMRA motif, highlighted in black.

FIG. 7. the HFTKMQK motif (275 to 281), which (275 to 281) comprises the MQK motif, highlighted in black.

Figure 8, SAAVCK motif (291-296), highlighted in black.

Figure 9.GKEC motif (328 to 331), the GKEC motif (328 to 331) comprising a KEC motif, highlighted in black.

Figure 10.DHFQK motif (388-392), the DHFQK motif (388-392) comprising DHFQ motif, highlighted in black.

Motif FARHERKMLLDNHA (447 to 460), highlighted in black spheres, fig. 11. Additionally, motif FARHERKMLLDNHA was mutated to FARHERKMLLDRHA.

Motif FARHERKMLLDNHALYDKTKK (447 to 467), highlighted as black spheres. Additionally, motif FARHERKMLLDNHALYDKTKK was mutated to FARHERKMLLDRHALYDKTKK。

Figure 13. the DYIDP motif (485 to 489), which DYIDP motif (485 to 489) comprises the YIDP motif, highlighted in black spheres.

Figure 14 YYDIV motif (366 to 370), highlighted in black spheres.

Figure 15.APPVDNF comprising an APPVDN motif (421 to 427) highlighted in black spheres. In this configuration, displayAPPVDNF motif mutated from Ala to MetMPPVDNF。

To facilitate purification of the expressed sequence, the amino acids may be further modified. For example, the amino acid sequence may comprise one or more additional histidine residues at the terminus. Included is a modified terminal deoxynucleotidyl transferase (TdT) comprising any one of SEQ ID NO 4 to SEQ ID NO 173 or a truncated form thereof. The sequences 4-173 are full-length sequences derived from spotted finless eels. Included is a modified terminal deoxynucleotidyl transferase (TdT) comprising any one of SEQ ID NO174 to SEQ ID NO 343. Sequences 174 to 343 are N-truncated sequences as sparrow eel/cow chimeras. Sequences 344 to 727 are truncated forms of spotted finless eel sequences. Additionally, for these sequences, there is an N-terminal sequence that is simply incorporated as a protease cleavage site (menlyfqg.).

Also disclosed is a method of nucleic acid synthesis, comprising the steps of:

(a) providing an initiator oligonucleotide;

(b) adding a 3' -blocked nucleoside triphosphate to the initiator oligonucleotide in the presence of a terminal deoxynucleotidyl transferase (TdT) as defined herein;

(c) removing all reagents from the initiator oligonucleotide;

(d) cleaving the blocking group in the presence of a cleaving agent; and is

(e) The cleavage agent is removed.

The method may add more than 1 nucleotide by repeating steps (b) to (e).

Reference herein to "nucleoside triphosphates" refers to molecules comprising a nucleoside (i.e., a base on a sugar molecule attached to a deoxyribose or ribose) bound to three phosphate groups. Examples of deoxyribonucleoside triphosphates comprising deoxyribose are: deoxyadenosine triphosphate (dATP), deoxyguanosine triphosphate (dGTP), deoxycytidine triphosphate (dCTP) or deoxythymidine triphosphate (dTTP). Examples of nucleoside triphosphates comprising ribose are: adenosine Triphosphate (ATP), Guanosine Triphosphate (GTP), Cytidine Triphosphate (CTP) or Uridine Triphosphate (UTP). Other types of nucleosides (such as naturally occurring modified and artificial nucleosides) can be combined with three phosphates to form nucleoside triphosphates.

Thus, reference herein to a "3 ' -blocked nucleoside triphosphate" refers to a nucleoside triphosphate having an additional group on the 3' end that prevents further addition of nucleotides, i.e., a nucleoside triphosphate by replacing the 3' -OH group with a protecting group (e.g., dATP, dGTP, dCTP, or dTTP).

It is to be understood that reference herein to a "3 '-blocking", "3' -blocking group" or "3 '-protecting group" refers to a group attached to the 3' end of a nucleoside triphosphate that prevents further nucleotide addition. The methods of the invention use reversible 3' -blocking groups that can be removed by cleavage to allow addition of additional nucleotides. In contrast, an irreversible 3 '-blocking group refers to a dNTP in which the 3' -OH group is neither exposed nor exposed by cleavage.

The 3' -blocked nucleoside 5 ' -triphosphate can be blocked by any chemical group that can be unmasked to reveal the 3' -OH. 3' -blocked nucleoside triphosphates can be substituted withSealing the lower part: 3 '-O-azidomethyl, 3' -aminooxy, 3'-O- (N-oxime) (3' -O-N ═ CR1R2Wherein R is1And R2Each being a C1-C3 alkyl group, e.g. CH3So that the oxime may be O-N ═ C (CH)3)2(N-acetoxime)), 3 '-O-allyl groups, 3' -O-cyanoethyl, 3 '-O-acetyl, 3' -O-nitrate, 3 '-phosphate, 3' -O-acetyllevulinate, 3 '-O-tert-butyldimethylsilane, 3' -O-trimethyl (silyl) ethoxymethyl, 3 '-O-nitrobenzyl and 3' -O-p-nitrobenzyl.

The 3 '-blocked nucleoside 5' -triphosphates can also be blocked by any chemical group that can be used directly for chemical ligation, such as copper-catalyzed or copper-free azide-alkyne and tetrazine-alkene click reactions. The 3' -blocked nucleoside triphosphates can include chemical moieties that include azides, alkynes, alkenes, and tetrazines.

Reference herein to a "cleavage agent" is to a substance capable of cleaving a3 '-blocking group from a 3' -blocked nucleoside triphosphate. In one embodiment, the lysing agent is a chemical lysing agent. In an alternative embodiment, the cleavage agent is an enzymatic cleavage agent. Cleavage may be accomplished in a single step, or may be a multi-step process, for example, the reaction of an oxime (such as, for example, 3'-O- (N-oxime), 3' -O-N ═ C (CH)3)2) Conversion to aminooxy (O-NH)2) Subsequently, the aminooxy group is cleaved to OH.

Those skilled in the art will appreciate that the choice of cleavage agent will depend on the type of 3' -nucleotide blocking group used. For example, tris (2-carboxyethyl) phosphine (TCEP) or tris (hydroxypropyl) phosphine (THPP) can be used to cleave a 3' -O-azidomethyl group, a palladium complex can be used to cleave a 3' -O-allyl group, or sodium nitrite can be used to cleave a 3' -aminooxy group. Thus, in one embodiment, the lysing agent is selected from: tris (2-carboxyethyl) phosphine (TCEP), palladium complexes or sodium nitrite.

In one embodiment, the lysing agent is added in the presence of a lysis solution comprising a denaturing agent such as urea, guanidinium chloride, formamide, or betaine. The addition of a denaturant has the advantage of being able to disrupt any undesirable secondary structure in the DNA. In further embodiments, the lysis solution comprises one or more buffers. Those skilled in the art will appreciate that the choice of buffer depends on the exact lysis chemistry and lysis agent desired.

Reference herein to an "initiator oligonucleotide" or "initiator sequence" refers to a short oligonucleotide having a free 3 '-end to which a 3' -blocked nucleoside triphosphate may be attached. In one embodiment, the initiator sequence is a DNA initiator sequence. In an alternative embodiment, the initiator sequence is an RNA initiator sequence.

Reference herein to a "DNA initiator sequence" is a short piece of DNA sequence to which a 3' -blocked nucleoside triphosphate can be attached (i.e., DNA will be synthesized from the end of the DNA initiator sequence).

In one embodiment, the initiator sequence is between 5 and 50 nucleotides in length, such as between 5 and 30 nucleotides in length (i.e. between 10 and 30 nucleotides in length), in particular between 5 and 20 nucleotides in length (i.e. about 20 nucleotides in length), more in particular between 5 and 15 nucleotides in length, for example between 10 and 15 nucleotides in length, in particular 12 nucleotides in length.

In one embodiment, the initiator sequence is single-stranded. In an alternative embodiment, the initiator sequence is double-stranded. It will be appreciated by those skilled in the art that the 3 '-overhang (i.e., free 3' -end) allows for efficient addition.

In one embodiment, the initiator sequence is immobilized on a solid support. This allows TdT and the lysing agent to be removed without washing away the synthesized nucleic acid (in steps (c) and (e), respectively). The initiator sequence may be attached to a solid support that is stable under aqueous conditions, such that the method can be readily performed via a flow device.

In one embodiment, the initiator sequence is immobilized on the solid support via a reversible interacting moiety such as a chemically cleavable linker, an antibody/immunogenic epitope, a biotin/biotin binding protein (such as avidin or streptavidin), or a glutathione-GST tag. Thus, in a further embodiment, the method additionally comprises extracting the resulting nucleic acid by removing the reversible interaction moiety in the initiator sequence, such as by incubation with proteinase K.

In one embodiment, the initiator sequence comprises an enzymatically recognizable base or sequence of bases. Bases recognized by enzymes, such as glycosylases, can be removed to produce base-free sites that can be cleaved by chemical or enzymatic means. The base sequence can be recognized and cleaved by restriction enzymes.

In a further embodiment, the initiator sequence is immobilized on the solid support by a chemically cleavable linker, such as a disulfide, allyl, or azide masked hemiaminal ether linker. Thus, in one embodiment, the method additionally comprises cleaving the chemical linker by adding: tris (2-carboxyethyl) phosphine (TCEP) or Dithiothreitol (DTT) for disulfide linkers; palladium complexes for allyl linkers; or TCEP for azide-masked hemiaminylether linkers.

In one embodiment, the resulting nucleic acids are extracted and amplified by polymerase chain reaction using the nucleic acids bound to a solid support as a template. Thus, the initiator sequence may comprise an appropriate forward primer sequence and an appropriate reverse primer that can be synthesized.

In one embodiment, the terminal deoxynucleotidyl transferase (TdT) of the present invention is added in the presence of an extension solution comprising one or more buffers (e.g., Tris or mehtylenearsenate), one or more salts (e.g., Na)+、K+、Mg2+、Mn2+、Cu2+、Zn2+、Co2+Etc., all with appropriate counter ions, such as CI) and inorganic pyrophosphatase (e.g., Saccharomyces cerevisiae homologs). It will be appreciated that the choice of buffer and salt will depend on optimal enzyme activity and stability. The use of inorganic pyrophosphatase helps to reduce the decomposition of the pyrophosphataseAccumulation of pyrophosphate due to hydrolysis of nucleoside triphosphate by TdT. Thus, the use of inorganic pyrophosphatase has the advantage of reducing the rates of (1) and (2): (1) reverse reaction and (2) TdT chain disproportionation.

In one embodiment, step (b) is carried out at a pH range between 5 and 10. Thus, it will be appreciated that any buffer having a buffer range of pH 5-10 may be used, for example, mehtylenearsenate, Tris, HEPES or Tricine, in particular mehtylenearsenate or Tris.

In one embodiment, step (d) is performed at a temperature of less than 99 ℃, such as less than 95 ℃, 90 ℃, 85 ℃, 80 ℃, 75 ℃, 70 ℃, 65 ℃, 60 ℃, 55 ℃, 50 ℃, 45 ℃, 40 ℃, 35 ℃ or 30 ℃. It will be appreciated that the optimum temperature will depend on the cracking agent used. The temperature used aids in cleaving and disrupting any secondary structure formed during nucleotide addition.

In one embodiment, steps (c) and (e) are performed by applying a wash solution. In one embodiment, the wash solution comprises the same buffers and salts as used in the extension solution described herein. This has the following advantages: allowing the wash solution to be collected after step (c) and recycled as an extension solution in step (b) when repeating the method steps.

Also disclosed is a kit comprising a terminal deoxynucleotidyl transferase (TdT) enzyme as defined herein in combination with an initiator sequence and one or more 3' -blocked nucleoside triphosphates.

The invention includes nucleic acid sequences for expressing modified terminal transferases. The invention includes codon optimized cDNA sequences expressing modified terminal transferases. Codon optimized cDNA sequences for each protein variant (SEQ ID NO 4-727) were included.

The nucleic acid sequence may be the sequence (ID 728) below:

the invention includes cell lines that produce modified terminal transferases.

Examples

Expression of TdT variants

Briefly, a plasmid containing a gene encoding terminal transferase was transformed into BL21 escherichia coli (e. Starting Luria Broth (LB) cultures were grown overnight at 37 ℃ and inoculated into LB expression cultures. Expression cultures were grown to an optical density of 0.6 at 600nm and induced by addition of IPTG to 1 mM. Cultures were induced at 25 ℃ and grown overnight. The next morning, cultures were lysed in detergent lysis buffer and purified to homogeneity by Immobilized Metal Affinity Chromatography (IMAC).

Determination of incorporation of reversible terminators by TdT variants

173 terminal transferases were expressed, purified, and compared to wild-type bovine TdT (SEQ ID NO 2). The purified engineered TdT was then used in the following assay: fluorescently labeled 15-nt ssDNA primers were incubated with 1 XTdT buffer (Thermo Fisher Scientific), yeast inorganic pyrophosphatase (Sigma-Aldrich,0.1 mU/. mu.l), 3 '-azidomethyl dTTP or 3' -aminooxy dATP, and engineered TdT (24. mu.g/. mu.l) for 10min at 37 ℃. The reaction was then quenched using formamide (Fisher Scientific), and the samples were loaded directly onto denaturing polyacrylamide gel electrophoresis and analyzed by denaturing polyacrylamide gel electrophoresis. The gel was imaged and the resulting gel bands quantified using a Typhoon scanner (GE).

The results from 173 TdT enzymes (SEQ ID NO 1-173) are shown in FIG. 1.

Determination of incorporation of base-modified reversible terminators by TdT variants

A describes 3' -aminooxy 6-azido 2 ' -deoxyadenosine 5 ' -triphosphate; c describes 3' -aminooxy 4-azido 5-methyl 2 ' -deoxycytidine 5 ' -triphosphate; g describes 3' -aminooxy 2-azido 2 ' -deoxyguanosine 5 ' -triphosphate; t describes 3' -aminooxy 5- (-hydroxy-1-butynyl) 2 ' -deoxyuridine 5 ' -triphosphate.

192 terminal deoxynucleotidyl transferase (TdT) variants were expressed and purified as described above. The expressed variants have SEQ ID NO 345, SEQ ID NO 347, SEQ ID NO 352, SEQ ID NO 357, SEQ ID NO 359, SEQ ID NO 360, SEQ ID NO 361, SEQ ID NO 362, SEQ ID NO 364, SEQ ID NO 365, SEQ ID NO 366, SEQ ID NO 367, SEQ ID NO 368, SEQ ID NO 370, SEQ ID NO 371, SEQ ID NO 372, SEQ ID NO 375, SEQ ID NO 376, SEQ ID NO 377, SEQ ID NO 378, SEQ ID NO 380, SEQ ID NO 382, SEQ ID NO 383, SEQ ID NO 384, SEQ ID NO 385, SEQ ID NO 387, SEQ ID NO 388, SEQ ID NO 392, SEQ ID NO 393, SEQ ID NO 394, SEQ ID NO 395, SEQ ID NO 397, SEQ ID NO 393, SEQ ID NO 395, SEQ ID NO, SEQ ID NO 398, SEQ ID NO 399, SEQ ID NO 401, SEQ ID NO 405, SEQ ID NO 406, SEQ ID NO 410, SEQ ID NO 411, SEQ ID NO 416, SEQ ID NO 418, SEQ ID NO 422, SEQ ID NO 426, SEQ ID NO 427, SEQ ID NO 430, SEQ ID NO 433, SEQ ID NO 436, SEQ ID NO 439, SEQ ID NO 440, SEQ ID NO 442, SEQ ID NO 444, SEQ ID NO 445, SEQ ID NO 446, SEQ ID NO 447, SEQ ID NO 450, SEQ ID NO 453, SEQ ID NO 454, SEQ ID NO 455, SEQ ID NO 457, SEQ ID NO 460, SEQ ID NO 461, SEQ ID NO 462, SEQ ID NO 463, SEQ ID NO 464, SEQ ID NO 467, SEQ ID NO 447, SEQ ID NO 460, SEQ ID NO 461, SEQ ID NO 462, SEQ ID NO 463, SEQ ID NO, SEQ ID NO 472, SEQ ID NO 473, SEQ ID NO 475, SEQ ID NO 476, SEQ ID NO 477, SEQ ID NO 478, SEQ ID NO 479, SEQ ID NO 480, SEQ ID NO 485, SEQ ID NO 486, SEQ ID NO 487, SEQ ID NO 489, SEQ ID NO 492, SEQ ID NO 494, SEQ ID NO 495, SEQ ID NO 497, SEQ ID NO 499, SEQ ID NO 500, SEQ ID NO 503, SEQ ID NO 505, SEQ ID NO 506, SEQ ID NO 507, SEQ ID NO 509, SEQ ID NO 510, SEQ ID NO 514, SEQ ID NO 516, SEQ ID NO 517, SEQ ID NO 519, SEQ ID NO 524, SEQ ID NO 525, SEQ ID NO 526, SEQ ID NO 527, SEQ ID NO 528, SEQ ID NO, SEQ ID NO. 529, SEQ ID NO. 531, SEQ ID NO. 532, SEQ ID NO. 533, SEQ ID NO. 535, SEQ ID NO. 543, SEQ ID NO. 544, SEQ ID NO. 546, SEQ ID NO. 550, SEQ ID NO. 553, SEQ ID NO. 555, SEQ ID NO. 557, SEQ ID NO. 559, SEQ ID NO. 560, SEQ ID NO. 561, SEQ ID NO. 562, SEQ ID NO. 564, SEQ ID NO. 565, SEQ ID NO. 567, SEQ ID NO. 568, SEQ ID NO. 570, SEQ ID NO. 572, SEQ ID NO. 573, SEQ ID NO. 575, SEQ ID NO. 580, SEQ ID NO. 582, SEQ ID NO. 584, SEQ ID NO. 589, SEQ ID NO. 593, SEQ ID NO. 595, SEQ ID NO. 598, SEQ ID NO. 599, SEQ ID NO. 600, 601, 604, 605, 606, 609, 611, 612, 614, 618, 619, 620, 623, 629, 637, 638, 639, 641, 643, 644, 646, 648, 649, 651, 652, 654, 657, 658, 660, 661, 662, 664, 665, 664, 665, 652, 666, 657, 658, 660, 661, 662, 664, 665, 666, 664, 667 SEQ ID NO, 670 SEQ ID NO, 673 SEQ ID NO, 678 SEQ ID NO, 679 SEQ ID NO, 681 SEQ ID NO, 684 SEQ ID NO, 685 SEQ ID NO, 687 SEQ ID NO, 690 SEQ ID NO, 692 SEQ ID NO, 698 SEQ ID NO, 699 SEQ ID NO, 700 SEQ ID NO, 703 SEQ ID NO, 706 SEQ ID NO, 707 SEQ ID NO, 708 SEQ ID NO, 711 SEQ ID NO, 712 SEQ ID NO, 715 SEQ ID NO, 716 SEQ ID NO, 717 SEQ ID NO, 718 SEQ ID NO, 720 SEQ ID NO, 721 SEQ ID NO, 725 SEQ ID NO.

In single nucleotide incorporation assays, base-modified reversible terminators a, C, G, and T are provided as substrates to the engineered variants. A pool of DNA initiators with degenerate ends (… NNN, where N A, C, G, T) was immobilized to a solid support and exposed to a Nucleotide Addition Mixture (NAM) solution comprising (1) a TdT variant, (2) a neutral pH buffer, (3) a monovalent salt, (4) cobalt chloride, and (5) 3' -ONH2-dXTP, wherein X is a modified nucleobase yielding A, C, G or T. The incubation temperature was 37 ℃ and the reaction time was 2.5 minutes. The solid support is then washed with a high salt solution at neutral pH followed by a low salt solution at neutral pH. The starter pool was converted to a sequencing library and sorted by Next Generation Sequencing (NGS) on an Illumina NextSeq500 in pairs of 30 cycle-end reads (PE30 reads)And (6) analyzing. The bcl file was converted to a fastq file using the Illumina's bcl2fastq conversion software and analyzed in R. The incorporation efficiency of the addition of each modified base ([ reads containing N +1 products ] was calculated against all possible initiator backgrounds]/{ [ reads containing N initiators]Read containing N +1 products]). The average incorporation efficiency across all backgrounds is shown in figure 3 (upper panel). Wild-type bovine and wild-type spotted finless eel TdT activity is indicated by dotted lines due to its association with single nucleotide incorporation of base-modified reversible termination nucleotides a, C, G and T. Since in our assay wild-type bovine or wild-type spotted finless eel TdT was unable to incorporate these base-modified reversible terminator nucleotides, the dotted line appears at y ═ 0. Mutations in all of the presented TdT variants resulted in improved TdT-modified base incorporation relative to wild-type bovine and wild-type spotted finless eel TdT.

Determination of multicyclability of TdT variants

Using 3' -ONH2Nucleoside 5' -triphosphates with terminal deoxynucleotidyl transferase (TdT) SEQ ID NO1&344-727 was performed on a solid support for the 8-nt nucleic acid sequence. TdT variants were expressed and purified as described above. The DNA sequence 5 ' -ATCGATCG-3 ' is synthesized by repeatedly exposing a solid support-bound DNA initiator to a Nucleotide Addition Mixture (NAM) solution comprising (1) TdT, (2) a neutral pH buffer, (3) a monovalent salt, (4) cobalt chloride, and (5) 3' -ONH2-dntps, wherein N is selected from adenine, thymine, cytosine or guanine. One cycle consists of: (A) incubating a NAM solution with the indicated A, T, C or G reversible terminator nucleotide on a solid support for 5 minutes at 37 ℃; (B) the solid support is then washed with a high salt solution at neutral pH; (C) the solid support is then exposed to an aqueous acidic sodium nitrite solution; and (D) the solid support is then washed with the same high salt solution at neutral pH as (B). Then (A) - (D) were repeated 7 more times to synthesize the desired 8-nt sequence. The synthesized DNA was analyzed by running the reaction on a denaturing polyacrylamide gel and quantified by means of a fluorophore covalently attached to the DNA initiator. full-Length fraction (8-nt species) by taking 8-nthe fluorescence intensity of the t-band was determined and divided by the total lane intensity. Wild-type bovine and wild-type spotted finless eel TdT activity, when correlated with the full-length fraction, is represented by the dashed line at y ═ 0, indicating that they are not able to synthesize any 8-nt product. All mutations contained in the TdT variant resulted in an improvement in TdT relative to wild-type bovine and wild-type spotted finless eel TdT. The results are shown in FIG. 3 (lower inset).

47页详细技术资料下载
上一篇:一种医用注射器针头装配设备
下一篇:通过使用角质酶改善关于聚酯纺织品的护理特性

网友询问留言

已有0条留言

还没有人留言评论。精彩留言会获得点赞!

精彩留言,会给你点赞!