Enzymes with RUVC domains

文档序号:1909397 发布日期:2021-11-30 浏览:21次 中文

阅读说明:本技术 具有ruvc结构域的酶 (Enzymes with RUVC domains ) 是由 布莱恩·托马斯 克利斯多佛·布朗 罗斯·坎托尔 奥德拉·德沃托 克里斯蒂娜·布特弗尔德 利 于 2020-02-14 设计创作,主要内容包括:本公开提供具有区别性结构域特征的核酸内切酶,以及使用此类酶或其变体的方法。(The present disclosure provides endonucleases having discriminatory domain characteristics, as well as methods of using such enzymes or variants thereof.)

1. An engineered nuclease system comprising:

(a) An endonuclease comprising a RuvC III domain and an HNH domain, wherein the endonuclease is derived from an uncultured microorganism, wherein the endonuclease is a class 2 type II Cas endonuclease; and

(b) an engineered guide ribonucleic acid structure configured to form a complex with the endonuclease, the complex comprising:

(i) a guide ribonucleic acid sequence configured to hybridize to a target deoxyribonucleic acid sequence; and

(ii) a tracr ribonucleic acid sequence configured to bind to the endonuclease.

2. The engineered nuclease system of claim 1, wherein the RuvC _ III domain comprises a sequence having at least 70%, at least 75%, at least 80%, or at least 90% sequence identity to any one of SEQ ID NO 1827-3637.

3. An engineered nuclease system comprising:

(a) an endonuclease comprising a RuvC _ III domain having at least 75% sequence identity to any one of SEQ ID NO 1827-3637; and

(b) an engineered guide ribonucleic acid structure configured to form a complex with the endonuclease, the complex comprising:

(i) a guide ribonucleic acid sequence configured to hybridize to a target deoxyribonucleic acid sequence; and

(ii) A tracr ribonucleic acid sequence configured to bind to the endonuclease.

4. An engineered nuclease system comprising:

(a) an endonuclease configured to bind to a Protospacer Adjacent Motif (PAM) sequence comprising SEQ ID NO 5512-5537, wherein the endonuclease is a type 2 type II Cas endonuclease; and

(b) an engineered guide ribonucleic acid structure configured to form a complex with the endonuclease, the complex comprising:

(i) a guide ribonucleic acid sequence configured to hybridize to a target deoxyribonucleic acid sequence; and

(ii) a tracr ribonucleic acid sequence configured to bind to the endonuclease.

5. The engineered nuclease system of claim 4, wherein the endonuclease is derived from an uncultured microorganism.

6. The engineered nuclease system of any of claims 4-5, wherein the endonuclease is not engineered to bind different PAM sequences.

7. The engineered endonuclease system of claim 4, wherein the endonuclease is not a Cas9 endonuclease, a Cas14 endonuclease, a Cas12a endonuclease, a Cas12b endonuclease, a Cas12c endonuclease, a Cas12d endonuclease, a Cas12e endonuclease, a Cas13a endonuclease, a Cas13b endonuclease, a Cas13c endonuclease, or a Cas13d endonuclease.

8. The engineered nuclease system of claim 4, wherein the endonuclease has less than 80% identity to a Cas9 endonuclease.

9. The engineered nuclease system of any of claims 3-8, wherein the endonuclease further comprises an HNH domain.

10. The engineered nuclease system of any of claims 1-9, wherein the tracr ribonucleic acid sequence comprises a sequence having at least 80% sequence identity to about 60 to 90 consecutive nucleotides selected from any of SEQ ID NO 5476-5511 and SEQ ID NO 5538.

11. An engineered nuclease system comprising:

(a) an engineered guide ribonucleic acid structure comprising:

(i) a guide ribonucleic acid sequence configured to hybridize to a target deoxyribonucleic acid sequence; and

(ii) a tracr ribonucleic acid sequence configured to bind to an endonuclease,

wherein the tracr ribonucleic acid sequence comprises a sequence having at least 80% sequence identity to about 60 to 90 consecutive nucleotides selected from any one of SEQ ID NO 5476-5511 and SEQ ID NO 5538; and

(b) a class 2, type II Cas endonuclease configured to bind to the engineered guide ribonucleic acid.

12. The engineered nuclease system of any of claims 1-3 or 11, wherein the endonuclease is configured to bind to a Protospacer Adjacent Motif (PAM) sequence selected from the group comprising SEQ ID NO:5512 and 5537.

13. The engineered nuclease system of any of claims 1-11, wherein the engineered guide ribonucleic acid structure comprises at least two ribonucleic acid polynucleotides.

14. The engineered nuclease system of any of claims 1-11, wherein the engineered guide ribonucleic acid structure comprises a ribonucleic acid polynucleotide comprising the guide ribonucleic acid sequence and the tracr ribonucleic acid sequence.

15. The engineered nuclease system of any of claims 1-14, wherein the guide ribonucleic acid sequence is complementary to a prokaryotic, bacterial, archaeal, eukaryotic, fungal, plant, mammalian or human genomic sequence.

16. The engineered nuclease system of any of claims 1-15, wherein the guide ribonucleic acid sequence is 15-24 nucleotides in length.

17. The engineered nuclease system of any of claims 1-16, wherein the endonuclease comprises one or more Nuclear Localization Sequences (NLS) proximal to the N-terminus or C-terminus of the endonuclease.

18. The engineered nuclease system of any of claims 1-17, wherein the NLS comprises a sequence selected from SEQ ID NO 5597-5612.

19. The engineered nuclease system of any of claims 1-18, further comprising a single-stranded or double-stranded DNA repair template comprising, from 5 'to 3': a first homology arm comprising a sequence of at least 20 nucleotides 5 'to the target deoxyribonucleic acid sequence, a synthetic DNA sequence of at least 10 nucleotides, and a second homology arm comprising a sequence of at least 20 nucleotides 3' to the target sequence.

20. The engineered nuclease system of claim 19, wherein the first homology arm or the second homology arm comprises a sequence of at least 40, 80, 120, 150, 200, 300, 500, or 1,000 nucleotides.

21. The engineered nuclease system of any of claims 1-20, wherein the system further comprises Mg2+A source.

22. The engineered nuclease system of any of claims 1-21, wherein the endonuclease and the tracr ribonucleic acid sequence are derived from different bacterial species within the same gate.

23. The engineered nuclease system of any of claims 1-22, wherein the endonuclease is derived from a bacterium belonging to the genus of dermatophyte.

24. The engineered nuclease system of any of claims 1-22, wherein the endonuclease is derived from a bacterium belonging to the phylum verrucomicrobia, temporal allotriomycota, or temporal black narcissus.

25. The engineered nuclease system of any of claims 1-22, wherein the endonuclease is derived from a bacterium comprising a 16S rRNA gene that is at least 90% identical to any of SEQ ID NO 5592 and 5595.

26. The engineered nuclease system of any of claims 1-25, wherein the HNH domain comprises a sequence having at least 70% or at least 80% identity to any of SEQ ID NO 5638-5460.

27. The engineered nuclease system of any of claims 1-26, wherein the endonuclease comprises SEQ ID NOs 1-1826 or variants having at least 55% identity thereto.

28. The engineered nuclease system of any of claims 1-27, wherein the endonuclease comprises a sequence that is at least 70%, 80%, or 90% identical to a sequence selected from the group consisting of SEQ ID NOs 1827-1830 and SEQ ID NOs 1827-2140.

29. The engineered nuclease system of any of claims 1-28, wherein the endonuclease comprises a sequence that is at least 70%, 80%, or 90% identical to a sequence selected from the group consisting of SEQ ID NOs 3638 and 3641 or SEQ ID NOs 3638 and 3954.

30. The engineered nuclease system of any of claims 1-29, wherein the endonuclease comprises at least 1, at least 2, at least 3, at least 4, or at least 5 peptide motifs selected from the group consisting of SEQ ID NOS 5615-5632.

31. The engineered nuclease system of any of claims 1-30, wherein the endonuclease comprises a sequence that is at least 70%, 80%, or 90% identical to a sequence selected from SEQ ID NOs 1-4 or SEQ ID NOs 1-319.

32. The engineered nuclease system of any of claims 1-31, wherein the guide RNA structure comprises a sequence that is at least 70%, 80% or 90% identical to a sequence selected from the group consisting of SEQ ID NO 5461-5464, SEQ ID NO 5476-5479, or SEQ ID NO 5476-5489.

33. The engineered nuclease system of any of claims 1-32, wherein the guide RNA structure comprises an RNA sequence predicted to comprise a hairpin consisting of a stem and a loop, the stem comprising at least 10, at least 12, or at least 14 base-paired ribonucleotides, and an asymmetric bulge within 4 base pairs of the loop.

34. The engineered nuclease system of any of claims 1-33, wherein the endonuclease is configured to bind to a PAM comprising a sequence selected from the group consisting of SEQ ID NO:5512-5515 or SEQ ID NO: 5527-5530.

35. The engineered nuclease system of any one of claims 1-34, wherein:

a) the endonuclease comprises a sequence that is at least 70%, at least 80%, or at least 90% identical to SEQ ID NO 1827;

b) the guide RNA structure comprises a sequence that is at least 70%, at least 80%, or at least 90% identical to at least one of SEQ ID NO 5461 or SEQ ID NO 5476; and

c) the endonuclease is configured to bind to a PAM comprising SEQ ID NO:5512 or SEQ ID NO: 5527.

36. The engineered nuclease system of any one of claims 1-34, wherein:

a) the endonuclease comprises a sequence that is at least 70%, at least 80%, or at least 90% identical to SEQ ID NO 1828;

b) the guide RNA structure comprises a sequence that is at least 70%, at least 80%, or at least 90% identical to at least one of SEQ ID NO 5462 or SEQ ID NO 5477; and

c) the endonuclease is configured to bind to a PAM comprising SEQ ID NO:5513 or SEQ ID NO: 5528.

37. The engineered nuclease system of any one of claims 1-34, wherein:

a) the endonuclease comprises a sequence that is at least 70%, at least 80%, or at least 90% identical to SEQ ID NO 1829;

b) The guide RNA structure comprises a sequence that is at least 70%, at least 80%, or at least 90% identical to at least one of SEQ ID NO 5463 or SEQ ID NO 5478; and

c) the endonuclease is configured to bind to a PAM comprising SEQ ID NO:5514 or SEQ ID NO: 5529.

38. The engineered nuclease system of any one of claims 1-34, wherein:

a) the endonuclease comprises a sequence that is at least 70%, at least 80%, or at least 90% identical to SEQ ID NO: 1830;

b) the guide RNA structure comprises a sequence that is at least 70%, at least 80%, or at least 90% identical to at least one of SEQ ID NO 5464 or SEQ ID NO 5479; and

c) the endonuclease is configured to bind to a PAM comprising SEQ ID NO:5515 or SEQ ID NO: 5530.

39. The engineered nuclease system of any of claims 1-27, wherein the endonuclease comprises a sequence that is at least 70%, 80%, or 90% identical to a sequence selected from the group consisting of SEQ ID NO:2141-2142, or SEQ ID NO: 2141-2241.

40. The engineered nuclease system of any of claims 1-27 or claim 39, wherein the endonuclease comprises a sequence that is at least 70%, 80% or 90% identical to a sequence selected from the group consisting of SEQ ID NO 3955-3956 or SEQ ID NO 3955-4055.

41. The engineered nuclease system of any of claims 1-27 or claims 39-40, wherein the endonuclease comprises at least 1, at least 2, at least 3, at least 4, or at least 5 peptide motifs selected from the group consisting of SEQ ID NO 5632-5638.

42. The engineered nuclease system of any of claims 1-27 or claims 39-41, wherein the endonuclease comprises a sequence that is at least 70%, 80% or 90% identical to a sequence selected from the group consisting of SEQ ID NO 320-321 or SEQ ID NO 320-420.

43. The engineered nuclease system of any of claims 1-27 or claims 39-42, wherein the guide RNA structure comprises a sequence that is at least 70%, 80%, or 90% identical to a sequence selected from: 5465, 5490-5491 or 5490-5494 parts of SEQ ID NO.

44. The engineered nuclease system of any of claims 1-27 or claims 39-43, wherein the guide RNA structure comprises a tracr ribonucleic acid sequence comprising a hairpin comprising at least 8, at least 10, or at least 12 base-paired ribonucleotides.

45. The engineered nuclease system of any of claims 1-27 or claims 39-44, wherein the endonuclease is configured to bind to a PAM comprising a sequence selected from SEQ ID NO:5516 and SEQ ID NO: 5531.

46. The engineered nuclease system of any of claims 1-27 or claims 39-45, wherein

a) The endonuclease comprises a sequence that is at least 70%, 80%, or 90% identical to SEQ ID NO 2141;

b) the guide RNA structure comprises a sequence that is at least 70%, 80%, or 90% identical to SEQ ID NO 5490; and

c) the endonuclease is configured to bind to a PAM comprising SEQ ID NO: 5531.

47. The engineered nuclease system of any of claims 1-27 or claims 39-45, wherein

a) The endonuclease comprises a sequence that is at least 70%, 80%, or 90% identical to SEQ ID NO: 2142;

b) the guide RNA structure comprises a sequence that is at least 70%, 80%, or 90% identical to SEQ ID NO 5465 or SEQ ID NO 5491; and

c) the endonuclease is configured to bind to a PAM comprising SEQ ID NO: 5516.

48. The engineered nuclease system of any of claims 1-27, wherein the endonuclease comprises a sequence that is at least 70%, 80%, or 90% identical to a sequence selected from the group consisting of SEQ ID NO 2245 and 2246.

49. The engineered nuclease system of any of claims 1-27 or claim 48, wherein the endonuclease comprises a sequence that is at least 70%, 80%, or 90% identical to a sequence selected from the group consisting of SEQ ID NO 4059 and 4060.

50. The engineered nuclease system of any of claims 1-27 or claims 48-49, wherein the endonuclease comprises at least 1, at least 2, at least 3, at least 4, or at least 5 peptide motifs selected from the group consisting of SEQ ID NO 5639-5648.

51. The engineered nuclease system of any of claims 1-27 or claims 48-50, wherein the endonuclease comprises a sequence that is at least 70%, 80%, or 90% identical to a sequence selected from the group consisting of SEQ ID NO 424-425.

52. The engineered nuclease system of any of claims 1-27 or claims 48-51, wherein the guide RNA structure comprises a sequence that is at least 70%, 80% or 90% identical to a sequence selected from the group consisting of SEQ ID NO 5498 5499 and SEQ ID NO 5539.

53. The engineered nuclease system of any of claims 1-27 or claims 48-52, wherein the guide RNA structure comprises a guide ribonucleic acid sequence predicted to comprise a hairpin with an uninterrupted base-pairing region, the guide ribonucleic acid structure comprising a guide ribonucleic acid sequence of at least 8 nucleotides and a tracr ribonucleic acid sequence of at least 8 nucleotides, and wherein the tracr ribonucleic acid sequence comprises, from 5 'to 3', a first hairpin and a second hairpin, wherein the first hairpin has a longer stem than the second hairpin.

54. The engineered nuclease system of any of claims 1-27, wherein the endonuclease comprises a sequence that is at least 70%, 80%, or 90% identical to a sequence selected from the group consisting of SEQ ID NO 2242-2244 or SEQ ID NO 2247-2249.

55. The engineered nuclease system of any of claims 1-27 or claim 54, wherein the endonuclease comprises a sequence that is at least 70%, 80% or 90% identical to a sequence selected from the group consisting of SEQ ID NO 4056, 4058 and SEQ ID NO 4061, 4063.

56. The engineered nuclease system of any of claims 1-27 or 54-55, wherein the endonuclease comprises at least 1, at least 2, at least 3, at least 4, or at least 5 peptide motifs selected from the group consisting of SEQ ID NO 5639-5648.

57. The engineered nuclease system of any of claims 1-27 or claims 54-56, wherein the endonuclease comprises a sequence that is at least 70%, 80% or 90% identical to a sequence selected from the group consisting of SEQ ID NO 421 and 423 or SEQ ID NO 426 and 428.

58. The engineered nuclease system of any of claims 1-27 or claims 54-57, wherein the guide RNA structure comprises a sequence that is at least 70%, 80%, or 90% identical to a sequence selected from: 5466-5467 SEQ ID NO, 5495-5497 SEQ ID NO, 5500-5502 SEQ ID NO and 5539 SEQ ID NO.

59. The engineered nuclease system of any of claims 1-27 or claims 54-58, wherein the guide RNA structure comprises a guide ribonucleic acid sequence predicted to comprise a hairpin with an uninterrupted base-pairing region, the guide ribonucleic acid structure comprising a guide ribonucleic acid sequence of at least 8 nucleotides and a tracr ribonucleic acid sequence of at least 8 nucleotides, and wherein the tracr ribonucleic acid sequence comprises, from 5 'to 3', a first hairpin and a second hairpin, wherein the first hairpin has a longer stem than the second hairpin.

60. The engineered nuclease system of any of claims 1-27 or 54-59, wherein the endonuclease is configured to bind to a PAM comprising a sequence selected from SEQ ID NO 5517-5518 or SEQ ID NO 5532-5534.

61. The engineered nuclease system of any of claims 1-27 or claims 54-60, wherein:

a) the endonuclease comprises a sequence that is at least 70%, 80%, or 90% identical to SEQ ID NO 2247;

b) the guide RNA structure comprises a sequence that is at least 70%, 80%, or 90% identical to SEQ ID NO 5500; and

c) the endonuclease is configured to bind to a PAM comprising SEQ ID NO:5517 or SEQ ID NO: 5532.

62. The engineered nuclease system of any of claims 1-27 or claims 54-60, wherein:

a) the endonuclease comprises a sequence that is at least 70%, 80%, or 90% identical to SEQ ID NO 2248;

b) the guide RNA structure comprises a sequence that is at least 70%, 80%, or 90% identical to SEQ ID NO 5501; and

c) the endonuclease is configured to bind to a PAM comprising SEQ ID NO:5518 or SEQ ID NO: 5533.

63. The engineered nuclease system of any of claims 1-27 or claims 54-60, wherein:

a) the endonuclease comprises a sequence that is at least 70%, 80%, or 90% identical to SEQ ID NO 2249;

b) the guide RNA structure comprises a sequence that is at least 70%, 80%, or 90% identical to SEQ ID NO 5502; and

c) the endonuclease is configured to bind to a PAM comprising SEQ ID NO: 5534.

64. The engineered nuclease system of any of claims 1-27, wherein the endonuclease comprises a sequence that is at least 70%, 80%, or 90% identical to a sequence selected from the group consisting of SEQ ID NO 2253 or SEQ ID NO 2253-2481.

65. The engineered nuclease system of any of claims 1-27 or claim 64, wherein the endonuclease comprises a sequence that is at least 70%, 80% or 90% identical to a sequence selected from the group consisting of SEQ ID NO 4067 or SEQ ID NO 4067-4295.

66. The engineered nuclease system of any of claims 1-27 or claims 64-65, wherein the endonuclease comprises a peptide motif according to SEQ ID NO 5649.

67. The engineered nuclease system of any of claims 1-27 or claims 64-66, wherein the endonuclease comprises a sequence that is at least 70%, 80%, or 90% identical to a sequence selected from the group consisting of SEQ ID NO 432 or SEQ ID NO 432-660.

68. The engineered nuclease system of any of claims 1-27 or claims 64-67, wherein the guide RNA structure comprises a sequence that is at least 70%, 80%, or 90% identical to a sequence selected from SEQ ID NO 5468 or SEQ ID NO 5503.

69. The engineered nuclease system of any of claims 1-27 or claims 64-68, wherein the endonuclease is configured to bind to a PAM comprising a sequence selected from SEQ ID NO 5519.

70. The engineered nuclease system of any of claims 1-27 or claims 64-69, wherein:

a) the endonuclease comprises a sequence that is at least 70%, 80%, or 90% identical to SEQ ID NO 2253;

b) the guide RNA structure comprises a sequence that is at least 70%, 80%, or 90% identical to SEQ ID NO 5468 or SEQ ID NO 5503; and

c) The endonuclease is configured to bind to a PAM comprising SEQ ID NO: 5519.

71. The engineered nuclease system of any of claims 1-27, wherein the endonuclease comprises a sequence that is at least 70%, 80%, or 90% identical to a sequence selected from the group consisting of SEQ ID NO 2482 and 2489.

72. The engineered nuclease system of any of claims 1-27 or claim 71, wherein the endonuclease comprises a sequence that is at least 70%, 80%, or 90% identical to a sequence selected from the group consisting of SEQ ID NO 4296-.

73. The engineered nuclease system of any of claims 1-27 or 71-72, wherein the endonuclease comprises a sequence that is at least 70%, 80%, or 90% identical to a sequence selected from the group consisting of SEQ ID NO 661-668.

74. The engineered nuclease system of any of claims 1-27, wherein the endonuclease comprises a sequence that is at least 70%, 80%, or 90% identical to a sequence selected from the group consisting of SEQ ID NO 2490 and 2498.

75. The engineered nuclease system of any of claims 1-27 or claim 74, wherein the endonuclease comprises a sequence that is at least 70%, 80%, or 90% identical to a sequence selected from the group consisting of SEQ ID NO 4304-4312.

76. The engineered nuclease system of any of claims 1-27 or claims 74-75, wherein the endonuclease comprises a sequence that is at least 70%, 80%, or 90% identical to a sequence selected from the group consisting of SEQ ID NO 669 and 677.

77. The engineered nuclease system of any of claims 1-27 or claims 74-76, wherein the guide RNA structure comprises a sequence that is at least 70%, 80%, or 90% identical to a sequence selected from SEQ ID NO 5504.

78. The engineered nuclease system of any of claims 1-27, wherein the endonuclease comprises a sequence that is at least 70%, 80%, or 90% identical to a sequence selected from the group consisting of SEQ ID NO:2499 or SEQ ID NO:2499 and 2750.

79. The engineered nuclease system of any of claims 1-27 or claim 78, wherein the endonuclease comprises a sequence that is at least 70%, 80%, or 90% identical to a sequence selected from the group consisting of SEQ ID NO 4313 or SEQ ID NO 4313-4564.

80. The engineered nuclease system of any of claims 1-27 or claims 78-79, wherein the endonuclease comprises at least 1, at least 2, at least 3, at least 4, or at least 5 peptide motifs selected from the group consisting of SEQ ID NO 5650-5667.

81. The engineered nuclease system of any of claims 1-27 or claims 78-80, wherein the endonuclease comprises a sequence that is at least 70%, 80%, or 90% identical to a sequence selected from the group consisting of SEQ ID NO 678 or SEQ ID NO 678-929.

82. The engineered nuclease system of any of claims 1-27 or claims 78-81, wherein the guide RNA structure comprises a sequence that is at least 70%, 80%, or 90% identical to SEQ ID NO 5469 or SEQ ID NO 5505.

83. The engineered nuclease system of any of claims 1-27 or claims 78-82, wherein the endonuclease is configured to bind to a PAM comprising SEQ ID NO:5520 or SEQ ID NO: 5535.

84. The engineered nuclease system of any of claims 1-27 or claims 78-83, wherein:

a) the endonuclease comprises a sequence that is at least 70%, 80%, or 90% identical to SEQ ID NO 2499;

b) the guide RNA structure comprises a sequence that is at least 70%, 80%, or 90% identical to SEQ ID NO 5469 or SEQ ID NO 5505; and

c) the endonuclease is configured to bind to a PAM comprising SEQ ID NO:5520 or SEQ ID NO: 5535.

85. The engineered nuclease system of any of claims 1-27, wherein the endonuclease comprises a sequence that is at least 70%, 80%, or 90% identical to a sequence selected from the group consisting of SEQ ID NO 2751 or SEQ ID NO 2751-2913.

86. The engineered nuclease system of any of claims 1-27 or claim 85, wherein the endonuclease comprises a sequence that is at least 70%, 80%, or 90% identical to a sequence selected from SEQ ID NO:4565 or SEQ ID NO: 4565-4727.

87. The engineered nuclease system of any of claims 1-27 or claims 85-86, wherein the endonuclease comprises at least 1, at least 2, at least 3, at least 4, or at least 5 peptide motifs selected from the group consisting of SEQ ID NO 5668 and 5678.

88. The engineered nuclease system of any of claims 1-27 or claims 85-87, wherein the endonuclease comprises a sequence that is at least 70%, 80% or 90% identical to a sequence selected from the group consisting of SEQ ID NO 930 and SEQ ID NO 930-1092.

89. The engineered nuclease system of any of claims 1-27 or claims 85-88, wherein the guide RNA structure comprises a sequence that is at least 70%, 80%, or 90% identical to SEQ ID NO 5470 or SEQ ID NO 5506.

90. The engineered nuclease system of any of claims 1-27 or claims 85-89, wherein the endonuclease is configured to bind to a PAM comprising a sequence selected from SEQ ID NO:5521 or SEQ ID NO: 5536.

91. The engineered nuclease system of any of claims 1-27 or claims 85-90, wherein:

a) the endonuclease comprises a sequence that is at least 70%, 80%, or 90% identical to SEQ ID NO: 2751;

b) the guide RNA structure comprises a sequence that is at least 70%, 80%, or 90% identical to SEQ ID NO 5470 or SEQ ID NO 5506; and

c) the endonuclease is configured to bind to a PAM comprising SEQ ID NO:5521 or SEQ ID NO: 5536.

92. The engineered nuclease system of any of claims 1-27, wherein the endonuclease comprises a sequence that is at least 70%, 80%, or 90% identical to a sequence selected from the group consisting of SEQ ID NO 2914 and 3174.

93. The engineered nuclease system of any of claims 1-27 or claim 92, wherein the endonuclease comprises a sequence that is at least 70%, 80%, or 90% identical to a sequence selected from the group consisting of SEQ ID NO 4728 or SEQ ID NO 4728-4988.

94. The engineered nuclease system of any of claims 1-27 or claims 92-93, wherein the endonuclease comprises at least 1, at least 2, or at least 3 peptide motifs selected from the group consisting of SEQ ID NOs 5676-5678.

95. The engineered nuclease system of any of claims 1-27 or claims 92-94, wherein the endonuclease comprises a sequence that is at least 70%, 80% or 90% identical to a sequence selected from the group consisting of SEQ ID NO 1093 or SEQ ID NO 1093-1353.

96. The engineered nuclease system of any of claims 1-27 or claims 92-95, wherein the guide RNA structure comprises a sequence that is at least 70%, 80%, or 90% identical to a sequence selected from: 5471 SEQ ID NO, 5507 SEQ ID NO and 5540-5542 SEQ ID NO.

97. The engineered nuclease system of any of claims 1-27 or claims 92-96, wherein the guide RNA structure comprises a tracr ribonucleic acid sequence predicted to comprise at least two hairpins comprising less than 5 base-paired ribonucleotides.

98. The engineered nuclease system of any of claims 1-27 or claims 92-97, wherein the endonuclease is configured to bind to a PAM comprising SEQ ID NO 5522.

99. The engineered nuclease system of any of claims 1-27 or claims 92-98, wherein:

a) the endonuclease comprises a sequence that is at least 70%, 80%, or 90% identical to SEQ ID NO 2914;

b) the guide RNA structure comprises a sequence that is at least 70%, 80%, or 90% identical to SEQ ID NO 5471 or SEQ ID NO 5507; and

c) the endonuclease is configured to bind to a PAM comprising SEQ ID NO: 5522.

100. The engineered nuclease system of any of claims 1-27, wherein the endonuclease comprises a sequence that is at least 70%, 80%, or 90% identical to a sequence selected from the group consisting of SEQ ID NO 3175 or SEQ ID NO 3175 and 3330.

101. The engineered nuclease system of any of claims 1-27 or 100, wherein the endonuclease comprises a sequence that is at least 70%, 80% or 90% identical to a sequence selected from the group consisting of SEQ ID NO 4989 or SEQ ID NO 4989 and 5146.

102. The engineered nuclease system of any of claims 1-27 or 100-101, wherein the endonuclease comprises at least 1, at least 2, at least 3, at least 4, or at least 5 peptide motifs selected from the group consisting of SEQ ID NOs 5679-5686.

103. The engineered nuclease system of any of claims 1-27 or 100-102, wherein the endonuclease comprises a sequence that is at least 70%, 80%, or 90% identical to a sequence selected from the group consisting of SEQ ID NO 1354 and SEQ ID NO 1354-1511.

104. The engineered nuclease system of any of claims 1-27 or 100-103, wherein the guide RNA structure comprises a sequence that is at least 70%, 80% or 90% identical to a sequence selected from SEQ ID No. 5472 or SEQ ID No. 5508.

105. The engineered nuclease system of any of claims 1-27 or 100-104, wherein the endonuclease is configured to bind to a PAM comprising a sequence selected from SEQ ID NO:5523 or SEQ ID NO: 5537.

106. The engineered nuclease system of any of claims 1-27 or claim 100-105, wherein

a) The endonuclease comprises a sequence that is at least 70%, 80%, or 90% identical to SEQ ID NO 3175;

b) the guide RNA structure comprises a sequence that is at least 70%, 80%, or 90% identical to SEQ ID NO 5472 or SEQ ID NO 5508; and

c) the endonuclease is configured to bind to a PAM comprising SEQ ID NO:5523 or SEQ ID NO: 5537.

107. The engineered nuclease system of any of claims 1-27, wherein the endonuclease comprises a sequence that is at least 70%, 80%, or 90% identical to a sequence selected from the group consisting of SEQ ID NO 3331 or SEQ ID NO 3331-3474.

108. The engineered nuclease system of any of claims 1-27 or claim 107, wherein the endonuclease comprises a sequence that is at least 70%, 80% or 90% identical to a sequence selected from the group consisting of SEQ ID NOs 5147 or 5147-5290.

109. The engineered nuclease system of any of claims 1-27 or claims 107-108, wherein the endonuclease comprises at least 1, at least 2, at least 3, at least 4, or at least 5 peptide motifs selected from the group consisting of SEQ ID NOs 5674-5675 and 5687-5693.

110. The engineered nuclease system of any of claims 1-27 or 107-109, wherein the endonuclease comprises a sequence that is at least 70%, 80%, or 90% identical to a sequence selected from SEQ ID NO 1512 or SEQ ID NO 1512-1655.

111. The engineered nuclease system of any of claims 1-27 or claims 107-110, wherein the guide RNA structure comprises a sequence that is at least 70%, 80% or 90% identical to a sequence selected from SEQ ID NO 5473 or SEQ ID NO 5509.

112. The engineered nuclease system of any of claims 1-27 or 107-111, wherein the endonuclease is configured to bind to a PAM comprising SEQ ID NO 5524.

113. The engineered nuclease system of any of claims 1-27 or claims 107-112, wherein:

a) the endonuclease comprises a sequence that is at least 70%, 80%, or 90% identical to SEQ ID NO 3331;

b) the guide RNA structure comprises a sequence that is at least 70%, 80%, or 90% identical to SEQ ID NO 5473 or SEQ ID NO 5509; and

c) the endonuclease is configured to bind to a PAM comprising SEQ ID NO: 5524.

114. The engineered nuclease system of any of claims 1-27, wherein the endonuclease comprises a sequence that is at least 70%, 80%, or 90% identical to a sequence selected from the group consisting of SEQ ID NO 3475 or SEQ ID NO 3475-3568.

115. The engineered nuclease system of any of claims 1-27 or claim 114, wherein the endonuclease comprises a sequence that is at least 70%, 80%, or 90% identical to a sequence selected from the group consisting of SEQ ID NO 5291 or SEQ ID NO 5291-5389.

116. The engineered nuclease system of any of claims 1-27 or 114-115, wherein the endonuclease comprises at least 1, at least 2, at least 3, at least 4, or at least 5 peptide motifs selected from the group consisting of SEQ ID NOs 5694-5699.

117. The engineered nuclease system of any of claims 1-27 or 114-116, wherein the endonuclease comprises a sequence that is at least 70%, 80%, or 90% identical to a sequence selected from the group consisting of SEQ ID NO 1656 or SEQ ID NO 1656-1755.

118. The engineered nuclease system of any of claims 1-27 or claim 114 and 117, wherein the guide RNA structure comprises a sequence that is at least 70%, 80% or 90% identical to SEQ ID No. 5474 or SEQ ID No. 5510.

119. The engineered nuclease system of any of claims 1-27 or 114-118, wherein the endonuclease is configured to bind to a PAM comprising SEQ ID NO 5525.

120. The engineered nuclease system of any one of claims 1-27 or claim 114 and 119, wherein:

a) the endonuclease comprises a sequence that is at least 70%, 80%, or 90% identical to SEQ ID NO 3475;

b) the guide RNA structure comprises a sequence that is at least 70%, 80%, or 90% identical to SEQ ID NO 5474 or SEQ ID NO 5510; and

c) the endonuclease is configured to bind to a PAM comprising SEQ ID NO: 5525.

121. The engineered nuclease system of any of claims 1-27, wherein the endonuclease comprises a sequence that is at least 70%, 80%, or 90% identical to a sequence selected from the group consisting of SEQ ID NO 3569 or SEQ ID NO 3569-3637.

122. The engineered nuclease system of any of claims 1-27 or claim 121, wherein the endonuclease comprises a sequence that is at least 70%, 80%, or 90% identical to a sequence selected from the group consisting of SEQ ID NO 5390 or SEQ ID NO 5390-5460.

123. The engineered nuclease system of any of claims 1-27 or claims 121-122, wherein the endonuclease comprises at least 1, at least 2, at least 3, at least 4, or at least 5 peptide motifs selected from the group consisting of SEQ ID NOs 5700-5717.

124. The engineered nuclease system of any of claims 1-27 or claims 121-123, wherein the endonuclease comprises a sequence that is at least 70%, 80%, or 90% identical to a sequence selected from SEQ ID No. 1756 or SEQ ID No. 1756-1826.

125. The engineered nuclease system of any of claims 1-27 or claims 121-124, wherein the guide RNA structure comprises a sequence that is at least 70%, 80%, or 90% identical to SEQ ID No. 5475 or SEQ ID No. 5511.

126. The engineered nuclease system of any of claims 1-27 or 121-125, wherein the endonuclease is configured to bind to a PAM comprising SEQ ID NO 5526.

127. The engineered nuclease system of any of claims 1-27 or claims 121-126, wherein:

a) the endonuclease comprises a sequence that is at least 70%, 80%, or 90% identical to SEQ ID NO 3569;

b) the guide RNA structure comprises a sequence that is at least 70%, 80%, or 90% identical to SEQ ID NO 5475 or SEQ ID NO 5511; and

c) the endonuclease is configured to bind to a PAM comprising SEQ ID NO: 5526.

128. The engineered nuclease system of any one of claims 1-127, wherein the sequence identity is determined by BLASTP, CLUSTALW, MUSCLE, MAFFT, or Smith-Waterman homology search algorithm.

129. The engineered nuclease system of claim 128, wherein the sequence identity is determined by the BLASTP homology search algorithm adjusted using a word length (W) of 3, a expectation (E) of 10, and a BLOSUM62 scoring matrix with a penalty of 11, an extension of 1, and using a conditional composition scoring matrix.

130. An engineered guide ribonucleic acid polynucleotide comprising:

a) a DNA targeting segment comprising a nucleotide sequence complementary to a target sequence in a target DNA molecule; and

b) A protein binding segment comprising two complementary stretches of nucleotides that hybridize to form a double-stranded RNA (dsRNA) duplex,

wherein the two complementary stretches of nucleotides are covalently linked to each other through an intervening nucleotide, an

Wherein the engineered guide ribonucleic acid polynucleotide is configured to form a complex with an endonuclease comprising a RuvC _ III domain having at least 75% sequence identity to any one of SEQ ID NO 1827-3637 and to target the complex to the target sequence of the target DNA molecule.

131. The engineered guide ribonucleic acid polynucleotide of claim 130, wherein the DNA targeting segment is located 5' to the two complementary stretches of nucleotides.

132. The engineered guide ribonucleic acid polynucleotide of any one of claims 130-131, wherein:

a) the protein binding segment comprises a sequence having at least 70%, at least 80% or at least 90% identity to a sequence selected from the group consisting of SEQ ID NO 5476-5479 or SEQ ID NO 5476-5489;

b) the protein binding segment comprises a sequence having at least 70%, at least 80% or at least 90% identity to a sequence selected from (SEQ ID NO:5490-5491 or SEQ ID NO:5490-5494) and SEQ ID NO: 5538;

c) The protein binding segment comprises a sequence having at least 70%, at least 80% or at least 90% identity to a sequence selected from the group consisting of SEQ ID NO 5498-5499;

d) the protein binding segment comprises a sequence having at least 70%, at least 80% or at least 90% identity to a sequence selected from the group consisting of SEQ ID NO 5495-5497 and SEQ ID NO 5500-5502;

e) the protein binding segment comprises a sequence having at least 70%, at least 80%, or at least 90% identity to SEQ ID No. 5503;

f) the protein binding segment comprises a sequence having at least 70%, at least 80%, or at least 90% identity to SEQ ID No. 5504;

g) the protein binding segment comprises a sequence having at least 70%, at least 80%, or at least 90% identity to SEQ ID No. 5505;

h) the protein binding segment comprises a sequence having at least 70%, at least 80%, or at least 90% identity to SEQ ID No. 5506;

i) the protein binding segment comprises a sequence having at least 70%, at least 80%, or at least 90% identity to SEQ ID No. 5507;

j) the protein binding segment comprises a sequence having at least 70%, at least 80%, or at least 90% identity to SEQ ID No. 5508;

k) the protein binding segment comprises a sequence having at least 70%, at least 80%, or at least 90% identity to SEQ ID No. 5509;

l) the protein binding segment comprises a sequence having at least 70%, at least 80%, or at least 90% identity to SEQ ID NO 5510; or

m) the protein binding segment comprises a sequence having at least 70%, at least 80%, or at least 90% identity to SEQ ID NO: 5511.

133. The engineering guide ribonucleic acid polynucleotide of any one of claims 130-132, wherein:

a) the guide ribonucleic acid polynucleotide comprises an RNA sequence comprising a hairpin comprising a stem and a loop, wherein the stem comprises at least 10, at least 12, or at least 14 base-paired ribonucleotides and an asymmetric bulge within 4 base pairs of the loop;

b) the guide ribonucleic acid polynucleotide comprises a tracr ribonucleic acid sequence predicted to comprise a hairpin comprising at least 8, at least 10, or at least 12 base-paired ribonucleotides;

c) the guide RNA structure comprises a guide ribonucleic acid sequence predicted to comprise a hairpin with an uninterrupted base-pairing region, the guide ribonucleic acid structure comprising a guide ribonucleic acid sequence of at least 8 nucleotides and a tracr ribonucleic acid sequence of at least 8 nucleotides, and wherein the tracr ribonucleic acid sequence from 5 'to 3' comprises a first hairpin and a second hairpin, the first hairpin having a longer stem than the second hairpin; or

d) The guide ribonucleic acid polynucleotide comprises a tracr ribonucleic acid sequence predicted to comprise at least two hairpins comprising fewer than 5 base-paired ribonucleotides.

134. A deoxyribonucleic acid polynucleotide encoding the engineered guide ribonucleic acid polynucleotide of any one of claims 130-133.

135. A nucleic acid comprising an engineered nucleic acid sequence optimized for expression in an organism, wherein the nucleic acid encodes a class 2 type II Cas endonuclease comprising a RuvC III domain and an HNH domain, and wherein the endonuclease is derived from an uncultured microorganism.

136. A nucleic acid comprising an engineered nucleic acid sequence optimized for expression in an organism, wherein said nucleic acid encodes an endonuclease comprising a RuvC III domain having at least 70% sequence identity to any one of SEQ ID NOs 1827-3637.

137. The nucleic acid according to any one of claims 135-136, wherein the endonuclease comprises a HNH domain that has at least 70% or at least 80% sequence identity to any one of SEQ ID NOs 3638-5460.

138. The nucleic acid of any one of claims 135-137, wherein the endonuclease comprises SEQ ID NO 5572-5591 or a variant having at least 70% sequence identity thereto.

139. The nucleic acid of any one of claims 135-138, wherein the endonuclease comprises a sequence encoding one or more Nuclear Localization Sequences (NLS) proximal to the N-terminus or C-terminus of the endonuclease.

140. The nucleic acid of claim 139, wherein the NLS comprises a sequence selected from the group consisting of SEQ ID NO 5597-5612.

141. The nucleic acid of any one of claims 135-140, wherein the organism is a prokaryotic organism, a bacterial organism, a eukaryotic organism, a fungal organism, a plant organism, a mammalian organism, a rodent organism, or a human.

142. The nucleic acid of claim 141, wherein the organism is escherichia coli, and wherein:

a) the nucleic acid sequence has at least 70%, 80% or 90% identity to a sequence selected from the group consisting of SEQ ID NO 5572-5575;

b) the nucleic acid sequence has at least 70%, 80% or 90% identity to a sequence selected from the group consisting of SEQ ID NO 5576-5577;

c) the nucleic acid sequence has at least 70%, 80% or 90% identity to a sequence selected from SEQ ID NO 5578-5580;

d) the nucleic acid sequence has at least 70%, 80% or 90% identity to SEQ ID NO 5581;

e) The nucleic acid sequence has at least 70%, 80% or 90% identity to SEQ ID NO 5582;

f) the nucleic acid sequence has at least 70%, 80% or 90% identity to SEQ ID NO 5583;

g) the nucleic acid sequence has at least 70%, 80% or 90% identity to SEQ ID NO 5584;

h) the nucleic acid sequence has at least 70%, 80% or 90% identity to SEQ ID NO 5585;

i) the nucleic acid sequence has at least 70%, 80% or 90% identity to SEQ ID NO 5586; or

j) The nucleic acid sequence has at least 70%, 80% or 90% identity with SEQ ID NO 5587.

143. The nucleic acid of claim 141, wherein the organism is a human, and wherein

a) The nucleic acid sequence has at least 70%, 80% or 90% identity with SEQ ID NO 5588 or SEQ ID NO 5589; or

b) The nucleic acid sequence has at least 70%, 80% or 90% identity with SEQ ID NO 5590 or SEQ ID NO 5591.

144. A vector comprising a nucleic acid sequence encoding a class 2, type II Cas endonuclease comprising a RuvC III domain and an HNH domain, wherein said endonuclease is derived from an uncultured microorganism.

145. A vector comprising the nucleic acid of any one of claims 135-143.

146. The vector of any one of claims 144-145, further comprising a nucleic acid encoding an engineered guide ribonucleic acid structure configured to form a complex with the endonuclease, the complex comprising:

a) a guide ribonucleic acid sequence configured to hybridize to a target deoxyribonucleic acid sequence; and

b) a tracr ribonucleic acid sequence configured to bind to the endonuclease.

147. The vector of any one of claims 144-146, wherein the vector is a plasmid, a minicircle, CELiD, an adeno-associated virus (AAV) -derived virion, or a lentivirus.

148. A cell comprising the vector of any one of claims 144-147.

149. A method of making an endonuclease, comprising culturing the cell of claim 146.

150. A method for binding, cleaving, labeling or modifying a double-stranded deoxyribonucleic acid polynucleotide, the method comprising:

(a) contacting the double-stranded deoxyribonucleic acid polynucleotide with a type 2 Cas endonuclease complexed with an engineered guide ribonucleic acid structure configured to bind to the endonuclease and the double-stranded deoxyribonucleic acid polynucleotide;

(b) Wherein the double-stranded deoxyribonucleic acid polynucleotide comprises a protospacer sequence adjacent motif (PAM); and is

(c) Wherein the PAM comprises a sequence selected from the group consisting of SEQ ID NO:5512-5526 or SEQ ID NO: 5527-5537.

151. The method of claim 149, wherein the double-stranded deoxyribonucleic acid polynucleotide comprises a first strand comprising a sequence complementary to the sequence of the engineered guide ribonucleic acid structure and a second strand comprising the PAM.

152. The method of claim 151, wherein said PAM is directly adjacent to the 3' terminus of said sequence complementary to said sequence of said engineered guide ribonucleic acid structure.

153. The method of any one of claims 149-152, wherein the class 2 type II Cas endonuclease is not a Cas9 endonuclease, a Cas14 endonuclease, a Cas12a endonuclease, a Cas12b endonuclease, a Cas12c endonuclease, a Cas12d endonuclease, a Cas12e endonuclease, a Cas13a endonuclease, a Cas13b endonuclease, a Cas13c endonuclease, or a Cas13 d endonuclease.

154. The method of any one of claims 149-153, wherein the class 2 type II Cas endonuclease is derived from an uncultured microorganism.

155. The method of any one of claims 149-154, wherein the double-stranded deoxyribonucleic acid polynucleotide is a eukaryotic, plant, fungal, mammalian, rodent or human double-stranded deoxyribonucleic acid polynucleotide.

156. The method as set forth in any one of claims 149-155, wherein:

a) the PAM comprises a sequence selected from the group consisting of SEQ ID NO:5512-5515 and SEQ ID NO: 5527-5530;

b) the PAM comprises SEQ ID NO 5516 or SEQ ID NO 5531;

c) the PAM comprises SEQ ID NO 5539;

d) the PAM comprises SEQ ID NO 5517 or SEQ ID NO 5518;

e) the PAM comprises SEQ ID NO: 5519;

f) the PAM comprises SEQ ID NO of 5520 or SEQ ID NO of 5535;

g) the PAM comprises SEQ ID NO 5521 or SEQ ID NO 5536;

h) the PAM comprises SEQ ID NO 5522;

i) the PAM comprises SEQ ID NO:5523 or SEQ ID NO: 5537;

j) the PAM comprises SEQ ID NO: 5524;

k) the PAM comprises SEQ ID NO 5525; or

l) the PAM comprises SEQ ID NO 5526.

157. A method of modifying a target nucleic acid locus, the method comprising delivering the engineered nuclease system of any of claims 1-129 to the target nucleic acid locus, wherein the endonuclease is configured to form a complex with the engineered guide ribonucleic acid structure, and wherein the complex is configured such that upon binding of the complex to the target nucleic acid locus, the complex modifies the target nucleic acid locus.

158. The method of claim 156, wherein modifying the target nucleic acid locus comprises binding, nicking, cleaving, or labeling the target nucleic acid locus.

159. The method of any one of claims 156-158, wherein the target nucleic acid locus comprises deoxyribonucleic acid (DNA) or ribonucleic acid (RNA).

160. The method of claim 159, wherein the target nucleic acid comprises genomic DNA, viral RNA, or bacterial DNA.

161. The method of any one of claims 156-160, wherein the target nucleic acid locus is in vitro.

162. The method of any one of claims 156-160, wherein the target nucleic acid locus is within a cell.

163. The method of claim 162, wherein the cell is a prokaryotic cell, a bacterial cell, a eukaryotic cell, a fungal cell, a plant cell, an animal cell, a mammalian cell, a rodent cell, a primate cell, or a human cell.

164. The method of any one of claims 162-163, wherein delivering the engineered nuclease system to the target nucleic acid locus comprises delivering the nucleic acid of any one of claims 135-140 or the vector of any one of claims 142-146.

165. The method of any one of claims 162-163, wherein delivering the engineered nuclease system to the target nucleic acid locus comprises delivering nucleic acid comprising an open reading frame encoding the endonuclease.

166. The method of claim 164, wherein said nucleic acid comprises a promoter to which said open reading frame encoding said endonuclease is operably linked.

167. The method of any one of claims 162-163, wherein delivering the engineered nuclease system to the target nucleic acid locus comprises delivering a capped mRNA comprising the open reading frame encoding the endonuclease.

168. The method of any one of claims 162-163, wherein delivering the engineered nuclease system to the target nucleic acid locus comprises delivering a translated polypeptide.

169. The method of any one of claims 162-163, wherein delivering the engineered nuclease system to the target nucleic acid locus comprises delivering deoxyribonucleic acid (DNA) encoding the engineered guide ribonucleic acid structure operably linked to a ribonucleic acid (RNA) pol III promoter.

170. The method of any one of claims 156-169, wherein the endonuclease induces a single strand break or a double strand break at or proximal to the target locus.

Background

Cas enzymes and their associated Clustered Regularly Interspaced Short Palindromic Repeats (CRISPR) -directed ribonucleic acids (RNAs) appear to be a prevalent (about 45% of bacteria, about 84% of archaea) component of the prokaryotic immune system for protecting such microorganisms against non-self nucleic acids, such as infectious viruses and plasmids, by CRISPR-RNA-directed nucleic acid cleavage. Although the deoxyribonucleic acid (DNA) elements encoding the CRISPR RNA elements may be relatively conserved in structure and length, their CRISPR-associated (Cas) proteins are highly diverse, comprising a wide variety of nucleic acid interaction domains. Although the CRISPR DNA element has been observed as early as 1987, the programmable endonuclease cleavage capability of the CRISPR/Cas complex was not until recently recognized, leading to the use of recombinant CRISPR/Cas systems in a variety of DNA manipulation and gene editing applications.

Sequence listing

This application contains a sequence listing that has been submitted electronically in ASCII format and is incorporated by reference herein in its entirety. The ASCII copy created on day 13, 2/2020 is named 55921-703_601_ sl. txt and is 23,363,113 bytes in size.

Disclosure of Invention

In some aspects, the present disclosure provides an engineered nuclease system comprising: (a) an endonuclease comprising a RuvC III domain and an HNH domain, wherein the endonuclease is derived from an uncultured microorganism, wherein the endonuclease is a class 2 type II Cas endonuclease; and (b) an engineered guide ribonucleic acid structure configured to form a complex with the endonuclease, the complex comprising: (i) a guide ribonucleic acid sequence configured to hybridize to a target deoxyribonucleic acid sequence; and (ii) a tracr ribonucleic acid sequence configured to bind to the endonuclease. In some embodiments, the RuvC _ III domain comprises a sequence having at least 70%, at least 75%, at least 80%, or at least 90% sequence identity to any one of SEQ ID NO 1827-3637.

In some aspects, the present disclosure provides an engineered nuclease system comprising: (a) an endonuclease comprising a RuvC _ III domain having at least 75% sequence identity to any one of SEQ ID NO 1827-3637; and (b) an engineered guide ribonucleic acid structure configured to form a complex with the endonuclease, the complex comprising: (i) a guide ribonucleic acid sequence configured to hybridize to a target deoxyribonucleic acid sequence; and (ii) a tracr ribonucleic acid sequence configured to bind to the endonuclease.

In some aspects, the present disclosure provides an engineered nuclease system comprising: (a) an endonuclease configured to bind to a Protospacer Adjacent Motif (PAM) sequence comprising SEQ ID NO 5512-5537, wherein the endonuclease is a type 2 type II Cas endonuclease; and (b) an engineered guide ribonucleic acid structure configured to form a complex with the endonuclease, the complex comprising: (i) a guide ribonucleic acid sequence configured to hybridize to a target deoxyribonucleic acid sequence; and (ii) a tracr ribonucleic acid sequence configured to bind to the endonuclease.

In some embodiments, the endonuclease is derived from an uncultured microorganism. In some embodiments, the endonuclease is not engineered to bind different PAM sequences. In some embodiments, the endonuclease is not a Cas9 endonuclease, a Cas14 endonuclease, a Cas12a endonuclease, a Cas12b endonuclease, a Cas12c endonuclease, a Cas12d endonuclease, a Cas12e endonuclease, a Cas13a endonuclease, a Cas13b endonuclease, a Cas13c endonuclease, or a Cas13d endonuclease. In some embodiments, the endonuclease has less than 80% identity to a Cas9 endonuclease. In some embodiments, the endonuclease further comprises an HNH domain. In some embodiments, the tracr ribonucleic acid sequence comprises a sequence having at least 80% sequence identity to about 60 to 90 consecutive nucleotides selected from any one of SEQ ID NO:5476-5511 and SEQ ID NO: 5538.

In some aspects, the present disclosure provides an engineered nuclease system comprising: (a) an engineered guide ribonucleic acid structure comprising: (i) a guide ribonucleic acid sequence configured to hybridize to a target deoxyribonucleic acid sequence; and (ii) a tracr ribonucleic acid sequence configured to bind to an endonuclease, wherein the tracr ribonucleic acid sequence comprises a sequence having at least 80% sequence identity to about 60 to 90 consecutive nucleotides selected from any one of SEQ ID NO:5476-5511 and SEQ ID NO: 5538; and (b) a class 2, type II Cas endonuclease configured to bind to the engineered guide ribonucleic acid. In some embodiments, the endonuclease is configured to bind to a Protospacer Adjacent Motif (PAM) sequence selected from the group comprising SEQ ID NO 5512 and 5537.

In some embodiments, the engineered guide ribonucleic acid structure comprises at least two ribonucleic acid polynucleotides. In some embodiments, the engineered guide ribonucleic acid structure comprises one ribonucleic acid polynucleotide comprising the guide ribonucleic acid sequence and the tracr ribonucleic acid sequence.

In some embodiments, the guide ribonucleic acid sequence is complementary to a prokaryotic, bacterial, eukaryotic, archaeal, fungal, plant, mammalian or human genomic sequence. In some embodiments, the guide ribonucleic acid sequence is 15 to 24 nucleotides in length. In some embodiments, the endonuclease comprises one or more Nuclear Localization Sequences (NLS) proximal to the N-terminus or C-terminus of the endonuclease. In some embodiments, the NLS comprises a sequence selected from SEQ ID NO 5597-5612.

In some embodiments, the engineered nuclease system further comprises a single-stranded or double-stranded DNA repair template comprising, from 5 'to 3': a first homology arm comprising a sequence of at least 20 nucleotides 5 'to the target deoxyribonucleic acid sequence, a synthetic DNA sequence of at least 10 nucleotides, and a second homology arm comprising a sequence of at least 20 nucleotides 3' to the target sequence. In some embodiments, the first homology arm or the second homology arm comprises a sequence of at least 40, 80, 120, 150, 200, 300, 500, or 1,000 nucleotides.

In some embodiments, the system further comprises Mg2+A source.

In some embodiments, the endonuclease and tracr ribonucleic acid sequences are derived from different bacterial species within the same gate. In some embodiments, the endonuclease is derived from a bacterium belonging to the genus of dermatophytes. In some embodiments, the endonuclease is derived from a bacterium belonging to the Phylum Verrucomicrobia (Phylum Verrucomicrobia), Phylum provisionala (Phylum Candidatus peregrinia) or Phylum provisionala nigricans (Phylum Candidatus melainabia). In some embodiments, the endonuclease is derived from a bacterium that comprises a 16S rRNA gene that is at least 90% identical to any one of SEQ ID NO:5592 and 5595.

In some embodiments, the HNH domain comprises a sequence that is at least 70% or at least 80% identical to any one of SEQ ID NO 5638-5460. In some embodiments, the endonuclease comprises SEQ ID NOs 1-1826 or variants having at least 55% identity thereto. In some embodiments, the endonuclease comprises a sequence that is at least 70%, 80%, or 90% identical to a sequence selected from SEQ ID NO 1827-1830 or SEQ ID NO 1827-2140.

In some embodiments, the endonuclease comprises a sequence that is at least 70%, 80%, or 90% identical to a sequence selected from SEQ ID NO 3638-3641 or SEQ ID NO 3638-3954. In some embodiments, the endonuclease comprises at least 1, at least 2, at least 3, at least 4, or at least 5 peptide motifs selected from the group consisting of SEQ ID NO 5615-5632. In some embodiments, the endonuclease comprises a sequence that is at least 70%, 80%, or 90% identical to a sequence selected from SEQ ID NOS 1-4 or SEQ ID NOS 1-319.

In some embodiments, the guide RNA structure comprises a sequence that is at least 70%, 80% or 90% identical to a sequence selected from the group consisting of SEQ ID NO 5461-5464, SEQ ID NO 5476-5479 or SEQ ID NO 5476-5489. In some embodiments, the guide RNA structure comprises an RNA sequence predicted to comprise a hairpin consisting of a stem and a loop, the stem comprising at least 10, at least 12, or at least 14 base-paired ribonucleotides, and an asymmetric bulge within 4 base pairs of the loop.

In some embodiments, the endonuclease is configured to bind to a PAM comprising a sequence selected from the group consisting of SEQ ID NO:5512-5515 or SEQ ID NO: 5527-5530.

In some embodiments: (a) the endonuclease comprises a sequence that is at least 70%, at least 80%, or at least 90% identical to SEQ ID NO 1827; (b) the guide RNA structure comprises a sequence that is at least 70%, at least 80%, or at least 90% identical to at least one of SEQ ID NO 5461 or SEQ ID NO 5476; and (c) the endonuclease is configured to bind to a PAM comprising SEQ ID NO:5512 or SEQ ID NO: 5527. In some embodiments: (a) the endonuclease comprises a sequence that is at least 70%, at least 80%, or at least 90% identical to SEQ ID NO 1828; (b) the guide RNA structure comprises a sequence that is at least 70%, at least 80%, or at least 90% identical to at least one of SEQ ID NO 5462 or SEQ ID NO 5477; and (c) the endonuclease is configured to bind to a PAM comprising SEQ ID NO:5513 or SEQ ID NO: 5528. In some embodiments: (a) the endonuclease comprises a sequence that is at least 70%, at least 80%, or at least 90% identical to SEQ ID NO 1829; (b) the guide RNA structure comprises a sequence that is at least 70%, at least 80%, or at least 90% identical to at least one of SEQ ID NO 5463 or SEQ ID NO 5478; and (c) the endonuclease is configured to bind to a PAM comprising SEQ ID NO:5514 or SEQ ID NO: 5529. In some embodiments: (a) the endonuclease comprises a sequence that is at least 70%, at least 80%, or at least 90% identical to SEQ ID NO: 1830; (b) the guide RNA structure comprises a sequence that is at least 70%, at least 80%, or at least 90% identical to at least one of SEQ ID NO 5464 or SEQ ID NO 5479; and (c) the endonuclease is configured to bind to a PAM comprising SEQ ID NO:5515 or SEQ ID NO: 5530.

In some embodiments, the endonuclease comprises a sequence that is at least 70%, 80%, or 90% identical to a sequence selected from the group consisting of SEQ ID NO:2141-2142 or SEQ ID NO: 2141-2241. In some embodiments, the endonuclease comprises a sequence that is at least 70%, 80%, or 90% identical to a sequence selected from the group consisting of SEQ ID NO 3955-3956 or SEQ ID NO 3955-4055. In some embodiments, the endonuclease comprises at least 1, at least 2, at least 3, at least 4, or at least 5 peptide motifs selected from SEQ ID NO 5632-5638. In some embodiments, the endonuclease comprises a sequence that is at least 70%, 80%, or 90% identical to a sequence selected from SEQ ID NO 320-321 or SEQ ID NO 320-420. In some embodiments, the guide RNA structure comprises a sequence that is at least 70%, 80%, or 90% identical to a sequence selected from the group consisting of: 5465, 5490-5491 or 5490-5494 parts of SEQ ID NO. In some embodiments, the guide RNA structure comprises a tracr ribonucleic acid sequence comprising a hairpin comprising at least 8, at least 10, or at least 12 base-paired ribonucleotides. In some embodiments, the endonuclease is configured to bind to a PAM comprising a sequence selected from the group consisting of SEQ ID NO:5516 and SEQ ID NO: 5531. In some embodiments: (a) the endonuclease comprises a sequence that is at least 70%, 80%, or 90% identical to SEQ ID NO 2141; (b) the guide RNA structure comprises a sequence that is at least 70%, 80%, or 90% identical to SEQ ID NO 5490; and (c) the endonuclease is configured to bind to a PAM comprising SEQ ID NO: 5531. In some embodiments: (a) the endonuclease comprises a sequence that is at least 70%, 80%, or 90% identical to SEQ ID NO: 2142; (b) the guide RNA structure comprises a sequence that is at least 70%, 80%, or 90% identical to SEQ ID NO 5465 or SEQ ID NO 5491; and (c) the endonuclease is configured to bind to a PAM comprising SEQ ID NO: 5516.

In some embodiments, the endonuclease comprises a sequence that is at least 70%, 80%, or 90% identical to a sequence selected from the group consisting of SEQ ID NO 2245-2246. In some embodiments, the endonuclease comprises a sequence that is at least 70%, 80%, or 90% identical to a sequence selected from the group consisting of SEQ ID NO 4059-4060. In some embodiments, the endonuclease comprises at least 1, at least 2, at least 3, at least 4, or at least 5 peptide motifs selected from SEQ ID NO 5639-5648. In some embodiments, the endonuclease comprises a sequence that is at least 70%, 80%, or 90% identical to a sequence selected from the group consisting of SEQ ID NO 424-425. In some embodiments, the guide RNA structure comprises a sequence that is at least 70%, 80%, or 90% identical to a sequence selected from the group consisting of SEQ ID NO 5498-5499 and SEQ ID NO 5539. In some embodiments, a guide RNA structure comprises a guide ribonucleic acid sequence predicted to comprise a hairpin with an uninterrupted base-pairing region, the guide ribonucleic acid structure comprising a guide ribonucleic acid sequence of at least 8 nucleotides and a tracr ribonucleic acid sequence of at least 8 nucleotides, and wherein the tracr ribonucleic acid sequence comprises, from 5 'to 3', a first hairpin and a second hairpin, wherein the first hairpin has a longer stem than the second hairpin.

In some embodiments, the endonuclease comprises a sequence that is at least 70%, 80%, or 90% identical to a sequence selected from the group consisting of SEQ ID NO 2242-2244 or SEQ ID NO 2247-2249. In some embodiments, the endonuclease comprises a sequence that is at least 70%, 80%, or 90% identical to a sequence selected from the group consisting of SEQ ID NO 4056-4058 and SEQ ID NO 4061-4063. In some embodiments, the endonuclease comprises at least 1, at least 2, at least 3, at least 4, or at least 5 peptide motifs selected from SEQ ID NO 5639-5648. In some embodiments, the endonuclease comprises a sequence that is at least 70%, 80%, or 90% identical to a sequence selected from SEQ ID NO 421-423 or SEQ ID NO 426-428. In some embodiments, the guide RNA structure comprises a sequence that is at least 70%, 80%, or 90% identical to a sequence selected from the group consisting of: 5466-5467 SEQ ID NO, 5495-5497 SEQ ID NO, 5500-5502 SEQ ID NO and 5539 SEQ ID NO. In some embodiments, a guide RNA structure comprises a guide ribonucleic acid sequence predicted to comprise a hairpin with an uninterrupted base-pairing region, the guide ribonucleic acid structure comprising a guide ribonucleic acid sequence of at least 8 nucleotides and a tracr ribonucleic acid sequence of at least 8 nucleotides, and wherein the tracr ribonucleic acid sequence comprises, from 5 'to 3', a first hairpin and a second hairpin, wherein the first hairpin has a longer stem than the second hairpin. In some embodiments, the endonuclease is configured to bind to a PAM comprising a sequence selected from the group consisting of SEQ ID NO:5517-5518 or SEQ ID NO: 5532-5534. In some embodiments: (a) the endonuclease comprises a sequence that is at least 70%, 80%, or 90% identical to SEQ ID NO 2247; (b) the guide RNA structure comprises a sequence that is at least 70%, 80%, or 90% identical to SEQ ID NO 5500; and (c) the endonuclease is configured to bind to a PAM comprising SEQ ID NO:5517 or SEQ ID NO: 5532. In some embodiments: (a) the endonuclease comprises a sequence that is at least 70%, 80%, or 90% identical to SEQ ID NO 2248; (b) the guide RNA structure comprises a sequence that is at least 70%, 80%, or 90% identical to SEQ ID NO 5501; and (c) the endonuclease is configured to bind to a PAM comprising SEQ ID NO:5518 or SEQ ID NO: 5533. In some embodiments: (a) the endonuclease comprises a sequence that is at least 70%, 80%, or 90% identical to SEQ ID NO 2249; (b) the guide RNA structure comprises a sequence that is at least 70%, 80%, or 90% identical to SEQ ID NO 5502; and (c) the endonuclease is configured to bind to a PAM comprising SEQ ID NO: 5534.

In some embodiments, the endonuclease comprises a sequence that is at least 70%, 80%, or 90% identical to a sequence selected from SEQ ID NO 2253 or SEQ ID NO 2253-2481. In some embodiments, the endonuclease comprises a sequence that is at least 70%, 80%, or 90% identical to a sequence selected from the group consisting of SEQ ID NO 4067 or SEQ ID NO 4067-4295. In some embodiments, the endonuclease comprises a peptide motif according to SEQ ID NO: 5649. In some embodiments, the endonuclease comprises a sequence that is at least 70%, 80%, or 90% identical to a sequence selected from SEQ ID NO 432 or SEQ ID NO 432-660. In some embodiments, the guide RNA structure comprises a sequence that is at least 70%, 80%, or 90% identical to a sequence selected from SEQ ID NO 5468 or SEQ ID NO 5503. In some embodiments, the endonuclease is configured to bind to a PAM comprising a sequence selected from SEQ ID NO: 5519. In some embodiments: (a) the endonuclease comprises a sequence that is at least 70%, 80%, or 90% identical to SEQ ID NO 2253; (b) the guide RNA structure comprises a sequence that is at least 70%, 80%, or 90% identical to SEQ ID NO 5468 or SEQ ID NO 5503; and (c) the endonuclease is configured to bind to a PAM comprising SEQ ID NO: 5519.

In some embodiments, the endonuclease comprises a sequence that is at least 70%, 80%, or 90% identical to a sequence selected from the group consisting of SEQ ID NO 2482-2489. In some embodiments, the endonuclease comprises a sequence that is at least 70%, 80%, or 90% identical to a sequence selected from the group consisting of SEQ ID NO 4296-4303. In some embodiments, the endonuclease comprises a sequence that is at least 70%, 80%, or 90% identical to a sequence selected from the group consisting of SEQ ID NO 661-668. In some embodiments, the endonuclease comprises a sequence that is at least 70%, 80%, or 90% identical to a sequence selected from the group consisting of SEQ ID NO 2490-2498. In some embodiments, the endonuclease comprises a sequence that is at least 70%, 80%, or 90% identical to a sequence selected from the group consisting of SEQ ID NO 4304-4312. In some embodiments, the endonuclease comprises a sequence that is at least 70%, 80%, or 90% identical to a sequence selected from the group consisting of SEQ ID NO 669-677. In some embodiments, the guide RNA structure comprises a sequence that is at least 70%, 80%, or 90% identical to a sequence selected from SEQ ID NO 5504.

In some embodiments, the endonuclease comprises a sequence that is at least 70%, 80%, or 90% identical to a sequence selected from SEQ ID NO:2499 or SEQ ID NO: 2499-2750. In some embodiments, the endonuclease comprises a sequence that is at least 70%, 80%, or 90% identical to a sequence selected from the group consisting of SEQ ID NO 4313 or SEQ ID NO 4313-4564. In some embodiments, the endonuclease comprises at least 1, at least 2, at least 3, at least 4, or at least 5 peptide motifs selected from the group consisting of SEQ ID NO 5650-5667. In some embodiments, the endonuclease comprises a sequence that is at least 70%, 80%, or 90% identical to a sequence selected from SEQ ID NO 678 or SEQ ID NO 678-929. In some embodiments, the guide RNA structure comprises a sequence that is at least 70%, 80%, or 90% identical to SEQ ID NO 5469 or SEQ ID NO 5505. In some embodiments, the endonuclease is configured to bind to a PAM comprising SEQ ID NO:5520 or SEQ ID NO: 5535. In some embodiments: (a) the endonuclease comprises a sequence that is at least 70%, 80%, or 90% identical to SEQ ID NO 2499; (b) the guide RNA structure comprises a sequence that is at least 70%, 80%, or 90% identical to SEQ ID NO 5469 or SEQ ID NO 5505; and (c) the endonuclease is configured to bind to a PAM comprising SEQ ID NO:5520 or SEQ ID NO: 5535.

In some embodiments, the endonuclease comprises a sequence that is at least 70%, 80%, or 90% identical to a sequence selected from SEQ ID NO:2751 or SEQ ID NO: 2751-2913. In some embodiments, the endonuclease comprises a sequence that is at least 70%, 80%, or 90% identical to a sequence selected from the group consisting of SEQ ID NO:4565 or SEQ ID NO: 4565-4727. In some embodiments, the endonuclease comprises at least 1, at least 2, at least 3, at least 4, or at least 5 peptide motifs selected from the group consisting of SEQ ID NO 5668-5678. In some embodiments, the endonuclease comprises a sequence that is at least 70%, 80%, or 90% identical to a sequence selected from SEQ ID NO 930 or 930-1092. In some embodiments, the guide RNA structure comprises a sequence that is at least 70%, 80%, or 90% identical to SEQ ID NO 5470 or SEQ ID NO 5506. In some embodiments, the endonuclease is configured to bind to a PAM comprising a sequence selected from SEQ ID NO:5521 or SEQ ID NO: 5536. In some embodiments: (a) the endonuclease comprises a sequence that is at least 70%, 80%, or 90% identical to SEQ ID NO: 2751; (b) the guide RNA structure comprises a sequence that is at least 70%, 80%, or 90% identical to SEQ ID NO 5470 or SEQ ID NO 5506; and c) the endonuclease is configured to bind to a PAM comprising SEQ ID NO:5521 or SEQ ID NO: 5536.

In some embodiments, the endonuclease comprises a sequence that is at least 70%, 80%, or 90% identical to a sequence selected from the group consisting of SEQ ID NO 2914 and SEQ ID NO 2914-3174. In some embodiments, the endonuclease comprises a sequence that is at least 70%, 80%, or 90% identical to a sequence selected from the group consisting of SEQ ID NO 4728 or SEQ ID NO 4728-4988. In some embodiments, the endonuclease comprises at least 1, at least 2, or at least 3 peptide motifs selected from the group consisting of SEQ ID NO 5676-5678. In some embodiments, the endonuclease comprises a sequence that is at least 70%, 80%, or 90% identical to a sequence selected from the group consisting of SEQ ID NO 1093 or SEQ ID NO 1093-1353. In some embodiments, the guide RNA structure comprises a sequence that is at least 70%, 80%, or 90% identical to a sequence selected from the group consisting of: 5471 SEQ ID NO, 5507 SEQ ID NO and 5540-5542 SEQ ID NO. In some embodiments, the guide RNA structure comprises a tracr ribonucleic acid sequence predicted to comprise at least two hairpins comprising fewer than 5 base-paired ribonucleotides. In some embodiments, the endonuclease is configured to bind to a PAM comprising SEQ ID NO: 5522. In some embodiments: (a) the endonuclease comprises a sequence that is at least 70%, 80%, or 90% identical to SEQ ID NO 2914; (b) the guide RNA structure comprises a sequence that is at least 70%, 80%, or 90% identical to SEQ ID NO 5471 or SEQ ID NO 5507; and (c) the endonuclease is configured to bind to a PAM comprising SEQ ID NO: 5522.

In some embodiments, the endonuclease comprises a sequence that is at least 70%, 80%, or 90% identical to a sequence selected from the group consisting of SEQ ID NO 3175 or SEQ ID NO 3175-3330. In some embodiments, the endonuclease comprises a sequence that is at least 70%, 80%, or 90% identical to a sequence selected from the group consisting of SEQ ID NO 4989 or SEQ ID NO 4989 and 5146. In some embodiments, the endonuclease comprises at least 1, at least 2, at least 3, at least 4, or at least 5 peptide motifs selected from the group consisting of SEQ ID NO 5679-5686. In some embodiments, the endonuclease comprises a sequence that is at least 70%, 80%, or 90% identical to a sequence selected from the group consisting of SEQ ID NO 1354 and SEQ ID NO 1354-1511. In some embodiments, the guide RNA structure comprises a sequence that is at least 70%, 80%, or 90% identical to a sequence selected from SEQ ID NO 5472 or SEQ ID NO 5508. In some embodiments, the endonuclease is configured to bind to a PAM comprising a sequence selected from SEQ ID NO:5523 or SEQ ID NO: 5537. In some embodiments: (a) the endonuclease comprises a sequence that is at least 70%, 80%, or 90% identical to SEQ ID NO 3175; (b) the guide RNA structure comprises a sequence that is at least 70%, 80%, or 90% identical to SEQ ID NO 5472 or SEQ ID NO 5508; and (c) the endonuclease is configured to bind to a PAM comprising SEQ ID NO:5523 or SEQ ID NO: 5537.

In some embodiments, the endonuclease comprises a sequence that is at least 70%, 80%, or 90% identical to a sequence selected from the group consisting of SEQ ID NO 3331 or SEQ ID NO 3331-3474. In some embodiments, the endonuclease comprises a sequence that is at least 70%, 80%, or 90% identical to a sequence selected from the group consisting of SEQ ID NO:5147 or SEQ ID NO: 5147-5290. In some embodiments, the endonuclease comprises at least 1, at least 2, at least 3, at least 4, or at least 5 peptide motifs selected from the group consisting of SEQ ID NO:5674-5675 and SEQ ID NO: 5687-5693. In some embodiments, the endonuclease comprises a sequence that is at least 70%, 80%, or 90% identical to a sequence selected from SEQ ID NO:1512 or SEQ ID NO: 1512-1655. In some embodiments, the guide RNA structure comprises a sequence that is at least 70%, 80%, or 90% identical to a sequence selected from SEQ ID NO:5473 or SEQ ID NO: 5509. In some embodiments, the endonuclease is configured to bind to a PAM comprising SEQ ID NO 5524. In some embodiments: (a) the endonuclease comprises a sequence that is at least 70%, 80%, or 90% identical to SEQ ID NO 3331; (b) the guide RNA structure comprises a sequence that is at least 70%, 80%, or 90% identical to SEQ ID NO 5473 or SEQ ID NO 5509; and (c) the endonuclease is configured to bind to a PAM comprising SEQ ID NO: 5524.

In some embodiments, the endonuclease comprises a sequence that is at least 70%, 80%, or 90% identical to a sequence selected from SEQ ID NO 3475 or SEQ ID NO 3475-3568. In some embodiments, the endonuclease comprises a sequence that is at least 70%, 80%, or 90% identical to a sequence selected from SEQ ID NO 5291 or SEQ ID NO 5291-5389. In some embodiments, the endonuclease comprises at least 1, at least 2, at least 3, at least 4, or at least 5 peptide motifs selected from the group consisting of SEQ ID NO 5694-5699. In some embodiments, the endonuclease comprises a sequence that is at least 70%, 80%, or 90% identical to a sequence selected from the group consisting of SEQ ID NO 1656 or SEQ ID NO 1656-1755. In some embodiments, the guide RNA structure comprises a sequence that is at least 70%, 80%, or 90% identical to SEQ ID NO 5474 or SEQ ID NO 5510. In some embodiments, the endonuclease is configured to bind to a PAM comprising SEQ ID NO 5525. In some embodiments: (a) the endonuclease comprises a sequence that is at least 70%, 80%, or 90% identical to SEQ ID NO 3475; (b) the guide RNA structure comprises a sequence that is at least 70%, 80%, or 90% identical to SEQ ID NO 5474 or SEQ ID NO 5510; and (c) the endonuclease is configured to bind to a PAM comprising SEQ ID NO: 5525.

In some embodiments, the endonuclease comprises a sequence that is at least 70%, 80%, or 90% identical to a sequence selected from SEQ ID NO 3569 or SEQ ID NO 3569-3637. In some embodiments, the endonuclease comprises a sequence that is at least 70%, 80%, or 90% identical to a sequence selected from SEQ ID NO:5390 or SEQ ID NO: 5390-5460. In some embodiments, the endonuclease comprises at least 1, at least 2, at least 3, at least 4, or at least 5 peptide motifs selected from the group consisting of SEQ ID NO 5700-5717. In some embodiments, the endonuclease comprises a sequence that is at least 70%, 80%, or 90% identical to a sequence selected from SEQ ID NO 1756 or SEQ ID NO 1756-1826. In some embodiments, the guide RNA structure comprises a sequence that is at least 70%, 80%, or 90% identical to SEQ ID NO 5475 or SEQ ID NO 5511. In some embodiments, the endonuclease is configured to bind to a PAM comprising SEQ ID NO 5526. In some embodiments: (a) the endonuclease comprises a sequence that is at least 70%, 80%, or 90% identical to SEQ ID NO 3569; (b) the guide RNA structure comprises a sequence that is at least 70%, 80%, or 90% identical to SEQ ID NO 5475 or SEQ ID NO 5511; and (c) the endonuclease is configured to bind to a PAM comprising SEQ ID NO: 5526. In some embodiments, sequence identity is determined by BLASTP, CLUSTALW, MUSCLE, MAFFT, or Smith-Waterman homology search algorithms. In some embodiments, sequence identity is determined by the BLASTP homology search algorithm adjusted using a word length (W) of 3, a expectation (E) of 10, and a BLOSUM62 scoring matrix with a penalty of 11 for the presence of a set gap, extended to 1, and using a conditional composition scoring matrix.

In some aspects, the present disclosure provides an engineered guide ribonucleic acid polynucleotide comprising: (a) a DNA targeting segment comprising a nucleotide sequence complementary to a target sequence in a target DNA molecule; and (b) a protein-binding segment comprising two complementary stretches of nucleotides that hybridize to form a double-stranded RNA (dsRNA) duplex, wherein the two complementary stretches of nucleotides are covalently linked to each other by intervening nucleotides, and wherein the engineered guide ribonucleic acid polynucleotide is configured to form a complex with an endonuclease comprising a RuvC _ III domain having at least 75% sequence identity to any one of SEQ ID NO:1827-3637 and to target the complex to the target sequence of the target DNA molecule. In some embodiments, the DNA targeting segment is located 5' to the two complementary nucleotide stretches.

In some embodiments: (a) the protein binding segment comprises a sequence having at least 70%, at least 80% or at least 90% identity to a sequence selected from the group consisting of SEQ ID NO 5476-5479 or SEQ ID NO 5476-5489; (b) the protein binding segment comprises a sequence having at least 70%, at least 80% or at least 90% identity to a sequence selected from (SEQ ID NO:5490-5491 or SEQ ID NO:5490-5494) and SEQ ID NO: 5538; (c) the protein binding segment comprises a sequence having at least 70%, at least 80% or at least 90% identity to a sequence selected from the group consisting of SEQ ID NO 5498-5499; (d) the protein binding segment comprises a sequence having at least 70%, at least 80% or at least 90% identity to a sequence selected from the group consisting of SEQ ID NO 5495-5497 and SEQ ID NO 5500-5502; (e) the protein binding segment comprises a sequence having at least 70%, at least 80%, or at least 90% identity to SEQ ID No. 5503; (f) the protein binding segment comprises a sequence having at least 70%, at least 80%, or at least 90% identity to SEQ ID No. 5504; (g) the protein binding segment comprises a sequence having at least 70%, at least 80%, or at least 90% identity to SEQ ID No. 5505; (h) the protein binding segment comprises a sequence having at least 70%, at least 80%, or at least 90% identity to SEQ ID No. 5506; (i) the protein binding segment comprises a sequence having at least 70%, at least 80%, or at least 90% identity to SEQ ID No. 5507; (j) the protein binding segment comprises a sequence having at least 70%, at least 80%, or at least 90% identity to SEQ ID No. 5508; (k) the protein binding segment comprises a sequence having at least 70%, at least 80%, or at least 90% identity to SEQ ID No. 5509; (l) The protein binding segment comprises a sequence having at least 70%, at least 80%, or at least 90% identity to SEQ ID No. 5510; or (m) the protein binding segment comprises a sequence having at least 70%, at least 80%, or at least 90% identity to SEQ ID NO: 5511.

In some embodiments: (a) the guide ribonucleic acid polynucleotide comprises an RNA sequence comprising a hairpin comprising a stem and a loop, wherein the stem comprises at least 10, at least 12, or at least 14 base-paired ribonucleotides and an asymmetric bulge within 4 base pairs of the loop; (b) the guide ribonucleic acid polynucleotide comprises a tracr ribonucleic acid sequence predicted to comprise a hairpin comprising at least 8, at least 10, or at least 12 base-paired ribonucleotides; (c) the guide RNA structure comprises a guide ribonucleic acid sequence predicted to comprise a hairpin with an uninterrupted base-pairing region, the guide ribonucleic acid structure comprising a guide ribonucleic acid sequence of at least 8 nucleotides and a tracr ribonucleic acid sequence of at least 8 nucleotides, and wherein the tracr ribonucleic acid sequence from 5 'to 3' comprises a first hairpin and a second hairpin, the first hairpin having a longer stem than the second hairpin; or (d) the guide ribonucleotide polynucleotide comprises a tracr ribonucleotide sequence that is predicted to comprise at least two hairpins comprising fewer than 5 base-paired ribonucleotides.

In some aspects, the present disclosure provides a deoxyribonucleic acid polynucleotide encoding any of the engineered guide ribonucleic acid polynucleotides described herein.

In some aspects, the disclosure provides a nucleic acid comprising an engineered nucleic acid sequence optimized for expression in an organism, wherein the nucleic acid encodes a class 2 type II Cas endonuclease comprising a RuvC III domain and an HNH domain, and wherein the endonuclease is derived from an uncultured microorganism.

In some aspects, the present disclosure provides a nucleic acid comprising an engineered nucleic acid sequence optimized for expression in an organism, wherein the nucleic acid encodes an endonuclease comprising a RuvC III domain having at least 70% sequence identity to any one of SEQ ID NOs 1827-3637. In some embodiments, the endonuclease comprises an HNH domain that has at least 70% or at least 80% sequence identity to any one of SEQ ID NO 3638-5460. In some embodiments, the endonuclease comprises SEQ ID NO 5572-5591 or a variant having at least 70% sequence identity thereto. In some embodiments, the endonuclease comprises a sequence encoding one or more Nuclear Localization Sequences (NLS) proximal to the N-terminus or C-terminus of the endonuclease. In some embodiments, the NLS comprises a sequence selected from SEQ ID NO 5597-5612.

In some embodiments, the organism is a prokaryotic organism, a bacterial organism, a eukaryotic organism, a fungal organism, a plant organism, a mammalian organism, a rodent organism, or a human. In some embodiments, the organism is escherichia coli, and: (a) the nucleic acid sequence has at least 70%, 80% or 90% identity to a sequence selected from the group consisting of SEQ ID NO 5572-5575; (b) the nucleic acid sequence has at least 70%, 80% or 90% identity to a sequence selected from the group consisting of SEQ ID NO 5576-5577; (c) the nucleic acid sequence has at least 70%, 80% or 90% identity to a sequence selected from SEQ ID NO 5578-5580; (d) the nucleic acid sequence has at least 70%, 80% or 90% identity to SEQ ID NO 5581; (e) the nucleic acid sequence has at least 70%, 80% or 90% identity to SEQ ID NO 5582; (f) the nucleic acid sequence has at least 70%, 80% or 90% identity to SEQ ID NO 5583; (g) the nucleic acid sequence has at least 70%, 80% or 90% identity to SEQ ID NO 5584; (h) the nucleic acid sequence has at least 70%, 80% or 90% identity to SEQ ID NO 5585; (i) the nucleic acid sequence has at least 70%, 80% or 90% identity to SEQ ID NO 5586; or (j) the nucleic acid sequence has at least 70%, 80% or 90% identity to SEQ ID NO: 5587. In some embodiments, the organism is a human, and: (a) the nucleic acid sequence has at least 70%, 80% or 90% identity with SEQ ID NO 5588 or SEQ ID NO 5589; or (b) the nucleic acid sequence has at least 70%, 80% or 90% identity with SEQ ID NO:5590 or SEQ ID NO: 5591.

In some aspects, the present disclosure provides a vector comprising a nucleic acid sequence encoding a class 2 type II Cas endonuclease comprising a RuvC III domain and an HNH domain, wherein the endonuclease is derived from an uncultured microorganism.

In some aspects, the present disclosure provides a vector comprising any of the nucleic acids described herein. In some embodiments, the vector further comprises a nucleic acid encoding an engineered guide ribonucleic acid structure configured to form a complex with the endonuclease, the complex comprising: a) a guide ribonucleic acid sequence configured to hybridize to a target deoxyribonucleic acid sequence; and b) a tracr ribonucleic acid sequence configured to bind to the endonuclease. In some embodiments, the vector is a plasmid, a minicircle, CELiD, an adeno-associated virus (AAV) -derived virion, or a lentivirus.

In some aspects, the present disclosure provides a cell comprising any vector as described herein.

In some aspects, the present disclosure provides a method of making an endonuclease comprising culturing any of the cells as described herein.

In some aspects, the present disclosure provides a method for binding, cleaving, labeling or modifying a double-stranded deoxyribonucleic acid polynucleotide, the method comprising: (a) contacting the double-stranded deoxyribonucleic acid polynucleotide with a type 2 Cas endonuclease complexed with an engineered guide ribonucleic acid structure configured to bind to the endonuclease and the double-stranded deoxyribonucleic acid polynucleotide; (b) wherein the double-stranded deoxyribonucleic acid polynucleotide comprises a protospacer sequence adjacent motif (PAM); and (c) wherein the PAM comprises a sequence selected from the group consisting of SEQ ID NO:5512-5526 or SEQ ID NO: 5527-5537. In some embodiments, a double-stranded deoxyribonucleic acid polynucleotide comprises a first strand comprising a sequence complementary to the sequence of the engineered guide ribonucleic acid structure and a second strand comprising the PAM. In some embodiments, the PAM is directly adjacent to the 3' terminus of the sequence that is complementary to the sequence of the engineered guide ribonucleic acid structure.

In some embodiments, the class 2 type II Cas endonuclease is not a Cas9 endonuclease, a Cas14 endonuclease, a Cas12a endonuclease, a Cas12b endonuclease, a Cas12 c endonuclease, a Cas12d endonuclease, a Cas12e endonuclease, a Cas13a endonuclease, a Cas13b endonuclease, a Cas13c endonuclease, or a Cas13 d endonuclease. In some embodiments, the class 2 type II Cas endonuclease is derived from an uncultured microorganism. In some embodiments, the double-stranded deoxyribonucleic acid polynucleotide is a eukaryotic, plant, fungal, mammalian, rodent, or human double-stranded deoxyribonucleic acid polynucleotide.

In some embodiments: (a) the PAM comprises a sequence selected from the group consisting of SEQ ID NO:5512-5515 and SEQ ID NO: 5527-5530; (b) the PAM comprises SEQ ID NO 5516 or SEQ ID NO 5531; (c) the PAM comprises SEQ ID NO 5539; (d) the PAM comprises SEQ ID NO 5517 or SEQ ID NO 5518; (e) the PAM comprises SEQ ID NO: 5519; (f) the PAM comprises SEQ ID NO of 5520 or SEQ ID NO of 5535; (g) the PAM comprises SEQ ID NO 5521 or SEQ ID NO 5536; (h) the PAM comprises SEQ ID NO 5522; (i) the PAM comprises SEQ ID NO:5523 or SEQ ID NO: 5537; (j) the PAM comprises SEQ ID NO: 5524; (k) the PAM comprises SEQ ID NO 5525; or (l) the PAM comprises SEQ ID NO: 5526.

In some aspects, the disclosure provides a method of modifying a target nucleic acid locus, the method comprising delivering any of the engineered nuclease systems described herein to the target nucleic acid locus, wherein the endonuclease is configured to form a complex with the engineered guide ribonucleic acid structure, and wherein the complex is configured such that upon binding of the complex to the target nucleic acid locus, the complex modifies the target nucleic acid locus. In some embodiments, modifying the target nucleic acid locus comprises binding, nicking, cleaving, or labeling the target nucleic acid locus. In some embodiments, the target nucleic acid locus comprises deoxyribonucleic acid (DNA) or ribonucleic acid (RNA). In some embodiments, the target nucleic acid comprises genomic DNA, viral RNA, or bacterial DNA. In some embodiments, the target nucleic acid locus is in vitro. In some embodiments, the target nucleic acid locus is within a cell. In some embodiments, the cell is a prokaryotic cell, a bacterial cell, a eukaryotic cell, a fungal cell, a plant cell, an animal cell, a mammalian cell, a rodent cell, a primate cell, or a human cell.

In some embodiments, delivering the engineered nuclease system to the target nucleic acid locus comprises delivering the nucleic acid of any one of claims 135-140 or the vector of any one of claims 142-146. In some embodiments, delivering the engineered nuclease system to the target nucleic acid locus comprises delivering a nucleic acid comprising an open reading frame encoding the endonuclease. In some embodiments, the nucleic acid comprises a promoter to which the open reading frame encoding the endonuclease is operably linked. In some embodiments, delivering the engineered nuclease system to the target nucleic acid locus comprises delivering a capped mRNA comprising the open reading frame encoding the endonuclease. In some embodiments, delivering the engineered nuclease system to the target nucleic acid locus comprises delivering a translated polypeptide. In some embodiments, delivering the engineered nuclease system to the target nucleic acid locus comprises delivering deoxyribonucleic acid (DNA) encoding the engineered guide ribonucleic acid structure operably linked to a ribonucleic acid (RNA) pol III promoter. In some embodiments, the endonuclease induces a single strand break or a double strand break at or proximal to the target locus.

Other aspects and advantages of the present disclosure will become apparent to those skilled in the art from the following detailed description, wherein only illustrative embodiments of the present disclosure are shown and described. As will be realized, the disclosure is capable of other and different embodiments, and its several details are capable of modifications in various obvious respects, all without departing from the disclosure. Accordingly, the drawings and description are to be regarded as illustrative in nature, and not as restrictive.

Is incorporated by reference

All publications, patents, and patent applications mentioned in this specification are herein incorporated by reference to the same extent as if each individual publication, patent, or patent application was specifically and individually indicated to be incorporated by reference.

Drawings

The novel features believed characteristic of the invention are set forth with particularity in the appended claims. A better understanding of the features and advantages of the present invention will be obtained by reference to the following detailed description that sets forth illustrative embodiments, in which the principles of the invention are utilized, and the accompanying drawings (also "fig.") of which:

figure 1 depicts typical structures of different classes and types of CRISPR/Cas loci.

Fig. 2 depicts the architecture of a native class 2/type II crRNA/tracrRNA pair, compared to a hybrid sgRNA in which both are linked.

Fig. 3 depicts a schematic showing the structure of a CRISPR locus encoding an enzyme from the MG1 family.

Fig. 4 depicts a schematic showing the structure of a CRISPR locus encoding an enzyme from the MG2 family.

Fig. 5 depicts a schematic showing the structure of a CRISPR locus encoding an enzyme from the MG3 family.

FIG. 6 depicts a structure-based alignment of an enzyme of the present disclosure (MG1-1) with Cas9(SEQ ID NO:5613) from Staphylococcus aureus.

FIG. 7 depicts a structure-based alignment of an enzyme of the present disclosure (MG2-1) with Cas9(SEQ ID NO:5613) from Staphylococcus aureus.

FIG. 8 depicts a structure-based alignment of the enzyme of the present disclosure (MG3-1) with Cas9(SEQ ID NO:5614) from Actinomyces naeslundii.

FIGS. 9A, 9B, 9C, 9D, 9E, 9F, 9G and 9H depict structure-based alignments of MG1 family enzymes MG1-1 to MG1-6(SEQ ID NOS: 5, 6, 9, 1, 2 and 3).

Fig. 10 depicts in vitro cleavage of DNA by MG1-4 and the corresponding sgRNA complexes containing targeting sequences of different lengths.

Fig. 11 depicts cell lysis of e.coli genomic DNA using MG1-4 with its corresponding sgRNA. Shown are cell dilution series transformed with MG1-4 and a target or non-target spacer (top); bottom panels show the quantitative data, with the left bar representing non-target sgrnas and the right bar representing target sgrnas.

Fig. 12 depicts the formation of cell indel markers (indels) generated by transfecting HEK cells with the MG1-4 or MG1-6 constructs described in example 11 along with their corresponding sgrnas comprising various different targeting sequences targeting different locations in the human genome.

Fig. 13 depicts in vitro cleavage of DNA by MG3-6 and the corresponding sgRNA complexes containing targeting sequences of different lengths.

Fig. 14 depicts cell lysis of e.coli genomic DNA using MG3-7 with its corresponding sgRNA. Shown are cell dilution series transformed with MG3-7 and a target or non-target spacer (top); the bottom panel shows the quantitative data, with the left bar representing non-target sgrnas and the right bar representing target sgrnas.

Fig. 15 depicts the formation of cell indel markers generated by transfecting HEK cells with the MG3-7 construct described in example 13 along with its corresponding sgRNA containing various different targeting sequences targeting different locations in the human genome.

Fig. 16 depicts in vitro cleavage of DNA by complexes of MG15-1 and its corresponding sgrnas containing targeting sequences of varying lengths.

Fig. 17, fig. 18, fig. 19 and fig. 20 depict agarose gels showing the results of PAM vector library cleavage in the presence of TXTL extracts containing various MG family nucleases and their corresponding tracrrnas or sgrnas.

Fig. 21, 22, 23, 24, 25, and 26 depict predicted structures of corresponding sgrnas of MG enzymes described herein (e.g., as predicted in example 7).

Fig. 27, 28, 29, 30, 31, 32 and 33 depict seqLogo representations of PAM sequences derived by NGS as described herein (e.g., as described in example 6).

Fig. 34 depicts cell lysis of e.coli genomic DNA using MG2-7 with its corresponding sgRNA. Shown are cell dilution series transformed with MG2-7 and a target or non-target spacer (top); the bottom panel shows the quantitative data, with the right bar representing non-target sgrnas and the left bar representing target sgrnas.

Fig. 35 depicts cell lysis of e.coli genomic DNA using MG14-1 with its corresponding sgRNA. Shown are cell dilution series transformed with MG14-1 and a target or non-target spacer (top); the bottom panel shows the quantitative data, with the right bar representing non-target sgrnas and the left bar representing target sgrnas.

Fig. 36 depicts cell lysis of e.coli genomic DNA using MG15-1 with its corresponding sgRNA. Shown are cell dilution series transformed with MG15-1 and a target or non-target spacer (top); the bottom panel shows the quantitative data, with the right bar representing non-target sgrnas and the left bar representing target sgrnas.

Brief description of the sequence listing

The sequence listing filed herewith provides exemplary polynucleotide and polypeptide sequences for use in the methods, compositions, and systems of the present disclosure. The following is an exemplary description of the sequences therein.

MG1

SEQ ID NO 1-319 show the full-length peptide sequence of MG1 nuclease.

1827-2140 shows the peptide sequence of the RuvC-III domain of the MG1 nuclease described above.

SEQ ID NO 3638-3955 shows the peptides of the HNH domain of the above MG1 nuclease.

SEQ ID NO:5476-5479 shows the nucleotide sequence of MG1 tracrRNA derived from the same locus as the above-mentioned MG1 nuclease (e.g., the same loci as SEQ ID NO:1-4, respectively).

SEQ ID NO:5461-5464 shows the nucleotide sequence of a sgRNA engineered to function with an MG1 nuclease (e.g., SEQ ID NO:1-4, respectively), where N represents the nucleotide of the targeting sequence.

SEQ ID NO:5572-5575 shows the nucleotide sequence of the coding sequence of the E.coli codon-optimized MG1 family enzyme (SEQ ID NO: 1-4).

SEQ ID NO:5588-5589 shows the nucleotide sequence of the coding sequence of the human codon-optimized MG1 family enzyme (SEQ ID NO:1 and SEQ ID NO: 3).

SEQ ID NO 5616-5632 shows the peptide motif characteristic of an MG1 family enzyme.

MG2

320-420 shows the full-length peptide sequence of MG2 nuclease.

2141-2241 shows the peptide sequence of RuvC _ III domain of the MG2 nuclease.

3955-4055 shows the peptides of the HNH domain of the above MG2 nuclease.

SEQ ID NO:5490-5494 shows the nucleotide sequence of MG2 tracrRNA derived from the same locus as the above-described MG2 nuclease (e.g., the same loci as SEQ ID NO:320, 321, 323, 325 and 326, respectively).

SEQ ID NO:5465 shows the nucleotide sequence of sgRNA engineered to function with MG2 nuclease (e.g., SEQ ID NO:321 above).

SEQ ID NO 5572-5575 shows the nucleotide sequence of the coding sequence for the E.coli codon-optimized MG2 family enzyme.

5631-5638 show peptide sequence features of enzymes of the MG2 family.

MG3

421-431 shows the full-length peptide sequence of MG3 nuclease.

SEQ ID NO 2242-2251 shows the peptide sequence of the RuvC _ III domain of the MG3 nuclease described above.

4056-4066 shows the peptides of the HNH domain of the MG3 nuclease described above.

SEQ ID NO:5495-5502 shows the nucleotide sequence of MG3 tracrRNA derived from the same locus as the above-mentioned MG3 nuclease (e.g., the same loci as SEQ ID NO:421-428, respectively).

SEQ ID NO:5466-5467 shows the nucleotide sequence of the sgRNA engineered to function with MG3 nuclease (e.g., SEQ ID NO: 421-423).

SEQ ID NO 5578-5580 shows the nucleotide sequence of the coding sequence for the E.coli codon-optimized MG3 family enzyme.

5639-5648 show peptide sequence features of enzymes of the MG3 family.

MG4

432-660 of the sequence of the full-length peptide of MG4 nuclease is shown in SEQ ID NO.

2253-2481 shows the peptide sequence of RuvC _ III domain of MG4 nuclease described above.

4067-4295 shows the peptides of the HNH domain of the above MG4 nuclease.

5503 shows the nucleotide sequence of MG4tracrRNA derived from the same locus as the above-described MG4 nuclease.

5468 shows the nucleotide sequence of sgRNA engineered to function with MG4 nuclease.

5649 shows the peptide sequence characteristics of an MG4 family enzyme.

MG6

661-668 shows the full-length peptide sequence of MG6 nuclease.

2482-2489 shows the peptide sequence of the RuvC _ III domain of the MG6 nuclease described above.

4296-4303 shows the peptide of the HNH domain of the above MG3 nuclease.

MG7

669-677 shows the full-length peptide sequence of MG7 nuclease.

2490-2498 show the peptide sequence of the RuvC _ III domain of the MG7 nuclease described above.

4304-4312 show the peptides of the HNH domain of the above MG3 nuclease.

5504 shows the nucleotide sequence of MG7 tracrRNA derived from the same locus as the above-described MG7 nuclease.

MG14

678-929 shows the full-length peptide sequence of MG14 nuclease.

2499-2750 shows the peptide sequence of the RuvC _ III domain of MG14 nuclease.

The peptides of the HNH domain of the MG14 nuclease are shown in SEQ ID NO 4313-4564.

5505 shows the nucleotide sequence of MG14 tracrRNA derived from the same locus as the above-described MG14 nuclease.

5581 shows the nucleotide sequence of the coding sequence for an E.coli codon optimised MG14 family enzyme.

SEQ ID NO:5650-5667 shows the peptide sequence characteristics of the MG14 family of enzymes.

MG15

930-1092 shows the full-length peptide sequence of MG15 nuclease.

SEQ ID NO 2751-2913 shows the peptide sequence of RuvC _ III domain of MG15 nuclease.

SEQ ID NO:4565-4727 shows the peptides of the HNH domain of the above MG15 nuclease.

5506 shows the nucleotide sequence of MG15 tracrRNA derived from the same locus as the above-described MG15 nuclease.

SEQ ID No. 5470 shows the nucleotide sequence of sgRNA engineered to function with MG15 nuclease.

5582 shows the nucleotide sequence of the coding sequence for an E.coli codon optimised MG15 family enzyme.

SEQ ID NO 5668-5675 shows peptide sequence characteristics of an MG15 family enzyme.

MG16

1093-1353 shows the full-length peptide sequence of MG16 nuclease.

2914-3174 shows the peptide sequence of RuvC _ III domain of MG16 nuclease.

4728-4988 show the peptides of the HNH domain of the MG16 nuclease described above.

5507 shows the nucleotide sequence of MG16 tracrRNA derived from the same locus as the above-described MG3 nuclease.

SEQ ID No. 5471 shows the nucleotide sequence of sgRNA engineered to function with MG16 nuclease.

5583 shows the nucleotide sequence of the coding sequence for the E.coli codon optimised MG16 family enzyme.

SEQ ID NO 5676-5678 shows the peptide sequence characteristics of the MG16 family of enzymes.

MG18

1354-1511 shows the full-length peptide sequence of MG18 nuclease.

3175-3330 show the peptide sequence of the RuvC _ III domain of the MG18 nuclease.

4989-5146 shows the peptides of the HNH domain of the MG18 nuclease described above.

5508 shows the nucleotide sequence of MG18 tracrRNA derived from the same locus as the above-described MG18 nuclease.

SEQ ID No. 5472 shows the nucleotide sequence of sgRNA engineered to function with MG18 nuclease.

5584 shows the nucleotide sequence of the coding sequence for an E.coli codon optimized MG18 family enzyme.

SEQ ID NO:5679-5686 shows the peptide sequence characteristics of an MG18 family enzyme.

MG21

The full-length peptide sequence of MG21 nuclease is shown in SEQ ID NO: 1512-1655.

SEQ ID NO 3331-3474 shows the peptide sequence of RuvC _ III domain of the MG21 nuclease described above.

SEQ ID NO 5147-5290 shows the peptides of the HNH domain of the above MG21 nuclease.

5509 shows the nucleotide sequence of MG21 tracrRNA derived from the same locus as the above-described MG21 nuclease.

SEQ ID NO 5473 shows a sgRNA nucleotide sequence engineered to function with MG21 nuclease.

5585 shows the nucleotide sequence of the coding sequence for an E.coli codon optimised MG21 family enzyme.

SEQ ID NOS 5687-5692 and 5674-5675 show peptide sequence characteristics of enzymes of the MG21 family.

MG22

The full-length peptide sequence of MG22 nuclease is shown in SEQ ID NO 1656-1755.

SEQ ID NO 3475-3568 shows the peptide sequence of RuvC _ III domain of the MG22 nuclease described above.

SEQ ID NO 5291-5389 shows the peptides of the HNH domain of the above MG22 nuclease.

5510 shows the nucleotide sequence of MG22 tracrRNA derived from the same locus as the MG22 nuclease described above.

SEQ ID No. 5474 shows the nucleotide sequence of sgRNA engineered to function with MG22 nuclease.

5586 shows the nucleotide sequence of the coding sequence for an E.coli codon optimised MG22 family enzyme.

SEQ ID NO:5694-5699 shows the peptide sequence characteristics of an MG22 family enzyme.

MG23

SEQ ID NO. 1756-1826 shows the full-length peptide sequence of MG23 nuclease.

SEQ ID NO 3569-3637 shows the peptide sequence of RuvC _ III domain of the MG23 nuclease.

5390-5460 shows the peptides of the HNH domain of the above-mentioned MG23 nuclease.

5511 shows the nucleotide sequence of MG23 tracrRNA derived from the same locus as the G23 nuclease described above.

SEQ ID No. 5475 shows the nucleotide sequence of sgRNA engineered to function with MG23 nuclease.

5587 shows the nucleotide sequence of the codon-optimized coding sequence of E.coli for an enzyme of the MG23 family.

5700-5717 show the peptide sequence features of an MG23 family enzyme.

Detailed Description

While various embodiments of the present invention have been shown and described herein, it will be obvious to those skilled in the art that such embodiments are provided by way of example only. Numerous variations, changes, and substitutions will now occur to those skilled in the art without departing from the invention. It should be understood that various alternatives to the embodiments of the invention described herein may be employed.

Unless otherwise indicated, practice of some of the methods disclosed herein uses techniques of immunology, biochemistry, chemistry, molecular biology, microbiology, cell biology, genomics, and recombinant DNA. See, e.g., Sambrook and Green, Molecular Cloning: A Laboratory Manual, 4 th edition (2012); the series Current Protocols in Molecular Biology (edited by F.M. Ausubel et al); the series Methods In Enzymology (Academic Press, Inc.), PCR 2: A Practical Approach (M.J. MacPherson, B.D. Hames and G.R. Taylor editor (1995)), Harlow and Lane editor (1988) Antibodies, A Laboratory Manual, and Culture of Animal Cells: A Manual of Basic technology and Specialized Applications, 6 th edition (R.I. Freeney editor (2010)), incorporated herein by reference In its entirety.

As used herein, the singular forms "a", "an" and "the" are intended to include the plural forms as well, unless the context clearly indicates otherwise. Furthermore, if the terms "including," having, "" with, "or variants thereof are used in the detailed description and/or claims, these terms are intended to be inclusive in a manner similar to the term" comprising.

The term "about" or "approximately" means that the particular value determined by one of ordinary skill in the art is within an acceptable error range, which will depend in part on how the value is determined or determined, i.e., limited by the measurement system. For example, "about" can mean within one or more than one standard deviation, according to practice in the art. Alternatively, "about" may represent a range of up to 20%, up to 15%, up to 10%, up to 5%, or up to 1% of a given value.

As used herein, "cell" generally refers to a biological cell. A cell may be the basic structural, functional and/or biological unit of a living organism. The cells may be derived from any organism having one or more cells. Some non-limiting examples include: prokaryotic cells, eukaryotic cells, bacterial cells, archaeal cells, cells of unicellular eukaryotes, protozoal cells, cells from plants (e.g., cells from plant crops, fruits, vegetables, cereals, soybeans, corn, maize, wheat, seeds, tomatoes, rice, cassava, sugarcane, pumpkin, hay, potatoes, cotton, hemp, tobacco, flowering plants, conifer, gymnosperms, ferns, lycopodaceae, hornworts, liverworts, mosses), algal cells (e.g., Botryococcus braunii, Chlamydomonas reinhardtii, Nannochloropsis gaditana, pyrenoidosa, Sargassum pratense, c Echinoderm, nematode, etc.), cells from vertebrates (e.g., fish, amphibians, reptiles, birds, mammals), cells from mammals (e.g., pigs, cows, goats, sheep, rodents, rats, mice, non-human primates, humans, etc.), and the like. Sometimes cells are not derived from a natural organism (e.g., cells may be synthetic, sometimes referred to as artificial cells).

As used herein, the term "nucleotide" generally refers to a base-sugar-phosphate combination. The nucleotides may include synthetic nucleotides. The nucleotide may comprise a synthetic nucleotide analog. Nucleotides can be monomeric units of a nucleic acid sequence (e.g., deoxyribonucleic acid (DNA) and ribonucleic acid (RNA)). The term nucleotide may include ribonucleoside triphosphates, Adenosine Triphosphate (ATP), Uridine Triphosphate (UTP), Cytosine Triphosphate (CTP), Guanosine Triphosphate (GTP) and deoxyribonucleoside triphosphates, such as dATP, dCTP, dITP, dUTP, dGTP, dTTP or derivatives thereof. Such derivatives may include, for example, [ α S ] dATP, 7-deaza-dGTP and 7-deaza-dATP, as well as nucleotide derivatives that confer nuclease resistance to nucleic acid molecules containing them. As used herein, the term nucleotide may refer to dideoxyribonucleoside triphosphates (ddntps) and derivatives thereof. Illustrative examples of dideoxyribonucleoside triphosphates can include, but are not limited to, ddATP, ddCTP, ddGTP, ddITP, and ddTTP. The nucleotides may be unlabeled or detectably labeled, for example using a moiety comprising an optically detectable moiety (e.g., a fluorophore). It is also possible to label with quantum dots. Detectable labels may include, for example, radioisotopes, fluorescent labels, chemiluminescent labels, bioluminescent labels, and enzyme labels. Fluorescent labels for nucleotides may include, but are not limited to, fluorescein, 5-carboxyfluorescein (FAM), 2'7' -dimethoxy-4 '5-dichloro-6-carboxyfluorescein (JOE), rhodamine, 6-carboxyrhodamine (R6G), N' -tetramethyl-6-carboxyrhodamine (TAMRA), 6-carboxy-X-Rhodamine (ROX), 4- (4 'dimethylaminophenylazo) benzoic acid (DABCYL), cascade blue, oregon green, texas red, cyanine, and 5- (2' -aminoethyl) aminonaphthalene-1-sulfonic acid (EDANS). Specific examples of fluorescently labeled nucleotides can include [ R6G ] dUTP, [ TAMRA ] dUTP, [ R110] dCTP, [ R6G ] dCTP, [ TAMRA ] dCTP, [ JOE ] ddATP, [ R6G ] ddATP, [ FAM ] ddCTP, [ R110] ddCTP, [ TAMRA ] ddGTP, [ ROX ] ddTTP, [ dR6G ] ddATP, [ dR110] ddCTP, [ AMdTRA ] ddGTP and [ dROX ] ddTTP (available from Perkin Elmer, Foster City, Calif); FluoroLink deoxynucleotides, FluoroLink Cy3-dCTP, FluoroLink Cy5-dCTP, FluoroLink FluorX-dCTP, FluoroLink Cy3-dUTP, and FluoroLink Cy5-dUTP (available from Amersham, Arlington Heights, Ill.); fluorescein-15-dATP, fluorescein-12-dUTP, tetramethyl-rhodamine-6-dUTP, IR770-9-dATP, fluorescein-12-ddUTP, fluorescein-12-UTP, and fluorescein-15-2' -dATP (available from Boehringer Mannheim, Indianapolis, Ind.); and chromosome-labeled nucleotides, BODIPY-FL-14-UTP, BODIPY-FL-4-UTP, BODIPY-TMR-14-dUTP, BODIPY-TR-14-UTP, BODIPY-TR-14-dUTP, cascade blue-7-UTP, cascade blue-7-dUTP, fluorescein-12-UTP, fluorescein-12-dUTP, Oregon green 488-5-dUTP, rhodamine green-5-UTP, rhodamine green-5-dUTP, tetramethylrhodamine-6-UTP, tetramethylrhodamine-6-dUTP, Texas described above, Texas red-5-UTP, Texas red-5-dUTP, and Texas red-12-dUTP (available from Molecular Probes, eugene, oreg). Nucleotides may also be labeled or tagged by chemical modification. The chemically modified mononucleotide may be biotin-dNTP. Some non-limiting examples of biotinylated dNTPs can include biotin-dATP (e.g., bio-N6-ddATP, biotin-14-dATP), biotin-dCTP (e.g., biotin-11-dCTP, biotin-14-dCTP), and biotin-dUTP (e.g., biotin-11-dUTP, biotin-16-dUTP, biotin-20-dUTP).

The terms "polynucleotide", "oligonucleotide" and "nucleic acid" are used interchangeably and generally refer to a polymeric form of nucleotides of any length, either deoxyribonucleotides or ribonucleotides or analogs thereof, whether in single-stranded, double-stranded or multi-stranded form. The polynucleotide may be exogenous or endogenous to the cell. The polynucleotide may be present in a cell-free environment. The polynucleotide may be a gene or a fragment thereof. The polynucleotide may be DNA. The polynucleotide may be RNA. The polynucleotide may have any three-dimensional structure and may perform any function. The polynucleotide may comprise one or more analogs (e.g., altered backbone, sugar, or nucleobases). Modifications to the nucleotide structure, if present, may be imparted before or after assembly of the polymer. Some non-limiting examples of analogs include: 5-bromouracil, peptide nucleic acids, xeno nucleic acids, morpholino compounds, locked nucleic acids, diol nucleic acids, threose nucleic acids, dideoxynucleotides, cordycepin, 7-deaza-GTP, fluorophores (e.g., rhodamine or fluorescein linked to a sugar), thiol-containing nucleotides, biotin-linked nucleotides, fluorescent base analogs, CpG islands, methyl-7-guanosine, methylated nucleotides, inosine, thiouridine, pseudouridine, dihydrouridine, stevioside, and tetanoside. Non-limiting examples of polynucleotides include coding or non-coding regions of a gene or gene fragment, loci defined by linkage analysis, exons, introns, messenger RNA (mrna), transfer RNA (trna), ribosomal RNA (rrna), short interfering RNA (sirna), short hairpin RNA (shrna), micro-RNA (mirna), ribozymes, cDNA, recombinant polynucleotides, branched polynucleotides, plasmids, vectors, isolated DNA of any sequence, isolated RNA of any sequence, cell-free polynucleotides (including cell-free DNA (cfdna), and cell-free RNA (cfrna), nucleic acid probes, and primers. The sequence of nucleotides may be interrupted by non-nucleotide components.

The term "transfection" or "transfected" generally refers to the introduction of nucleic acids into cells by non-viral or virus-based methods. The nucleic acid molecule may be a gene sequence encoding the complete protein or a functional part thereof. See, e.g., Sambrook et al, 1989, Molecular Cloning: A Laboratory Manual, 18.1-18.88.

The terms "peptide", "polypeptide" and "protein" are used interchangeably herein to generally refer to a polymer of at least two amino acid residues joined by one or more peptide bonds. The term does not imply a polymer of a particular length, nor is it intended to imply or distinguish whether the peptide is produced using recombinant techniques, chemical or enzymatic synthesis, or naturally occurring. The term applies to naturally occurring amino acid polymers as well as amino acid polymers comprising at least one modified amino acid. In some cases, the polymer may be interrupted by non-amino acids. The term includes amino acid chains of any length, including full-length proteins and proteins with or without secondary and/or tertiary structures (e.g., domains). The term also includes amino acid polymers that have been modified, for example, by disulfide bond formation, glycosylation, lipidation, acetylation, phosphorylation, oxidation, and any other manipulation (e.g., conjugation to a labeling component). As used herein, the term "amino acid" generally refers to natural and unnatural amino acids, including, but not limited to, modified amino acids and amino acid analogs. Modified amino acids may include natural amino acids and unnatural amino acids that have been chemically modified to include groups or chemical moieties on the amino acid that do not naturally occur. Amino acid analogs can refer to amino acid derivatives. The term "amino acid" includes D-amino acids and L-amino acids.

As used herein, "non-native" may generally refer to nucleic acid or polypeptide sequences not found in a native nucleic acid or protein. Non-naturally may refer to an affinity tag. Non-natural may refer to fusion. Non-natural may refer to a naturally occurring nucleic acid or polypeptide sequence comprising mutations, insertions, and/or deletions. The non-native sequence may exhibit and/or encode an activity (e.g., an enzymatic activity, a methyltransferase activity, an acetyltransferase activity, a kinase activity, an ubiquitination activity, etc.) that may also be exhibited by a nucleic acid and/or polypeptide sequence fused to the non-native sequence. Non-native nucleic acid or polypeptide sequences may be joined to a naturally occurring nucleic acid or polypeptide sequence (or variant thereof) by genetic engineering to produce a chimeric nucleic acid and/or a polypeptide sequence encoding a chimeric nucleic acid and/or polypeptide.

As used herein, the term "promoter" generally refers to a region of regulatory DNA that controls the transcription or expression of a gene, and which may be located near or overlap with the nucleotide or region of nucleotides at which RNA transcription is initiated. Promoters may comprise specific DNA sequences that bind protein factors, commonly referred to as transcription factors, which facilitate the binding of RNA polymerase to the DNA that causes transcription of a gene. A "basal promoter," also referred to as a "core promoter," can generally refer to a promoter that contains all the essential elements necessary to promote transcriptional expression of an operably linked polynucleotide. Eukaryotic basal promoters typically (but not necessarily) comprise TATA boxes and/or CAAT boxes.

As used herein, the term "expression" generally refers to the process of transcription (e.g., transcription into mRNA or other RNA transcript) of a nucleic acid sequence or polynucleotide from a DNA template and/or the subsequent translation of the transcribed mRNA into a peptide, polypeptide, or protein. The transcripts and encoded polypeptides may be collectively referred to as "gene products". If the polynucleotide is derived from genomic DNA, expression may include splicing of the mRNA in a eukaryotic cell.

As used herein, "operably linked," "operably linked," or grammatical equivalents thereof, generally refers to the juxtaposition of genetic elements, e.g., promoters, enhancers, polyadenylation sequences, and the like, wherein the elements are in a relationship permitting them to function in the intended manner. For example, a regulatory element comprising a promoter and/or enhancer sequence is operably linked to a coding region if the regulatory element contributes to the initiation of transcription of the coding sequence. Intermediate residues may be present between the regulatory element and the coding region, so long as this functional relationship is maintained.

As used herein, "vector" generally refers to a macromolecule or macromolecular association that comprises or is associated with a polynucleotide and that can be used to mediate the delivery of the polynucleotide to a cell. Examples of vectors include plasmids, viral vectors, liposomes, and other gene delivery vehicles. Vectors typically comprise genetic elements, such as regulatory elements, operably linked to a gene to facilitate expression of the gene in a target.

As used herein, "expression cassette" and "nucleic acid cassette" are used interchangeably and generally refer to a combination of nucleic acid sequences or elements that are expressed together or operably linked for expression. In some cases, an expression cassette refers to a combination of regulatory elements and one or more genes to which they are operably linked for expression.

"functional fragments" of a DNA or protein sequence generally refer to fragments that retain a biological activity (functional or structural) substantially similar to the biological activity of the full-length DNA or protein sequence. The biological activity of a DNA sequence may be its ability to affect expression in a manner known to be due to the full-length sequence.

As used herein, an "engineered" object generally means that the object has been modified by human intervention. According to a non-limiting example: nucleic acids can be modified by changing their sequence to a sequence that does not occur in nature; nucleic acids can be modified by ligating them to nucleic acids to which they do not bind in nature, such that the ligation product has a function not present in the original nucleic acid; engineered nucleic acids can be synthesized in vitro using sequences that do not occur in nature; proteins can be modified by changing their amino acid sequence to a sequence that does not occur in nature; the engineered proteins may acquire new functions or properties. An "engineered" system comprises at least one engineered component.

As used herein, "synthetic" and "artificial" are used interchangeably to refer to a protein or domain thereof having low sequence identity (e.g., less than 50% sequence identity, less than 25% sequence identity, less than 10% sequence identity, less than 5% sequence identity, less than 1% sequence identity) to a naturally occurring human protein. For example, the VPR and VP64 domains are synthetic transactivation domains.

As used herein, the term "tracrRNA" or "tracr sequence" may generally refer to a nucleic acid having at least about 5%, 10%, 20%, 30%, 40%, 50%, 60%, 70%, 80%, 90%, 95%, or 100% sequence identity and/or sequence similarity to a wild-type exemplary tracrRNA sequence (e.g., a tracrRNA from S.pyogenes, S.aureus, etc. or SEQ ID NO: 5476-5511). A tracrRNA can refer to a nucleic acid having at most about 5%, 10%, 20%, 30%, 40%, 50%, 60%, 70%, 80%, 90%, or 100% sequence identity and/or sequence similarity to a wild-type exemplary tracrRNA sequence (e.g., a tracrRNA from streptococcus pyogenes staphylococcus aureus, etc.). tracrRNA may refer to a modified form of tracrRNA that may comprise nucleotide changes, such as deletions, insertions or substitutions, variants, mutations or chimeras. A tracrRNA can refer to a nucleic acid that has at least about 60% identity over a stretch of at least 6 consecutive nucleotides to a wild-type exemplary tracrRNA (e.g., a tracrRNA from streptococcus pyogenes staphylococcus aureus, etc.). For example, the tracrRNA sequence may be at least about 60% identical, at least about 65% identical, at least about 70% identical, at least about 75% identical, at least about 80% identical, at least about 85% identical, at least about 90% identical, at least about 95% identical, at least about 98% identical, at least about 99% identical, or 100% identical over a stretch of at least 6 consecutive nucleotides to a wild-type exemplary tracrRNA (e.g., a tracrRNA from streptococcus pyogenes staphylococcus aureus, etc.). Type II tracrRNA sequences can predict genomic sequences by recognizing regions complementary to partial repeats in adjacent CRISPR arrays.

As used herein, "guide nucleic acid" may generally refer to a nucleic acid that is hybridizable to another nucleic acid. The guide nucleic acid may be RNA. The guide nucleic acid may be DNA. The guide nucleic acid may be programmed to bind the nucleic acid sequence site-specifically. The nucleic acid or target nucleic acid to be targeted may comprise nucleotides. The guide nucleic acid may comprise nucleotides. A portion of the target nucleic acid can be complementary to a portion of the guide nucleic acid. The strand of the double-stranded target polynucleotide that is complementary to and hybridizes to the guide nucleic acid may be referred to as the complementary strand. The strand of the double-stranded target polynucleotide that is complementary to the complementary strand, and therefore may not be complementary to the guide nucleic acid, may be referred to as the non-complementary strand. A guide nucleic acid may comprise a polynucleotide chain and may be referred to as a "single guide nucleic acid". A guide nucleic acid may comprise two polynucleotide strands and may be referred to as a "dual guide nucleic acid". The term "guide nucleic acid" may be inclusive, referring to both single and double guide nucleic acids, if not otherwise specified. The guide nucleic acid may comprise a segment that may be referred to as a "nucleic acid targeting segment" or a "nucleic acid targeting sequence". The nucleic acid targeting segment can comprise a sub-segment that can be referred to as a "protein binding segment" or a "protein binding sequence" or a "Cas protein binding segment".

The term "sequence identity" or "percent identity" in the context of two or more nucleic acid or polypeptide sequences generally refers to two (e.g., in aligned pairs) or more (e.g., in multiple sequence alignments) sequences that are identical or have a specified percentage of amino acid residues or nucleotides that are identical, as measured using a sequence comparison algorithm, when compared and aligned over a local or global comparison window to obtain maximum correspondence. Suitable sequence comparison algorithms for polypeptide sequences include, for example, BLASTP (using the parameters of word length (W) of 3, expectation (E) of 10, and BLOSUM62 scoring matrix (setting gap penalties (when present) of 11, extending to 1) and adjusted using conditional type composition scoring matrices for polypeptide sequences greater than 30 residues in length); BLASTP using a word length of 2 (W), parameters of 1000000 for expect (E), and a PAM30 scoring matrix (for sequences less than 30 residues, a gap penalty of 9 for open gaps and 1 for extended gaps) are set (these are the default parameters of BLASTP in the BLAST suite available from https:// BLAST. ncbi. nlm. nih. gov); CLUSTALW with parameters of; the Smith-Waterman homology search algorithm, with the following parameters: match is 2, mismatch is-1, gap is-1; MUSCLE with default parameters; MAFFT, with the following parameters: the heavy tree (tree) is 2, and the maximum iteration number is 1000; novafold with default parameters; HMMER hmmulign with default parameters.

As used herein, the term "RuvC III domain" generally refers to the third discontinuous segment of a RuvC endonuclease domain (the RuvC nuclease domain consists of three discontinuous segments RuvC I, RuvC _ II and RuvC III). RuvC domains or segments thereof can generally be identified by alignment with known domain sequences, structural alignment with proteins having annotated domains, or by comparison with Hidden Markov Models (HMMs) constructed based on known domain sequences (e.g., for RuvC _ III Pfam HMM PF 18541).

As used herein, the term "HNH domain" generally refers to the endonuclease domain having characteristic histidine and asparagine residues. HNH domains can generally be identified by alignment with known domain sequences, structural alignment with proteins with annotated domains, or by comparison with Hidden Markov Models (HMMs) constructed based on known domain sequences (e.g., Pfam HMM PF01844 for domain HNH).

SUMMARY

The discovery of novel Cas enzymes with unique functions and structures may offer the potential to further disrupt deoxyribonucleic acid (DNA) editing technology, increase speed, specificity, functionality, and ease of use. There are relatively few functionally characterized CRISPR/Cas enzymes in the literature, relative to the predicted prevalence of Clustered Regularly Interspaced Short Palindromic Repeats (CRISPR) systems in microorganisms and the absolute diversity of microbial species. This is due in part to the fact that large numbers of microbial species may not be readily cultured under laboratory conditions. Metagenomic sequencing from natural environment niches representing a large number of microbial species has the potential to greatly increase the number of known new CRISPR/Cas systems and accelerate the discovery of new oligonucleotide editing functions. The CasX/CasY CRISPR system was discovered in 2016 from metagenomic analysis of natural microbial communities, demonstrating a recent example of the success of this approach.

CRISPR/Cas systems are RNA-guided nuclease complexes that have been described for use as an adaptive immune system in microorganisms. In its natural environment, the CRISPR/Cas system occurs in a CRISPR (clustered regularly interspaced short palindromic repeats) operon or locus, typically comprising two parts: (i) a series of short repeated sequences (30-40bp) separated by equally short spacer sequences, which encode RNA-based targeting elements; and (ii) a Cas-encoding ORF encoding a nuclease polypeptide guided by the RNA-based targeting element and an accessory protein/enzyme. Efficient nuclease targeting of a particular target nucleic acid sequence typically requires (i) complementary hybridization between the first 6-8 nucleic acids of the target (target seed) and the crRNA guide; and (ii) the presence of a Protospacer Adjacent Motif (PAM) sequence within a defined vicinity of the target seed (PAM is typically a sequence not commonly found in the host genome). CRISPR-Cas systems are generally divided into classes 2, 5 types and 16 subtypes according to the exact function and organization of the system, based on shared functional features and evolutionary similarities.

Class I CRISPR-Cas systems have large multi-subunit effector complexes, including I, III and type IV.

With respect to components, the type I CRISPR-Cas system is considered to be of moderate complexity. In a type I CRISPR-Cas system, an array of RNA targeting elements is transcribed into long precursor crrnas (pre-crrnas), which are processed at the repeat elements to release short, mature crrnas that direct the nuclease complex to the nucleic acid target when it is followed by a suitable short consensus sequence called a prepro-spacer adjacent motif (PAM). This processing occurs through the endoribonuclease subunit (Cas6) of a large endonuclease complex called a cascade, which also contains the nuclease (Cas3) protein component of the crRNA-guided nuclease complex. CasI nucleases function primarily as DNA nucleases.

The type III CRISPR system may be characterized by the presence of a central nuclease called Cas10, and a repeat-associated mysterious protein (RAMP) comprising Csm or Cmr protein subunits. Like the type I system, mature crRNA is processed from pre-crRNA using Cas 6-like enzymes. Unlike type I and type II systems, type III systems appear to target and cleave DNA-RNA duplexes (e.g., DNA strands that serve as templates for RNA polymerase).

The type IV CRISPR-Cas system has an effector complex consisting of: two genes, in some cases, genes of predicted small subunits, of the RAMP proteins of the highly reduced large subunit nuclease (csf1), Cas5(csf3), and Cas7(csf2) groups; such systems are commonly found on endogenous plasmids.

Class II CRISPR-Cas systems typically have single polypeptide multidomain nuclease effectors and include types II, V and VI.

The type II CRISPR-Cas system is considered the simplest in terms of components. In type II CRISPR-Cas systems, processing of CRISPR arrays into mature crrnas requires no special endonuclease subunits, but rather a small trans-coding crRNA (tracrrna) with regions complementary to the array repeats; the tracrRNA interacts with its corresponding effector nuclease (e.g., Cas9) and repeat sequence to form a precursor dsRNA structure that is cleaved by endogenous RNAse III to produce a mature effector enzyme loaded with tracrRNA and crRNA. Cas II nucleases are referred to as DNA nucleases. Type 2 effectors typically exhibit a structure consisting of: a RuvC-like endonuclease domain that folds with RNaseH and an unrelated HNH nuclease domain inserted within the fold of the RuvC-like nuclease domain. The RuvC-like domain is responsible for cleaving the target (e.g., crRNA complement) DNA strand, while the HNH domain is responsible for cleaving the displaced DNA strand.

A feature of the V-type CRISPR-Cas system is that the nuclease effector (e.g., Cas12) structure is similar to the structure of the II/effector (comprising RuvC-like domains). Similar to type II, most (but not all) type V CRISPR systems use tracrRNA to process pre-crRNA into mature crRNA, however, unlike type II systems that require RNAse III to cleave pre-crRNA into multiple crrnas, type V systems are capable of cleaving pre-crRNA using effector nucleases themselves. Like the type II CRISPR-Cas system, the type V CRISPR-Cas system is also referred to as a DNA nuclease. Unlike type II CRISPR-Cas systems, some type V enzymes (e.g., Cas12a) appear to have robust single-stranded non-specific deoxyribonuclease activity that is activated by the first crRNA-directed cleavage of a double-stranded target sequence.

Type VI CRIPSR-Cas system has an RNA-guided RNA endonuclease. The single polypeptide effector of a type VI system (e.g., Cas13) comprises two HEPN ribonuclease domains, rather than a RuvC-like domain. Unlike type II and V systems, type VI systems also do not appear to require tracrRNA to process pre-crRNA into crRNA. However, similar to the type V system, some type VI systems (e.g., C2C2) appear to have robust single-stranded non-specific nuclease (ribonuclease) activity that is activated by the first crRNA-directed cleavage of the target RNA.

Due to its simpler architecture, class II CRISPR-Cas has been most widely used for engineering and development into designer nuclease/genome editing applications.

One of the early adaptations of this system for use in vitro can be found in Jinek et al (science.2012, 8.17; 337(6096):816-21, which is incorporated herein by reference in its entirety). The Jinek study first described a system involving (i) a recombinantly expressed, purified full-length Cas9 (e.g., class II, type II Cas enzymes) isolated from streptococcus pyogenes SF370, (II) a purified mature about 42nt crRNA with about 20nt 5 'sequence complementary to the target DNA sequence to be cleaved, followed by a 3' tracr binding sequence (the entire crRNA was transcribed in vitro from a synthetic DNA template carrying the T7 promoter sequence); (iii) (iii) purified tracrRNA transcribed in vitro from a synthetic DNA template carrying the T7 promoter sequence, and (iv) Mg2 +. Jinek later described an improved engineered system in which the crRNA of (ii) was linked to the 5' end of (iii) by a linker (e.g. GAAA) to form a single fused synthetic guide rna (sgrna) capable of directing Cas9 to the target by itself (compare upper and lower panels of fig. 2).

Mali et al (science.2013, 2/15; 339(6121): 823-: (i) an ORF encoding a codon optimized Cas9 (e.g., class II, type II Cas enzyme) under a suitable mammalian promoter, with a C-terminal nuclear localization sequence (e.g., SV40 NLS) and a suitable polyadenylation signal (e.g., TK pA signal); and (ii) an ORF encoding a sgRNA (with a 5 'sequence beginning with G followed by a 20nt complementary target nucleic acid sequence linked to a 3' tracr binding sequence, a linker, and a tracrRNA sequence) under a suitable polymerase III promoter (e.g., the U6 promoter).

MG1 enzyme

In one aspect, the present disclosure provides an engineered nuclease system comprising (a) an endonuclease. In some cases, the endonuclease is a Cas endonuclease. In some cases, the endonuclease is a type II, class II Cas endonuclease. The endonuclease can comprise a RuvC _ III domain, wherein the RuvC _ III domain has at least about 70% sequence identity to any one of SEQ ID NO 1827-2140. In some cases, an endonuclease can comprise a RuvC _ III domain, wherein the RuvC _ III domain has at least about 20%, at least about 25%, at least about 30%, at least about 35%, at least about 40%, at least about 45%, at least about 50%, at least about 55%, at least about 60%, at least about 65%, at least about 70%, at least about 75%, at least about 80%, at least about 85%, at least about 90%, at least about 91%, at least about 92%, at least about 93%, at least about 94%, at least about 95%, at least about 96%, at least about 97%, at least about 98%, at least about 99% identity to any one of SEQ ID NO 1827-2140. In some cases, the endonuclease can comprise a RuvC _ III domain that is substantially identical to any one of SEQ ID NOs 1827-2140. The endonuclease may comprise a RuvC _ III domain having at least about 70% sequence identity to any one of SEQ ID NO 1827-1831. In some cases, the endonuclease can comprise a RuvC III domain that is at least about 20%, at least about 25%, at least about 30%, at least about 35%, at least about 40%, at least about 45%, at least about 50%, at least about 55%, at least about 60%, at least about 65%, at least about 70%, at least about 75%, at least about 80%, at least about 85%, at least about 90%, at least about 91%, at least about 92%, at least about 93%, at least about 94%, at least about 95%, at least about 96%, at least about 97%, at least about 98%, at least about 99% identical to any of SEQ ID NOs 1827-1831. In some cases, the endonuclease can comprise a RuvC _ III domain that is substantially identical to any one of SEQ ID NO 1827-1831. In some cases, an endonuclease can comprise a RuvC III domain that is at least about 70%, at least about 75%, at least about 80%, at least about 85%, at least about 90%, at least about 91%, at least about 92%, at least about 93%, at least about 94%, at least about 95%, at least about 96%, at least about 97%, at least about 98%, at least about 99% identical to SEQ ID No. 1827. In some cases, the endonuclease can comprise a RuvC III domain that is at least about 70%, at least about 75%, at least about 80%, at least about 85%, at least about 90%, at least about 91%, at least about 92%, at least about 93%, at least about 94%, at least about 95%, at least about 96%, at least about 97%, at least about 98%, at least about 99% identical to SEQ ID No. 1828. In some cases, the endonuclease can comprise a RuvC III domain that is at least about 70%, at least about 75%, at least about 80%, at least about 85%, at least about 90%, at least about 91%, at least about 92%, at least about 93%, at least about 94%, at least about 95%, at least about 96%, at least about 97%, at least about 98%, at least about 99% identical to SEQ ID No. 1829. In some cases, the endonuclease can comprise a RuvC III domain that is at least about 70%, at least about 75%, at least about 80%, at least about 85%, at least about 90%, at least about 91%, at least about 92%, at least about 93%, at least about 94%, at least about 95%, at least about 96%, at least about 97%, at least about 98%, at least about 99% identical to SEQ ID No. 1830. In some cases, the endonuclease can comprise a RuvC III domain that is at least about 70%, at least about 75%, at least about 80%, at least about 85%, at least about 90%, at least about 91%, at least about 92%, at least about 93%, at least about 94%, at least about 95%, at least about 96%, at least about 97%, at least about 98%, at least about 99% identical to SEQ ID No. 1831.

The endonuclease may comprise an HNH domain having at least about 70% identity to any one of SEQ ID NO 3638-3955. In some cases, the endonuclease can comprise an HNH domain that is at least about 70%, at least about 75%, at least about 80%, at least about 85%, at least about 90%, at least about 91%, at least about 92%, at least about 93%, at least about 94%, at least about 95%, at least about 96%, at least about 97%, at least about 98%, or at least about 99% identical to any one of SEQ ID NO 3638-. The endonuclease may comprise an HNH domain substantially identical to any one of SEQ ID NO 3638-3955. The endonuclease may comprise an HNH domain having at least about 70% identity to any one of SEQ ID NO 3638-3955. In some cases, the endonuclease can comprise an HNH domain that is at least about 70%, at least about 75%, at least about 80%, at least about 85%, at least about 90%, at least about 91%, at least about 92%, at least about 93%, at least about 94%, at least about 95%, at least about 96%, at least about 97%, at least about 98%, or at least about 99% identical to any one of SEQ ID NO 3638-. The endonuclease may comprise an HNH domain substantially identical to any one of SEQ ID NO 3638-3955. The endonuclease can comprise an HNH domain that is at least about 70% identical to any one of SEQ ID NO 3638-3641. In some cases, the endonuclease can comprise an HNH domain that is at least about 70%, at least about 75%, at least about 80%, at least about 85%, at least about 90%, at least about 91%, at least about 92%, at least about 93%, at least about 94%, at least about 95%, at least about 96%, at least about 97%, at least about 98%, or at least about 99% identical to any one of SEQ ID NO 3638 and 3641. The endonuclease may comprise an HNH domain substantially identical to any one of SEQ ID NO 3638-3641. The endonuclease can comprise a HNH domain having at least about 70% identity to any one of SEQ ID NOs 3638. In some cases, the endonuclease can comprise an HNH domain that is at least about 70%, at least about 75%, at least about 80%, at least about 85%, at least about 90%, at least about 91%, at least about 92%, at least about 93%, at least about 94%, at least about 95%, at least about 96%, at least about 97%, at least about 98%, or at least about 99% identical to any one of SEQ ID No. 3638. The endonuclease may comprise an HNH domain substantially identical to any one of SEQ ID NO 3638. The endonuclease may comprise a HNH domain having at least about 70% identity to any one of SEQ ID NOs 3639. In some cases, the endonuclease can comprise an HNH domain that is at least about 70%, at least about 75%, at least about 80%, at least about 85%, at least about 90%, at least about 91%, at least about 92%, at least about 93%, at least about 94%, at least about 95%, at least about 96%, at least about 97%, at least about 98%, or at least about 99% identical to any one of SEQ ID No. 3639. The endonuclease may comprise an HNH domain substantially identical to any one of SEQ ID NO 3639. The endonuclease can comprise a HNH domain having at least about 70% identity to any one of SEQ ID NOs 3640. In some cases, the endonuclease can comprise an HNH domain that is at least about 70%, at least about 75%, at least about 80%, at least about 85%, at least about 90%, at least about 91%, at least about 92%, at least about 93%, at least about 94%, at least about 95%, at least about 96%, at least about 97%, at least about 98%, or at least about 99% identical to any one of SEQ ID No. 3640. The endonuclease may comprise an HNH domain substantially identical to any one of SEQ ID NO 3640. The endonuclease can comprise a HNH domain having at least about 70% identity to any one of SEQ ID NOs 3641. In some cases, the endonuclease can comprise an HNH domain that is at least about 70%, at least about 75%, at least about 80%, at least about 85%, at least about 90%, at least about 91%, at least about 92%, at least about 93%, at least about 94%, at least about 95%, at least about 96%, at least about 97%, at least about 98%, or at least about 99% identical to any of SEQ ID NOs 3641. The endonuclease may comprise an HNH domain substantially identical to any one of SEQ ID NO 3641.

In some cases, the endonuclease can comprise a variant that is at least about 30%, at least about 35%, at least about 40%, at least about 45%, at least about 50%, at least about 55%, at least about 60%, at least about 65%, at least about 70%, at least about 75%, at least about 80%, at least about 85%, at least about 90%, at least about 95%, at least about 96%, at least about 97%, at least about 98%, or at least about 99% identical to any of SEQ ID NOs 1-6 or 9-319. In some cases, the endonuclease can be substantially identical to any of SEQ ID NOs 1-6 or 9-319. In some cases, the endonuclease can comprise a variant that is at least about 30%, at least about 35%, at least about 40%, at least about 45%, at least about 50%, at least about 55%, at least about 60%, at least about 65%, at least about 70%, at least about 75%, at least about 80%, at least about 85%, at least about 90%, at least about 95%, at least about 96%, at least about 97%, at least about 98%, or at least about 99% identical to any of SEQ ID NOs 1-4. In some cases, the endonuclease can be substantially identical to any of SEQ ID NOs 1-4. In some cases, the endonuclease can comprise a peptide motif substantially identical to any of SEQ ID NOs 5615, 5616, or 5617.

In some cases, the endonuclease can comprise a variant having one or more Nuclear Localization Sequences (NLS). The NLS can be proximal to the N-terminus or C-terminus of the endonuclease. The NLS can be attached to the N-terminus or C-terminus of any one of SEQ ID NOS 1-6 or SEQ ID NOS 9-319 or to the N-terminus or C-terminus of a variant having at least about 30%, at least about 35%, at least about 40%, at least about 45%, at least about 50%, at least about 55%, at least about 60%, at least about 65%, at least about 70%, at least about 75%, at least about 80%, at least about 85%, at least about 90%, at least about 95%, at least about 96%, at least about 97%, at least about 98%, or at least about 99% identity to any one of SEQ ID NOS 1-319. The NLS can be a SV40 large T antigen NLS. The NLS can be a c-myc NLS. The NLS may comprise a sequence having at least about 80%, at least about 85%, at least about 90%, at least about 95%, at least about 99% identity to any of SEQ ID NO 5593 and 5608. The NLS may comprise a sequence substantially identical to any one of SEQ ID NO: 5593-. The NLS may comprise any of the sequences in table 1 below, or a combination thereof:

table 1: exemplary NLS sequences that can be used with Cas effectors of the present disclosure

In some cases, the endonuclease may be recombinant (e.g., cloned, expressed and purified by suitable methods such as expression in e.coli followed by epitope tag purification). In some cases, the endonuclease can be derived from a bacterium that has a 16S rRNA gene that is at least about 90% identical to any one of SEQ ID NO:5592 and 5595. The endonuclease can be derived from a species having a 16S rRNA gene that is at least about 80%, at least about 82%, at least about 83%, at least about 84%, at least about 85%, at least about 86%, at least about 87%, at least about 88%, at least about 89%, at least about 90%, at least about 91%, at least about 92%, at least about 93%, at least about 94%, at least about 95%, at least about 96%, at least about 97%, at least about 98%, or at least about 99% identical to any of SEQ ID NO 5592 and 5595. The endonuclease may be derived from a species having a 16S rRNA gene that is substantially identical to any one of SEQ ID NO: 5592-5595. The endonuclease may be derived from a bacterium belonging to the phylum Vibrio verruculosa or Vietnamese Allomycota.

In some cases, sequence identity may be determined by BLASTP, CLUSTALW, MUSCLE, MAFFT, Novafold, or Smith-Waterman homology search algorithms. Sequence identity can be determined by the BLASTP algorithm using the word length (W) of 3, the parameters of expectation (E) of 10, and using the BLOSUM62 scoring matrix (setting the gap (when present) penalty of 11, extension of 1) and using the conditional composition scoring matrix adjustment).

In some cases, sequence identity may be determined by BLASTP, CLUSTALW, MUSCLE, MAFFT, Novafold, or Smith-Waterman homology search algorithms. Sequence identity can be determined by the BLASTP algorithm using the word length (W) of 3, the parameters of expectation (E) of 10, and using the BLOSUM62 scoring matrix (setting the gap (when present) penalty of 11, extension of 1) and using the conditional composition scoring matrix adjustment).

In some cases, the tracr sequence may have a specific sequence. the tracr sequence may have at least about 80% to at least about 60-100 (e.g., at least about 60, at least about 65, at least about 70, at least about 75, at least about 80, at least about 85, or at least about 90) consecutive nucleotides of the native tracr rna sequence. the tracr sequence may have at least about 80% sequence identity to at least about 60-100 (e.g., at least about 60, to at least about 65, at least about 70, at least about 75, at least about 80, at least about 85, or at least about 90) consecutive nucleotides of any one of SEQ ID NO 5476-5489. In some cases, the tracrRNA may have at least about 80%, at least about 85%, at least about 90%, at least about 91%, at least about 92%, at least about 93%, at least about 94%, at least about 95%, at least about 96%, at least about 97%, at least about 98%, or at least about 99% identity over at least about 60-90 (e.g., at least about 60, at least about 65, at least about 70, at least about 75, at least about 80, at least about 85, or at least about 90) consecutive nucleotides of any one of SEQ ID NO 5476-5489. In some cases, the tracrRNA may be substantially identical to at least about 60-100 (e.g., at least about 60, at least about 65, at least about 70, at least about 75, at least about 80, at least about 85, or at least about 90) consecutive nucleotides of any one of SEQ ID NO 5476-5489. the tracrRNA may comprise any one of SEQ ID NO 5476-5489.

In some cases, the at least one engineered synthetic guide ribonucleic acid (sgRNA) capable of forming a complex with the endonuclease can comprise a sequence having at least about 80% identity to any one of SEQ ID NOs 5461-5464. The sgRNA can comprise a sequence that is at least about 80%, at least about 85%, at least about 90%, at least about 91%, at least about 92%, at least about 93%, at least about 94%, at least about 95%, at least about 96%, at least about 97%, at least about 98%, or at least about 99% identical to any of SEQ ID NO 5461-5464. The sgRNA may comprise a sequence substantially identical to any of SEQ ID NO 5461-5464.

In some cases, the system described above may comprise two different sgrnas that target a first region and a second region to cleave in a target DNA locus, wherein the second region is located 3' to the first region. In some cases, the above system may comprise a single-stranded or double-stranded DNA repair template comprising, from 5 'to 3': a first homology arm comprising a sequence of at least about 20 (e.g., at least about 40, 80, 120, 150, 200, 300, 500, or 1kb) nucleotides 5 'to the first region, a synthetic DNA sequence of at least about 10 nucleotides, and a second homology arm comprising a sequence of at least about 20 (e.g., at least about 40, 80, 120, 150, 200, 300, 500, or 1kb) nucleotides 3' to the second region.

In another aspect, the present disclosure provides a method of modifying a target nucleic acid locus. The methods can include delivering any of the non-natural systems disclosed herein (including the enzymes disclosed herein and at least one synthetic guide rna (sgrna)) to a target nucleic acid locus. The enzyme can form a complex with at least one sgRNA, and upon binding of the complex to the target nucleic acid locus, can modify the target nucleic acid locus. Delivering an enzyme to the locus can comprise transfecting a cell with the system or a nucleic acid encoding the system. Delivering a nuclease to the locus can comprise electroporating the cell with the system or a nucleic acid encoding the system. Delivering the nuclease to the locus can include incubating the system in a buffer with a nucleic acid comprising the locus of interest. In some cases, the target nucleic acid locus comprises deoxyribonucleic acid (DNA) or ribonucleic acid (RNA). The target nucleic acid locus may comprise genomic DNA, viral RNA, or bacterial DNA. The target nucleic acid locus can be within a cell. The target nucleic acid locus can be in vitro. The target nucleic acid locus can be in a eukaryotic cell or a prokaryotic cell. The cell may be an animal cell, a human cell, a bacterial cell, an archaeal cell, or a plant cell. The enzyme may induce a single-strand or double-strand break at or proximal to the target locus of interest.

Where the target nucleic acid locus can be intracellular, the enzyme can be provided as a nucleic acid comprising an open reading frame encoding an enzyme having a RuvC III domain at least about 75% (e.g., at least about 90%, at least about 91%, at least about 92%, at least about 93%, at least about 94%, at least about 95%, at least about 96%, at least about 97%, at least about 98%, at least about 99%) identical to any of SEQ ID NOs 1827-2140. A deoxyribonucleic acid (DNA) comprising an open reading frame encoding the endonuclease may comprise a sequence substantially identical to any one of SEQ ID NO:5572-5575 or a variant having at least about 30%, at least about 35%, at least about 40%, at least about 45%, at least about 50%, at least about 55%, at least about 60%, at least about 65%, at least about 70%, at least about 75%, at least about 80%, at least about 85%, at least about 90%, at least about 95%, at least about 96%, at least about 97%, at least about 98%, or at least about 99% identity to any one of SEQ ID NO: 5572-5575. In some cases, the nucleic acid comprises a promoter to which an open reading frame encoding an endonuclease is operably linked. The promoter may be a CMV, EF1a, SV40, PGK1, Ubc, human beta actin, CAG, TRE or CaMKIIa promoter. The endonuclease can be provided as a capped mRNA comprising the open reading frame encoding the endonuclease. The endonuclease may be provided as a translated polypeptide. The at least one engineered sgRNA can be provided as a deoxyribonucleic acid (DNA) comprising a gene sequence encoding the at least one engineered sgRNA operably linked to a ribonucleic acid (RNA) pol III promoter. In some cases, the organism may be a eukaryote. In some cases, the organism may be a fungus. In some cases, the organism may be a human.

In some cases, the present disclosure can provide an expression cassette comprising a system disclosed herein or a nucleic acid described herein. In some cases, the expression cassette or nucleic acid may be provided as a vector. In some cases, an expression cassette, nucleic acid, or vector may be provided in a cell. In some cases, the cell is a bacterial cell having a 16S rRNA gene that is at least about 90% (e.g., at least about 99%) identical to any of SEQ ID NO:5592 and 5595.

MG2 enzyme

In one aspect, the present disclosure provides an engineered nuclease system comprising (a) an endonuclease. In some cases, the endonuclease is a Cas endonuclease. In some cases, the endonuclease is a type II, class II Cas endonuclease. The endonuclease can comprise a RuvC _ III domain, wherein the RuvC _ III domain has at least about 70% sequence identity to any one of SEQ ID NO 2141-2241. In some cases, an endonuclease can comprise a RuvC _ III domain, wherein the RuvC _ III domain has at least about 20%, at least about 25%, at least about 30%, at least about 35%, at least about 40%, at least about 45%, at least about 50%, at least about 55%, at least about 60%, at least about 65%, at least about 70%, at least about 75%, at least about 80%, at least about 85%, at least about 90%, at least about 91%, at least about 92%, at least about 93%, at least about 94%, at least about 95%, at least about 96%, at least about 97%, at least about 98%, at least about 99% identity to any one of SEQ ID NO 2141-2241. In some cases, the endonuclease can comprise a RuvC _ III domain that is substantially identical to any one of SEQ ID NOs 2141-2142. The endonuclease may comprise a RuvC _ III domain having at least about 70% sequence identity to any one of SEQ ID NOs: 2141-2142. In some cases, an endonuclease can comprise a RuvC _ III domain that is at least about 20%, at least about 25%, at least about 30%, at least about 35%, at least about 40%, at least about 45%, at least about 50%, at least about 55%, at least about 60%, at least about 65%, at least about 70%, at least about 75%, at least about 80%, at least about 85%, at least about 90%, at least about 91%, at least about 92%, at least about 93%, at least about 94%, at least about 95%, at least about 96%, at least about 97%, at least about 98%, at least about 99% identical to any of SEQ ID NO 2141-2142. In some cases, the endonuclease may comprise a RuvC _ III domain that is substantially identical to any one of SEQ ID NOs 2141-2142.

The endonuclease may comprise an HNH domain that is at least about 70% identical to any one of SEQ ID NO 3955-4055. In some cases, the endonuclease can comprise an HNH domain that is at least about 70%, at least about 75%, at least about 80%, at least about 85%, at least about 90%, at least about 91%, at least about 92%, at least about 93%, at least about 94%, at least about 95%, at least about 96%, at least about 97%, at least about 98%, or at least about 99% identical to any one of SEQ ID NO 3955-4055. The endonuclease may comprise an HNH domain substantially identical to any one of SEQ ID NO 3955-4055. The endonuclease may comprise an HNH domain that is at least about 70% identical to any one of SEQ ID NO 3955-3956. In some cases, the endonuclease can comprise an HNH domain that is at least about 70%, at least about 75%, at least about 80%, at least about 85%, at least about 90%, at least about 91%, at least about 92%, at least about 93%, at least about 94%, at least about 95%, at least about 96%, at least about 97%, at least about 98%, or at least about 99% identical to any one of SEQ ID NO 3955-3956. The endonuclease may comprise an HNH domain substantially identical to any one of SEQ ID NO 3955-3956.

In some cases, the endonuclease can comprise a variant that is at least about 30%, at least about 35%, at least about 40%, at least about 45%, at least about 50%, at least about 55%, at least about 60%, at least about 65%, at least about 70%, at least about 75%, at least about 80%, at least about 85%, at least about 90%, at least about 95%, at least about 96%, at least about 97%, at least about 98%, or at least about 99% identical to any of SEQ ID NO 320-420. In some cases, the endonuclease can be substantially identical to any one of SEQ ID NOs 320-420. In some cases, the endonuclease can comprise a variant that is at least about 30%, at least about 35%, at least about 40%, at least about 45%, at least about 50%, at least about 55%, at least about 60%, at least about 65%, at least about 70%, at least about 75%, at least about 80%, at least about 85%, at least about 90%, at least about 95%, at least about 96%, at least about 97%, at least about 98%, or at least about 99% identical to any of SEQ ID NO 320-321. In some cases, the endonuclease can be substantially identical to any of SEQ ID NO 320-321.

In some cases, the endonuclease can comprise a variant having one or more Nuclear Localization Sequences (NLS). The NLS can be proximal to the N-terminus or C-terminus of the endonuclease. The NLS can be attached to the N-terminus or C-terminus of any of SEQ ID NO:320-420 or a variant having at least about 30%, at least about 35%, at least about 40%, at least about 45%, at least about 50%, at least about 55%, at least about 60%, at least about 65%, at least about 70%, at least about 75%, at least about 80%, at least about 85%, at least about 90%, at least about 95%, at least about 96%, at least about 97%, at least about 98%, or at least about 99% identity to any of SEQ ID NO: 320-420. The NLS can be a SV40 large T antigen NLS. The NLS can be a c-myc NLS. The NLS may comprise a sequence having at least about 80%, at least about 85%, at least about 90%, at least about 95%, at least about 99% identity to any of SEQ ID NO 5593 and 5608. The NLS may comprise a sequence substantially identical to any one of SEQ ID NO: 5593-. The NLS may comprise any of the sequences in table 1 or a combination thereof:

In some cases, sequence identity may be determined by BLASTP, CLUSTALW, MUSCLE, MAFFT, Novafold, or Smith-Waterman homology search algorithms. Sequence identity can be determined by the BLASTP algorithm using the word length (W) of 3, the parameters of expectation (E) of 10, and using the BLOSUM62 scoring matrix (setting the gap (when present) penalty of 11, extension of 1) and using the conditional composition scoring matrix adjustment).

In some cases, the system can comprise (b) at least one engineered synthetic guide ribonucleic acid (sgRNA) capable of forming a complex with an endonuclease, with a 5' targeting region complementary to a desired cleavage sequence. In some cases, the 5' targeting region may comprise a PAM sequence compatible with the endonuclease. In some cases, the most 5' nucleotide of the targeting region may be G. In some cases, the 5' targeting region can be 15-23 nucleotides in length. The guide sequence and tracr sequence may be provided as separate ribonucleic acids (RNAs) or a single ribonucleic acid (RNA). The guide RNA may comprise a crRNA tracrRNA binding sequence 3' of the targeting region. The guide RNA may comprise a tracrRNA sequence preceded by a 4-nucleotide linker 3' to the tracrRNA binding region of the crRNA. The sgRNA can comprise, from 5 'to 3': a non-native guide nucleic acid sequence capable of hybridizing to a target sequence in a cell; and tracr sequences. In some cases, the non-native guide nucleic acid sequence and tracr sequence are covalently linked.

In some cases, the tracr sequence may have a specific sequence. the tracr sequence may have at least about 80% to at least about 60-100 (e.g., at least about 60, at least about 65, at least about 70, at least about 75, at least about 80, at least about 85, or at least about 90) consecutive nucleotides of the native tracr rna sequence. the tracr sequence may have at least about 80% sequence identity to at least about 60-100 (e.g., at least about 60, at least about 65, at least about 70, at least about 75, at least about 80, at least about 85, or at least about 90) consecutive nucleotides of any one of SEQ ID NO 5490-5494. In some cases, the tracrRNA may be at least about 80%, at least about 85%, at least about 90%, at least about 91%, at least about 92%, at least about 93%, at least about 94%, at least about 95%, at least about 96%, at least about 97%, at least about 98%, or at least about 99% identical to at least about 60-90 (e.g., at least about 60, at least about 65, at least about 70, at least about 75, at least about 80, at least about 85, or at least about 90) consecutive nucleotides of any one of SEQ ID NO 5490-5494. In some cases, the tracrRNA may be substantially identical to at least about 60-100 (e.g., at least about 60, at least about 65, at least about 70, at least about 75, at least about 80, at least about 85, or at least about 90) consecutive nucleotides of any one of SEQ ID NO 5490-5494. the tracrRNA may comprise any one of SEQ ID NO 5490-5494.

In some cases, at least one engineered synthetic guide ribonucleic acid (sgRNA) capable of forming a complex with the endonuclease can comprise a sequence having at least about 80% identity to SEQ ID NO: 5465. The sgRNA can comprise a sequence having at least about 80%, at least about 85%, at least about 90%, at least about 91%, at least about 92%, at least about 93%, at least about 94%, at least about 95%, at least about 96%, at least about 97%, at least about 98%, or at least about 99% identity to SEQ ID No. 5465. The sgRNA can comprise a sequence substantially identical to SEQ ID NO 5465.

In some cases, the system can comprise two different sgrnas that target a first region and a second region to cleave in a target DNA locus, wherein the second region is located 3' to the first region. In some cases, the above system may comprise a single-stranded or double-stranded DNA repair template comprising, from 5 'to 3': a first homology arm comprising a sequence of at least about 20 (e.g., at least about 40, 80, 120, 150, 200, 300, 500, or 1kb) nucleotides 5 'to the first region, a synthetic DNA sequence of at least about 10 nucleotides, and a second homology arm comprising a sequence of at least about 20 (e.g., at least about 40, 80, 120, 150, 200, 300, 500, or 1kb) nucleotides 3' to the second region.

In another aspect, the present disclosure provides a method of modifying a target nucleic acid locus of interest. The methods can include delivering any of the non-natural systems disclosed herein (including the enzymes disclosed herein and at least one synthetic guide rna (sgrna)) to a target nucleic acid locus. The enzyme can form a complex with at least one sgRNA and can modify a target nucleic acid locus of interest upon binding of the complex to the target nucleic acid locus of interest. Delivering an enzyme to the locus can comprise transfecting a cell with the system or a nucleic acid encoding the system. Delivering a nuclease to the locus can comprise electroporating the cell with the system or a nucleic acid encoding the system. Delivering the nuclease to the locus can include incubating the system in a buffer with a nucleic acid comprising the locus of interest. In some cases, the target nucleic acid locus comprises deoxyribonucleic acid (DNA) or ribonucleic acid (RNA). The target nucleic acid locus may comprise genomic DNA, viral RNA, or bacterial DNA. The target nucleic acid locus can be within a cell. The target nucleic acid locus can be in vitro. The target nucleic acid locus can be in a eukaryotic cell or a prokaryotic cell. The cell may be an animal cell, a human cell, a bacterial cell, an archaeal cell, or a plant cell. The enzyme may induce a single-strand or double-strand break at or proximal to the target locus of interest.

Where the target nucleic acid locus can be intracellular, the enzyme can be provided as a nucleic acid comprising an open reading frame encoding an enzyme having a RuvC _ III domain at least about 75% (e.g., at least about 90%, at least about 91%, at least about 92%, at least about 93%, at least about 94%, at least about 95%, at least about 96%, at least about 97%, at least about 98%, at least about 99%) identical to any one of SEQ ID NO 2141-2241. A deoxyribonucleic acid (DNA) comprising an open reading frame encoding the endonuclease may comprise a sequence substantially identical to any of SEQ ID NO:5576-5577 or a variant having at least about 30%, at least about 35%, at least about 40%, at least about 45%, at least about 50%, at least about 55%, at least about 60%, at least about 65%, at least about 70%, at least about 75%, at least about 80%, at least about 85%, at least about 90%, at least about 95%, at least about 96%, at least about 97%, at least about 98%, or at least about 99% identity to any of SEQ ID NO: 5576-5577. In some cases, the nucleic acid comprises a promoter to which an open reading frame encoding an endonuclease is operably linked. The promoter may be a CMV, EF1a, SV40, PGK1, Ubc, human beta actin, CAG, TRE or CaMKIIa promoter. The endonuclease can be provided as a capped mRNA comprising the open reading frame encoding the endonuclease. The endonuclease may be provided as a translated polypeptide. The at least one engineered sgRNA can be provided as a deoxyribonucleic acid (DNA) comprising a gene sequence encoding the at least one engineered sgRNA operably linked to a ribonucleic acid (RNA) pol III promoter. In some cases, the organism may be a eukaryote. In some cases, the organism may be a fungus. In some cases, the organism may be a human.

MG3 enzyme

In one aspect, the present disclosure provides an engineered nuclease system comprising (a) an endonuclease. In some cases, the endonuclease is a Cas endonuclease. In some cases, the endonuclease is a type II, class II Cas endonuclease. The endonuclease can comprise a RuvC _ III domain, wherein the RuvC _ III domain has at least about 70% sequence identity to any one of SEQ ID NO 2242-2251. In some cases, the endonuclease can comprise a RuvC _ III domain, wherein the RuvC _ III domain has at least about 20%, at least about 25%, at least about 30%, at least about 35%, at least about 40%, at least about 45%, at least about 50%, at least about 55%, at least about 60%, at least about 65%, at least about 70%, at least about 75%, at least about 80%, at least about 85%, at least about 90%, at least about 91%, at least about 92%, at least about 93%, at least about 94%, at least about 95%, at least about 96%, at least about 97%, at least about 98%, at least about 99% identity to any one of SEQ ID NO 2242-2251. In some cases, the endonuclease can comprise a RuvC _ III domain that is substantially identical to any one of SEQ ID NO 2242-2251. The endonuclease may comprise a RuvC _ III domain having at least about 70% sequence identity to any one of SEQ ID NO 2242-2244. In some cases, the endonuclease can comprise a RuvC _ III domain that is at least about 20%, at least about 25%, at least about 30%, at least about 35%, at least about 40%, at least about 45%, at least about 50%, at least about 55%, at least about 60%, at least about 65%, at least about 70%, at least about 75%, at least about 80%, at least about 85%, at least about 90%, at least about 91%, at least about 92%, at least about 93%, at least about 94%, at least about 95%, at least about 96%, at least about 97%, at least about 98%, or at least about 99% identical to any of SEQ ID NO 2242-2244. In some cases, the endonuclease can comprise a RuvC _ III domain that is substantially identical to any one of SEQ ID NO 2242-2244.

The endonuclease may comprise an HNH domain that is at least about 70% identical to any of SEQ ID NO 4056-4066. In some cases, the endonuclease can comprise an HNH domain that is at least about 70%, at least about 75%, at least about 80%, at least about 85%, at least about 90%, at least about 91%, at least about 92%, at least about 93%, at least about 94%, at least about 95%, at least about 96%, at least about 97%, at least about 98%, or at least about 99% identical to any one of SEQ ID NO 4056 and 4066. The endonuclease may comprise an HNH domain substantially identical to any one of SEQ ID NO 4056-4066. The endonuclease may comprise an HNH domain that is at least about 70% identical to any one of SEQ ID NO 4056 and 4058. In some cases, the endonuclease can comprise an HNH domain that is at least about 70%, at least about 75%, at least about 80%, at least about 85%, at least about 90%, at least about 91%, at least about 92%, at least about 93%, at least about 94%, at least about 95%, at least about 96%, at least about 97%, at least about 98%, or at least about 99% identical to any one of SEQ ID NO 4056 and 4058. The endonuclease may comprise an HNH domain substantially identical to any one of SEQ ID NO 4056-4058.

In some cases, the endonuclease can comprise a variant that is at least about 30%, at least about 35%, at least about 40%, at least about 45%, at least about 50%, at least about 55%, at least about 60%, at least about 65%, at least about 70%, at least about 75%, at least about 80%, at least about 85%, at least about 90%, at least about 95%, at least about 96%, at least about 97%, at least about 98%, or at least about 99% identical to any of SEQ ID NO 421 and 431. In some cases, the endonuclease can be substantially identical to any of SEQ ID NO 421 and 431. In some cases, the endonuclease can comprise a variant that is at least about 30%, at least about 35%, at least about 40%, at least about 45%, at least about 50%, at least about 55%, at least about 60%, at least about 65%, at least about 70%, at least about 75%, at least about 80%, at least about 85%, at least about 90%, at least about 95%, at least about 96%, at least about 97%, at least about 98%, or at least about 99% identical to any of SEQ ID NO 421 and 423. In some cases, the endonuclease can be substantially identical to any of SEQ ID NO 421 and 423.

In some cases, the endonuclease can comprise a variant having one or more Nuclear Localization Sequences (NLS). The NLS can be proximal to the N-terminus or C-terminus of the endonuclease. The NLS can be attached to the N-terminus or C-terminus of any of SEQ ID NO 421-431 or to the N-terminus or C-terminus of a variant having at least about 30%, at least about 35%, at least about 40%, at least about 45%, at least about 50%, at least about 55%, at least about 60%, at least about 65%, at least about 70%, at least about 75%, at least about 80%, at least about 85%, at least about 90%, at least about 95%, at least about 96%, at least about 97%, at least about 98%, or at least about 99% identity to any of SEQ ID NO 421-431. The NLS can be a SV40 large T antigen NLS. The NLS can be a c-myc NLS. The NLS may comprise a sequence having at least about 80%, at least about 85%, at least about 90%, at least about 95%, at least about 99% identity to any of SEQ ID NO 5593-5608. The NLS may comprise a sequence substantially identical to any one of SEQ ID NO: 5593-. The NLS may comprise any of the sequences in table 1 or a combination thereof:

In some cases, sequence identity may be determined by BLASTP, CLUSTALW, MUSCLE, MAFFT, Novafold, or Smith-Waterman homology search algorithms. Sequence identity can be determined by the BLASTP algorithm using the word length (W) of 3, the parameters of expectation (E) of 10, and using the BLOSUM62 scoring matrix (setting the gap (when present) penalty of 11, extension of 1) and using the conditional composition scoring matrix adjustment).

In some cases, the system can comprise (b) at least one engineered synthetic guide ribonucleic acid (sgRNA) capable of forming a complex with an endonuclease, with a 5' targeting region complementary to a desired cleavage sequence. In some cases, the 5' targeting region may comprise a PAM sequence compatible with the endonuclease. In some cases, the most 5' nucleotide of the targeting region may be G. In some cases, the 5' targeting region can be 15-23 nucleotides in length. The guide sequence and tracr sequence may be provided as separate ribonucleic acids (RNAs) or a single ribonucleic acid (RNA). The guide RNA may comprise a crRNA tracrRNA binding sequence 3' of the targeting region. The guide RNA may comprise a tracrRNA sequence preceded by a 4-nucleotide linker 3' to the tracrRNA binding region of the crRNA. The sgRNA can comprise, from 5 'to 3': a non-native guide nucleic acid sequence capable of hybridizing to a target sequence in a cell; and tracr sequences. In some cases, the non-native guide nucleic acid sequence and tracr sequence are covalently linked.

In some cases, the tracr sequence may have a specific sequence. the tracr sequence may have at least about 80% to at least about 60-100 (e.g., at least about 60, at least about 65, at least about 70, at least about 75, at least about 80, at least about 85, or at least about 90) consecutive nucleotides of the native tracr rna sequence. the tracr sequence may have at least about 80% sequence identity to at least about 60-100 (e.g., at least about 60, at least about 65, at least about 70, at least about 75, at least about 80, at least about 85, or at least about 90) consecutive nucleotides of any one of SEQ ID NO 5495-5502. In some cases, the tracrRNA may be at least about 80%, at least about 85%, at least about 90%, at least about 91%, at least about 92%, at least about 93%, at least about 94%, at least about 95%, at least about 96%, at least about 97%, at least about 98%, or at least about 99% identical to at least about 60-90 (e.g., at least about 60, at least about 65, at least about 70, at least about 75, at least about 80, at least about 85, or at least about 90) consecutive nucleotides of any of SEQ ID NO 5495-5502. In some cases, the tracrRNA may be substantially identical to at least about 60-100 (e.g., at least about 60, at least about 65, at least about 70, at least about 75, at least about 80, at least about 85, or at least about 90) consecutive nucleotides of any one of SEQ ID NO 5495-5502. the tracrRNA may comprise any one of SEQ ID NO 5495-5502.

In some cases, the at least one engineered synthesis-directing ribonucleic acid (sgRNA) capable of forming a complex with an endonuclease can comprise a sequence having at least about 80% identity to any one of SEQ ID NO: 5466-5467. The sgRNA can comprise a sequence that is at least about 80%, at least about 85%, at least about 90%, at least about 91%, at least about 92%, at least about 93%, at least about 94%, at least about 95%, at least about 96%, at least about 97%, at least about 98%, or at least about 99% identical to any of SEQ ID NO 5466-5467. The sgRNA may comprise a sequence which is essentially identical to any of SEQ ID NO 5466-5467.

In some cases, the system can comprise two different sgrnas that target a first region and a second region to cleave in a target DNA locus, wherein the second region is located 3' to the first region. In some cases, the above system may comprise a single-stranded or double-stranded DNA repair template comprising, from 5 'to 3': a first homology arm comprising a sequence of at least about 20 (e.g., at least about 40, 80, 120, 150, 200, 300, 500, or 1kb) nucleotides 5 'to the first region, a synthetic DNA sequence of at least about 10 nucleotides, and a second homology arm comprising a sequence of at least about 20 (e.g., at least about 40, 80, 120, 150, 200, 300, 500, or 1kb) nucleotides 3' to the second region.

In another aspect, the present disclosure provides a method of modifying a target nucleic acid locus of interest. The methods can include delivering any of the non-natural systems disclosed herein (including the enzymes disclosed herein and at least one synthetic guide rna (sgrna)) to a target nucleic acid locus. The enzyme can form a complex with at least one sgRNA and can modify a target nucleic acid locus of interest upon binding of the complex to the target nucleic acid locus of interest. Delivering an enzyme to the locus can comprise transfecting a cell with the system or a nucleic acid encoding the system. Delivering a nuclease to the locus can comprise electroporating the cell with the system or a nucleic acid encoding the system. Delivering the nuclease to the locus can include incubating the system in a buffer with a nucleic acid comprising the locus of interest. In some cases, the target nucleic acid locus comprises deoxyribonucleic acid (DNA) or ribonucleic acid (RNA). The target nucleic acid locus may comprise genomic DNA, viral RNA, or bacterial DNA. The target nucleic acid locus can be within a cell. The target nucleic acid locus can be in vitro. The target nucleic acid locus can be in a eukaryotic cell or a prokaryotic cell. The cell may be an animal cell, a human cell, a bacterial cell, an archaeal cell, or a plant cell. The enzyme may induce a single-strand or double-strand break at or proximal to the target locus of interest.

Where the target nucleic acid locus can be intracellular, the enzyme can be provided as a nucleic acid comprising an open reading frame encoding an enzyme having a RuvC _ III domain at least about 75% (e.g., at least about 90%, at least about 91%, at least about 92%, at least about 93%, at least about 94%, at least about 95%, at least about 96%, at least about 97%, at least about 98%, at least about 99%) identical to any of SEQ ID NO 2242-2251. A deoxyribonucleic acid (DNA) comprising an open reading frame encoding the endonuclease may comprise a sequence substantially identical to any of SEQ ID NO 5578-5580 or a variant having at least about 30%, at least about 35%, at least about 40%, at least about 45%, at least about 50%, at least about 55%, at least about 60%, at least about 65%, at least about 70%, at least about 75%, at least about 80%, at least about 85%, at least about 90%, at least about 95%, at least about 96%, at least about 97%, at least about 98%, or at least about 99% identity to any of SEQ ID NO 5578-5580. In some cases, the nucleic acid comprises a promoter to which an open reading frame encoding an endonuclease is operably linked. The promoter may be a CMV, EF1a, SV40, PGK1, Ubc, human beta actin, CAG, TRE or CaMKIIa promoter. The endonuclease can be provided as a capped mRNA comprising the open reading frame encoding the endonuclease. The endonuclease may be provided as a translated polypeptide. The at least one engineered sgRNA can be provided as a deoxyribonucleic acid (DNA) comprising a gene sequence encoding the at least one engineered sgRNA operably linked to a ribonucleic acid (RNA) pol III promoter. In some cases, the organism may be a eukaryote. In some cases, the organism may be a fungus. In some cases, the organism may be a human.

MG4 enzyme

In one aspect, the present disclosure provides an engineered nuclease system comprising (a) an endonuclease. In some cases, the endonuclease is a Cas endonuclease. In some cases, the endonuclease is a type II, class II Cas endonuclease. The endonuclease can comprise a RuvC _ III domain, wherein the RuvC _ III domain has at least about 70% sequence identity to any one of SEQ ID NO 2253-2481. In some cases, the endonuclease can comprise a RuvC _ III domain, wherein the RuvC _ III domain has at least about 20%, at least about 25%, at least about 30%, at least about 35%, at least about 40%, at least about 45%, at least about 50%, at least about 55%, at least about 60%, at least about 65%, at least about 70%, at least about 75%, at least about 80%, at least about 85%, at least about 90%, at least about 91%, at least about 92%, at least about 93%, at least about 94%, at least about 95%, at least about 96%, at least about 97%, at least about 98%, at least about 99% identity to any one of EQ ID NO 2253-2481. In some cases, the endonuclease can comprise a RuvC _ III domain that is substantially identical to any one of SEQ ID NO 2253-2481. The endonuclease can comprise a RuvC _ III domain having at least about 70% sequence identity to any one of SEQ ID NO 2253-2481. In some cases, the endonuclease can comprise a RuvC _ III domain that is at least about 20%, at least about 25%, at least about 30%, at least about 35%, at least about 40%, at least about 45%, at least about 50%, at least about 55%, at least about 60%, at least about 65%, at least about 70%, at least about 75%, at least about 80%, at least about 85%, at least about 90%, at least about 91%, at least about 92%, at least about 93%, at least about 94%, at least about 95%, at least about 96%, at least about 97%, at least about 98%, at least about 99% identical to any of SEQ ID NO 2253-2481. In some cases, the endonuclease can comprise a RuvC _ III domain that is substantially identical to any one of SEQ ID NO 2253-2481.

The endonuclease may comprise an HNH domain that is at least about 70% identical to any one of SEQ ID NO 4067-4295. In some cases, the endonuclease can comprise an HNH domain that is at least about 70%, at least about 75%, at least about 80%, at least about 85%, at least about 90%, at least about 91%, at least about 92%, at least about 93%, at least about 94%, at least about 95%, at least about 96%, at least about 97%, at least about 98%, or at least about 99% identical to any one of SEQ ID NO 4067-4295. The endonuclease may comprise an HNH domain substantially identical to any one of SEQ ID NO 4067-4295. The endonuclease may comprise an HNH domain that is at least about 70% identical to any one of SEQ ID NO 4067-4295. In some cases, the endonuclease can comprise an HNH domain that is at least about 70%, at least about 75%, at least about 80%, at least about 85%, at least about 90%, at least about 91%, at least about 92%, at least about 93%, at least about 94%, at least about 95%, at least about 96%, at least about 97%, at least about 98%, or at least about 99% identical to any one of SEQ ID NO 4067-4295. The endonuclease may comprise an HNH domain substantially identical to any one of SEQ ID NO 4067-4295.

In some cases, the endonuclease can comprise a variant that is at least about 30%, at least about 35%, at least about 40%, at least about 45%, at least about 50%, at least about 55%, at least about 60%, at least about 65%, at least about 70%, at least about 75%, at least about 80%, at least about 85%, at least about 90%, at least about 95%, at least about 96%, at least about 97%, at least about 98%, or at least about 99% identical to any of SEQ ID NO 432-660. In some cases, the endonuclease can be substantially identical to any of SEQ ID NO 432-660. In some cases, the endonuclease can comprise a variant that is at least about 30%, at least about 35%, at least about 40%, at least about 45%, at least about 50%, at least about 55%, at least about 60%, at least about 65%, at least about 70%, at least about 75%, at least about 80%, at least about 85%, at least about 90%, at least about 95%, at least about 96%, at least about 97%, at least about 98%, or at least about 99% identical to any of SEQ ID NO 432-660. In some cases, the endonuclease can be substantially identical to any of SEQ ID NO 432-660.

In some cases, the endonuclease can comprise a variant having one or more Nuclear Localization Sequences (NLS). The NLS can be proximal to the N-terminus or C-terminus of the endonuclease. NLS can be attached to the N-terminus or C-terminus of any of SEQ ID NO. 432-660 or to the N-terminus or C-terminus of a variant having at least about 30%, at least about 35%, at least about 40%, at least about 45%, at least about 50%, at least about 55%, at least about 60%, at least about 65%, at least about 70%, at least about 75%, at least about 80%, at least about 85%, at least about 90%, at least about 95%, at least about 96%, at least about 97%, at least about 98%, or at least about 99% identity to any of SEQ ID NO. 432-660. The NLS can be a SV40 large T antigen NLS. The NLS can be a c-myc NLS. The NLS may comprise a sequence having at least 80%, at least 85%, at least 90%, at least 95%, at least 99% identity to any of SEQ ID NO 5593-5608. The NLS may comprise a sequence substantially identical to any one of SEQ ID NO: 5593-. The NLS may comprise any of the sequences in table 1 or a combination thereof:

In some cases, sequence identity may be determined by BLASTP, CLUSTALW, MUSCLE, MAFFT, Novafold, or Smith-Waterman homology search algorithms. Sequence identity can be determined by the BLASTP algorithm using the word length (W) of 3, the parameters of expectation (E) of 10, and using the BLOSUM62 scoring matrix (setting the gap (when present) penalty of 11, extension of 1) and using the conditional composition scoring matrix adjustment).

In some cases, the system can comprise (b) at least one engineered synthetic guide ribonucleic acid (sgRNA) capable of forming a complex with an endonuclease, with a 5' targeting region complementary to a desired cleavage sequence. In some cases, the 5' targeting region may comprise a PAM sequence compatible with the endonuclease. In some cases, the most 5' nucleotide of the targeting region may be G. In some cases, the 5' targeting region can be 15-23 nucleotides in length. The guide sequence and tracr sequence may be provided as separate ribonucleic acids (RNAs) or a single ribonucleic acid (RNA). The guide RNA may comprise a crRNA tracrRNA binding sequence 3' of the targeting region. The guide RNA may comprise a tracrRNA sequence preceded by a 4-nucleotide linker 3' to the tracrRNA binding region of the crRNA. The sgRNA can comprise, from 5 'to 3': a non-native guide nucleic acid sequence capable of hybridizing to a target sequence in a cell; and tracr sequences. In some cases, the non-native guide nucleic acid sequence and tracr sequence are covalently linked.

In some cases, the tracr sequence may have a specific sequence. the tracr sequence may have at least about 80% to at least about 60-100 (e.g., at least about 60, at least about 65, at least about 70, at least about 75, at least about 80, at least about 85, or at least about 90) consecutive nucleotides of the native tracr rna sequence. the tracr sequence may have at least about 80% identity to at least about 60-100 (e.g., at least about 60, at least about 65, at least about 70, at least about 75, at least about 80, at least about 85, or at least about 90) consecutive nucleotides of SEQ ID No. 5503. In some cases, a tracrRNA may have at least about 80%, at least about 85%, at least about 90%, at least about 91%, at least about 92%, at least about 93%, at least about 94%, at least about 95%, at least about 96%, at least about 97%, at least about 98%, or at least about 99% identity to at least about 60-90 (e.g., at least about 60, at least about 65, at least about 70, at least about 75, at least about 80, at least about 85, or at least about 90) consecutive nucleotides of SEQ ID No. 5503. In some cases, the tracrRNA can be substantially identical to at least about 60-90 (e.g., at least about 60, at least about 65, at least about 70, at least about 75, at least about 80, at least about 85, or at least about 90) consecutive nucleotides of SEQ ID NO: 5503. the tracrRNA may comprise SEQ ID NO 5503.

In some cases, at least one engineered synthetic guide ribonucleic acid (sgRNA) capable of forming a complex with the endonuclease can comprise a sequence having at least about 80% identity to SEQ ID NO: 5468. The sgRNA can comprise a sequence having at least about 80%, at least about 85%, at least about 90%, at least about 91%, at least about 92%, at least about 93%, at least about 94%, at least about 95%, at least about 96%, at least about 97%, at least about 98%, or at least about 99% identity to SEQ ID No. 5468. The sgRNA can comprise a sequence substantially identical to SEQ ID NO 5468.

In some cases, the system can comprise two different sgrnas that target a first region and a second region to cleave in a target DNA locus, wherein the second region is located 3' to the first region. In some cases, the above system may comprise a single-stranded or double-stranded DNA repair template comprising, from 5 'to 3': a first homology arm comprising a sequence of at least about 20 (e.g., at least about 40, 80, 120, 150, 200, 300, 500, or 1kb) nucleotides 5 'to the first region, a synthetic DNA sequence of at least about 10 nucleotides, and a second homology arm comprising a sequence of at least about 20 (e.g., at least about 40, 80, 120, 150, 200, 300, 500, or 1kb) nucleotides 3' to the second region.

In another aspect, the present disclosure provides a method of modifying a target nucleic acid locus of interest. The methods can include delivering any of the non-natural systems disclosed herein (including the enzymes disclosed herein and at least one synthetic guide rna (sgrna)) to a target nucleic acid locus. The enzyme can form a complex with at least one sgRNA and can modify a target nucleic acid locus of interest upon binding of the complex to the target nucleic acid locus of interest. Delivering an enzyme to the locus can comprise transfecting a cell with the system or a nucleic acid encoding the system. Delivering a nuclease to the locus can comprise electroporating the cell with the system or a nucleic acid encoding the system. Delivering the nuclease to the locus can include incubating the system in a buffer with a nucleic acid comprising the locus of interest. In some cases, the target nucleic acid locus comprises deoxyribonucleic acid (DNA) or ribonucleic acid (RNA). The target nucleic acid locus may comprise genomic DNA, viral RNA, or bacterial DNA. The target nucleic acid locus can be within a cell. The target nucleic acid locus can be in vitro. The target nucleic acid locus can be in a eukaryotic cell or a prokaryotic cell. The cell may be an animal cell, a human cell, a bacterial cell, an archaeal cell, or a plant cell. The enzyme may induce a single-strand or double-strand break at or proximal to the target locus of interest.

Where the target nucleic acid locus can be intracellular, the enzyme can be provided as a nucleic acid comprising an open reading frame encoding an enzyme having a RuvC _ III domain at least about 75% (e.g., at least about 90%, at least about 91%, at least about 92%, at least about 93%, at least about 94%, at least about 95%, at least about 96%, at least about 97%, at least about 98%, at least about 99%) identical to any of SEQ ID NO 2253-2481. In some cases, the nucleic acid comprises a promoter to which an open reading frame encoding an endonuclease is operably linked. The promoter may be a CMV, EF1a, SV40, PGK1, Ubc, human beta actin, CAG, TRE or CaMKIIa promoter. The endonuclease can be provided as a capped mRNA comprising the open reading frame encoding the endonuclease. The endonuclease may be provided as a translated polypeptide. The at least one engineered sgRNA can be provided as a deoxyribonucleic acid (DNA) comprising a gene sequence encoding the at least one engineered sgRNA operably linked to a ribonucleic acid (RNA) pol III promoter. In some cases, the organism may be a eukaryote. In some cases, the organism may be a fungus. In some cases, the organism may be a human.

MG6 enzyme

In one aspect, the present disclosure provides an engineered nuclease system comprising (a) an endonuclease. In some cases, the endonuclease is a Cas endonuclease. In some cases, the endonuclease is a type II, class II Cas endonuclease. The endonuclease can comprise a RuvC _ III domain, wherein the RuvC _ III domain has at least about 70% sequence identity to any one of SEQ ID NO: 2482-2489. In some cases, the endonuclease can comprise a RuvC _ III domain, wherein the RuvC _ III domain has at least about 20%, at least about 25%, at least about 30%, at least about 35%, at least about 40%, at least about 45%, at least about 50%, at least about 55%, at least about 60%, at least about 65%, at least about 70%, at least about 75%, at least about 80%, at least about 85%, at least about 90%, at least about 91%, at least about 92%, at least about 93%, at least about 94%, at least about 95%, at least about 96%, at least about 97%, at least about 98%, at least about 99% identity to any one of SEQ ID No. 2482-2489. In some cases, the endonuclease can comprise a RuvC _ III domain, wherein the RuvC _ III domain is substantially identical to any one of SEQ ID NO: 2482-2489.

The endonuclease may comprise an HNH domain that is at least about 70% identical to any of SEQ ID NO 4296-4303. In some cases, the endonuclease can comprise an HNH domain that is at least about 70%, at least about 75%, at least about 80%, at least about 85%, at least about 90%, at least about 91%, at least about 92%, at least about 93%, at least about 94%, at least about 95%, at least about 96%, at least about 97%, at least about 98%, or at least about 99% identical to any one of SEQ ID NO 4296-4303. The endonuclease may comprise an HNH domain substantially identical to any one of SEQ ID NO 4056-4066.

In some cases, the endonuclease may comprise a variant that is at least about 30%, at least about 35%, at least about 40%, at least about 45%, at least about 50%, at least about 55%, at least about 60%, at least about 65%, at least about 70%, at least about 75%, at least about 80%, at least about 85%, at least about 90%, at least about 95%, at least about 96%, at least about 97%, at least about 98%, or at least about 99% identical to any of SEQ ID NO 661-668. In some cases, the endonuclease may be substantially identical to any one of SEQ ID NO 661-668.

In some cases, the endonuclease can comprise a variant having one or more Nuclear Localization Sequences (NLS). The NLS can be proximal to the N-terminus or C-terminus of the endonuclease. NLS can be attached to the N-terminus or C-terminus of any of SEQ ID NO 661-668 or a variant having at least about 30%, at least about 35%, at least about 40%, at least about 45%, at least about 50%, at least about 55%, at least about 60%, at least about 65%, at least about 70%, at least about 75%, at least about 80%, at least about 85%, at least about 90%, at least about 95%, at least about 96%, at least about 97%, at least about 98%, or at least about 99% identity to SEQ ID NO 661-668. The NLS can be a SV40 large T antigen NLS. The NLS can be a c-myc NLS. The NLS may comprise a sequence having at least about 80%, at least about 85%, at least about 90%, at least about 95%, at least about 99% identity to any of SEQ ID NO 5593 and 5608. The NLS may comprise a sequence substantially identical to any one of SEQ ID NO: 5593-. The NLS may comprise any of the sequences in table 1, or a combination thereof:

in some cases, sequence identity may be determined by BLASTP, CLUSTALW, MUSCLE, MAFFT, Novafold, or Smith-Waterman homology search algorithms. Sequence identity can be determined by the BLASTP algorithm using the word length (W) of 3, the parameters of expectation (E) of 10, and using the BLOSUM62 scoring matrix (setting the gap (when present) penalty of 11, extension of 1) and using the conditional composition scoring matrix adjustment).

In some cases, the system can comprise (b) at least one engineered synthetic guide ribonucleic acid (sgRNA) capable of forming a complex with an endonuclease, with a 5' targeting region complementary to a desired cleavage sequence. In some cases, the 5' targeting region may comprise a PAM sequence compatible with the endonuclease. In some cases, the most 5' nucleotide of the targeting region may be G. In some cases, the 5' targeting region can be 15-23 nucleotides in length. The guide sequence and tracr sequence may be provided as separate ribonucleic acids (RNAs) or a single ribonucleic acid (RNA). The guide RNA may comprise a crRNA tracrRNA binding sequence 3' of the targeting region. The guide RNA may comprise a tracrRNA sequence preceded by a 4-nucleotide linker 3' to the tracrRNA binding region of the crRNA. The sgRNA can comprise, from 5 'to 3': a non-native guide nucleic acid sequence capable of hybridizing to a target sequence in a cell; and tracr sequences. In some cases, the non-native guide nucleic acid sequence and tracr sequence are covalently linked.

In some cases, the tracr sequence may have a specific sequence. the tracr sequence may have at least about 80% to at least about 60-100 (e.g., at least about 60, at least about 65, at least about 70, at least about 75, at least about 80, at least about 85, or at least about 90) consecutive nucleotides of the native tracr rna sequence.

In some cases, the above system may comprise two different guide RNAs that target a first region and a second region for cleavage in a target DNA locus, wherein the second region is located 3' to the first region. In some cases, the above system may comprise a single-stranded or double-stranded DNA repair template comprising, from 5 'to 3': a first homology arm comprising a sequence of at least about 20 (e.g., at least about 40, 80, 120, 150, 200, 300, 500, or 1kb) nucleotides 5 'to the first region, a synthetic DNA sequence of at least about 10 nucleotides, and a second homology arm comprising a sequence of at least about 20 (e.g., at least about 40, 80, 120, 150, 200, 300, 500, or 1kb) nucleotides 3' to the second region.

In another aspect, the present disclosure provides a method of modifying a target nucleic acid locus of interest. The methods can include delivering any of the non-natural systems disclosed herein (including the enzymes disclosed herein and at least one synthetic guide rna (sgrna)) to a target nucleic acid locus. The enzyme can form a complex with at least one sgRNA and can modify a target nucleic acid locus of interest upon binding of the complex to the target nucleic acid locus of interest. Delivering an enzyme to the locus can comprise transfecting a cell with the system or a nucleic acid encoding the system. Delivering a nuclease to the locus can comprise electroporating the cell with the system or a nucleic acid encoding the system. Delivering the nuclease to the locus can include incubating the system in a buffer with a nucleic acid comprising the locus of interest. In some cases, the target nucleic acid locus comprises deoxyribonucleic acid (DNA) or ribonucleic acid (RNA). The target nucleic acid locus may comprise genomic DNA, viral RNA, or bacterial DNA. The target nucleic acid locus can be within a cell. The target nucleic acid locus can be in vitro. The target nucleic acid locus can be in a eukaryotic cell or a prokaryotic cell. The cell may be an animal cell, a human cell, a bacterial cell, an archaeal cell, or a plant cell. The enzyme may induce a single-strand or double-strand break at or proximal to the target locus of interest.

Where the target nucleic acid locus can be intracellular, the enzyme can be provided as a nucleic acid comprising an open reading frame encoding an enzyme having a RuvC _ III domain at least about 75% (e.g., at least about 90%, at least about 91%, at least about 92%, at least about 93%, at least about 94%, at least about 95%, at least about 96%, at least about 97%, at least about 98%, at least about 99%) identical to any of SEQ ID NO 2482-2489. In some cases, the nucleic acid comprises a promoter to which an open reading frame encoding an endonuclease is operably linked. The promoter may be a CMV, EF1a, SV40, PGK1, Ubc, human beta actin, CAG, TRE or CaMKIIa promoter. The endonuclease can be provided as a capped mRNA comprising the open reading frame encoding the endonuclease. The endonuclease may be provided as a translated polypeptide. The at least one engineered sgRNA can be provided as a deoxyribonucleic acid (DNA) comprising a gene sequence encoding the at least one engineered sgRNA operably linked to a ribonucleic acid (RNA) pol III promoter. In some cases, the organism may be a eukaryote. In some cases, the organism may be a fungus. In some cases, the organism may be a human.

MG7 enzyme

In one aspect, the present disclosure provides an engineered nuclease system comprising (a) an endonuclease. In some cases, the endonuclease is a Cas endonuclease. In some cases, the endonuclease is a type II, class II Cas endonuclease. The endonuclease can comprise a RuvC _ III domain, wherein the RuvC _ III domain has at least about 70% sequence identity to any one of SEQ ID NO: 2490-2498. In some cases, the endonuclease can comprise a RuvC _ III domain, wherein the RuvC _ III domain has at least about 20%, at least about 25%, at least about 30%, at least about 35%, at least about 40%, at least about 45%, at least about 50%, at least about 55%, at least about 60%, at least about 65%, at least about 70%, at least about 75%, at least about 80%, at least about 85%, at least about 90%, at least about 91%, at least about 92%, at least about 93%, at least about 94%, at least about 95%, at least about 96%, at least about 97%, at least about 98%, at least about 99% identity to any one of SEQ ID No. 2490-2498. In some cases, the endonuclease can comprise a RuvC _ III domain, wherein the RuvC _ III domain is substantially identical to any one of SEQ ID NOs 2490-2498. The endonuclease can comprise a RuvC _ III domain having at least about 70% sequence identity to any one of SEQ ID NO 2490-2498. In some cases, the endonuclease can comprise a RuvC _ III domain, wherein the RuvC _ III domain has at least about 20%, at least about 25%, at least about 30%, at least about 35%, at least about 40%, at least about 45%, at least about 50%, at least about 55%, at least about 60%, at least about 65%, at least about 70%, at least about 75%, at least about 80%, at least about 85%, at least about 90%, at least about 91%, at least about 92%, at least about 93%, at least about 94%, at least about 95%, at least about 96%, at least about 97%, at least about 98%, at least about 99% identity to any one of SEQ ID No. 2490-2498. In some cases, the endonuclease can comprise a RuvC _ III domain that is substantially identical to any one of SEQ ID NO 2490-2498.

The endonuclease can comprise an HNH domain that is at least about 70% identical to any one of SEQ ID NO 4304-4312. In some cases, the endonuclease can comprise an HNH domain that is at least about 70%, at least about 75%, at least about 80%, at least about 85%, at least about 90%, at least about 91%, at least about 92%, at least about 93%, at least about 94%, at least about 95%, at least about 96%, at least about 97%, at least about 98%, or at least about 99% identical to any one of SEQ ID NO 4304-4312. The endonuclease may comprise an HNH domain substantially identical to any one of SEQ ID NO 4304-4312. In some cases, the endonuclease can comprise a variant that is at least about 70%, at least about 75%, at least about 80%, at least about 85%, at least about 90%, at least about 91%, at least about 92%, at least about 93%, at least about 94%, at least about 95%, at least about 96%, at least about 97%, at least about 98%, or at least about 99% identical to any of SEQ ID NO 4304-4312. The endonuclease may comprise an HNH domain substantially identical to any one of SEQ ID NO 4304-4312.

In some cases, the endonuclease can comprise a variant that is at least about 30%, at least about 35%, at least about 40%, at least about 45%, at least about 50%, at least about 55%, at least about 60%, at least about 65%, at least about 70%, at least about 75%, at least about 80%, at least about 85%, at least about 90%, at least about 95%, at least about 96%, at least about 97%, at least about 98%, or at least about 99% identical to any of SEQ ID NO 669-677. In some cases, the endonuclease can be substantially identical to any of SEQ ID NO 669-677. In some cases, the endonuclease can comprise a variant that is at least about 30%, at least about 35%, at least about 40%, at least about 45%, at least about 50%, at least about 55%, at least about 60%, at least about 65%, at least about 70%, at least about 75%, at least about 80%, at least about 85%, at least about 90%, at least about 95%, at least about 96%, at least about 97%, at least about 98%, or at least about 99% identical to any of SEQ ID NO 669-677. In some cases, the endonuclease can be substantially identical to any of SEQ ID NO 669-677.

In some cases, the endonuclease can comprise a variant having one or more Nuclear Localization Sequences (NLS). The NLS can be proximal to the N-terminus or C-terminus of the endonuclease. The NLS can be attached to the N-terminus or C-terminus of any of SEQ ID NO 669-677, or to the N-terminus or C-terminus of a variant having at least about 30%, at least about 35%, at least about 40%, at least about 45%, at least about 50%, at least about 55%, at least about 60%, at least about 65%, at least about 70%, at least about 75%, at least about 80%, at least about 85%, at least about 90%, at least about 95%, at least about 96%, at least about 97%, at least about 98%, or at least about 99% identity to any of SEQ ID NO 669-677. The NLS can be a SV40 large T antigen NLS. The NLS can be a c-myc NLS. The NLS may comprise a sequence having at least about 80%, at least about 85%, at least about 90%, at least about 95%, at least about 99% identity to any of SEQ ID NO 5593 and 5608. The NLS may comprise a sequence substantially identical to any one of SEQ ID NO: 5593-. The NLS may comprise any of the sequences in table 1, or a combination thereof:

in some cases, sequence identity may be determined by BLASTP, CLUSTALW, MUSCLE, MAFFT, Novafold, or Smith-Waterman homology search algorithms. Sequence identity can be determined by the BLASTP algorithm using the word length (W) of 3, the parameters of expectation (E) of 10, and using the BLOSUM62 scoring matrix (setting the gap (when present) penalty of 11, extension of 1) and using the conditional composition scoring matrix adjustment).

In some cases, the system can comprise (b) at least one engineered synthetic guide ribonucleic acid (sgRNA) capable of forming a complex with an endonuclease, with a 5' targeting region complementary to a desired cleavage sequence. In some cases, the 5' targeting region may comprise a PAM sequence compatible with the endonuclease. In some cases, the most 5' nucleotide of the targeting region may be G. In some cases, the 5' targeting region can be 15-23 nucleotides in length. The guide sequence and tracr sequence may be provided as separate ribonucleic acids (RNAs) or a single ribonucleic acid (RNA). The guide RNA may comprise a crRNA tracrRNA binding sequence 3' of the targeting region. The guide RNA may comprise a tracrRNA sequence preceded by a 4-nucleotide linker 3' to the tracrRNA binding region of the crRNA. The sgRNA can comprise, from 5 'to 3': a non-native guide nucleic acid sequence capable of hybridizing to a target sequence in a cell; and tracr sequences. In some cases, the non-native guide nucleic acid sequence and tracr sequence are covalently linked.

In some cases, the tracr sequence may have a specific sequence. the tracr sequence may have at least about 80% to at least about 60-100 (e.g., at least about 60, at least about 65, at least about 70, at least about 75, at least about 80, at least about 85, or at least about 90) consecutive nucleotides of the native tracr rna sequence. the tracr sequence may have at least about 80% sequence identity to at least about 60-100 (e.g., at least about 60, at least about 65, at least about 70, at least about 75, at least about 80, at least about 85, or at least about 90) consecutive nucleotides of SEQ ID No. 5504. In some cases, a tracrRNA can be at least about 80%, at least about 85%, at least about 90%, at least about 91%, at least about 92%, at least about 93%, at least about 94%, at least about 95%, at least about 96%, at least about 97%, at least about 98%, or at least about 99% identical to at least about 60-90 (e.g., at least about 60, at least about 65, at least about 70, at least about 75, at least about 80, at least about 85, or at least about 90) consecutive nucleotides of SEQ ID No. 5504. In some cases, the tracrRNA can be substantially identical to at least about 60-100 (e.g., at least about 60, at least about 65, at least about 70, at least about 75, at least about 80, at least about 85, or at least about 90) consecutive nucleotides of SEQ ID No. 5504. the tracrRNA may comprise SEQ ID NO 5504.

In some cases, the system can comprise two different sgrnas that target a first region and a second region to cleave in a target DNA locus, wherein the second region is located 3' to the first region. In some cases, the above system may comprise a single-stranded or double-stranded DNA repair template comprising, from 5 'to 3': a first homology arm comprising a sequence of at least about 20 (e.g., at least about 40, 80, 120, 150, 200, 300, 500, or 1kb) nucleotides 5 'to the first region, a synthetic DNA sequence of at least about 10 nucleotides, and a second homology arm comprising a sequence of at least about 20 (e.g., at least about 40, 80, 120, 150, 200, 300, 500, or 1kb) nucleotides 3' to the second region.

In another aspect, the present disclosure provides a method of modifying a target nucleic acid locus of interest. The methods can include delivering any of the non-natural systems disclosed herein (including the enzymes disclosed herein and at least one synthetic guide rna (sgrna)) to a target nucleic acid locus. The enzyme can form a complex with at least one sgRNA and can modify a target nucleic acid locus of interest upon binding of the complex to the target nucleic acid locus of interest. Delivering an enzyme to the locus can comprise transfecting a cell with the system or a nucleic acid encoding the system. Delivering a nuclease to the locus can comprise electroporating the cell with the system or a nucleic acid encoding the system. Delivering the nuclease to the locus can include incubating the system in a buffer with a nucleic acid comprising the locus of interest. In some cases, the target nucleic acid locus comprises deoxyribonucleic acid (DNA) or ribonucleic acid (RNA). The target nucleic acid locus may comprise genomic DNA, viral RNA, or bacterial DNA. The target nucleic acid locus can be within a cell. The target nucleic acid locus can be in vitro. The target nucleic acid locus can be in a eukaryotic cell or a prokaryotic cell. The cell may be an animal cell, a human cell, a bacterial cell, an archaeal cell, or a plant cell. The enzyme may induce a single-strand or double-strand break at or proximal to the target locus of interest.

Where the target nucleic acid locus can be intracellular, the enzyme can be provided as a nucleic acid comprising an open reading frame encoding an enzyme having a RuvC _ III domain at least about 75% (e.g., at least about 90%, at least about 91%, at least about 92%, at least about 93%, at least about 94%, at least about 95%, at least about 96%, at least about 97%, at least about 98%, at least about 99%) identical to any of SEQ ID NO: 2490-2498. In some cases, the nucleic acid comprises a promoter to which an open reading frame encoding an endonuclease is operably linked. The promoter may be a CMV, EF1a, SV40, PGK1, Ubc, human beta actin, CAG, TRE or CaMKIIa promoter. The endonuclease can be provided as a capped mRNA comprising the open reading frame encoding the endonuclease. The endonuclease may be provided as a translated polypeptide. The at least one engineered sgRNA can be provided as a deoxyribonucleic acid (DNA) comprising a gene sequence encoding the at least one engineered sgRNA operably linked to a ribonucleic acid (RNA) pol III promoter. In some cases, the organism may be a eukaryote. In some cases, the organism may be a fungus. In some cases, the organism may be a human.

MG14 enzyme

In one aspect, the present disclosure provides an engineered nuclease system comprising (a) an endonuclease. In some cases, the endonuclease is a Cas endonuclease. In some cases, the endonuclease is a type II, class II Cas endonuclease. The endonuclease can comprise a RuvC _ III domain, wherein the RuvC _ III domain has at least about 70% sequence identity to any one of SEQ ID NO: 2499-2750. In some cases, the endonuclease can comprise a RuvC _ III domain, wherein the RuvC _ III domain has at least about 20%, at least about 25%, at least about 30%, at least about 35%, at least about 40%, at least about 45%, at least about 50%, at least about 55%, at least about 60%, at least about 65%, at least about 70%, at least about 75%, at least about 80%, at least about 85%, at least about 90%, at least about 91%, at least about 92%, at least about 93%, at least about 94%, at least about 95%, at least about 96%, at least about 97%, at least about 98%, at least about 99% identity to any one of SEQ ID NO 2499-2750. In some cases, the endonuclease can comprise a RuvC _ III domain, wherein the RuvC _ III domain is substantially identical to any one of SEQ ID NO: 2499-2750. The endonuclease can comprise a RuvC _ III domain having at least about 70% sequence identity to any one of SEQ ID NO 2499-2750. In some cases, an endonuclease can comprise a RuvC _ III domain that is at least about 20%, at least about 25%, at least about 30%, at least about 35%, at least about 40%, at least about 45%, at least about 50%, at least about 55%, at least about 60%, at least about 65%, at least about 70%, at least about 75%, at least about 80%, at least about 85%, at least about 90%, at least about 91%, at least about 92%, at least about 93%, at least about 94%, at least about 95%, at least about 96%, at least about 97%, at least about 98%, at least about 99% identical to any of SEQ ID NO 2499-2750. In some cases, the endonuclease can comprise a RuvC _ III domain that is substantially identical to any one of SEQ ID NO 2499-2750.

The endonuclease may comprise an HNH domain that is at least about 70% identical to any one of SEQ ID NOs 4313-4564. In some cases, the endonuclease can comprise an HNH domain that is at least about 70%, at least about 75%, at least about 80%, at least about 85%, at least about 90%, at least about 91%, at least about 92%, at least about 93%, at least about 94%, at least about 95%, at least about 96%, at least about 97%, at least about 98%, or at least about 99% identical to any one of SEQ ID NO 4313-4564. The endonuclease may comprise an HNH domain which is substantially identical to any one of SEQ ID NOs 4313-4564. The endonuclease may comprise an HNH domain that is at least about 70% identical to any one of SEQ ID NOs 4313-4564. In some cases, the endonuclease can comprise an HNH domain that is at least about 70%, at least about 75%, at least about 80%, at least about 85%, at least about 90%, at least about 91%, at least about 92%, at least about 93%, at least about 94%, at least about 95%, at least about 96%, at least about 97%, at least about 98%, or at least about 99% identical to any one of SEQ ID NO 4067-4295. The endonuclease may comprise an HNH domain substantially identical to any one of SEQ ID NOS 4313-4564.

In some cases, the endonuclease can comprise a variant that is at least about 30%, at least about 35%, at least about 40%, at least about 45%, at least about 50%, at least about 55%, at least about 60%, at least about 65%, at least about 70%, at least about 75%, at least about 80%, at least about 85%, at least about 90%, at least about 95%, at least about 96%, at least about 97%, at least about 98%, or at least about 99% identical to any of SEQ ID NO 678-929. In some cases, the endonuclease can be substantially identical to any of SEQ ID NO 678-929. In some cases, the endonuclease can comprise a variant that is at least about 30%, at least about 35%, at least about 40%, at least about 45%, at least about 50%, at least about 55%, at least about 60%, at least about 65%, at least about 70%, at least about 75%, at least about 80%, at least about 85%, at least about 90%, at least about 95%, at least about 96%, at least about 97%, at least about 98%, or at least about 99% identical to any of SEQ ID NO 678-929. In some cases, the endonuclease can be substantially identical to any of SEQ ID NO 678-929.

In some cases, the endonuclease can comprise a variant having one or more Nuclear Localization Sequences (NLS). The NLS can be proximal to the N-terminus or C-terminus of the endonuclease. NLS can be attached to the N-terminus or C-terminus of any of SEQ ID NO 678-929 or to the N-terminus or C-terminus of a variant having at least about 30%, at least about 35%, at least about 40%, at least about 45%, at least about 50%, at least about 55%, at least about 60%, at least about 65%, at least about 70%, at least about 75%, at least about 80%, at least about 85%, at least about 90%, at least about 95%, at least about 96%, at least about 97%, at least about 98%, or at least about 99% identity to any of SEQ ID NO 678-929. The NLS can be a SV40 large T antigen NLS. The NLS can be a c-myc NLS. The NLS may comprise a sequence having at least about 80%, at least about 85%, at least about 90%, at least about 95%, at least about 99% identity to any of SEQ ID NO 5593 and 5608. The NLS may comprise a sequence substantially identical to any one of SEQ ID NO: 5593-. The NLS may comprise any of the sequences in table 1, or a combination thereof:

In some cases, sequence identity may be determined by BLASTP, CLUSTALW, MUSCLE, MAFFT, Novafold, or Smith-Waterman homology search algorithms. Sequence identity can be determined by the BLASTP algorithm using the word length (W) of 3, the parameters of expectation (E) of 10, and using the BLOSUM62 scoring matrix (setting the gap (when present) penalty of 11, extension of 1) and using the conditional composition scoring matrix adjustment).

In some cases, the system can comprise (b) at least one engineered synthetic guide ribonucleic acid (sgRNA) capable of forming a complex with an endonuclease, with a 5' targeting region complementary to a desired cleavage sequence. In some cases, the 5' targeting region may comprise a PAM sequence compatible with the endonuclease. In some cases, the most 5' nucleotide of the targeting region may be G. In some cases, the 5' targeting region can be 15-23 nucleotides in length. The guide sequence and tracr sequence may be provided as separate ribonucleic acids (RNAs) or a single ribonucleic acid (RNA). The guide RNA may comprise a crRNA tracrRNA binding sequence 3' of the targeting region. The guide RNA may comprise a tracrRNA sequence preceded by a 4-nucleotide linker 3' to the tracrRNA binding region of the crRNA. The sgRNA can comprise, from 5 'to 3': a non-native guide nucleic acid sequence capable of hybridizing to a target sequence in a cell; and tracr sequences. In some cases, the non-native guide nucleic acid sequence and tracr sequence are covalently linked.

In some cases, the tracr sequence may have a specific sequence. the tracr sequence may have at least about 80% to at least about 60-100 (e.g., at least about 60, at least about 65, at least about 70, at least about 75, at least about 80, at least about 85, or at least about 90) consecutive nucleotides of the native tracr rna sequence. the tracr sequence may have at least about 80% sequence identity to at least about 60-100 (e.g., at least about 60, at least about 65, at least about 70, at least about 75, at least about 80, at least about 85, or at least about 90) consecutive nucleotides of SEQ ID No. 5505. In some cases, a tracrRNA can have at least about 80%, at least about 85%, at least about 90%, at least about 91%, at least about 92%, at least about 93%, at least about 94%, at least about 95%, at least about 96%, at least about 97%, at least about 98%, or at least about 99% identity to at least about 60-90 (e.g., at least about 60, at least about 65, at least about 70, at least about 75, at least about 80, at least about 85, or at least about 90) consecutive nucleotides of SEQ ID No. 5505. In some cases, the tracrRNA can be substantially identical to at least about 60-100 (e.g., at least about 60, at least about 65, at least about 70, at least about 75, at least about 80, at least about 85, or at least about 90) consecutive nucleotides of SEQ ID No. 5505. the tracrRNA may comprise SEQ ID NO: 5505.

In some cases, at least one engineered synthetic guide ribonucleic acid (sgRNA) capable of forming a complex with an endonuclease can comprise a sequence having at least about 80% identity to SEQ ID NO: 5469. The sgRNA can comprise a sequence having at least about 80%, at least about 85%, at least about 90%, at least about 91%, at least about 92%, at least about 93%, at least about 94%, at least about 95%, at least about 96%, at least about 97%, at least about 98%, or at least about 99% identity to SEQ ID No. 5469. The sgRNA can comprise a sequence substantially identical to SEQ ID NO 5469.

In some cases, the system can comprise two different sgrnas that target a first region and a second region to cleave in a target DNA locus, wherein the second region is located 3' to the first region. In some cases, the above system may comprise a single-stranded or double-stranded DNA repair template comprising, from 5 'to 3': a first homology arm comprising a sequence of at least about 20 (e.g., at least about 40, 80, 120, 150, 200, 300, 500, or 1kb) nucleotides 5 'to the first region, a synthetic DNA sequence of at least about 10 nucleotides, and a second homology arm comprising a sequence of at least about 20 (e.g., at least about 40, 80, 120, 150, 200, 300, 500, or 1kb) nucleotides 3' to the second region.

In another aspect, the present disclosure provides a method of modifying a target nucleic acid locus of interest. The methods can include delivering any of the non-natural systems disclosed herein (including the enzymes disclosed herein and at least one synthetic guide rna (sgrna)) to a target nucleic acid locus. The enzyme can form a complex with at least one sgRNA and can modify a target nucleic acid locus of interest upon binding of the complex to the target nucleic acid locus of interest. Delivering an enzyme to the locus can comprise transfecting a cell with the system or a nucleic acid encoding the system. Delivering a nuclease to the locus can comprise electroporating the cell with the system or a nucleic acid encoding the system. Delivering the nuclease to the locus can include incubating the system in a buffer with a nucleic acid comprising the locus of interest. In some cases, the target nucleic acid locus comprises deoxyribonucleic acid (DNA) or ribonucleic acid (RNA). The target nucleic acid locus may comprise genomic DNA, viral RNA, or bacterial DNA. The target nucleic acid locus can be within a cell. The target nucleic acid locus can be in vitro. The target nucleic acid locus can be in a eukaryotic cell or a prokaryotic cell. The cell may be an animal cell, a human cell, a bacterial cell, an archaeal cell, or a plant cell. The enzyme may induce a single-strand or double-strand break at or proximal to the target locus of interest.

Where the target nucleic acid locus can be intracellular, the enzyme can be provided as a nucleic acid comprising an open reading frame encoding an enzyme having a RuvC _ III domain at least about 75% (e.g., at least about 90%, at least about 91%, at least about 92%, at least about 93%, at least about 94%, at least about 95%, at least about 96%, at least about 97%, at least about 98%, at least about 99%) identical to any of SEQ ID NO 2499-2750. A deoxyribonucleic acid (DNA) comprising an open reading frame encoding the endonuclease can comprise a sequence substantially identical to SEQ ID No. 5581 or a variant having at least about 30%, at least about 35%, at least about 40%, at least about 45%, at least about 50%, at least about 55%, at least about 60%, at least about 65%, at least about 70%, at least about 75%, at least about 80%, at least about 85%, at least about 90%, at least about 95%, at least about 96%, at least about 97%, at least about 98%, or at least about 99% identity to SEQ ID No. 5581. In some cases, the nucleic acid comprises a promoter to which an open reading frame encoding an endonuclease is operably linked. The promoter may be a CMV, EF1a, SV40, PGK1, Ubc, human beta actin, CAG, TRE or CaMKIIa promoter. The endonuclease can be provided as a capped mRNA comprising the open reading frame encoding the endonuclease. The endonuclease may be provided as a translated polypeptide. The at least one engineered sgRNA can be provided as a deoxyribonucleic acid (DNA) comprising a gene sequence encoding the at least one engineered sgRNA operably linked to a ribonucleic acid (RNA) pol III promoter. In some cases, the organism may be a eukaryote. In some cases, the organism may be a fungus. In some cases, the organism may be a human.

MG15 enzyme

In one aspect, the present disclosure provides an engineered nuclease system comprising (a) an endonuclease. In some cases, the endonuclease is a Cas endonuclease. In some cases, the endonuclease is a type II, class II Cas endonuclease. The endonuclease can comprise a RuvC _ III domain, wherein the RuvC _ III domain has at least about 70% sequence identity to any one of SEQ ID NO 2751-2913. In some cases, the endonuclease can comprise a RuvC _ III domain, wherein the RuvC _ III domain has at least about 20%, at least about 25%, at least about 30%, at least about 35%, at least about 40%, at least about 45%, at least about 50%, at least about 55%, at least about 60%, at least about 65%, at least about 70%, at least about 75%, at least about 80%, at least about 85%, at least about 90%, at least about 91%, at least about 92%, at least about 93%, at least about 94%, at least about 95%, at least about 96%, at least about 97%, at least about 98%, at least about 99% identity to any one of SEQ ID NO 2751-2913. In some cases, the endonuclease can comprise a RuvC _ III domain, wherein it is substantially identical to any one of SEQ ID NO 2751-2913. The endonuclease can comprise a RuvC _ III domain having at least about 70% sequence identity to any one of SEQ ID NO 2751-2913. In some cases, the endonuclease can comprise a RuvC _ III domain that is at least about 20%, at least about 25%, at least about 30%, at least about 35%, at least about 40%, at least about 45%, at least about 50%, at least about 55%, at least about 60%, at least about 65%, at least about 70%, at least about 75%, at least about 80%, at least about 85%, at least about 90%, at least about 91%, at least about 92%, at least about 93%, at least about 94%, at least about 95%, at least about 96%, at least about 97%, at least about 98%, or at least about 99% identical to any of SEQ ID NO 2751-2913. In some cases, the endonuclease can comprise a RuvC _ III domain that is substantially identical to any one of SEQ ID NO 2751-2913.

The endonuclease may comprise an HNH domain that is at least about 70% identical to any one of SEQ ID NO: 4565-4727. In some cases, the endonuclease can comprise an HNH domain that is at least about 70%, at least about 75%, at least about 80%, at least about 85%, at least about 90%, at least about 91%, at least about 92%, at least about 93%, at least about 94%, at least about 95%, at least about 96%, at least about 97%, at least about 98%, or at least about 99% identical to any one of SEQ ID NO 4565-4727. The endonuclease may comprise an HNH domain substantially identical to any one of SEQ ID NO: 4565-4727. The endonuclease may comprise an HNH domain that is at least about 70% identical to any one of SEQ ID NO: 4565-4727. In some cases, the endonuclease can comprise an HNH domain that is at least about 70%, at least about 75%, at least about 80%, at least about 85%, at least about 90%, at least about 91%, at least about 92%, at least about 93%, at least about 94%, at least about 95%, at least about 96%, at least about 97%, at least about 98%, or at least about 99% identical to any one of SEQ ID NO 4565-4727. The endonuclease may comprise an HNH domain substantially identical to any one of SEQ ID NO: 4565-4727.

In some cases, the endonuclease can comprise a variant that is at least about 30%, at least about 35%, at least about 40%, at least about 45%, at least about 50%, at least about 55%, at least about 60%, at least about 65%, at least about 70%, at least about 75%, at least about 80%, at least about 85%, at least about 90%, at least about 95%, at least about 96%, at least about 97%, at least about 98%, or at least about 99% identical to any of SEQ ID NO 930-. In some cases, the endonuclease can be substantially identical to any of SEQ ID NO 930-1092. In some cases, the endonuclease can comprise a variant that is at least about 30%, at least about 35%, at least about 40%, at least about 45%, at least about 50%, at least about 55%, at least about 60%, at least about 65%, at least about 70%, at least about 75%, at least about 80%, at least about 85%, at least about 90%, at least about 95%, at least about 96%, at least about 97%, at least about 98%, or at least about 99% identical to any of SEQ ID NO 930-. In some cases, the endonuclease can be substantially identical to any of SEQ ID NO 930-1092.

In some cases, the endonuclease can comprise a variant having one or more Nuclear Localization Sequences (NLS). The NLS can be proximal to the N-terminus or C-terminus of the endonuclease. NLS can be attached to the N-terminus or C-terminus of any of SEQ ID NO 930-. The NLS can be a SV40 large T antigen NLS. The NLS can be a c-myc NLS. The NLS may comprise a sequence having at least about 80%, at least about 85%, at least about 90%, at least about 95%, at least about 99% identity to any of SEQ ID NO 5593 and 5608. The NLS may comprise a sequence substantially identical to any one of SEQ ID NO: 5593-. The NLS may comprise any of the sequences in table 1, or a combination thereof:

In some cases, sequence identity may be determined by BLASTP, CLUSTALW, MUSCLE, MAFFT, Novafold, or Smith-Waterman homology search algorithms. Sequence identity can be determined by the BLASTP algorithm using the word length (W) of 3, the parameters of expectation (E) of 10, and using the BLOSUM62 scoring matrix (setting the gap (when present) penalty of 11, extension of 1) and using the conditional composition scoring matrix adjustment).

In some cases, the system can comprise (b) at least one engineered synthetic guide ribonucleic acid (sgRNA) capable of forming a complex with an endonuclease, with a 5' targeting region complementary to a desired cleavage sequence. In some cases, the 5' targeting region may comprise a PAM sequence compatible with the endonuclease. In some cases, the most 5' nucleotide of the targeting region may be G. In some cases, the 5' targeting region can be 15-23 nucleotides in length. The guide sequence and tracr sequence may be provided as separate ribonucleic acids (RNAs) or a single ribonucleic acid (RNA). The guide RNA may comprise a crRNA tracrRNA binding sequence 3' of the targeting region. The guide RNA may comprise a tracrRNA sequence preceded by a 4-nucleotide linker 3' to the tracrRNA binding region of the crRNA. The sgRNA can comprise, from 5 'to 3': a non-native guide nucleic acid sequence capable of hybridizing to a target sequence in a cell; and tracr sequences. In some cases, the non-native guide nucleic acid sequence and tracr sequence are covalently linked.

In some cases, the tracr sequence may have a specific sequence. the tracr sequence may have at least about 80% to at least about 60-100 (e.g., at least about 60, at least about 65, at least about 70, at least about 75, at least about 80, at least about 85, or at least about 90) consecutive nucleotides of the native tracr rna sequence. the tracr sequence may have at least about 80% sequence identity to at least about 60-100 (e.g., at least about 60, at least about 65, at least about 70, at least about 75, at least about 80, at least about 85, or at least about 90) consecutive nucleotides of SEQ ID No. 5506. In some cases, a tracrRNA can be at least about 80%, at least about 85%, at least about 90%, at least about 91%, at least about 92%, at least about 93%, at least about 94%, at least about 95%, at least about 96%, at least about 97%, at least about 98%, or at least about 99% identical to at least about 60-90 (e.g., at least about 60, at least about 65, at least about 70, at least about 75, at least about 80, at least about 85, or at least about 90) consecutive nucleotides of SEQ ID No. 5506. In some cases, the tracrRNA can be substantially identical to at least about 60-100 (e.g., at least about 60, at least about 65, at least about 70, at least about 75, at least about 80, at least about 85, or at least about 90) consecutive nucleotides of SEQ ID No. 5506. the tracrRNA may comprise SEQ ID NO 5506.

In some cases, at least one engineered synthetic guide ribonucleic acid (sgRNA) capable of forming a complex with the endonuclease can comprise a sequence having at least about 80% identity to SEQ ID NO: 5470. The sgRNA can comprise a sequence that is at least about 80%, at least about 85%, at least about 90%, at least about 91%, at least about 92%, at least about 93%, at least about 94%, at least about 95%, at least about 96%, at least about 97%, at least about 98%, or at least about 99% identical to SEQ ID No. 5470. The sgRNA can comprise a sequence substantially identical to SEQ ID NO: 5470.

In some cases, the system can comprise two different sgrnas that target a first region and a second region to cleave in a target DNA locus, wherein the second region is located 3' to the first region. In some cases, the above system may comprise a single-stranded or double-stranded DNA repair template comprising, from 5 'to 3': a first homology arm comprising a sequence of at least about 20 (e.g., at least about 40, 80, 120, 150, 200, 300, 500, or 1kb) nucleotides 5 'to the first region, a synthetic DNA sequence of at least about 10 nucleotides, and a second homology arm comprising a sequence of at least about 20 (e.g., at least about 40, 80, 120, 150, 200, 300, 500, or 1kb) nucleotides 3' to the second region.

In another aspect, the present disclosure provides a method of modifying a target nucleic acid locus of interest. The methods can include delivering any of the non-natural systems disclosed herein (including the enzymes disclosed herein and at least one synthetic guide rna (sgrna)) to a target nucleic acid locus. The enzyme can form a complex with at least one sgRNA and can modify a target nucleic acid locus of interest upon binding of the complex to the target nucleic acid locus of interest. Delivering an enzyme to the locus can comprise transfecting a cell with the system or a nucleic acid encoding the system. Delivering a nuclease to the locus can comprise electroporating the cell with the system or a nucleic acid encoding the system. Delivering the nuclease to the locus can include incubating the system in a buffer with a nucleic acid comprising the locus of interest. In some cases, the target nucleic acid locus comprises deoxyribonucleic acid (DNA) or ribonucleic acid (RNA). The target nucleic acid locus may comprise genomic DNA, viral RNA, or bacterial DNA. The target nucleic acid locus can be within a cell. The target nucleic acid locus can be in vitro. The target nucleic acid locus can be in a eukaryotic cell or a prokaryotic cell. The cell may be an animal cell, a human cell, a bacterial cell, an archaeal cell, or a plant cell. The enzyme may induce a single-strand or double-strand break at or proximal to the target locus of interest.

Where the target nucleic acid locus can be intracellular, the enzyme can be provided as a nucleic acid comprising an open reading frame encoding an enzyme having a RuvC _ III domain at least about 75% (e.g., at least about 90%, at least about 91%, at least about 92%, at least about 93%, at least about 94%, at least about 95%, at least about 96%, at least about 97%, at least about 98%, at least about 99%) identical to any of SEQ ID NO 2751-2913. A deoxyribonucleic acid (DNA) comprising an open reading frame encoding the endonuclease can comprise a sequence substantially identical to SEQ ID No. 5582 or a variant having at least about 30%, at least about 35%, at least about 40%, at least about 45%, at least about 50%, at least about 55%, at least about 60%, at least about 65%, at least about 70%, at least about 75%, at least about 80%, at least about 85%, at least about 90%, at least about 95%, at least about 96%, at least about 97%, at least about 98%, or at least about 99% identity to SEQ ID No. 5582. In some cases, the nucleic acid comprises a promoter to which an open reading frame encoding an endonuclease is operably linked. The promoter may be a CMV, EF1a, SV40, PGK1, Ubc, human beta actin, CAG, TRE or CaMKIIa promoter. The endonuclease can be provided as a capped mRNA comprising the open reading frame encoding the endonuclease. The endonuclease may be provided as a translated polypeptide. The at least one engineered sgRNA can be provided as a deoxyribonucleic acid (DNA) comprising a gene sequence encoding the at least one engineered sgRNA operably linked to a ribonucleic acid (RNA) pol III promoter. In some cases, the organism may be a eukaryote. In some cases, the organism may be a fungus. In some cases, the organism may be a human.

MG16 enzyme

In one aspect, the present disclosure provides an engineered nuclease system comprising (a) an endonuclease. In some cases, the endonuclease is a Cas endonuclease. In some cases, the endonuclease is a type II, class II Cas endonuclease. The endonuclease can comprise a RuvC _ III domain, wherein the RuvC _ III domain has at least about 70% sequence identity to any one of SEQ ID NO 2914-3174. In some cases, the endonuclease can comprise a RuvC _ III domain, wherein the RuvC _ III domain has at least about 20%, at least about 25%, at least about 30%, at least about 35%, at least about 40%, at least about 45%, at least about 50%, at least about 55%, at least about 60%, at least about 65%, at least about 70%, at least about 75%, at least about 80%, at least about 85%, at least about 90%, at least about 91%, at least about 92%, at least about 93%, at least about 94%, at least about 95%, at least about 96%, at least about 97%, at least about 98%, at least about 99% identity to any one of SEQ ID NO 2914-3174. In some cases, the endonuclease can comprise a RuvC _ III domain that is substantially identical to any one of SEQ ID NO 2914-3174. The endonuclease can comprise a RuvC _ III domain having at least about 70% sequence identity to any one of SEQ ID NO 2914-3174. In some cases, the endonuclease can comprise a RuvC _ III domain that is at least about 20%, at least about 25%, at least about 30%, at least about 35%, at least about 40%, at least about 45%, at least about 50%, at least about 55%, at least about 60%, at least about 65%, at least about 70%, at least about 75%, at least about 80%, at least about 85%, at least about 90%, at least about 91%, at least about 92%, at least about 93%, at least about 94%, at least about 95%, at least about 96%, at least about 97%, at least about 98%, at least about 99% identical to any of SEQ ID NO 2914-3174. In some cases, the endonuclease can comprise a RuvC _ III domain that is substantially identical to any one of SEQ ID NO 2914-3174.

The endonuclease may comprise an HNH domain that is at least about 70% identical to any one of SEQ ID NO 4728-4988. In some cases, the endonuclease can comprise an HNH domain that is at least about 70%, at least about 75%, at least about 80%, at least about 85%, at least about 90%, at least about 91%, at least about 92%, at least about 93%, at least about 94%, at least about 95%, at least about 96%, at least about 97%, at least about 98%, or at least about 99% identical to any one of SEQ ID NO 4728 and 4988. The endonuclease may comprise an HNH domain substantially identical to any one of SEQ ID NO 4728-4988. The endonuclease may comprise an HNH domain that is at least about 70% identical to any one of SEQ ID NO 4728-4988. In some cases, the endonuclease can comprise an HNH domain that is at least about 70%, at least about 75%, at least about 80%, at least about 85%, at least about 90%, at least about 91%, at least about 92%, at least about 93%, at least about 94%, at least about 95%, at least about 96%, at least about 97%, at least about 98%, or at least about 99% identical to any one of SEQ ID NO 4728 and 4988. The endonuclease may comprise an HNH domain substantially identical to any one of SEQ ID NO 4728-4988.

In some cases, the endonuclease can comprise a variant that is at least about 30%, at least about 35%, at least about 40%, at least about 45%, at least about 50%, at least about 55%, at least about 60%, at least about 65%, at least about 70%, at least about 75%, at least about 80%, at least about 85%, at least about 90%, at least about 95%, at least about 96%, at least about 97%, at least about 98%, or at least about 99% identical to any of SEQ ID NO 1093-1353. In some cases, the endonuclease can be substantially identical to any of SEQ ID NO 1093-1353. In some cases, the endonuclease can comprise a variant that is at least about 30%, at least about 35%, at least about 40%, at least about 45%, at least about 50%, at least about 55%, at least about 60%, at least about 65%, at least about 70%, at least about 75%, at least about 80%, at least about 85%, at least about 90%, at least about 95%, at least about 96%, at least about 97%, at least about 98%, or at least about 99% identical to any of SEQ ID NO 1093-1353. In some cases, the endonuclease can be substantially identical to any of SEQ ID NO 1093-1353.

In some cases, the endonuclease can comprise a variant having one or more Nuclear Localization Sequences (NLS). The NLS can be proximal to the N-terminus or C-terminus of the endonuclease. The NLS can be attached to the N-terminus or C-terminus of any of SEQ ID NO 1093-1353, or to the N-terminus or C-terminus of a variant having at least about 30%, at least about 35%, at least about 40%, at least about 45%, at least about 50%, at least about 55%, at least about 60%, at least about 65%, at least about 70%, at least about 75%, at least about 80%, at least about 85%, at least about 90%, at least about 95%, at least about 96%, at least about 97%, at least about 98%, or at least about 99% identity to any of SEQ ID NO 1093-1353. The NLS can be a SV40 large T antigen NLS. The NLS can be a c-myc NLS. The NLS may comprise a sequence having at least about 80%, at least about 85%, at least about 90%, at least about 95%, at least about 99% identity to any of SEQ ID NO 5593-5608. The NLS may comprise a sequence substantially identical to any one of SEQ ID NO: 5593-. The NLS may comprise any of the sequences in table 1 or a combination thereof:

in some cases, sequence identity may be determined by BLASTP, CLUSTALW, MUSCLE, MAFFT, Novafold, or Smith-Waterman homology search algorithms. Sequence identity can be determined by the BLASTP algorithm using the word length (W) of 3, the parameters of expectation (E) of 10, and using the BLOSUM62 scoring matrix (setting the gap (when present) penalty of 11, extension of 1) and using the conditional composition scoring matrix adjustment).

In some cases, the system can comprise (b) at least one engineered synthetic guide ribonucleic acid (sgRNA) capable of forming a complex with an endonuclease, with a 5' targeting region complementary to a desired cleavage sequence. In some cases, the 5' targeting region may comprise a PAM sequence compatible with the endonuclease. In some cases, the most 5' nucleotide of the targeting region may be G. In some cases, the 5' targeting region can be 15-23 nucleotides in length. The guide sequence and tracr sequence may be provided as separate ribonucleic acids (RNAs) or a single ribonucleic acid (RNA). The guide RNA may comprise a crRNA tracrRNA binding sequence 3' of the targeting region. The guide RNA may comprise a tracrRNA sequence preceded by a 4-nucleotide linker 3' to the tracrRNA binding region of the crRNA. The sgRNA can comprise, from 5 'to 3': a non-native guide nucleic acid sequence capable of hybridizing to a target sequence in a cell; and tracr sequences. In some cases, the non-native guide nucleic acid sequence and tracr sequence are covalently linked.

In some cases, the tracr sequence may have a specific sequence. the tracr sequence may have at least about 80% to at least about 60-100 (e.g., at least about 60, at least about 65, at least about 70, at least about 75, at least about 80, at least about 85, or at least about 90) consecutive nucleotides of the native tracr rna sequence. the tracr sequence may have at least about 80% sequence identity to at least about 60-100 (e.g., at least about 60, at least about 65, at least about 70, at least about 75, at least about 80, at least about 85, or at least about 90) consecutive nucleotides of SEQ ID No. 5507. In some cases, a tracrRNA can have at least about 80%, at least about 85%, at least about 90%, at least about 91%, at least about 92%, at least about 93%, at least about 94%, at least about 95%, at least about 96%, at least about 97%, at least about 98%, or at least about 99% identity to at least about 60-90 (e.g., at least about 60, at least about 65, at least about 70, at least about 75, at least about 80, at least about 85, or at least about 90) consecutive nucleotides of SEQ ID No. 5507. In some cases, the tracrRNA can be substantially identical to at least about 60-100 (e.g., at least about 60, at least about 65, at least about 70, at least about 75, at least about 80, at least about 85, or at least about 90) consecutive nucleotides of SEQ ID No. 5507. the tracrRNA may comprise SEQ ID NO 5507.

In some cases, at least one engineered synthesis-directing ribonucleic acid (sgRNA) capable of forming a complex with an endonuclease can comprise a sequence having at least about 80% identity to SEQ ID NO: 5471. The sgRNA can comprise a sequence that is at least about 80%, at least about 85%, at least about 90%, at least about 91%, at least about 92%, at least about 93%, at least about 94%, at least about 95%, at least about 96%, at least about 97%, at least about 98%, or at least about 99% identical to SEQ ID No. 5471. The sgRNA can comprise a sequence substantially identical to SEQ ID NO: 5471.

In some cases, the system can comprise two different sgrnas that target a first region and a second region to cleave in a target DNA locus, wherein the second region is located 3' to the first region. In some cases, the above system may comprise a single-stranded or double-stranded DNA repair template comprising, from 5 'to 3': a first homology arm comprising a sequence of at least about 20 (e.g., at least about 40, 80, 120, 150, 200, 300, 500, or 1kb) nucleotides 5 'to the first region, a synthetic DNA sequence of at least about 10 nucleotides, and a second homology arm comprising a sequence of at least about 20 (e.g., at least about 40, 80, 120, 150, 200, 300, 500, or 1kb) nucleotides 3' to the second region.

In another aspect, the present disclosure provides a method of modifying a target nucleic acid locus of interest. The methods can include delivering any of the non-natural systems disclosed herein (including the enzymes disclosed herein and at least one synthetic guide rna (sgrna)) to a target nucleic acid locus. The enzyme can form a complex with at least one sgRNA and can modify a target nucleic acid locus of interest upon binding of the complex to the target nucleic acid locus of interest. Delivering an enzyme to the locus can comprise transfecting a cell with the system or a nucleic acid encoding the system. Delivering a nuclease to the locus can comprise electroporating the cell with the system or a nucleic acid encoding the system. Delivering the nuclease to the locus can include incubating the system in a buffer with a nucleic acid comprising the locus of interest. In some cases, the target nucleic acid locus comprises deoxyribonucleic acid (DNA) or ribonucleic acid (RNA). The target nucleic acid locus may comprise genomic DNA, viral RNA, or bacterial DNA. The target nucleic acid locus can be within a cell. The target nucleic acid locus can be in vitro. The target nucleic acid locus can be in a eukaryotic cell or a prokaryotic cell. The cell may be an animal cell, a human cell, a bacterial cell, an archaeal cell, or a plant cell. The enzyme may induce a single-strand or double-strand break at or proximal to the target locus of interest.

Where the target nucleic acid locus can be intracellular, the enzyme can be provided as a nucleic acid comprising an open reading frame encoding an enzyme having a RuvC III domain at least about 75% (e.g., at least about 90%, at least about 91%, at least about 92%, at least about 93%, at least about 94%, at least about 95%, at least about 96%, at least about 97%, at least about 98%, at least about 99%) identical to any of SEQ ID NOs 2914-3174. A deoxyribonucleic acid (DNA) comprising an open reading frame encoding the endonuclease can comprise a sequence substantially identical to SEQ ID No. 5583 or a variant having at least about 30%, at least about 35%, at least about 40%, at least about 45%, at least about 50%, at least about 55%, at least about 60%, at least about 65%, at least about 70%, at least about 75%, at least about 80%, at least about 85%, at least about 90%, at least about 95%, at least about 96%, at least about 97%, at least about 98%, or at least about 99% identity to SEQ ID No. 5583. In some cases, the nucleic acid comprises a promoter to which an open reading frame encoding an endonuclease is operably linked. The promoter may be a CMV, EF1a, SV40, PGK1, Ubc, human beta actin, CAG, TRE or CaMKIIa promoter. The endonuclease can be provided as a capped mRNA comprising the open reading frame encoding the endonuclease. The endonuclease may be provided as a translated polypeptide. The at least one engineered sgRNA can be provided as a deoxyribonucleic acid (DNA) comprising a gene sequence encoding the at least one engineered sgRNA operably linked to a ribonucleic acid (RNA) pol III promoter. In some cases, the organism may be a eukaryote. In some cases, the organism may be a fungus. In some cases, the organism may be a human.

MG18 enzyme

In one aspect, the present disclosure provides an engineered nuclease system comprising (a) an endonuclease. In some cases, the endonuclease is a Cas endonuclease. In some cases, the endonuclease is a type II, class II Cas endonuclease. The endonuclease can comprise a RuvC _ III domain, wherein the RuvC _ III domain has at least about 70% sequence identity to any one of SEQ ID NO 3175-3300. In some cases, the endonuclease can comprise a RuvC _ III domain, wherein the RuvC _ III domain has at least about 20%, at least about 25%, at least about 30%, at least about 35%, at least about 40%, at least about 45%, at least about 50%, at least about 55%, at least about 60%, at least about 65%, at least about 70%, at least about 75%, at least about 80%, at least about 85%, at least about 90%, at least about 91%, at least about 92%, at least about 93%, at least about 94%, at least about 95%, at least about 96%, at least about 97%, at least about 98%, at least about 99% identity to any one of SEQ ID NO 3175-3300. In some cases, the endonuclease can comprise a RuvC _ III domain that is substantially identical to any one of SEQ ID NO 3175-3300. The endonuclease can comprise a RuvC _ III domain having at least about 70% sequence identity to any one of SEQ ID NO 3175-3300. In some cases, the endonuclease can comprise a RuvC _ III domain that is at least about 20%, at least about 25%, at least about 30%, at least about 35%, at least about 40%, at least about 45%, at least about 50%, at least about 55%, at least about 60%, at least about 65%, at least about 70%, at least about 75%, at least about 80%, at least about 85%, at least about 90%, at least about 91%, at least about 92%, at least about 93%, at least about 94%, at least about 95%, at least about 96%, at least about 97%, at least about 98%, at least about 99% identical to any of SEQ ID NO 3175-3300. In some cases, the endonuclease can comprise a RuvC _ III domain that is substantially identical to any one of SEQ ID NO 3175-3300.

The endonuclease may comprise an HNH domain that is at least about 70% identical to any one of SEQ ID NO:4989 and 5146. In some cases, the endonuclease can comprise an HNH domain that is at least about 70%, at least about 75%, at least about 80%, at least about 85%, at least about 90%, at least about 91%, at least about 92%, at least about 93%, at least about 94%, at least about 95%, at least about 96%, at least about 97%, at least about 98%, or at least about 99% identical to any of SEQ ID NO 4989 and 5146. The endonuclease may comprise an HNH domain substantially identical to any one of SEQ ID NO: 4989-5146. The endonuclease may comprise an HNH domain that is at least about 70% identical to any one of SEQ ID NO:4989 and 5146. In some cases, the endonuclease can comprise an HNH domain that is at least about 70%, at least about 75%, at least about 80%, at least about 85%, at least about 90%, at least about 91%, at least about 92%, at least about 93%, at least about 94%, at least about 95%, at least about 96%, at least about 97%, at least about 98%, or at least about 99% identical to any one of SEQ ID NO 4989 and 5146. The endonuclease may comprise an HNH domain substantially identical to any one of SEQ ID NO: 4989-5146.

In some cases, the endonuclease can comprise a variant that is at least about 30%, at least about 35%, at least about 40%, at least about 45%, at least about 50%, at least about 55%, at least about 60%, at least about 65%, at least about 70%, at least about 75%, at least about 80%, at least about 85%, at least about 90%, at least about 95%, at least about 96%, at least about 97%, at least about 98%, or at least about 99% identical to any of SEQ ID NO 1354-1511. In some cases, the endonuclease can be substantially identical to any one of SEQ ID NO 1354-1511. In some cases, the endonuclease can comprise a variant that is at least about 30%, at least about 35%, at least about 40%, at least about 45%, at least about 50%, at least about 55%, at least about 60%, at least about 65%, at least about 70%, at least about 75%, at least about 80%, at least about 85%, at least about 90%, at least about 95%, at least about 96%, at least about 97%, at least about 98%, or at least about 99% identical to any of SEQ ID NO 1354-1511. In some cases, the endonuclease can be substantially identical to any one of SEQ ID NO 1354-1511.

In some cases, the endonuclease can comprise a variant having one or more Nuclear Localization Sequences (NLS). The NLS can be proximal to the N-terminus or C-terminus of the endonuclease. The NLS can be attached to the N-terminus or C-terminus of any of SEQ ID NO 1354-1511 or to the N-terminus or C-terminus of a variant having at least about 30%, at least about 35%, at least about 40%, at least about 45%, at least about 50%, at least about 55%, at least about 60%, at least about 65%, at least about 70%, at least about 75%, at least about 80%, at least about 85%, at least about 90%, at least about 95%, at least about 96%, at least about 97%, at least about 98%, or at least about 99% identity to any of SEQ ID NO 1354-1511. The NLS can be a SV40 large T antigen NLS. The NLS can be a c-myc NLS. The NLS may comprise a sequence having at least about 80%, at least about 85%, at least about 90%, at least about 95%, at least about 99% identity to any of SEQ ID NO 5593-5608. The NLS may comprise a sequence substantially identical to any one of SEQ ID NO: 5593-. The NLS may comprise any of the sequences in table 1 or a combination thereof:

in some cases, sequence identity may be determined by BLASTP, CLUSTALW, MUSCLE, MAFFT, Novafold, or Smith-Waterman homology search algorithms. Sequence identity can be determined by the BLASTP algorithm using the word length (W) of 3, the parameters of expectation (E) of 10, and using the BLOSUM62 scoring matrix (setting the gap (when present) penalty of 11, extension of 1) and using the conditional composition scoring matrix adjustment).

In some cases, the system can comprise (b) at least one engineered synthetic guide ribonucleic acid (sgRNA) capable of forming a complex with an endonuclease, with a 5' targeting region complementary to a desired cleavage sequence. In some cases, the 5' targeting region may comprise a PAM sequence compatible with the endonuclease. In some cases, the most 5' nucleotide of the targeting region may be G. In some cases, the 5' targeting region can be 15-23 nucleotides in length. The guide sequence and tracr sequence may be provided as separate ribonucleic acids (RNAs) or a single ribonucleic acid (RNA). The guide RNA may comprise a crRNA tracrRNA binding sequence 3' of the targeting region. The guide RNA may comprise a tracrRNA sequence preceded by a 4-nucleotide linker 3' to the tracrRNA binding region of the crRNA. The sgRNA can comprise, from 5 'to 3': a non-native guide nucleic acid sequence capable of hybridizing to a target sequence in a cell; and tracr sequences. In some cases, the non-native guide nucleic acid sequence and tracr sequence are covalently linked.

In some cases, the tracr sequence may have a specific sequence. the tracr sequence may have at least about 80% to at least about 60-100 (e.g., at least about 60, at least about 65, at least about 70, at least about 75, at least about 80, at least about 85, or at least about 90) consecutive nucleotides of the native tracr rna sequence. the tracr sequence may have at least about 80% sequence identity to at least about 60-100 (e.g., at least about 60, at least about 65, at least about 70, at least about 75, at least about 80, at least about 85, or at least about 90) consecutive nucleotides of SEQ ID No. 5508. In some cases, a tracrRNA can be at least about 80%, at least about 85%, at least about 90%, at least about 91%, at least about 92%, at least about 93%, at least about 94%, at least about 95%, at least about 96%, at least about 97%, at least about 98%, or at least about 99% identical to at least about 60-90 (e.g., at least about 60, at least about 65, at least about 70, at least about 75, at least about 80, at least about 85, or at least about 90) consecutive nucleotides of SEQ ID No. 5508. In some cases, the tracrRNA can be substantially identical to at least about 60-100 (e.g., at least about 60, at least about 65, at least about 70, at least about 75, at least about 80, at least about 85, or at least about 90) consecutive nucleotides of SEQ ID No. 5508. the tracrRNA may comprise SEQ ID NO 5508.

In some cases, at least one engineered synthetic guide ribonucleic acid (sgRNA) capable of forming a complex with an endonuclease can comprise a sequence having at least about 80% identity to SEQ ID NO: 5472. The sgRNA can comprise a sequence that is at least about 80%, at least about 85%, at least about 90%, at least about 91%, at least about 92%, at least about 93%, at least about 94%, at least about 95%, at least about 96%, at least about 97%, at least about 98%, or at least about 99% identical to SEQ ID No. 5472. The sgRNA can comprise a sequence substantially identical to SEQ ID NO: 5472.

In some cases, the system can comprise two different sgrnas that target a first region and a second region to cleave in a target DNA locus, wherein the second region is located 3' to the first region. In some cases, the above system may comprise a single-stranded or double-stranded DNA repair template comprising, from 5 'to 3': a first homology arm comprising a sequence of at least about 20 (e.g., at least about 40, 80, 120, 150, 200, 300, 500, or 1kb) nucleotides 5 'to the first region, a synthetic DNA sequence of at least about 10 nucleotides, and a second homology arm comprising a sequence of at least about 20 (e.g., at least about 40, 80, 120, 150, 200, 300, 500, or 1kb) nucleotides 3' to the second region.

In another aspect, the present disclosure provides a method of modifying a target nucleic acid locus of interest. The methods can include delivering any of the non-natural systems disclosed herein (including the enzymes disclosed herein and at least one synthetic guide rna (sgrna)) to a target nucleic acid locus. The enzyme can form a complex with at least one sgRNA and can modify a target nucleic acid locus of interest upon binding of the complex to the target nucleic acid locus of interest. Delivering an enzyme to the locus can comprise transfecting a cell with the system or a nucleic acid encoding the system. Delivering a nuclease to the locus can comprise electroporating the cell with the system or a nucleic acid encoding the system. Delivering the nuclease to the locus can include incubating the system in a buffer with a nucleic acid comprising the locus of interest. In some cases, the target nucleic acid locus comprises deoxyribonucleic acid (DNA) or ribonucleic acid (RNA). The target nucleic acid locus may comprise genomic DNA, viral RNA, or bacterial DNA. The target nucleic acid locus can be within a cell. The target nucleic acid locus can be in vitro. The target nucleic acid locus can be in a eukaryotic cell or a prokaryotic cell. The cell may be an animal cell, a human cell, a bacterial cell, an archaeal cell, or a plant cell. The enzyme may induce a single-strand or double-strand break at or proximal to the target locus of interest.

Where the target nucleic acid locus can be intracellular, the enzyme can be provided as a nucleic acid comprising an open reading frame encoding an enzyme having a RuvC _ III domain at least about 75% (e.g., at least about 90%, at least about 91%, at least about 92%, at least about 93%, at least about 94%, at least about 95%, at least about 96%, at least about 97%, at least about 98%, at least about 99%) identical to any of SEQ ID NO 3175-3300. A deoxyribonucleic acid (DNA) comprising an open reading frame encoding the endonuclease can comprise a sequence substantially identical to SEQ ID No. 5584 or a variant having at least about 30%, at least about 35%, at least about 40%, at least about 45%, at least about 50%, at least about 55%, at least about 60%, at least about 65%, at least about 70%, at least about 75%, at least about 80%, at least about 85%, at least about 90%, at least about 95%, at least about 96%, at least about 97%, at least about 98%, or at least about 99% identity to SEQ ID No. 5584. In some cases, the nucleic acid comprises a promoter to which an open reading frame encoding an endonuclease is operably linked. The promoter may be a CMV, EF1a, SV40, PGK1, Ubc, human beta actin, CAG, TRE or CaMKIIa promoter. The endonuclease can be provided as a capped mRNA comprising the open reading frame encoding the endonuclease. The endonuclease may be provided as a translated polypeptide. The at least one engineered sgRNA can be provided as a deoxyribonucleic acid (DNA) comprising a gene sequence encoding the at least one engineered sgRNA operably linked to a ribonucleic acid (RNA) pol III promoter. In some cases, the organism may be a eukaryote. In some cases, the organism may be a fungus. In some cases, the organism may be a human.

MG21 enzyme

In one aspect, the present disclosure provides an engineered nuclease system comprising (a) an endonuclease. In some cases, the endonuclease is a Cas endonuclease. In some cases, the endonuclease is a type II, class II Cas endonuclease. The endonuclease can comprise a RuvC _ III domain, wherein the RuvC _ III domain has at least about 70% sequence identity to any one of SEQ ID NO 3331-3474. In some cases, the endonuclease can comprise a RuvC _ III domain, wherein the RuvC _ III domain has at least about 20%, at least about 25%, at least about 30%, at least about 35%, at least about 40%, at least about 45%, at least about 50%, at least about 55%, at least about 60%, at least about 65%, at least about 70%, at least about 75%, at least about 80%, at least about 85%, at least about 90%, at least about 91%, at least about 92%, at least about 93%, at least about 94%, at least about 95%, at least about 96%, at least about 97%, at least about 98%, at least about 99% identity to any one of SEQ ID NO 3331-3474. In some cases, the endonuclease can comprise a RuvC _ III domain that is substantially identical to any one of SEQ ID NO 3331-3474. The endonuclease can comprise a RuvC _ III domain having at least about 70% sequence identity to any one of SEQ ID NO 3331-3474. In some cases, the endonuclease can comprise a RuvC _ III domain that is at least about 20%, at least about 25%, at least about 30%, at least about 35%, at least about 40%, at least about 45%, at least about 50%, at least about 55%, at least about 60%, at least about 65%, at least about 70%, at least about 75%, at least about 80%, at least about 85%, at least about 90%, at least about 91%, at least about 92%, at least about 93%, at least about 94%, at least about 95%, at least about 96%, at least about 97%, at least about 98%, at least about 99% identical to any of SEQ ID NO 3331-3474. In some cases, the endonuclease can comprise a RuvC _ III domain that is substantially identical to any one of SEQ ID NO 3331-3474.

The endonuclease may comprise an HNH domain having at least about 70% identity to any one of SEQ ID NO 5147-5290. In some cases, the endonuclease can comprise an HNH domain that is at least about 70%, at least about 75%, at least about 80%, at least about 85%, at least about 90%, at least about 91%, at least about 92%, at least about 93%, at least about 94%, at least about 95%, at least about 96%, at least about 97%, at least about 98%, or at least about 99% identical to any of SEQ ID NO 5147-5290. The endonuclease may comprise an HNH domain substantially identical to any one of SEQ ID NO 5147-5290. The endonuclease may comprise an HNH domain having at least about 70% identity to any one of SEQ ID NO 5147-5290. In some cases, the endonuclease can comprise an HNH domain that is at least about 70%, at least about 75%, at least about 80%, at least about 85%, at least about 90%, at least about 91%, at least about 92%, at least about 93%, at least about 94%, at least about 95%, at least about 96%, at least about 97%, at least about 98%, or at least about 99% identical to any of SEQ ID NO 5147-5290. The endonuclease may comprise an HNH domain substantially identical to any one of SEQ ID NO 5147-5290.

In some cases, the endonuclease can comprise a variant that is at least about 30%, at least about 35%, at least about 40%, at least about 45%, at least about 50%, at least about 55%, at least about 60%, at least about 65%, at least about 70%, at least about 75%, at least about 80%, at least about 85%, at least about 90%, at least about 95%, at least about 96%, at least about 97%, at least about 98%, or at least about 99% identical to any of SEQ ID NO 1512-1655. In some cases, the endonuclease can be substantially identical to any one of SEQ ID NO 1512-1655. In some cases, the endonuclease can comprise a variant that is at least about 30%, at least about 35%, at least about 40%, at least about 45%, at least about 50%, at least about 55%, at least about 60%, at least about 65%, at least about 70%, at least about 75%, at least about 80%, at least about 85%, at least about 90%, at least about 95%, at least about 96% at least about 97%, at least about 98%, or at least about 99% identical to any of SEQ ID NO 1512-1655. In some cases, the endonuclease can be substantially identical to any one of SEQ ID NO 1512-1655.

In some cases, the endonuclease can comprise a variant having one or more Nuclear Localization Sequences (NLS). The NLS can be proximal to the N-terminus or C-terminus of the endonuclease. The NLS can be attached to the N-terminus or C-terminus of any of SEQ ID No. 1512-1655, or to the N-terminus or C-terminus of a variant having at least about 30%, at least about 35%, at least about 40%, at least about 45%, at least about 50%, at least about 55%, at least about 60%, at least about 65%, at least about 70%, at least about 75%, at least about 80%, at least about 85%, at least about 90%, at least about 95%, at least about 96%, at least about 97%, at least about 98%, or at least about 99% identity to any of SEQ ID No. 1512-1655. The NLS can be a SV40 large T antigen NLS. The NLS can be a c-myc NLS. The NLS may comprise a sequence having at least about 80%, at least about 85%, at least about 90%, at least about 95%, at least about 99% identity to any of SEQ ID NO 5593-5608. The NLS may comprise a sequence substantially identical to any one of SEQ ID NO: 5593-. The NLS may comprise any of the sequences in table 1 or a combination thereof:

in some cases, sequence identity may be determined by BLASTP, CLUSTALW, MUSCLE, MAFFT, Novafold, or Smith-Waterman homology search algorithms. Sequence identity can be determined by the BLASTP algorithm using the word length (W) of 3, the parameters of expectation (E) of 10, and using the BLOSUM62 scoring matrix (setting the gap (when present) penalty of 11, extension of 1) and using the conditional composition scoring matrix adjustment).

In some cases, the system can comprise (b) at least one engineered synthetic guide ribonucleic acid (sgRNA) capable of forming a complex with an endonuclease, with a 5' targeting region complementary to a desired cleavage sequence. In some cases, the 5' targeting region may comprise a PAM sequence compatible with the endonuclease. In some cases, the most 5' nucleotide of the targeting region may be G. In some cases, the 5' targeting region can be 15-23 nucleotides in length. The guide sequence and tracr sequence may be provided as separate ribonucleic acids (RNAs) or a single ribonucleic acid (RNA). The guide RNA may comprise a crRNA tracrRNA binding sequence 3' of the targeting region. The guide RNA may comprise a tracrRNA sequence preceded by a 4-nucleotide linker 3' to the tracrRNA binding region of the crRNA. The sgRNA can comprise, from 5 'to 3': a non-native guide nucleic acid sequence capable of hybridizing to a target sequence in a cell; and tracr sequences. In some cases, the non-native guide nucleic acid sequence and tracr sequence are covalently linked.

In some cases, the tracr sequence may have a specific sequence. the tracr sequence may have at least about 80% to at least about 60-100 (e.g., at least about 60, at least about 65, at least about 70, at least about 75, at least about 80, at least about 85, or at least about 90) consecutive nucleotides of the native tracr rna sequence. the tracr sequence may be substantially identical to at least about 60-100 (e.g., at least about 60, at least about 65, at least about 70, at least about 75, at least about 80, at least about 85, or at least about 90) consecutive nucleotides of SEQ ID No. 5509. In some cases, a tracrRNA can be at least about 80%, at least about 85%, at least about 90%, at least about 91%, at least about 92%, at least about 93%, at least about 94%, at least about 95%, at least about 96%, at least about 97%, at least about 98%, or at least about 99% identical to at least about 60-90 (e.g., at least about 60, at least about 65, at least about 70, at least about 75, at least about 80, at least about 85, or at least about 90) consecutive nucleotides of SEQ ID No. 5509. In some cases, the tracrRNA can be substantially identical to at least about 60-100 (e.g., at least about 60, at least about 65, at least about 70, at least about 75, at least about 80, at least about 85, or at least about 90) consecutive nucleotides of SEQ ID No. 5509. the tracrRNA may comprise SEQ ID NO 5509.

In some cases, at least one engineered synthetic guide ribonucleic acid (sgRNA) capable of forming a complex with the endonuclease can comprise a sequence having at least about 80% identity to SEQ ID NO: 5473. The sgRNA can comprise a sequence that is at least about 80%, at least about 85%, at least about 90%, at least about 91%, at least about 92%, at least about 93%, at least about 94%, at least about 95%, at least about 96%, at least about 97%, at least about 98%, or at least about 99% identical to SEQ ID No. 5473. The sgRNA can comprise a sequence substantially identical to SEQ ID NO: 5473.

In some cases, the system can comprise two different sgrnas that target a first region and a second region to cleave in a target DNA locus, wherein the second region is located 3' to the first region. In some cases, the above system may comprise a single-stranded or double-stranded DNA repair template comprising, from 5 'to 3': a first homology arm comprising a sequence of at least about 20 (e.g., at least about 40, 80, 120, 150, 200, 300, 500, or 1kb) nucleotides 5 'to the first region, a synthetic DNA sequence of at least about 10 nucleotides, and a second homology arm comprising a sequence of at least about 20 (e.g., at least about 40, 80, 120, 150, 200, 300, 500, or 1kb) nucleotides 3' to the second region.

In another aspect, the present disclosure provides a method of modifying a target nucleic acid locus of interest. The methods can include delivering any of the non-natural systems disclosed herein (including the enzymes disclosed herein and at least one synthetic guide rna (sgrna)) to a target nucleic acid locus. The enzyme can form a complex with at least one sgRNA and can modify a target nucleic acid locus of interest upon binding of the complex to the target nucleic acid locus of interest. Delivering an enzyme to the locus can comprise transfecting a cell with the system or a nucleic acid encoding the system. Delivering a nuclease to the locus can comprise electroporating the cell with the system or a nucleic acid encoding the system. Delivering the nuclease to the locus can include incubating the system in a buffer with a nucleic acid comprising the locus of interest. In some cases, the target nucleic acid locus comprises deoxyribonucleic acid (DNA) or ribonucleic acid (RNA). The target nucleic acid locus may comprise genomic DNA, viral RNA, or bacterial DNA. The target nucleic acid locus can be within a cell. The target nucleic acid locus can be in vitro. The target nucleic acid locus can be in a eukaryotic cell or a prokaryotic cell. The cell may be an animal cell, a human cell, a bacterial cell, an archaeal cell, or a plant cell. The enzyme may induce a single-strand or double-strand break at or proximal to the target locus of interest.

Where the target nucleic acid locus can be intracellular, the enzyme can be provided as a nucleic acid comprising an open reading frame encoding an enzyme having a RuvC III domain at least about 75% (e.g., at least about 90%, at least about 91%, at least about 92%, at least about 93%, at least about 94%, at least about 95%, at least about 96%, at least about 97%, at least about 98%, at least about 99%) identical to any of SEQ ID NOs 3331-3474. A deoxyribonucleic acid (DNA) comprising an open reading frame encoding the endonuclease can comprise a sequence substantially identical to SEQ ID No. 5585 or a variant having at least about 30%, at least about 35%, at least about 40%, at least about 45%, at least about 50%, at least about 55%, at least about 60%, at least about 65%, at least about 70%, at least about 75%, at least about 80%, at least about 85%, at least about 90%, at least about 95%, at least about 96%, at least about 97%, at least about 98%, or at least about 99% identity to SEQ ID No. 5585. In some cases, the nucleic acid comprises a promoter to which an open reading frame encoding an endonuclease is operably linked. The promoter may be a CMV, EF1a, SV40, PGK1, Ubc, human beta actin, CAG, TRE or CaMKIIa promoter. The endonuclease can be provided as a capped mRNA comprising the open reading frame encoding the endonuclease. The endonuclease may be provided as a translated polypeptide. The at least one engineered sgRNA can be provided as a deoxyribonucleic acid (DNA) comprising a gene sequence encoding the at least one engineered sgRNA operably linked to a ribonucleic acid (RNA) pol III promoter. In some cases, the organism may be a eukaryote. In some cases, the organism may be a fungus. In some cases, the organism may be a human.

MG22 enzyme

In one aspect, the present disclosure provides an engineered nuclease system comprising (a) an endonuclease. In some cases, the endonuclease is a Cas endonuclease. In some cases, the endonuclease is a type II, class II Cas endonuclease. The endonuclease can comprise a RuvC _ III domain, wherein the RuvC _ III domain has at least about 70% sequence identity to any one of SEQ ID NO 3475-3568. In some cases, the endonuclease can comprise a RuvC _ III domain, wherein the RuvC _ III domain has at least about 20%, at least about 25%, at least about 30%, at least about 35%, at least about 40%, at least about 45%, at least about 50%, at least about 55%, at least about 60%, at least about 65%, at least about 70%, at least about 75%, at least about 80%, at least about 85%, at least about 90%, at least about 91%, at least about 92%, at least about 93%, at least about 94%, at least about 95%, at least about 96%, at least about 97%, at least about 98%, at least about 99% identity to any one of SEQ ID NO 3475-3568. In some cases, the endonuclease can comprise a RuvC _ III domain that is substantially identical to any one of SEQ ID NOs 3475-3568. The endonuclease can comprise a RuvC _ III domain having at least about 70% sequence identity to any one of SEQ ID NO 3475-3568. In some cases, the endonuclease can comprise a RuvC _ III domain that is at least about 20%, at least about 25%, at least about 30%, at least about 35%, at least about 40%, at least about 45%, at least about 50%, at least about 55%, at least about 60%, at least about 65%, at least about 70%, at least about 75%, at least about 80%, at least about 85%, at least about 90%, at least about 91%, at least about 92%, at least about 93%, at least about 94%, at least about 95%, at least about 96%, at least about 97%, at least about 98%, or at least about 99% identical to any of SEQ ID NO 3475-3568. In some cases, the endonuclease can comprise a RuvC _ III domain that is substantially identical to any one of SEQ ID NOs 3475-3568.

The endonuclease can comprise an HNH domain that is at least about 70% identical to any one of SEQ ID NO 5291-5389. In some cases, the endonuclease can comprise an HNH domain that is at least about 70%, at least about 75%, at least about 80%, at least about 85%, at least about 90%, at least about 91%, at least about 92%, at least about 93%, at least about 94%, at least about 95%, at least about 96%, at least about 97%, at least about 98%, or at least about 99% identical to any one of SEQ ID NO 5291-5389. The endonuclease may comprise an HNH domain substantially identical to any one of SEQ ID NO 5291-5389. The endonuclease can comprise an HNH domain that is at least about 70% identical to any one of SEQ ID NO 5291-5389. In some cases, the endonuclease can comprise an HNH domain that is at least about 70%, at least about 75%, at least about 80%, at least about 85%, at least about 90%, at least about 91%, at least about 92%, at least about 93%, at least about 94%, at least about 95%, at least about 96%, at least about 97%, at least about 98%, or at least about 99% identical to any one of SEQ ID NO 5291-5389. The endonuclease may comprise an HNH domain substantially identical to any one of SEQ ID NO 5291-5389.

In some cases, the endonuclease can comprise a variant that is at least about 30%, at least about 35%, at least about 40%, at least about 45%, at least about 50%, at least about 55%, at least about 60%, at least about 65%, at least about 70%, at least about 75%, at least about 80%, at least about 85%, at least about 90%, at least about 95%, at least about 96%, at least about 97%, at least about 98%, or at least about 99% identical to any of SEQ ID NO 1656-1755. In some cases, the endonuclease can be substantially identical to any one of SEQ ID NO 1656-1755. In some cases, the endonuclease can comprise a variant that is at least about 30%, at least about 35%, at least about 40%, at least about 45%, at least about 50%, at least about 55%, at least about 60%, at least about 65%, at least about 70%, at least about 75%, at least about 80%, at least about 85%, at least about 90%, at least about 95%, at least about 96%, at least about 97%, at least about 98%, or at least about 99% identical to any of SEQ ID NO 1656-1755. In some cases, the endonuclease can be substantially identical to any one of SEQ ID NO 1656-1755.

In some cases, the endonuclease can comprise a variant having one or more Nuclear Localization Sequences (NLS). The NLS can be proximal to the N-terminus or C-terminus of the endonuclease. The NLS can be attached to the N-terminus or C-terminus of any one of SEQ ID NO. 432. about.660, or to the N-terminus or C-terminus of a variant having at least about 30%, at least about 35%, at least about 40%, at least about 45%, at least about 50%, at least about 55%, at least about 60%, at least about 65%, at least about 70%, at least about 75%, at least about 80%, at least about 85%, at least about 90%, at least about 95%, at least about 96%, at least about 97%, at least about 98%, or at least about 99% identity to any one of SEQ ID NO. 1656. about.1755. The NLS can be a SV40 large T antigen NLS. The NLS can be a c-myc NLS. The NLS may comprise a sequence having at least about 80%, at least about 85%, at least about 90%, at least about 95%, at least about 99% identity to any of SEQ ID NO 5593-5608. The NLS may comprise a sequence substantially identical to any one of SEQ ID NO: 5593-. The NLS may comprise any of the sequences in table 1 or a combination thereof:

in some cases, sequence identity may be determined by BLASTP, CLUSTALW, MUSCLE, MAFFT, Novafold, or Smith-Waterman homology search algorithms. Sequence identity can be determined by the BLASTP algorithm using the word length (W) of 3, the parameters of expectation (E) of 10, and using the BLOSUM62 scoring matrix (setting the gap (when present) penalty of 11, extension of 1) and using the conditional composition scoring matrix adjustment).

In some cases, the system can comprise (b) at least one engineered synthetic guide ribonucleic acid (sgRNA) capable of forming a complex with an endonuclease, with a 5' targeting region complementary to a desired cleavage sequence. In some cases, the 5' targeting region may comprise a PAM sequence compatible with the endonuclease. In some cases, the most 5' nucleotide of the targeting region may be G. In some cases, the 5' targeting region can be 15-23 nucleotides in length. The guide sequence and tracr sequence may be provided as separate ribonucleic acids (RNAs) or a single ribonucleic acid (RNA). The guide RNA may comprise a crRNA tracrRNA binding sequence 3' of the targeting region. The guide RNA may comprise a tracrRNA sequence preceded by a 4-nucleotide linker 3' to the tracrRNA binding region of the crRNA. The sgRNA can comprise, from 5 'to 3': a non-native guide nucleic acid sequence capable of hybridizing to a target sequence in a cell; and tracr sequences. In some cases, the non-native guide nucleic acid sequence and tracr sequence are covalently linked.

In some cases, the tracr sequence may have a specific sequence. the tracr sequence may have at least about 80% to at least about 60-100 (e.g., at least about 60, at least about 65, at least about 70, at least about 75, at least about 80, at least about 85, or at least about 90) consecutive nucleotides of the native tracr rna sequence. the tracr sequence may have at least about 80% sequence identity to at least about 60-100 (e.g., at least about 60, at least about 65, at least about 70, at least about 75, at least about 80, at least about 85, or at least about 90) consecutive nucleotides of SEQ ID NO: 5510. In some cases, the tracrRNA may be at least about 80%, at least about 85%, at least about 90%, at least about 91%, at least about 92%, at least about 93%, at least about 94%, at least about 95%, at least about 96%, at least about 97%, at least about 98%, or at least about 99% identical to at least about 60-90 (e.g., at least about 60, at least about 65, at least about 70, at least about 75, at least about 80, at least about 85, or at least about 90) consecutive nucleotides of SEQ ID No. 5510. In some cases, the tracrRNA can be substantially identical to at least about 60-100 (e.g., at least about 60, at least about 65, at least about 70, at least about 75, at least about 80, at least about 85, or at least about 90) consecutive nucleotides of SEQ ID NO: 5510. the tracrRNA may comprise SEQ ID NO 5510.

In some cases, at least one engineered synthetic guide ribonucleic acid (sgRNA) capable of forming a complex with the endonuclease can comprise a sequence having at least about 80% identity to SEQ ID NO: 5474. The sgRNA can comprise a sequence that is at least about 80%, at least about 85%, at least about 90%, at least about 91%, at least about 92%, at least about 93%, at least about 94%, at least about 95%, at least about 96%, at least about 97%, at least about 98%, or at least about 99% identical to SEQ ID No. 5474. The sgRNA can comprise a sequence substantially identical to SEQ ID NO: 5474.

In some cases, the system can comprise two different sgrnas that target a first region and a second region to cleave in a target DNA locus, wherein the second region is located 3' to the first region. In some cases, the above system may comprise a single-stranded or double-stranded DNA repair template comprising, from 5 'to 3': a first homology arm comprising a sequence of at least about 20 (e.g., at least about 40, 80, 120, 150, 200, 300, 500, or 1kb) nucleotides 5 'to the first region, a synthetic DNA sequence of at least about 10 nucleotides, and a second homology arm comprising a sequence of at least about 20 (e.g., at least about 40, 80, 120, 150, 200, 300, 500, or 1kb) nucleotides 3' to the second region.

In another aspect, the present disclosure provides a method of modifying a target nucleic acid locus of interest. The methods can include delivering any of the non-natural systems disclosed herein (including the enzymes disclosed herein and at least one synthetic guide rna (sgrna)) to a target nucleic acid locus. The enzyme can form a complex with at least one sgRNA and can modify a target nucleic acid locus of interest upon binding of the complex to the target nucleic acid locus of interest. Delivering an enzyme to the locus can comprise transfecting a cell with the system or a nucleic acid encoding the system. Delivering a nuclease to the locus can comprise electroporating the cell with the system or a nucleic acid encoding the system. Delivering the nuclease to the locus can include incubating the system in a buffer with a nucleic acid comprising the locus of interest. In some cases, the target nucleic acid locus comprises deoxyribonucleic acid (DNA) or ribonucleic acid (RNA). The target nucleic acid locus may comprise genomic DNA, viral RNA, or bacterial DNA. The target nucleic acid locus can be within a cell. The target nucleic acid locus can be in vitro. The target nucleic acid locus can be in a eukaryotic cell or a prokaryotic cell. The cell may be an animal cell, a human cell, a bacterial cell, an archaeal cell, or a plant cell. The enzyme may induce a single-strand or double-strand break at or proximal to the target locus of interest.

Where the target nucleic acid locus can be intracellular, the enzyme can be provided as a nucleic acid comprising an open reading frame encoding an enzyme having a RuvC _ III domain at least about 75% (e.g., at least about 90%, at least about 91%, at least about 92%, at least about 93%, at least about 94%, at least about 95%, at least about 96%, at least about 97%, at least about 98%, at least about 99%) identical to any of SEQ ID NO 3475-3568. A deoxyribonucleic acid (DNA) comprising an open reading frame encoding the endonuclease can comprise a sequence substantially identical to SEQ ID No. 5586 or a variant having at least about 30%, at least about 35%, at least about 40%, at least about 45%, at least about 50%, at least about 55%, at least about 60%, at least about 65%, at least about 70%, at least about 75%, at least about 80%, at least about 85%, at least about 90%, at least about 95%, at least about 96%, at least about 97%, at least about 98%, or at least about 99% identity to SEQ ID No. 5586. In some cases, the nucleic acid comprises a promoter to which an open reading frame encoding an endonuclease is operably linked. The promoter may be a CMV, EF1a, SV40, PGK1, Ubc, human beta actin, CAG, TRE or CaMKIIa promoter. The endonuclease can be provided as a capped mRNA comprising the open reading frame encoding the endonuclease. The endonuclease may be provided as a translated polypeptide. The at least one engineered sgRNA can be provided as a deoxyribonucleic acid (DNA) comprising a gene sequence encoding the at least one engineered sgRNA operably linked to a ribonucleic acid (RNA) pol III promoter. In some cases, the organism may be a eukaryote. In some cases, the organism may be a fungus. In some cases, the organism may be a human.

MG23 enzyme

In one aspect, the present disclosure provides an engineered nuclease system comprising (a) an endonuclease. In some cases, the endonuclease is a Cas endonuclease. In some cases, the endonuclease is a type II, class II Cas endonuclease. The endonuclease can comprise a RuvC _ III domain, wherein the RuvC _ III domain has at least about 70% sequence identity to any one of SEQ ID NO 3569-3637. In some cases, the endonuclease can comprise a RuvC _ III domain, wherein the RuvC _ III domain has at least about 20%, at least about 25%, at least about 30%, at least about 35%, at least about 40%, at least about 45%, at least about 50%, at least about 55%, at least about 60%, at least about 65%, at least about 70%, at least about 75%, at least about 80%, at least about 85%, at least about 90%, at least about 91%, at least about 92%, at least about 93%, at least about 94%, at least about 95%, at least about 96%, at least about 97%, at least about 98%, at least about 99% identity to any one of SEQ ID NO 3569-3637. In some cases, the endonuclease can comprise a RuvC _ III domain that is substantially identical to any one of SEQ ID NO 3569-3637. The endonuclease can comprise a RuvC _ III domain having at least about 70% sequence identity to any one of SEQ ID NO 3569-3637. In some cases, the endonuclease can comprise a RuvC _ III domain that is at least about 20%, at least about 25%, at least about 30%, at least about 35%, at least about 40%, at least about 45%, at least about 50%, at least about 55%, at least about 60%, at least about 65%, at least about 70%, at least about 75%, at least about 80%, at least about 85%, at least about 90%, at least about 91%, at least about 92%, at least about 93%, at least about 94%, at least about 95%, at least about 96%, at least about 97%, at least about 98%, at least about 99% identical to any of SEQ ID NO 3569-19 and 3637. In some cases, the endonuclease can comprise a RuvC _ III domain that is substantially identical to any one of SEQ ID NO 3569-3637.

The endonuclease can comprise an HNH domain that is at least about 70% identical to any one of SEQ ID NO 5390-5460. In some cases, the endonuclease can comprise an HNH domain that is at least about 70%, at least about 75%, at least about 80%, at least about 85%, at least about 90%, at least about 91%, at least about 92%, at least about 93%, at least about 94%, at least about 95%, at least about 96%, at least about 97%, at least about 98%, or at least about 99% identical to any one of SEQ ID NO 5390-5460. The endonuclease may comprise an HNH domain substantially identical to any one of SEQ ID NO 5390-5460. The endonuclease can comprise an HNH domain that is at least about 70% identical to any one of SEQ ID NO 5390-5460. In some cases, the endonuclease can comprise an HNH domain that is at least about 70%, at least about 75%, at least about 80%, at least about 85%, at least about 90%, at least about 91%, at least about 92%, at least about 93%, at least about 94%, at least about 95%, at least about 96%, at least about 97%, at least about 98%, or at least about 99% identical to any of SEQ ID NO 5390-5460. The endonuclease may comprise an HNH domain substantially identical to any one of SEQ ID NO 5390-5460.

In some cases, the endonuclease can comprise a variant that is at least about 30%, at least about 35%, at least about 40%, at least about 45%, at least about 50%, at least about 55%, at least about 60%, at least about 65%, at least about 70%, at least about 75%, at least about 80%, at least about 85%, at least about 90%, at least about 95%, at least about 96%, at least about 97%, at least about 98%, or at least about 99% identical to any of SEQ ID NO. 1756 and 1826. In some cases, the endonuclease can be substantially identical to any one of SEQ ID NO 1756 and 1826. In some cases, the endonuclease can comprise a variant that is at least about 30%, at least about 35%, at least about 40%, at least about 45%, at least about 50%, at least about 55%, at least about 60%, at least about 65%, at least about 70%, at least about 75%, at least about 80%, at least about 85%, at least about 90%, at least about 95%, at least about 96%, at least about 97%, at least about 98%, or at least about 99% identical to any of SEQ ID NO. 1756 and 1826. In some cases, the endonuclease can be substantially identical to any one of SEQ ID NO 1756 and 1826.

In some cases, the endonuclease can comprise a variant having one or more Nuclear Localization Sequences (NLS). The NLS can be proximal to the N-terminus or C-terminus of the endonuclease. The NLS can be attached to the N-terminus or C-terminus of any of SEQ ID NO. 1756 and 1826, or to the N-terminus or C-terminus of a variant having at least about 30%, at least about 35%, at least about 40%, at least about 45%, at least about 50%, at least about 55%, at least about 60%, at least about 65%, at least about 70%, at least about 75%, at least about 80%, at least about 85%, at least about 90%, at least about 95%, at least about 96%, at least about 97%, at least about 98%, or at least about 99% identity to any of SEQ ID NO. 1756 and 1826. The NLS can be a SV40 large T antigen NLS. The NLS can be a c-myc NLS. The NLS may comprise a sequence having at least about 80%, at least about 85%, at least about 90%, at least about 95%, at least about 99% identity to any of SEQ ID NO 5593-5608. The NLS may comprise a sequence substantially identical to any one of SEQ ID NO: 5593-. The NLS may comprise any of the sequences in table 1 or a combination thereof:

in some cases, sequence identity may be determined by BLASTP, CLUSTALW, MUSCLE, MAFFT, Novafold, or Smith-Waterman homology search algorithms. Sequence identity can be determined by the BLASTP algorithm using the word length (W) of 3, the parameters of expectation (E) of 10, and using the BLOSUM62 scoring matrix (setting the gap (when present) penalty of 11, extension of 1) and using the conditional composition scoring matrix adjustment).

In some cases, the system can comprise (b) at least one engineered synthetic guide ribonucleic acid (sgRNA) capable of forming a complex with an endonuclease, with a 5' targeting region complementary to a desired cleavage sequence. In some cases, the 5' targeting region may comprise a PAM sequence compatible with the endonuclease. In some cases, the most 5' nucleotide of the targeting region may be G. In some cases, the 5' targeting region can be 15-23 nucleotides in length. The guide sequence and tracr sequence may be provided as separate ribonucleic acids (RNAs) or a single ribonucleic acid (RNA). The guide RNA may comprise a crRNA tracrRNA binding sequence 3' of the targeting region. The guide RNA may comprise a tracrRNA sequence preceded by a 4-nucleotide linker 3' to the tracrRNA binding region of the crRNA. The sgRNA can comprise, from 5 'to 3': a non-native guide nucleic acid sequence capable of hybridizing to a target sequence in a cell; and tracr sequences. In some cases, the non-native guide nucleic acid sequence and tracr sequence are covalently linked.

In some cases, the tracr sequence may have a specific sequence. the tracr sequence may have at least about 80% to at least about 60-100 (e.g., at least about 60, at least about 65, at least about 70, at least about 75, at least about 80, at least about 85, or at least about 90) consecutive nucleotides of the native tracr rna sequence. the tracr sequence may have at least about 80% sequence identity to at least about 60-100 (e.g., at least about 60, at least about 65, at least about 70, at least about 75, at least about 80, at least about 85, or at least about 90) consecutive nucleotides of SEQ ID NO: 5511. In some cases, the tracrRNA may be at least about 80%, at least about 85%, at least about 90%, at least about 91%, at least about 92%, at least about 93%, at least about 94%, at least about 95%, at least about 96%, at least about 97%, at least about 98%, or at least about 99% identical to at least about 60-90 (e.g., at least about 60, at least about 65, at least about 70, at least about 75, at least about 80, at least about 85, or at least about 90) consecutive nucleotides of SEQ ID No. 5511. In some cases, the tracrRNA can be substantially identical to at least about 60-100 (e.g., at least about 60, at least about 65, at least about 70, at least about 75, at least about 80, at least about 85, or at least about 90) consecutive nucleotides of SEQ ID NO: 5511. the tracrRNA may comprise SEQ ID NO: 5511.

In some cases, at least one engineered synthetic guide ribonucleic acid (sgRNA) capable of forming a complex with the endonuclease can comprise a sequence having at least about 80% identity to SEQ ID NO: 5475. The sgRNA can comprise a sequence that is at least about 80%, at least about 85%, at least about 90%, at least about 91%, at least about 92%, at least about 93%, at least about 94%, at least about 95%, at least about 96%, at least about 97%, at least about 98%, or at least about 99% identical to SEQ ID No. 5475. The sgRNA can comprise a sequence substantially identical to SEQ ID NO: 5475.

In some cases, the system can comprise two different sgrnas that target a first region and a second region to cleave in a target DNA locus, wherein the second region is located 3' to the first region. In some cases, the above system may comprise a single-stranded or double-stranded DNA repair template comprising, from 5 'to 3': a first homology arm comprising a sequence of at least about 20 (e.g., at least about 40, 80, 120, 150, 200, 300, 500, or 1kb) nucleotides 5 'to the first region, a synthetic DNA sequence of at least about 10 nucleotides, and a second homology arm comprising a sequence of at least about 20 (e.g., at least about 40, 80, 120, 150, 200, 300, 500, or 1kb) nucleotides 3' to the second region.

In another aspect, the present disclosure provides a method of modifying a target nucleic acid locus of interest. The methods can include delivering any of the non-natural systems disclosed herein (including the enzymes disclosed herein and at least one synthetic guide rna (sgrna)) to a target nucleic acid locus. The enzyme can form a complex with at least one sgRNA and can modify a target nucleic acid locus of interest upon binding of the complex to the target nucleic acid locus of interest. Delivering an enzyme to the locus can comprise transfecting a cell with the system or a nucleic acid encoding the system. Delivering a nuclease to the locus can comprise electroporating the cell with the system or a nucleic acid encoding the system. Delivering the nuclease to the locus can include incubating the system in a buffer with a nucleic acid comprising the locus of interest. In some cases, the target nucleic acid locus comprises deoxyribonucleic acid (DNA) or ribonucleic acid (RNA). The target nucleic acid locus may comprise genomic DNA, viral RNA, or bacterial DNA. The target nucleic acid locus can be within a cell. The target nucleic acid locus can be in vitro. The target nucleic acid locus can be in a eukaryotic cell or a prokaryotic cell. The cell may be an animal cell, a human cell, a bacterial cell, an archaeal cell, or a plant cell. The enzyme may induce a single-strand or double-strand break at or proximal to the target locus of interest.

Where the target nucleic acid locus can be intracellular, the enzyme can be provided as a nucleic acid comprising an open reading frame encoding an enzyme having a RuvC _ III domain at least about 75% (e.g., at least about 90%, at least about 91%, at least about 92%, at least about 93%, at least about 94%, at least about 95%, at least about 96%, at least about 97%, at least about 98%, at least about 99%) identical to any of SEQ ID NO 3569-3637. A deoxyribonucleic acid (DNA) comprising an open reading frame encoding the endonuclease can comprise a sequence substantially identical to SEQ ID No. 5587 or a variant having at least about 30%, at least about 35%, at least about 40%, at least about 45%, at least about 50%, at least about 55%, at least about 60%, at least about 65%, at least about 70%, at least about 75%, at least about 80%, at least about 85%, at least about 90%, at least about 95%, at least about 96%, at least about 97%, at least about 98%, or at least about 99% identity to SEQ ID No. 5587. In some cases, the nucleic acid comprises a promoter to which an open reading frame encoding an endonuclease is operably linked. The promoter may be a CMV, EF1a, SV40, PGK1, Ubc, human beta actin, CAG, TRE or CaMKIIa promoter. The endonuclease can be provided as a capped mRNA comprising the open reading frame encoding the endonuclease. The endonuclease may be provided as a translated polypeptide. The at least one engineered sgRNA can be provided as a deoxyribonucleic acid (DNA) comprising a gene sequence encoding the at least one engineered sgRNA operably linked to a ribonucleic acid (RNA) pol III promoter. In some cases, the organism may be a eukaryote. In some cases, the organism may be a fungus. In some cases, the organism may be a human.

Examples

Example 1 metagenomic analysis of novel proteins

Metagenomic samples were collected from sediments, soil and animals. Extracting deoxyribonucleic acid (DNA) by using Zymobiomics DNA micro-preparation kit and extracting DNA in IlluminaSequencing at 2500. Samples were collected with the consent of the certified owner. Other raw sequence data from public sources include animal microbiome, sediment, soil, spa, hydrothermal vents, ocean, peat bogs, permafrost, and sewage sequences. Metagenomic sequence data was searched using hidden markov models generated based on known Cas protein sequences, including type II Cas effector proteins. Novel effector proteins identified by searching are aligned with known proteins to identify potential active sites. This metagenomic workflow has resulted in a description of the MG1, MG2, MG3, MG4, MG6, MG14, MG15, MG16, MG18, MG21, MG22, and MG23 families of class II, type II CRISPR endonucleases described herein.

Example 2a. -discovery of the MG1 family of CRISPR systems

Analysis of the data from the metagenomic analysis of example 1 revealed a new cluster of putative CRISPR systems not previously described, which initially comprised six members (MG1-1, MG1-2, MG1-3, MG1-4, MG1-5 and MG1-6, identified as SEQ ID NOs 5, 6, 1, 2 and 3, respectively). This family is characterized by enzymes with HNH and RuvC domains. The RuvC domain of this family has a RuvC III portion with low homology to the Cas9 family members previously described. Although the original family members have a maximum of 56.8% identity between them, all 6 enzymes exhibit distinct RuvC _ III portions of the RuvC domain with common motifs of RHHALDAMV (SEQ ID NO:5615), KHHALDAMC (SEQ ID NO:5616) or KHHALDAIC (SEQ ID NO: 5617). These motifs are not found in other described Cas 9-like enzymes. The corresponding protein and nucleic acid sequences of these novel enzymes and their related subdomains are provided in the sequence listing. The putative tracrRNA sequences were identified based on their position relative to other genes and are shown as SEQ ID NO: 5476-5479. Based on the 16S rRNA sequences from the genomic cassettes containing the CRISPR system, the enzyme system appears to be derived from the phylum verrucomicrobia, temporal allophycochyta or temporal narcissus. The 16S rRNA sequence is represented by SEQ ID NO 5592-5596. Detailed domain-level alignments of CRISPR system sequences calling together the features described by Shmakov et al (Mol cell.2015, 11/5; 60 (3): 385-97), the entire contents of which are incorporated by reference, are depicted in fig. 9A, 9B, 9C, 9D, 9E, 9F, 9G and 9H. Comparison of MG1-1, 1-2, and 1-3 with additional proprietary protein datasets revealed additional protein sequences with similar structures, presented as SEQ NOs: 7-319. These MG1 protein sequences resulted in the discovery of an additional MG1 motif as shown in SEQ ID NO: 5618-5632.

Example 2b. -discovery of the MG2 family of CRISPR systems

Analysis of the data from the metagenomic analysis of example 1 revealed a new cluster of putative CRISPR systems not previously described, which systems comprised six members (MG2-1, MG2-2, MG2-3, MG2-5 and MG 2-6). The corresponding protein and nucleic acid sequences of these novel enzymes and exemplary subdomains are presented as SEQ ID NOs 320, 322-325. Based on their position relative to other genes, putative tracrRNA sequences were identified in the operon and are presented as SEQ ID NOs:5490, 5492-5494 and 5538. A detailed domain-level alignment of these sequences described by Shmakov et al (Mol cell.2015, 11/5; 60 (3): 385-97.) with Cas9 is depicted in FIG. 7.

Comparison of MG2-1, MG2-2, MG2-3, MG2-5 and MG2-6 with additional proprietary protein datasets reveals additional protein sequences with similar structures, presented as SEQ NO 321 and 326-420. The motif commonly found in members of the MG2 family is represented by SEQ ID NO 5631-5638.

Example 2c. -discovery of the MG3 family of CRISPR systems

Analysis of the data from the metagenomic analysis of example 1 revealed a putative CRISPR system not previously described: a new cluster of MG 3-1. The corresponding amino acid sequences of these novel enzymes and their exemplary subdomains are presented as SEQ ID NOs 424, 2245 and 4059. Based on proximity to other elements in the operon, a putative tracrRNA-containing sequence was identified and included as SEQ ID NO 5498. A detailed domain-level alignment of this sequence with Cas9 from actinomyces naeslundii is depicted in fig. 8.

Comparison of MG3-1 with additional proprietary protein datasets revealed additional protein sequences with similar structures, presented as SEQ NO: 421-.

Example 2d. -discovery of MG4, 7, 14, 15, 16, 18, 21, 22, 23 families of CRISPR systems

Analysis of the data from the metagenomic analysis of example 1 revealed a new cluster of putative CRISPR systems not previously described, the systems comprising 9 families, each family comprising one member (MG 4-5, MG7-2, MG14-1, MG15-1, MG16-2, MG18-1, MG21-1, MG22-1, MG 23-1). The corresponding protein and nucleic acid sequences of these novel enzymes and exemplary subdomains thereof are presented as SEQ ID NOs 432, 669, 678, 930, 1093, 1354, 1512, 1656, 1756. Putative tracr-containing sequences were identified for each family based on proximity to other elements in the operon. These sequences are provided in the sequence listing as SEQ ID NO 5503-5511, respectively.

Comparison of MG 4-5, MG7-2, MG14-1, MG15-1, MG16-2, MG18-1, MG21-1, MG22-1, MG23-1 with the other proprietary protein data sets reveals additional protein sequences with similar structures presented as SEQ NO: 433-. For MG4, the motif common to the nucleases of these CRISPR systems groups is represented as SEQ ID NO: 5649; represented as SEQ ID NO 5650-5667 for MG 14; 5668-; represented as SEQ ID NO 5676-5678 for MG 16; 5679-5686 for MG 18; represented by SEQ ID NO:5687-5693 and SEQ ID NO:5674-5675 for MG 21; represented as SEQ ID NO:5694-5699 for MG 22; and is represented by SEQ ID NO 5700-5717 for MG 23.

Example 3-prophetic-determination of pre-spacer sequence adjacent motifs.

The experiment was carried out according to Karvelis et al methods.2017, 5 months and 15 days; 121-122:3-8 (which are incorporated herein by reference in their entirety) were performed to identify the Protospacer Adjacent Motif (PAM) sequence specificity of the novel enzymes described herein to allow for optimal synthetic sequence targeting.

In one example (in vivo screening), cells carrying plasmids encoding any of the enzymes and prepro-spacer targeting guide RNAs described herein are co-transformed with a plasmid library containing an antibiotic resistance gene and a prepro-spacer sequence flanked by a randomized PAM sequence. Plasmids containing functional PAM were cleaved by the enzyme, resulting in cell death. Deep sequencing of pools of enzyme-resistant plasmids isolated from surviving cells revealed a depleted set of plasmids containing functional PAM that allowed cleavage.

In another example (in vitro screening), a PAM library in the form of a DNA plasmid or tandem repeat is cleaved by an RNP complex (e.g., including an enzyme, tracrRNA and crRNA or an enzyme and a hybrid sgRNA) assembled in vitro or in cell lysate. The free DNA ends resulting from successful cleavage events were captured by adaptor ligation, and then the PAM side product was PCR amplified. The amplified functional PAM library was deep sequenced and the PAM that permitted DNA cleavage was identified.

Example 4 prophetic-use of synthetic CRISPR systems as described herein for genome editing in mammalian cells

DNA/RNA sequences were prepared which encode (i) an ORF encoding a codon optimized enzyme under a cell-compatible promoter, with a cell-compatible C-terminal nuclear localization sequence (e.g. SV40 NLS in human cells) and a suitable polyadenylation signal (e.g. TK pA signal in the case of human cells); and (ii) an ORF encoding a sgRNA (with a 5 'sequence beginning with G, followed by a 20nt complementary targeting nucleic acid sequence targeting genomic DNA, followed by the corresponding compatible PAM and 3' tracr binding sequences, linkers, and tracrRNA sequences identified by example 3) under a suitable polymerase III promoter (e.g., the U6 promoter in mammalian cells). In some embodiments, these sequences are prepared on the same or different plasmid vectors, which are transfected into eukaryotic cells by suitable techniques. In some embodiments, these sequences are prepared as separate DNA sequences that are transfected or microinjected into the cells. In some embodiments, these sequences are prepared as synthetic RNA or in vitro transcribed RNA, which is transfected or microinjected into the cell. In some embodiments, these sequences are translated into proteins and transfected or microinjected into cells.

Whichever transfection method is selected, (i) and (ii) are introduced into the cells. Incubation is allowed over a period of time so that the enzyme and/or sgRNA can be transcribed and/or translated into an active form. After the incubation period, genomic DNA near the target sequence is analyzed (e.g., by sequencing). Indel tags are introduced into the genomic DNA near the target sequence as a result of enzyme-mediated cleavage and non-homologous end joining.

In some embodiments, (i) and (ii) are introduced into a cell with a third repair nucleotide encoding a genomic region 25bp or greater in size flanking the cleavage site that will promote homology directed repair. Included in these flanking sequences may be single base pair mutations, functional gene fragments, foreign or native genes for expression, or several genes that make up the biochemical pathway.

Example 5. -prophetic-in vitro use of synthetic CRISPR systems as described herein

Cloning of any of the enzymes described herein into a suitable E.coli expression plasmid containing a purification tag, and recombinant expression in E.coli and purification using the recombinant tagAnd (4) transforming. RNA comprising a 5' G followed by a 20nt targeting and PAM sequence, a crRNA compatible tracrRNA binding region, a GAAA linker and a compatible tracrRNA was synthesized by a suitable solid phase RNA synthesis method. Recombinase and sgRNA in the presence of Mg2+ (e.g., 20mM HEPES pH7.5, 100mM KCl, 5mM MgCl) 21mM DTT, 5% glycerol) and the reaction is initiated by introducing the target DNA comprising a sequence complementary to the targeting sequence and the PAM sequence. Cleavage of the DNA is monitored by a suitable assay (e.g., agarose gel electrophoresis followed by ethidium bromide staining (or similarly acting DNA intercalators) and UV visualization).

EXAMPLE 6- (general protocol) identification/validation of PAM sequences of the endonucleases described herein

The PAM sequence was determined by sequencing a plasmid containing randomly generated PAM sequences that could be cleaved by a putative endonuclease expressed in an e.coli lysate based expression system (myTXTL, Arbor Biosciences). In this system, E.coli codon-optimized nucleotide sequences were transcribed and translated from the PCR fragment under the control of the T7 promoter. A second PCR fragment with tracr sequence under the T7 promoter and a minimal CRISPR array consisting of T7 promoter followed by repeat-spacer-repeat sequence was transcribed in the same reaction. Successful expression of the endonuclease and tracr sequences in the TXTL system followed by CRISPR array processing provides active in vitro CRISPR nuclease complexes.

A library of target plasmids containing a spacer sequence matching the spacer sequence in the minimal array followed by 8N mixed bases (the putative PAM sequence) was incubated with the output of the TXTL reaction. After 1-3 hours, the reaction is stopped and the DNA is recovered by a DNA purification kit (e.g., Zymo DCC, AMPure XP beads, QiaQuick, etc.). The adaptor sequence was blunt-ended to DNA with an active PAM sequence that had been cleaved by the endonuclease, whereas uncut DNA was not. DNA fragments containing active PAM sequences were then amplified by PCR using primers specific for the library and adaptor sequences. The PCR amplification products were resolved on a gel to identify amplicons corresponding to the cleavage event. The amplified segment of the cleavage reaction is also used as a template for the preparation of NGS libraries. Sequencing this generated library, which is a subset of the starting 8N library, revealed sequences comprising the correct PAM for the active CRISPR complex. For PAM testing using a single RNA construct, the same procedure was repeated except that in vitro transcribed RNA was added with the plasmid library and tracr/minimal CRISPR array template was omitted. For the endonuclease in the case of the preparation of the NGS library, seqLogo (see, e.g., Huber et al Nat methods.2015.2 months; 12 (2): 115-21) indicates that it was constructed and shown in FIG. 27, FIG. 38, FIG. 29, FIG. 30, FIG. 31, FIG. 32, FIG. 33, FIG. 34 and FIG. 35. The seqLoco module used to construct these representations uses a positional weight matrix of DNA sequence motifs (e.g., PAM sequences) and plots the corresponding sequence tags introduced by Schneider and Stephens (see, e.g., Schneider et al Nucleic Acids Res.1990, 25.10/18 (20): 6097-. The characters representing the sequences in the seqLogo representation have been stacked on top of each other for each position in the aligned sequence (e.g., PAM sequence). The height of each letter is proportional to the frequency of its occurrence and the letters have been sorted, so are most commonly on top.

Example 7- (general protocol) RNA folding of tracrRNA and sgRNA structures

Andronecus et al bioinformatics.2007, 7.1.7; 23(13): the method of i19-28 (the entire contents of which are incorporated by reference) calculates the fold structure of the guide RNA sequence at 37 ℃. Predicted structures of exemplary sgrnas described herein are shown in fig. 21, 22, 23, 24, 25, and 26.

Example 8- (general protocol) efficiency of in vitro cleavage of MG CRISPR complex

In the protease deficient E.coli B strain, the endonuclease was expressed as a His-tagged fusion protein from the inducible T7 promoter. Cells expressing the His-tagged protein were lysed by sonication, and the His-tagged protein was purified by Ni-NTA affinity chromatography on a HisTrap FF column (GE Lifescience) on AKTA Avant FPLC (GE Lifescience). The eluate was resolved on an acrylamide gel (Bio-Rad) by SDS-PAGE and stained with InstantBlue Ultrafast Coomassie Brilliant blue (Sigma-Aldrich). The purity was determined by densitometry of the protein bands using ImageLab software (Bio-Rad). The purified endonuclease was dialyzed into a storage buffer (pH7.5) composed of 50mM Tris-HCl, 300mM NaCl, 1mM TCEP, 5% glycerol and stored at-80 ℃.

Target DNA containing spacer and PAM sequences (e.g., as determined in example 6) was constructed by DNA synthesis. When the PAM has degenerate bases, a representative single PAM is selected for testing. The target DNA comprised 2200bp linear DNA amplified by PCR from a plasmid with PAM and spacer located 700bp from one end. Successful cleavage yielded fragments of 700 and 1500 bp. Target DNA, in vitro transcribed single RNA and purified recombinant protein in lysis buffer (10mM Tris, 100mM NaCl, 10mM MgCl)2) Is mixed with excess protein and RNA and incubated for 5 minutes to 3 hours, typically 1 hour. The reaction was stopped by adding RNAse a and incubated for 60 min. The reaction was then resolved on a 1.2% TAE agarose gel and the fraction of cleaved target DNA was quantified in ImageLab software.

Example 9- (general protocol) testing of genome cleavage Activity of MG CRISPR Complex in E.coli

Coli lacks the ability to repair double-stranded DNA breaks efficiently. Thus, cleavage of genomic DNA can be a fatal event. Using this phenomenon, endonuclease activity was tested in e.coli by recombinant expression of endonuclease and tracrRNA in the target strain, integrating the spacer/target and PAM sequences in its genomic DNA.

In this assay, the PAM sequence is specific to the endonuclease being tested, as determined by the method described in example 6. The sgRNA sequence was determined based on the sequence and predicted structure of the tracrRNA. Repeat-anti-repeat pairings of 8-12bp (usually 10bp) are selected starting from the 5' end of the repeat. The remaining 3 'end of the repeat and the 5' end of the tracrRNA were replaced with four cycles. Typically, the tetracycle is GAAA, but other tetracycles may also be used, particularly if the GAAA sequence is predicted to interfere with folding. In these cases, TTCG tetracycle was used.

Engineered strains with PAM sequences integrated into their genomic DNA were transformed with DNA encoding an endonuclease. The transformants were then chemically competent and transformed with 50ng of a single guide RNA specific for the target sequence ("on-target") or non-specific for the target ("off-target"). After heat shock, transformation was resumed in SOC for 2 hours at 37 ℃. Nuclease efficiency was then determined by a 5-fold dilution series grown on induction medium. Colonies were quantified from dilution series in triplicate.

Example 10- (general protocol) testing of genome cleavage Activity of MG CRISPR complexes in mammalian cells

To show targeting and cleavage activity in mammalian cells, the MG Cas effector protein sequence was tested in two mammalian expression vectors: (a) a C-terminal SV40NLS and 2A-GFP tag, and (b) an absence of GFP tag with two SV40NLS sequences, one N-terminal and one C-terminal. In some cases, the nucleotide sequence encoding the endonuclease is codon optimized for expression in a mammalian cell.

The corresponding single guide RNA sequence (sgRNA) with the targeting sequence was cloned into a second mammalian expression vector. These two plasmids were co-transfected into HEK293T cells. 72 hours after co-transfection of the expression plasmid and sgRNA targeting plasmid into HEK293T cells, DNA was extracted and used to prepare NGS libraries. The percentage NHEJ was measured by indel markers in target site sequencing to demonstrate the targeting efficiency of the enzyme in mammalian cells. At least 10 different target sites were selected to test the activity of each protein.

Example 11 characterization of members of the MG1 family

PAM specificity, tracrRNA/sgRNA validation

The targeted endonuclease activity of the MG1 family endonuclease system was demonstrated using the myTXTL system as described in example 6. In this assay, PCR amplification of the cleaved target plasmid produced products that migrated about 170bp in the gel, as shown in FIGS. 17-20. Amplification products of MG1-4 (double guide: see lane 3 of gel 1 and single guide: see lane 2 of gel 6), MG1-5 (see lane 10 of gel 2), MG1-6 (double guide: see lane 6 of gel 5 and single guide: see lane 5 of gel 6) and MG1-7 (double guide: see lane 13 of gel 3 and single guide: see lane 2 of gel 3) were observed (proteins SEQ ID NO:1-4, respectively). Sequencing of the PCR products revealed active PAM sequences for these enzymes as shown in table 2.

Table 2: PAM sequence specificity of MG1 enzyme and related data

Synthetic single guide RNA (sgRNA) was designed based on the sequence and predicted structure of tracrRNA and is shown as SEQ ID NO 5461-5464. The PAM sequence screen of example 6 was repeated with sgRNA. The results of this experiment are also shown in table 2, revealing that PAM specificity was slightly changed when sgRNA was used.

In vitro targeting of endonuclease activity

The in vitro activity of the MG1-4 endonuclease system (proteins SEQ ID NO:1 and sgRNA SEQ ID NO:5461) on target DNA having the PAM sequence CAGGAAGG was verified using the method of example 8. The single guide sequence reported above (SEQ ID NO:5461) with different spacer/targeting sequence lengths (N of the replacement sequence) of 18-24nt was used. The results are shown in fig. 10, where the left panel shows a gel demonstrating DNA cleavage by MG1-4 binding to corresponding single guide sgrnas with different targeting sequence lengths (18-24nt), and the right panel shows the same data quantified as a bar graph. The data indicate that a targeting sequence of 18-24 nucleotides works with the MG1-4/sgRNA system.

Targeted endonuclease activity in bacterial cells

The PAM sequence CAGGAAGG as in example 9 was used to test the in vivo activity of the MG1-4 endonuclease system (protein SEQ ID NO:1, sgRNA SEQ ID NO: 5461). Transformed E.coli were plated in serial dilutions (plate) and the results (left panel showing E.coli serial dilutions and right panel showing quantitative growth) are shown in FIG. 11. A significant reduction in growth of e.coli expressing the target sgRNA compared to e.coli expressing non-target sgrnas indicates that the genomic DNA is specifically cleaved by the endonuclease in the e.coli cells.

Targeted endonuclease activity in mammalian cells

The method of example 10 was used to demonstrate targeting and cleavage activity in mammalian cells. The open reading frames encoding the MG1-4 (protein SEQ ID NO:5527) and MG1-6 (protein SEQ ID NO:5529) sequences were cloned into 2 mammalian expression vectors, one with a C-terminal SV40 NLS and a 2A-GFP tag (E.coli MG-BB), one without a GFP tag, with 2 NLS sequences, one at the N-terminus and one at the C-terminus (E.coli pMG 5-BB). For MG1-6, the open reading frame was also codon optimized for mammalian expression (SEQ ID NO:5589) and cloned into the 2-NLS plasmid backbone (MG-16 hs). The results of this experiment are shown in fig. 12. An endonuclease expression vector is co-transfected into HEK293T cells with a second vector for expressing a sgRNA (e.g., SEQ ID NO:5512 or 5515) having a tracr sequence specific for the endonuclease and a guide sgRNA selected from tables 3-4. 72 hours after co-transfection, DNA was extracted and used to prepare an NGS library. Cleavage activity was detected by the appearance of an internal deletion (NHEJ remnant) of the sequence proximal to the target site. The percentage NHEJ was measured by indel markers in target site sequencing to demonstrate the targeting efficiency of the enzyme in mammalian cells and is shown in figure 12.

Table 3: MG1-4 mammal targeting sequence

Table 4: MG1-6 mammal targeting sequence

Example 12 characterization of members of the MG2 family

PAM specificity, tracrRNA/sgRNA validation

The targeted endonuclease activity of MG2 family members was demonstrated in the myTXTL system as described in example 6. The results of this assay are shown in FIGS. 17-20. In the assays shown in FIGS. 17-20, successful cleavage of the active proteins of the library produced a band of approximately 170bp in the gel. Amplification products of MG2-1 (see gel 2 lane 11 and gel 4 lane 6) and MG2-7 (see gel 11 lane 10) were observed (SEQ ID NOS: 320 and 321, respectively). Sequencing of the PCR products revealed active PAM sequences in table 5 below:

table 5: PAM sequence specificity of MG2 enzyme and related data

Targeted endonuclease activity in bacterial cells

The in vivo activity of the MG2-7 endonuclease system with the sgRNA (the endonuclease SEQ ID NO: 321; sgRNA SEQ ID NO:5465) and AGCGTAAG PAM sequences was demonstrated using the method described in example 9. Transformed E.coli were plated in serial dilutions and the results (E.coli serial dilutions are shown in the left panel and quantitative growth in the right panel) are shown in FIG. 34. A significant reduction in production of e.coli expressing the target sgRNA compared to e.coli expressing non-target sgrnas indicates that genomic DNA is specifically cleaved by the MG1-4 endonuclease in e.coli cells.

Example 13 characterization of members of the MG3 family

PAM specificity, tracrRNA/sgRNA validation

The targeted endonuclease activity of MG3 family members was confirmed using the myTXTL system as described in example 6 using tracr sequences and CRISPR arrays. In this assay, PCR amplification of the cleaved target plasmid produced products that migrated about 170bp in the gel, as shown in FIGS. 17-20. Amplification products of MG3-6 (double guide: see lane 8 on gel 2 and single guide: see lane 3 on gel 3), MG3-7 (double guide: see lane 3 on gel 2 and single guide: see lane 4 on gel 3) and MG3-8 (double guide: see lane 5 on gel 9) were observed (proteins SEQ ID NO:421, 422 and 423, respectively). Sequencing of the PCR products revealed active PAM sequences in table 6 below:

table 6: PAM sequence specificity of MG3 enzyme and related data

Synthetic single guide RNA (sgRNA) was designed based on the sequence and predicted structure of tracrRNA and is shown as SEQ ID NO 5466-5467. The PAM sequence screen of example 6 was repeated with sgRNA. The results of this experiment are also shown in table 6, indicating that PAM specificity was slightly changed when sgRNA was used.

In vitro targeting of endonuclease activity

The in vitro activity of MG3-6 (endonuclease SEQ ID NO:421) was verified using the method of example 8 with the PAM sequence GTGGGTTA. The single guide sequence reported above (SEQ ID NO:5466) with different spacer/targeting sequence lengths (N of the replacement sequence) of 18-24nt was used. The results are shown in fig. 13, where the upper panel shows a gel demonstrating DNA cleavage by MG3-6 binding to different sgrnas with different targeting sequence lengths (18-24nt), and the lower panel shows the same data quantified as a bar graph. The data indicate that a targeting sequence of 18-24 nucleotides works with the MG3-6/sgRNA system.

Targeted endonuclease activity in bacterial cells

The in vivo activity of the MG3-7 endonuclease system (protein SEQ ID NO: 422; sgRNA SEQ ID NO:5467) was tested with the PAM sequence TGGACCTG using the method of example 9. Transformed E.coli were plated in serial dilutions and the results (E.coli serial dilutions are shown in the upper panel and quantitative growth in the lower panel) are shown in FIG. 14. A significant reduction in growth of e.coli expressing the target sgRNA compared to e.coli expressing non-target sgrnas reveals specific cleavage of genomic DNA by the MG3-7 endonuclease system.

Targeted endonuclease activity in mammalian cells

The method of example 10 was used to demonstrate targeting and cleavage activity in mammalian cells. The open reading frame encoding MG3-7 (protein SEQ ID NO:422) was cloned into 2 mammalian expression vectors, one with a C-terminal SV40 NLS and a 2A-GFP tag (E.coli MG-BB), and the other without a GFP tag and with 2 NLS columns, one at the N-terminus and one at the C-terminus (E.coli pMG 5-BB). The endonuclease expression vector was co-transfected into HEK293T cells with a second vector for expression of the sgrnas described above with the guide sequences selected from table 7. The results of this experiment are shown in fig. 12. DNA was extracted 72 hours after co-transfection and used to prepare NGS libraries. Cleavage activity was detected by the appearance of an internal deletion (NHEJ remnant) near the target site. The results are shown in FIG. 15.

The target sites encoded on the sgRNA plasmids are shown in table 7 below.

Table 7: MG3-7 mammal targeting sequence

Example 13 characterization of members of the MG4 family

PAM specificity, tracrRNA/sgRNA validation

The targeted endonuclease activity of the MG4 family endonuclease system was demonstrated using the myTXTL system as described in example 6. In this assay, PCR amplification of the cleaved target plasmid produced products that migrated about 170bp in the gel, as shown in FIGS. 17-20. Amplification product of MG4-2 was observed (double guide: see lane 9 on gel 2 and single guide: see lane 7 on gel 10) (SEQ ID NO: 432). Sequencing of the PCR products revealed active PAM sequences as in table 8 below.

Table 8: PAM sequence specificity of MG4 enzyme and related data

Example 14 characterization of members of the MG14 family

PAM specificity, tracrRNA/sgRNA validation

The targeted endonuclease activity of MG14 family members was demonstrated using the myTXTL system as described in example 6. In this assay, PCR amplification of the cleaved target plasmid produced products that migrated about 170bp in the gel, as shown in FIGS. 17-20. Amplification product of MG14-1 was observed (double guide: see lane 4 on gel 1 and single guide: see lane 8 on gel 3) (SEQ ID NO: 678). Sequencing of the PCR products revealed the active PAM sequence specificity shown in table 9 below.

Targeted endonuclease activity in bacterial cells

The in vivo activity of the MG14-1 endonuclease system with the sgRNA (the endonuclease SEQ ID NO: 678; sgRNA SEQ ID NO:5469) and GGCGGGGA PAM sequences was confirmed using the method described in example 9. Transformed E.coli were plated in serial dilutions and the results (E.coli serial dilutions are shown in the left panel and quantitative growth in the right panel) are shown in FIG. 35. A significant reduction in growth of e.coli expressing the target sgRNA compared to e.coli expressing non-target sgrnas indicates that genomic DNA is specifically cleaved by the MG1-4 endonuclease in e.coli cells.

Example 15 characterization of members of the MG15 family

PAM specificity, tracrRNA/sgRNA validation

The targeted endonuclease activity of MG15 family members was demonstrated using the myTXTL system as described in example 6. In this assay, PCR amplification of the cleaved target plasmid produced products that migrated about 170bp in the gel, as shown in FIGS. 17-20. Amplification product of MG15-1 was observed (double guide: see lane 7 on gel 7 and single guide: see lane 9 on gel 3) (SEQ ID NO: 930). Sequencing of the PCR products revealed the active PAM sequence specificity detailed in table 10 below.

Table 10:

in vitro Activity

The in vitro activity of the MG15-1 endonuclease system (protein SEQ ID NO: 930; sgRNA SEQ ID NO:5470) was tested with the PAM sequence GGGTCAAA using the method of example 8. The single targeting sequence (SEQ ID NO:5470) reported above with different spacer/targeting sequence lengths (N of the replacement sequence) of 18-24nt was used. The results are shown in fig. 16, where the upper panel shows a gel demonstrating DNA cleavage by MG15-1 binding to different sgrnas with different targeting sequence lengths (18-24nt), and the lower panel shows the same data quantified as a bar graph. The data indicate that a targeting sequence of 18-24 nucleotides works with the MG15-1/sgRNA system.

Targeted endonuclease activity in bacterial cells

The in vivo activity of the MG15-1 endonuclease system with the sgRNA (endonuclease SEQ ID NO: 930; sgRNA SEQ ID NO:5470) and GGGTCAAA PAM sequences was confirmed using the method described in example 9. Transformed E.coli were plated in serial dilutions and the results (E.coli serial dilutions are shown in the left panel and quantitative growth in the right panel) are shown in FIG. 35. A significant reduction in growth of e.coli expressing the target sgRNA compared to e.coli expressing non-target sgrnas indicates that genomic DNA is specifically cleaved by the MG1-4 endonuclease in e.coli cells.

Example 16 characterization of members of the MG16 family

PAM specificity, tracrRNA/sgRNA validation

The targeted endonuclease activity of MG16 family members was demonstrated using the myTXTL system as described in example 6. In this assay, PCR amplification of the cleaved target plasmid produced products that migrated about 170bp in the gel, as shown in FIGS. 17-20. Amplification product of MG16-2 was observed (see gel 11, lane 17) (SEQ ID NO: 1093). Sequencing of the PCR products revealed the active PAM sequence specificity detailed in table 11 below.

Table 11:

example 17 characterization of members of the MG18 family

PAM specificity, tracrRNA/sgRNA validation

The targeted endonuclease activity of MG18 family members was demonstrated using the myTXTL system as described in example 6. In this assay, PCR amplification of the cleaved target plasmid produced products that migrated about 170bp in the gel, as shown in FIGS. 17-20. Amplification product of MG18-1 was observed (double guide: see lane 9 on gel 9 and single guide: see lane 12 on gel 11) (SEQ ID NO: 1354). Sequencing of the PCR products revealed the active PAM sequence specificity detailed in table 12 below.

Table 12:

example 18 characterization of members of the MG21 family

PAM specificity, tracrRNA/sgRNA validation

The targeting endonuclease activity of the MG21 family was demonstrated using the myTXTL system as described in example 6. In this assay, PCR amplification of the cleaved target plasmid produced products that migrated about 170bp in the gel, as shown in FIGS. 17-20. Amplification product of MG21-1 was observed (see lane 2 of gel 11) (SEQ ID NO: 1512). Sequencing of the PCR products revealed the active PAM sequence specificity detailed in table 13 below.

Table 13:

example 19 characterization of members of the MG22 family

PAM specificity, tracrRNA/sgRNA validation

The targeted endonuclease activity of MG22 family members was demonstrated using the myTXTL system as described in example 6. In this assay, PCR amplification of the cleaved target plasmid produced products that migrated about 170bp in the gel, as shown in FIGS. 17-20. In the assays shown in FIGS. 17-20, successful cleavage of the active proteins of the library produced a band of approximately 170bp in the gel. Amplification product of MG22-1 was observed (see lane 3 of gel 11) (protein SEQ ID NO: 1656). Sequencing of the PCR products revealed the active PAM sequence specificity detailed in table 14 below.

Table 14:

example 20 characterization of members of the MG23 family

PAM specificity, tracrRNA/sgRNA validation

The targeted endonuclease activity of MG23 family members was demonstrated using the myTXTL system as described in example 6. In this assay, PCR amplification of the cleaved target plasmid produced products that migrated about 170bp in the gel, as shown in FIGS. 17-20. Amplification product of MG23-1 was observed (see lane 4 of gel 11) (SEQ ID NO: 1756). Sequencing of the PCR products revealed the active PAM sequence specificity of these enzymes, detailed in table 15 below.

Table 15:

the systems of the present disclosure can be used for a variety of applications, such as nucleic acid editing (e.g., gene editing), binding to nucleic acid molecules (e.g., sequence-specific binding). For example, such systems can be used to address (e.g., remove or replace) genetic mutations that may cause disease in a subject; inactivating the gene to determine its function in the cell; as a diagnostic tool for the detection of pathogenic genetic elements (e.g., by cleavage of retroviral RNA or amplified DNA sequences encoding pathogenic mutations); as an inactive enzyme to target and detect a particular nucleotide sequence (e.g., a sequence encoding antibiotic resistance in bacteria); inactivation or inability of the virus to infect the host cell by targeting the viral genome; adding genes or modifying metabolic pathways to engineer organisms to produce valuable small molecules, macromolecules or secondary metabolites; establishing a gene driving element for evolutionary selection; as a biosensor to detect cell perturbations caused by foreign small molecules and nucleotides.

While preferred embodiments of the present invention have been shown and described herein, it will be obvious to those skilled in the art that such embodiments are provided by way of example only. It is not intended that the invention be limited to the specific embodiments provided in the specification. While the invention has been described with reference to the foregoing specification, the descriptions and illustrations of the embodiments herein are not meant to be construed in a limiting sense. Numerous variations, changes, and substitutions will now occur to those skilled in the art without departing from the invention. Further, it is to be understood that all aspects of the present invention are not limited to the specific depictions, configurations or relative proportions set forth herein which depend upon a variety of conditions and variables. It should be understood that various alternatives to the embodiments of the invention described herein may be employed in practicing the invention. It is therefore contemplated that the present invention shall also cover any such alternatives, modifications, variations or equivalents. It is intended that the following claims define the scope of the present disclosure and that methods and structures within the scope of these claims and their equivalents be covered thereby.

130页详细技术资料下载
上一篇:一种医用注射器针头装配设备
下一篇:CRISPR/CAS融合蛋白和系统

网友询问留言

已有0条留言

还没有人留言评论。精彩留言会获得点赞!

精彩留言,会给你点赞!