VI-E type and VI-F type CRISPR-Cas system and application

文档序号:900156 发布日期:2021-02-26 浏览:2次 中文

阅读说明:本技术 VI-E型和VI-F型CRISPR-Cas系统及用途 (VI-E type and VI-F type CRISPR-Cas system and application ) 是由 杨辉 胥春龙 周英思 肖庆全 于 2020-05-11 设计创作,主要内容包括:本发明提供了新型CRISPR/Cas组合物及它们在靶向核酸中的用途。具体而言,本发明提供了非天然存在或经工程化改造的RNA靶向系统,这些系统含有一个靶向RNA的新型Cas13e或Cas13f效应蛋白,以及至少一种靶向核酸组分,例如一个向导RNA(gRNA)或crRNA。所述新型Cas效应蛋白是已知的Cas效应蛋白中最小的一种,大小约为800个氨基酸,因此特别适合用于小容量载体(例如AAV载体)中进行递送。(The present invention provides novel CRISPR/Cas compositions and their use in targeting nucleic acids. In particular, the invention provides non-naturally occurring or engineered RNA targeting systems comprising a novel Cas13e or Cas13f effector protein targeting RNA, and at least one targeting nucleic acid component, such as a guide RNA (grna) or crRNA. The novel Cas effector protein is the smallest of the known Cas effector proteins, is about 800 amino acids in size, and is therefore particularly suitable for delivery in small volume vectors (e.g., AAV vectors).)

1. A Clustered Regularly Interspaced Short Palindromic Repeats (CRISPR) -Cas complex comprising:

(1) an RNA guide sequence comprising a spacer sequence capable of hybridizing to a target RNA, and a (DR) sequence 3' that is a direct repeat of the spacer sequence; and the combination of (a) and (b),

(2) a CRISPR-associated protein (Cas) having the amino acid sequence of SEQ ID NO:1-7, or a derivative or functional fragment of said Cas;

wherein the Cas, derivatives and functional fragments of Cas are capable of (i) binding to the RNA guide sequence, and (ii) targeting the target RNA,

with the proviso that when said complex comprises SEQ ID NO:1-7, the spacer sequence is not 100% complementary to a naturally occurring phage nucleic acid.

2. The CRISPR-Cas complex of claim 1, wherein said DR sequence has an amino acid sequence identical to SEQ ID NO: a secondary structure substantially identical to the secondary structure of any one of 8 to 14.

3. The CRISPR-Cas complex of claim 1, wherein said DR sequence consists of SEQ ID NO: 8-14.

4. The CRISPR-Cas complex of claim 1, 2 or 3, wherein said target RNA is encoded by a eukaryotic DNA.

5. The CRISPR-Cas complex of claim 4, wherein the eukaryotic DNA is a non-human mammalian DNA, non-human primate DNA, human DNA, plant DNA, insect DNA, bird DNA, reptile DNA, rodent DNA, fish DNA, worm/nematode DNA, yeast DNA.

6. The CRISPR-Cas complex of any of claims 1-5, wherein said target RNA is an mRNA.

7. The CRISPR-Cas complex of any of claims 1-6, wherein said spacer sequence is between 15-60 nucleotides, between 25-50 nucleotides or about 30 nucleotides.

8. The CRISPR-Cas complex of any of claims 1-7, wherein said spacer sequence is 90-100% complementary to said target RNA.

9. The CRISPR-Cas complex of any of claims 1-8, wherein said derivative comprises the amino acid sequence of SEQ ID NO:1-7, or a conservative amino acid substitution of one or more residues thereof.

10. The CRISPR-Cas complex of claim 9, wherein said derivative comprises only conservative amino acid substitutions.

11. The CRISPR-Cas complex of any of claims 1-10, wherein said derivative comprises a sequence identical to SEQ ID NO:1-7, or a wild-type Cas.

12. The CRISPR-Cas complex of any of claims 1-9, wherein said derivative is capable of binding to said RNA guide sequence hybridized to a target RNA but said derivative is devoid of rnase catalytic activity due to rnase catalytic site mutation.

13. The CRISPR-Cas complex of claim 12, wherein said derivative has no more than 210 residues deleted at the N-terminus, and/or no more than 180 residues deleted at the C-terminus.

14. The CRISPR-Cas complex of claim 13, wherein said derivative is deleted of about 180 residues at the N-terminus and/or about 150 residues at the C-terminus.

15. The CRISPR-Cas complex of any of claims 12-14, wherein said derivative further comprises an RNA base editing domain.

16. The CRISPR-Cas complex of claim 15, wherein said RNA base editing domain is an adenosine deaminase, such as a double stranded RNA-specific adenosine deaminase (such as ADAR1 or ADAR 2); or apolipoprotein B mRNA editing enzyme; or catalytic polypeptide-like (APOBEC); or activating an induced cytidine deaminase (AID).

17. The CRISPR-Cas complex of claim 16, wherein said ADAR2 has the E488Q/T375G double mutation or said ADAR2 is ADAR2 DD.

18. The CRISPR-Cas complex of any of claims 15-17, wherein said base-editing domain is further fused to an RNA-binding structure (e.g., MS2) domain.

19. The CRISPR-Cas complex of any of claims 12-14, wherein said derivative further comprises an RNA methyltransferase, an RNA demethylase, an RNA splicing modifier, a localization factor, or a translation modifier.

20. The CRISPR-Cas complex of any of claims 1-19, wherein said Cas, said derivative or functional fragment comprises a Nuclear Localization Signal (NLS) sequence or a Nuclear Export Signal (NES).

21. The CRISPR-Cas complex of any of claims 1-20, wherein targeting said target RNA results in modification of the target RNA.

22. The CRISPR-Cas complex of claim 21, wherein said modification of the target RNA is said one cleavage of the target RNA.

23. The CRISPR-Cas complex of claim 21, wherein the modification of the target RNA is deamination of an adenosine (a) to an inosine (I).

24. The CRISPR-Cas complex of any of claims 1-23, further comprising a target RNA comprising a sequence capable of hybridizing to said spacer sequence.

25. A fusion protein comprising (1) a Cas, a derivative thereof, or a functional fragment thereof of any one of claims 1-24, and (2) one heterologous functional domain.

26. The fusion protein of claim 25, wherein the heterologous functional machinery domain comprises: a Nuclear Localization Signal (NLS), a reporter protein or detection tag (e.g., GST, HRP, CAT, GFP, HcRed, DsRed, CFP, YFP, BFP), a localization signal, a protein targeting moiety, a DNA binding domain (e.g., MBP, Lex A DBD, Gal4DBD), an epitope tag (e.g., His, myc, V5, FLAG, HA, VSV-G, Trx, etc.), a transcriptional activation domain (e.g., VP64 or VPR), a transcriptional repression domain (e.g., KRAB or SID), a nuclease (e.g., FokI), a deamination domain (e.g., ADAR1, ADAR2, APOBEC, AID or TAD), a methylase, a demethylase, a transcriptional release factor, an HDAC, a polypeptide having ssRNA cleavage activity, a polypeptide having ssDNA cleavage activity, a dsDNA or RNA cleavage activity, or a combination of any of the above.

27. The fusion protein of claim 25 or 26, wherein the heterologous functional domain is fused at the N-terminus, C-terminus, or within the fusion protein.

28. A conjugate comprising (1) a Cas, a derivative thereof or a functional fragment thereof according to any one of claims 1 to 24, and (1) conjugated to (2) one heterologous functional moiety.

29. The conjugate of claim 28, wherein the heterologous functional moiety comprises: a Nuclear Localization Signal (NLS), a reporter protein or detection tag (e.g., GST, HRP, CAT, GFP, HcRed, DsRed, CFP, YFP, BFP), a localization signal, a protein targeting moiety, a DNA binding domain (e.g., MBP, Lex A DBD, Gal4DBD), an epitope tag (e.g., His, myc, V5, FLAG, HA, VSV-G, Trx, etc.), a transcription activation domain (e.g., VP64 or VPR), a transcription inhibition domain (e.g., KRAB moiety or SID moiety), a nuclease (e.g., FokI), a deamination domain (e.g., ADAR1, ADAR2, APOBEC, AID or TAD), a methylase, a demethylase, a transcription release factor, an HDAC, a polypeptide having ssRNA cleavage activity, a polypeptide having ssDNA cleavage activity, a polypeptide having dsDNA cleavage activity, A DNA or RNA ligase, or any combination thereof.

30. The conjugate of claim 28 or 29, wherein the heterologous functional moiety is conjugated at the N-terminus, C-terminus or inside the Cas, the derivative thereof or the functional fragment thereof.

31. An isolated nucleic acid encoding SEQ ID NO:1-7, or a derivative thereof, or a functional fragment thereof, or a fusion protein thereof, with the proviso that said polynucleotide is not SEQ ID NO: 15-21.

32. The polynucleotide of claim 31 which has been codon optimized for expression in a cell.

33. The polynucleotide of claim 32, wherein the cell is a eukaryotic cell.

34. A non-naturally occurring polynucleotide comprising SEQ ID NO: 8-14, wherein the derivative (i) has substantial identity to SEQ ID NO: 8-14 with one or more (e.g., 1, 2, 3, 4, 5, 6, 7, 8, 9, or 10) nucleotide additions, deletions, or substitutions; (ii) and SEQ ID NO: any one of 8-14 has at least 20%, 30%, 40%, 50%, 60%, 70%, 80%, 90%,

95% or 97% sequence identity; (iii) under stringent conditions with SEQ ID NO: 8-14, or hybridizes to any one of (i) and (ii); or (iv) is the complement of any one of (i) - (iii), with the proviso that the derivative is not SEQ ID NO: 8-14, and the derivative encodes an RNA, or is itself an RNA, which hybridizes to SEQ ID NO: any RNA encoded by 8-14 retains essentially the same secondary structure.

35. The non-naturally occurring polynucleotide of claim 34, wherein the derivative is used as a DR sequence in a Cas, a derivative thereof or a functional fragment thereof of any one of claims 1-24.

36. A vector comprising the polynucleotide of any one of claims 31-35.

37. The vector of claim 36, wherein said polynucleotide is operably linked to a promoter and an optionally selected enhancer.

38. The vector of claim 37, wherein said promoter is a constitutive promoter, an inducible promoter, a broad-spectrum expression promoter, or a tissue-specific promoter.

39. The vector of any one of claims 36-38 which is a plasmid.

40. The vector of any one of claims 36-38, which is a retroviral vector, a phage vector, an adenoviral vector, a Herpes Simplex Virus (HSV) vector, an AAV vector, or a lentiviral vector.

41. The vector of claim 40, wherein the AAV vector is a recombinant AAV vector of serotype AAV1, AAV2, AAV4, AAV5, AAV6, AAV7, AAVrh74, AAV8, AAV9, AAV10, AAV 11, AAV 12, or AAV 13.

42. A delivery system comprising (1) one delivery vector, and (2) the CRISPR-Cas complex of any of claims 1-24, the fusion protein of any of claims 25-27, the conjugate of any of claims 28-30, the polynucleotide of any of claims 31-33, or the vector of any of claims 36-41.

43. The delivery system of claim 42, wherein the delivery vehicle is a nanoparticle, liposome, exosome, microvesicle, or gene-gun.

44. A cell or progeny thereof comprising the CRISPR-Cas complex of any one of claims 1-24, the fusion protein of any one of claims 25-27, the conjugate of any one of claims 28-30, or the vector of any one of claims 31-33.

45. The cell of claim 44, or progeny thereof, which is a eukaryotic cell (e.g., a non-human mammalian cell, a human cell, or a plant cell) or a prokaryotic cell (e.g., a bacterial cell).

46. A non-human multicellular eukaryote comprising the cell of claim 44 or 45.

47. The non-human multicellular eukaryote of claim 46 being an animal model for a human genetic disease (e.g., a rodent or primate).

48. A method of modifying a target RNA, the method comprising contacting a target RNA with the CRISPR-Cas complex of any of claims 1-24, wherein a spacer sequence is complementary to the target RNA for at least 15 nucleotides; and, wherein the Cas, the derivative or the functional fragment thereof binds to the RNA guide sequence to form a complex; wherein the complex binds to a target RNA; wherein the Cas, derivative or functional fragment modifies the target RNA when the complex binds to the target RNA.

49. The method of claim 48, wherein the target RNA is modified by Cas cleavage.

50. The method of claim 48, wherein said target RNA is modified by deamination with a derivative comprising a double stranded RNA specific adenosine deaminase.

51. The method of any one of claims 48-50, wherein the target RNA is an mRNA, tRNA, rRNA, noncoding RNA, IncRNA, or nuclear RNA.

52. The method of any one of claims 48-51, wherein the Cas, derivative and functional fragment do not exhibit substantial (or detectable) accessory RNase activity after binding of the complex to the target RNA.

53. The method of any one of claims 48-52, wherein the target RNA is within one cell.

54. The method of claim 53, wherein said cell is a cancer cell.

55. The method of claim 53, wherein said cell is infected with an infectious agent.

56. The method of claim 55, wherein said infectious agent is a virus, prion, protozoan, fungus, or parasite.

57. The method of any of claims 53-56, wherein the CRISPR-Cas complex is encoded by a first polynucleotide encoding a sequence of any one of SEQ ID NOs: 1-7 or a derivative or functional fragment thereof, said second polynucleotide comprising the sequence of any one of SEQ ID NOs: 8-14 and a sequence encoding a spacer RNA capable of binding to a target RNA, wherein the first nucleotide and the second polynucleotide are introduced into the cell.

58. The method of claim 57, wherein the first polynucleotide and the second polynucleotide are introduced into the cell via the same vector.

59. The method of any one of claims 53-58, comprising one or more of: (i) inducing cellular senescence in vitro or in vivo; (ii) cell cycle arrest in vitro or in vivo; (iii) inhibition of cell growth and/or inhibition of cell growth in vitro or in vivo; (iv) induction of apoptosis in vitro or in vitro; (v) inducing apoptosis in vitro or in vitro; (vi) necrosis is induced in vitro or in vitro.

60. A method of treating a disorder or disease in a subject in need thereof, the method comprising: administering to the subject a composition comprising the CRISPR-Cas complex of any of claims 1-24 or a polynucleotide encoding said complex; wherein the spacer sequence is complementary to at least 15 nucleotides of the target RNA associated with the disease or disorder; wherein the Cas, derivative or functional fragment binds to an RNA guide sequence to form a complex; wherein the complex binds to the target RNA; and when the complex binds to the target RNA, the Cas, the derivative or the functional fragment thereof cleaves the target RNA, thereby effecting treatment of the disorder or disease in the subject.

61. The method of claim 60, wherein the condition or disease is a cancer or an infectious disease.

62. The method of claim 61, wherein the cancer is Wilms 'tumor, Ewing's sarcoma, neuroendocrine tumor, glioblastoma, neuroblastoma, melanoma, skin cancer, breast cancer, colon cancer, rectal cancer, prostate cancer, liver cancer, kidney cancer, pancreatic cancer, lung cancer, bile cancer, cervical cancer, endometrial cancer, esophageal cancer, stomach cancer, head and neck cancer, medullary thyroid cancer, ovarian cancer, glioma, lymphoma, leukemia, myeloma, acute lymphocytic leukemia, acute myelogenous leukemia, chronic lymphocytic leukemia, chronic myelogenous leukemia, Hodgkin's lymphoma, non-Hodgkin's lymphoma, or bladder cancer.

63. The method of any one of claims 60-62, which is an in vitro method, an in vivo method, or an ex vivo method.

64. A cell or progeny thereof obtained by the method of any of claims 48-59, wherein the cell and progeny thereof contain a non-naturally occurring modification (e.g., a non-naturally occurring modification in a transcribed RNA of the cell/progeny).

65. A method of detecting the presence of a target RNA, the method comprising contacting the target RNA with a composition comprising a fusion protein of any one of claims 25-27, or a conjugate of any one of claims 28-30, or a polynucleotide encoding the fusion protein, wherein the fusion protein or conjugate comprises a detectable label (e.g., a label detectable by fluorescence, northern blot, or FISH) and a complex spacer sequence capable of binding to the target RNA.

66. A eukaryotic cell comprising a short palindromic repeat (CRISPR) -Cas complex comprising a clustered regular spacer, said CRISPR-Cas complex comprising:

(1) an RNA guide sequence comprising a spacer sequence capable of hybridizing to a target RNA, and a 3' sequence that is a Direct Repeat (DR) of said spacer sequence; and the combination of (a) and (b),

(2) a CRISPR-associated protein (Cas) having the amino acid sequence of SEQ ID NO:1-7, or a derivative or functional fragment of said Cas;

wherein the Cas, derivative and functional fragment are capable of (i) binding to an RNA guide sequence, and (ii) targeting a target RNA.

Background

Clustered Regularly Interspaced Short Palindromic Repeats (CRISPR) are a family of DNA sequences found in the genomes of prokaryotes such as bacteria and archaea. According to current knowledge, CRISPR is thought to be derived from DNA fragments of phages invading prokaryotes, and is used to detect and destroy the DNA of these phages when prokaryotes are re-infected with similar phages.

Clustered regularly interspaced short palindromic repeat system (CRISPR-associated systems) is a set of homologous genes, otherwise known as Cas genes, a portion of which may encode a Cas protein with helicase and nuclease activities. Cas protein is an enzyme that recognizes and cleaves a specific strand of a polynucleotide (e.g., DNA) complementary to a crRNA using CRISPR sequence (crRNA) derived RNA as a guide sequence.

The CRISPR-Cas system together constitutes an original prokaryotic "immune system" that confers resistance or acquired immunity to foreign pathogenic genetic elements (such as those present in extrachromosomal DNAs such as plasmids and in bacteriophages) or to foreign RNAs encoded by foreign DNAs.

The CRISPR/Cas system is a prokaryotic defense mechanism for foreign genetic material that is widely found in nature, in about 50% of sequenced bacterial genomes and nearly 90% of archaeal sequences. The original prokaryotic system forms the basis of the CRISPR-Cas technology through research and development, and the technology can be widely applied to various eukaryotes including human beings, and can realize various applications such as basic biological research, research and development of biotechnology products, disease treatment and the like.

The CRISPR-Cas prokaryotic system consists of protein effectors, non-coding elements and locus structures with abundant varieties, and some cases of generating important biotechnology through artificial design and modification of the protein effectors are provided at present.

CRISPR locus structure has been studied in many different systems. In these systems, CRISPR arrays in genomic DNA typically comprise an AT-rich leader sequence, followed by short DR sequences separated by unique spacer sequences. These CRISPR DR sequences range in size from 23 to 55bps, typically 28 to 37bps in size. Some of the DR sequences are inverted repeats of each other, meaning that a secondary structure, such as a stem-loop structure (often referred to as a "hairpin structure"), is formed in the RNA, while other sequences are unstructured. Spacers in different CRISPR arrays vary in size, ranging from 21 to 72bps, usually from 32 to 38 bps. The repetitive spacer sequence in a CRISPR array is typically less than 50 units.

Some small clusters of cas genes often appear next to such CRISPR repeat spacer arrays. The 93 Cas genes discovered so far can be divided into 35 families according to their sequence similarity of encoded proteins, 11 of which form the so-called Cas core, including protein families of Cas1 to Cas 9. An intact CRISPR-Cas locus has at least one gene belonging to the Cas core.

CRISPR-Cas systems can be divided into two broad categories: class 1 CRISPR-Cas systems degrade foreign nucleic acids using a complex of multiple Cas proteins, and class 2 degrades foreign nucleic acids using a single large Cas effector protein. The class 2 CRISPR-Cas system is composed of single subunit effectors and can be used as a simpler component set in engineering modification and application transformation, so that the class 2 system becomes an important source for discovery, modification and optimization of novel gene editing.

The CRISPR-Cas system of the 1 st class can be divided into I, III and IV types, the CRISPR-Cas system of the 2 nd class can be divided into II, V and VI types, and the 6 types can be further divided into 19 subtypes. Most CRISPR-Cas systems have Cas1 protein. Many prokaryotes have multiple CRISPR-Cas systems at the same time, indicating that these CRISPR-Cas systems can coexist with each other and possibly share certain components.

Cas9 is one of the most representative Cas proteins first found in streptococcus pyogenes, and it belongs to group 2 family subtype II. spCas9, derived from Streptococcus pyogenes (Streptococcus pyogenes), is currently the most commonly used Cas 9. Cas9 is a DNA endonuclease co-activated by a small crRNA molecule complementary to the target DNA sequence, a separate trans-activation CRISPR RNA (tracrRNA). The crRNA consists of a Direct Repeat (DR) sequence responsible for binding the protein to the crRNA and a spacer sequence. The spacer sequence can be artificially designed to form a complementary sequence to any nucleic acid target sequence. In this way, the CRISPR system can target DNA or RNA targets by artificially designing the spacer sequence of the crRNA. The crRNA is fused to the tracrRNA to form a single guide rna (sgrna) to function better. The sgRNA binds to Cas9, binds to its target DNA, and directs Cas9 to cleave the target DNA. The corresponding Cas9 effector protein has also been found in other species, as Cas9 in the streptococcus thermophilus (s. thermophilus) CRISPR system, and its use is also similar. These CRISPR/Cas9 systems described above are widely found in many eukaryotes, including bakers yeast (Saccharomyces cerevisiae), the conditionally pathogenic pathogen Candida albicans (Candida albicans), zebrafish (Danio rerio), Drosophila (Drosophila melanogaster), ants (harpegthos saltator, ocerea bilo), mosquitoes (Aedes aegypti), nematodes (Caenorhabditis elegans), plants, mice, monkeys and human embryos.

Another Cas effector protein recently discovered is Cas12a (formerly known as Cpf 1). Cas12a and C2C1 and C2C3 all belong to class 2V-type Cas proteins, which lack HNH nuclease but have RuvC nuclease activity. Cas12a was originally discovered in new sporozoite of Francisella (Francisella novicida), also known as Cpf1, whose original name reflects its CRISPR-Cas subtype ubiquitous in the Prevotella and Francisella lineages. Cas12a shows several major differences from Cas9, including: cas12a nicks double-stranded DNA to produce cohesive ends, whereas Cas9 nicks double-stranded DNA to produce blunt ends, which is associated with a "T-rich" PAM sequence, and only CRISPR RNA (crRNA) is required for successful targeting without tracrRNA. Ca the small crRNA of Cas12a is more suitable for multiplex genome editing than the gRNA of Cas 9. Furthermore, the cohesive ends of the 5' overhang left by Cas12a can be used for DNA ligation, which is more target specific than traditional restriction enzyme cloning. Finally, the cleavage site of Cas12a is 18-23 base pairs downstream of its PAM site, which means that after Double Strand Break (DSB) is generated, DNA repair to the NHEJ pathway is not followed by disruption of the nuclease recognition sequence, so Cas12a is able to perform multiple rounds of DNA cleavage, unlike Cas9 which is only able to perform one cleavage. The Cas9 cleavage sequence is only at base pair 3 upstream of the PAM site, and the NHEJ pathway often results in indel mutations that disrupt the recognition sequence, thus making multiple rounds of cleavage impossible. Theoretically, repeated rounds of DNA cleavage can increase the chances that the desired genome editing will occur.

And several class 2 type VI Cas proteins identified in the near past, including Cas13 (also known as C2C2), Cas13b, Cas13C, Cas13d, are all RNA-guided rnases (i.e., these Cas proteins recognize target RNA sequences using crRNA rather than target DNA sequences as Cas9 and Cas12 a). Overall, the CRISPR/Cas13 system can achieve higher RNA degradation compared to traditional RNAi and CRISPRi techniques, while showing less off-target cleavage compared to RNAi.

One of the drawbacks of these Cas13 proteins identified to date is their relatively large size. Cas13a, Cas13b, Cas13c all have more than 1100 amino acid residues, making it difficult to package their coding sequences (about 3.3kb) with sgrnas, plus any required promoter sequences and translational regulatory sequences, into certain small-volume gene therapy vectors. For example, the currently most safe adeno-associated virus (AAV) -based gene therapy vectors have a packaging capacity of only about 4.7 kb. Although the smallest Cas13 protein currently found, Cas13d, is only about 920 amino acids (i.e., about 2.8kb of coding sequence) and can theoretically be packaged into AAV vectors, it has limited utility in single base editing gene therapy, as such therapy requires reliance on Cas13 d-based fusion proteins with single base editing function, such as dCas13d-ADAR2DD (whose coding sequence is about 3.9 kb).

Furthermore, the currently known Cas13 proteins/systems all have non-specific rnase activity when activated by crRNA-based target sequence recognition. This activity was shown to be particularly strong in Cas13a and Cas13b, and could also be detected in Cas13 d. Although this property can be advantageously used in nucleic acid detection methods, the non-specific rnase activity of these Cas13 proteins constitutes a great potential risk for their use in gene therapy.

Disclosure of Invention

The invention provides a Clustered Regularly Interspaced Short Palindromic Repeats (CRISPR) -Cas complex comprising: (1) an RNA guide sequence comprising a spacer sequence capable of hybridizing to the target RNA and a Direct Repeat (DR) sequence 3' to the spacer sequence; (2) a clustered regularly spaced short palindromic repeat-associated protein (Cas) having the amino acid sequence of SEQ ID NO:1-7, or a derivative or functional fragment of the Cas protein; the Cas protein, the derivative of the Cas protein, the functional fragment of the Cas protein are capable of: (i) binding RNA to a guide sequence; (ii) target RNA is targeted, in which the spacer sequence is not 100% complementary to the naturally occurring phage nucleic acid.

In some embodiments, the DR sequence has a sequence identical to SEQ ID NO: 8-14, and a secondary structure that is substantially identical to the secondary structure of any one of claims 8-14.

In some embodiments, the DR sequence consists of SEQ ID NO: any one of codes 8-14.

In some embodiments, the target RNA is encoded by eukaryotic DNA.

In some embodiments, the eukaryotic DNA is non-human mammalian DNA, non-human primate DNA, human DNA, plant DNA, insect DNA, avian DNA, reptile DNA, rodent DNA, fish DNA, helminth/nematode DNA, yeast DNA.

In some embodiments, the target RNA is mRNA.

In some embodiments, the spacer sequence is between 15-55 nucleotides, or between 25-35 nucleotides, or between about 30 nucleotides.

In some embodiments, the spacer sequence is 90-100% complementary to the target RNA.

In some embodiments, the derivative comprises SEQ ID NO:1-7, or a conservative amino acid substitution of one or more residues thereof.

In some embodiments, the derivatives comprise only conservative amino acid substitutions. .

In some embodiments, the derivative has a sequence in the HEPN domain or RXXXXH motif that is identical to SEQ ID NO:1-7, and a wild-type Cas.

In some embodiments, the derivative is capable of binding to an RNA guide sequence that has hybridized to a target RNA, but does not have RNase catalytic activity due to mutation of the RNase catalytic site of the Cas.

In some embodiments, the derivative has no more than 210 residues deleted from the N-terminus, and/or no more than 180 residues deleted from the C-terminus.

In some embodiments, the derivative is deleted for about 180 residues from the N-terminus, and/or about 150 residues from the C-terminus.

In some embodiments, the derivative further comprises an RNA base editing domain.

In some embodiments, the RNA base-editing domain is an adenosine deaminase, e.g., a double-stranded RNA-specific adenosine deaminase (e.g., ADAR1 or ADAR 2); apolipoprotein B mRNA editing enzyme; catalytic polypeptide-like (APOBEC); or activation-induced cytidine deaminase (AID).

In some embodiments, the ADAR has the E488Q/T375G double mutation, or ADAR2 DD.

In some embodiments, the base-editing domain is further fused to an RNA-binding domain, such as MS 2.

In some embodiments, the derivative further comprises an RNA methyltransferase or RNA demethylase, an RNA splicing modifier, a localization factor, a translation modifier.

In some embodiments, the Cas, derivative or functional fragment comprises a nuclear localization signal sequence (NLS) or a Nuclear Export Signal (NES).

In some embodiments, the targeting of the target RNA results in modification of the target RNA.

In some embodiments, the modification of the target RNA is cleavage of the target RNA.

In some embodiments, the modification of the target RNA is the deamination of adenosine (a) to inosine (I).

In some embodiments, the CRISPR-Cas complex of the invention further comprises a target RNA comprising a sequence capable of hybridizing to a spacer sequence.

Another aspect of the invention provides a fusion protein comprising (1) a Cas, a derivative thereof, or a functional fragment thereof of the invention, and (2) a heterologous functional domain.

In some embodiments, the heterologous functional domain comprises: a Nuclear Localization Signal (NLS), a reporter protein or detection tag (e.g., GST, HRP, CAT, GFP, HcRed, DsRed, CFP, YFP, BFP), a localization signal, a protein targeting moiety, a DNA binding domain (e.g., MBP, Lex A DBD, Gal4DBD), an epitope tag (e.g., His, myc, V5, FLAG, HA, VSV-G, Trx, etc.), a transcriptional activation domain (e.g., VP64 or VPR), a transcriptional repression domain (e.g., KRAB or SID moiety), a nuclease (e.g., FokI), a deamination domain (e.g., ADAR1, ADAR2, APOBEC, AID or TAD), a methylase, a demethylase, a transcriptional release factor, an HDAC, a dsRNA, a polypeptide having ssRNA cleaving activity, a polypeptide having ssDNA cleaving activity, a polypeptide having dsDNA cleaving activity, a DNA ligase or RNA ligase, or any combination thereof.

In some embodiments, the heterologous functional domain is fused at the N-terminus, C-terminus, or within the fusion protein.

Another aspect of the invention provides a conjugate comprising (1) a Cas, a derivative thereof, or a functional fragment thereof of the invention conjugated to (2) a heterologous functional domain active portion.

In some embodiments, the heterologous functional domain active portion comprises: a Nuclear Localization Signal (NLS), a reporter protein or detection tag (e.g., GST, HRP, CAT, GFP, HcRed, DsRed, CFP, YFP, BFP), a localization signal, a protein targeting moiety, a DNA binding domain (e.g., MBP, Lex A DBD, Gal4DBD), an epitope tag (e.g., His, myc, V5, FLAG, HA, VSV-G, Trx, etc.), a transcriptional activation domain (e.g., VP64 or VPR), a transcriptional repression domain (e.g., KRAB or SID effector region), a nuclease (e.g., FokI), a deamination domain (e.g., ADAR1, ADAR2, APOBEC, AID or TAD), a methylase, a demethylase, a transcriptional release factor, an HDAC, a polypeptide having ssRNA cleaving activity, a polypeptide having ssDNA cleaving activity, a polypeptide having dsDNA cleaving activity, a DNA-ligase or RNA, or any combination thereof.

In some embodiments, the heterologous functional moiety is conjugated at the N-terminus, C-terminus, or internally with respect to the Cas, derivative thereof, or functional fragment thereof.

Another aspect of the invention provides a nucleic acid encoding SEQ ID NO:1-7 or a derivative thereof, a functional fragment thereof, a fusion protein thereof, with the proviso that said polynucleotide is not SEQ ID NO: 15-21.

In some embodiments, the polynucleotide is codon optimized for expression in a cell.

In some embodiments, the cell is a eukaryotic cell.

In another aspect of the invention, there is provided a non-naturally occurring polynucleotide comprising the nucleotide sequence of SEQ ID NO: 8-14, wherein said derivative (i) is substantially identical to SEQ ID NO: 8-14 with one or more (e.g., 1, 2, 3, 4, 5, 6, 7, 8, 9, or 10) nucleotide additions, deletions, or substitutions; (ii) and SEQ ID NO: 8-14 has at least 20%, 30%, 40%, 50%, 60%, 70%, 80%, 90%, 95%, or 97% sequence identity; (iii) under stringent conditions with SEQ ID NO: 8-14, or hybridizes to any one of (i) and (ii); or, (iv) as a complement to any one of (i) - (iii), with the proviso that the derivative is not SEQ ID NO: 8-14, and the derivative encodes or is itself an RNA that hybridizes to SEQ ID NO: any RNA encoded by 8-14 retains essentially the same secondary structure.

In some embodiments, the derivative is used as a DR sequence for any one of Cas, a derivative thereof, or a functional fragment thereof of the present invention.

In another aspect of the invention there is provided a vector comprising a polynucleotide of the invention.

In some embodiments, the polynucleotide may be operably linked to a promoter and any enhancer.

In some embodiments, the promoter is a constitutive promoter, an inducible promoter, a broad-spectrum expression promoter, or a tissue-specific promoter.

In some embodiments, the vector is a plasmid.

In some embodiments, the vector is a retroviral vector, a phage vector, an adenoviral vector, a Herpes Simplex Virus (HSV) vector, an AAV vector, or a lentiviral vector.

In some embodiments, the AAV vector is a recombinant AAV vector of the following serotypes: serotype AAV1, AAV2, AAV4, AAV5, AAV6, AAV7, AAVrh74, AAV8, AAV9, AAV10, AAV 11, AAV 12 or AAV 13.

Another aspect of the invention provides a delivery system comprising (1) one delivery vector, and (2) a CRISPR-Cas complex of the invention, a fusion protein of the invention, a conjugate of the invention, a polynucleotide of the invention, or a vector of the invention.

In some embodiments, the delivery vehicle is a nanoparticle, liposome, exosome, microvesicle, or gene-gun.

Another aspect of the invention provides a cell or progeny of such a cell comprising a CRISPR-Cas complex of the invention, a fusion protein of the invention, a conjugate of the invention, a polynucleotide of the invention or a vector of the invention.

In some embodiments, the cell or progeny thereof is a eukaryotic cell (e.g., a non-human mammalian cell, a human cell, or a plant cell) or a prokaryotic cell (e.g., a bacterial cell).

Another aspect of the invention provides a non-human multicellular eukaryote comprising the cell of the invention.

In another aspect of the invention, there is provided a non-human multicellular eukaryote comprising a cell of the invention.

In some embodiments, the non-human multicellular eukaryote is an animal model of a human genetic disease (e.g., a rodent or primate).

Another aspect of the invention provides a method of modifying a target RNA, the method comprising contacting the target RNA with a CRISPR-Cas complex of the invention, wherein the spacer sequence is complementary to at least 15 nucleotides of the target RNA; wherein the Cas, derivative or functional fragment binds to the RNA guide sequence to form a complex; wherein the complex binds to the target RNA; wherein upon binding of the complex to the target RNA, the Cas, derivative or functional fragment modifies the target RNA.

In some embodiments, the target RNA is modified in such a way as to be cleaved by the Cas protein.

In some embodiments, the target RNA is modified by deamination of a derivative comprising a double-stranded RNA-specific gonadal deaminase.

In some embodiments, the target RNA is mRNA, tRNA, rRNA, non-coding RNA, incrna, or nuclear RNA.

In some embodiments, the Cas, derivative, and functional fragment do not exhibit substantial (or detectable) accessory RNase activity when the complex binds to the target RNA.

In some embodiments, the target RNA is intracellular.

In some embodiments, the cell is a cancer cell.

In some embodiments, the cell is infected with an infectious agent.

In some embodiments, the infectious agent is a virus, prion, protozoan, fungus, or parasite.

In some embodiments, the CRISPR-Cas complex is encoded by a first polynucleotide and a second polynucleotide: the first polynucleotide is a polynucleotide encoding SEQ ID NO:1-7 or a derivative or functional fragment thereof; the second polynucleotide comprises SEQ ID NO: 8-14 and a sequence encoding a spacer RNA capable of binding to the target RNA, wherein the first and second polynucleotides are introduced into the cell.

In some embodiments, the first and second polynucleotides are introduced into the cell via the same vector.

In some embodiments, the method causes one or more of the following: (i) inducing cellular senescence in vitro or in vivo; (ii) cell cycle arrest in vitro or in vivo; (iii) inhibition of cell growth and/or inhibition of cell growth in vitro or in vivo; (iv) (ii) no in vitro or in vitro induction; (v) inducing apoptosis in vitro or in vitro; (vi) necrosis is induced in vitro or in vitro.

Another aspect of the invention provides a method of treating a disorder or disease in a subject in need thereof, the method comprising: administering to the subject a composition comprising a CRISPR-Cas complex of the invention or a polynucleotide encoding the complex; wherein the spacer sequence is complementary to at least 15 nucleotides in a target RNA associated with the disease or disorder; wherein the Cas protein, Cas protein derivative, or Cas protein functional fragment binds to the RNA guide sequence to form a complex; wherein the complex binds to the target RNA; and when the complex is combined with the target RNA, the Cas protein derivative or the Cas protein functional fragment cuts the target RNA, so that the aim of treating diseases or symptoms of a subject is fulfilled.

In some embodiments, the disorder or disease is cancer or an infectious disease.

In some embodiments, the cancer is wilms 'tumor, ewing's sarcoma, neuroendocrine tumor, glioblastoma, neuroblastoma, melanoma, skin cancer, breast cancer, colon cancer, rectal cancer, prostate cancer, liver cancer, kidney cancer, pancreatic cancer, lung cancer, biliary tract cancer, cervical cancer, endometrial cancer, esophageal cancer, stomach cancer, head and neck cancer, medullary thyroid cancer, ovarian cancer, glioma, lymphoma, leukemia, myeloma, acute lymphocytic leukemia, acute myelogenous leukemia, chronic lymphocytic leukemia, chronic myelogenous leukemia, hodgkin's lymphoma, non-hodgkin's lymphoma, or bladder cancer.

In some embodiments, the method is an in vitro method, an in vivo method, or an ex vivo method.

Another aspect of the invention provides a cell or progeny thereof obtained by the methods of the invention, wherein the cell and progeny comprise a non-naturally occurring modification (e.g., a non-naturally occurring modification in the transcribed RNA of the cell)/progeny).

In another aspect of the invention, there is provided a method of detecting the presence of a target RNA, the method comprising contacting the target RNA with a composition comprising a fusion protein of the invention, or a conjugate of the invention, or a polynucleotide encoding the fusion protein, wherein the fusion protein or conjugate comprises a detectable label (e.g., a label detectable by fluorescence, northern blot, or FISH) and a complex spacer sequence capable of binding to the target RNA.

Another aspect of the invention provides a eukaryotic cell comprising a Clustered Regularly Interspaced Short Palindromic Repeats (CRISPR) -Cas complex comprising: (1) an RNA guide sequence comprising a spacer sequence capable of hybridizing to a target RNA and a Direct Repeat (DR) sequence 3' of the spacer sequence; (2) a CRISPR-associated protein (Cas) having the amino acid sequence of SEQ ID NO:1-7, or a derivative or functional fragment of the Cas; wherein the Cas, derivatives and functional fragments of Cas are capable of (i) binding to an RNA guide sequence, and (ii) targeting a target RNA.

Any one of the embodiments of the invention described herein, including embodiments described only in the examples or claims or described only in one of the following aspects/sections, should be understood as follows: the embodiments described may be combined with any other embodiment or embodiments of the invention unless explicitly disclaimed or otherwise inappropriate in combination.

Drawings

Fig. 1 is a schematic (not to scale) of genomic loci of representative members of Cas13e and Cas13f families. The schematic shows the Cas coding sequence (one-tipped long strip), followed by multiple sets of adjacent Direct Repeats (DR) (short strips) and spacer sequences (diamonds).

Figure 2 shows the predicted secondary structures of DR sequences associated with the respective Cas13e and Cas13f proteins. Their coding sequences are represented from left to right by SEQ ID NOs: 8-14.

Fig. 3 shows the newly discovered Cas13e and Cas13f effector proteins of the present invention, as well as previously discovered effector protein phylogenetic trees of related Cas13a, Cas13b, Cas13c, and Cas13 d.

Figure 4 shows the domain structure of Cas13a-Cas13f protein, showing the overall size and location of the two RXXXXH motifs on each representative member of the Cas protein.

Figure 5 shows the predicted 3D structure of the Cas13e.1 effector protein.

Figure 6 is a schematic representation of three plasmids encoding (1) Cas13e effector protein, (2) guide RNA (grna) coding sequences that can generate a guide RNA complementary to mCherry mRNA that can form a complex with Cas13e effector protein, and (3) the mCherry reporter gene can be transfected into cells to express the respective gene products, resulting in degradation of the reporter mCherry mRNA, respectively.

Figure 7 shows that a guide RNA complementary to mCherry mRNA can knock down mCherry mRNA, with reduced mCherry expression observed under a fluorescent microscope. As a negative control, non-targeting (NT) guide RNAs that do not hybridize/bind to mCherry mRNA failed to knock down mCherry expression.

Figure 8 shows that the knockdown of mCherry expression in the figure 6 experiment was up to about 75%.

Fig. 9 shows that Cas13e is utilizing a guide RNA with a DR sequence at the 3 'end (compared to a guide RNA with a DR sequence at the 5' end).

Figure 10 shows the correlation between spacer sequence length and specific (guide RNA dependent) RNase activity on target RNA relative to non-target (NT) controls.

FIG. 11 shows the correlation between spacer length and nonspecific (non-guide RNA-dependent) RNase activity on target RNA relative to non-target (NT) control.

FIG. 12 shows that the dCas13e.1-ADAR2DD fusion has RNA base editing activity. Specifically, there are three plasmids encoding (1) a dCas13e protein (RNase dead) fused to a single base RNA editor ADAR2DD, (2) a guide RNA (grna) coding sequence that can produce a guide RNA complementary to a mutant mCherry mRNA that has a G-to-a point mutation and can form a complex with dCas13e effector protein, and (3) a mutant mCherry reporter gene encoding the mCherry mRNA having a G-to-a point mutation, each transfected into a cell to express its respective gene product. Due to point mutations, mutated mCherry mRNA is generally unable to produce fluorescent mCherry protein. After the guide RNA was bound to the mutant mCherry mRNA, the fused ADAR2DD base editor converted a to I (G equivalent), restoring the ability of the mRNA to encode the fluorescent mCherry protein.

Figure 13 shows that successful RNA base editing restored mCherry expression. In the experiment of fig. 12, a plasmid encoding mutant mCherry (mCherry @) alone failed to express fluorescent mCherry. The plasmid encoding dCas13e-ADAR2DD base editor alone also failed to express fluorescent mCherry. Plasmids encoding gRNA-1 or gRNA-2 alone (also expressing GFP reporter) also failed to express fluorescent mCherry, although GFP expression was significant. However, when all three plasmids were transfected into the same cell, significant fluorescent mCherry expression was observed (GFP reporter was also expressed simultaneously).

FIG. 14 shows a related fragment of the mutant mCherry gene with the early stop codon TAG; and a sequence of the two gRNAs that is capable of complexing with dCas13e-ADAR2DD RNA base editor; and a "corrected" TGG codon.

FIG. 15 is a schematic diagram (not drawn to scale) showing the process of fusion of a C-terminally truncated version of dCas13e.1 protein with an ADAR2DD RNA base editor (shown as "ADAR 2").

Figure 16 shows the percentage results of transforming mCherry mutants back to wild-type mCherry for the series of C-terminal truncation mutants of dcas13e.1 and the RNA base editor constructed by ADAR2 in figure 15.

FIG. 17 is a schematic diagram (not drawn to scale) showing the process of fusion of a series of C-terminally and optionally N-terminally truncated versions of dCas13e.1 protein with the ADAR2DD RNA base editor.

Figure 18 shows the percentage results of transforming mCherry mutants back to wild-type mCherry for some of the C-and N-terminal truncated versions of Cas13e.1 in figure 17 with the RNA base editor constructed with ADAR2 DD.

Figure 19 shows a series of plasmids encoding: cas13a, Cas13b, Cas13d, cas13e.1, cas13f.1, and mCherry reporter genes, and gRNA coding sequences or non-targeted grnas targeting ANXA4 were shown as controls.

Fig. 20 shows the efficient knock down of ANXA4 expression by cas13e.1, cas13f.1, Cas13a, Cas13 d.

Detailed Description

SUMMARY

The invention described herein provides novel class 2 type VI Cas effector proteins, sometimes referred to herein as Cas13e and Cas13f. The novel Cas13 protein of the invention is much smaller than the previously discovered Cas13 effector protein (Cas13a-Cas13d), and its crRNA coding sequence can be easily packaged into small volume gene therapy vectors, such as AAV vectors. Furthermore, the Cas13e and Cas13f effector proteins were newly discovered to be more effective in knocking out RNA target sequences and to be more efficient in RNA single base editing, while exhibiting negligible non-specific rnase activity after activation by crRNA-based target recognition, and in addition its spacer sequence length is within a narrow (e.g. about 30 nucleotides) range, compared to Cas13a, Cas13b and Cas13d effector proteins. Thus, the novel Cas proteins are well suited for gene therapy.

Thus, in a first aspect, the invention provides Cas13e and Cas13f, e.g., having the amino acid sequence of SEQ ID NO:1-7, or an ortholog, homolog, derivative (described below), functional fragment thereof (described below), wherein said ortholog, homolog, derivative, functional fragment retains the amino acid sequence of SEQ ID NO:1-7, or a pharmaceutically acceptable salt thereof. Such functions include, but are not limited to: the ability to bind to a guide RNA/crRNA (described below) in the present invention to form a complex, rnase activity, and the ability to bind to and cleave a target RNA at a specific site under the guide of the crRNA that is at least partially complementary to the target RNA.

In some embodiments, the Cas13e or Cas13f effector protein of the invention may be: (i) SEQ ID NO: 1-7; (ii) a derivative comprising one or more of SEQ ID NOs: 1-7 (e.g., 1, 2, 3, 4, 5, 6, 7, 8, 9, or 10 residues) is added, deleted, and/or substituted (e.g., conservative substitutions); (iii) a derivative having at least about the same amino acid sequence as SEQ ID NO:1-7, 80%, 85%, 90%, 91%, 92%, 93%, 94%, 95%, 96%, 97%, 98%, or 99.

In some embodiments, Cas13e and Cas13f effector proteins, orthologs, homologs, derivatives, and functional fragments thereof are not naturally occurring, e.g., there may be at least one amino acid difference compared to the naturally occurring sequence.

In a related aspect, the invention provides addition derivatives Cas13e and Cas13f effector proteins that are based on the amino acid sequences of SEQ ID NOs: 1-7 or the above orthologs, homologs, derivatives and functional fragments thereof, comprising another covalently or non-covalently linked protein or polypeptide or other molecule (e.g., a detection reagent or drug/chemical moiety). Such other proteins/polypeptides/other molecules may be linked, for example, by chemical coupling, gene fusion, or other non-covalent linkage (e.g., biotin-streptavidin binding). Such derivative proteins do not affect the function of the original protein, such as the ability to bind to form a complex with the guide RNA/crRNA (described below) of the present invention, rnase activity, and the ability to bind to and cleave the target RNA at a specific site under the guide of the crRNA that is at least partially complementary to the target RNA.

Such derivations can be used, for example, to add a nuclear localization signal (NLS, e.g., SV40 large T antigen NLS), enhancing the ability of the subject Cas13e and Cas13f effector proteins to enter the nucleus. Such derivations can also be used to add targeting molecules or targeting moieties that direct the subject Cas13e and Cas13f effector proteins to specific cellular or subcellular locations. Such derivations can also be used to add detectable labels, facilitating detection, monitoring, or purification of the subject Cas13e and Cas13f effector proteins. Such derivations can also be used to add deaminase moieties (e.g., moieties containing adenine or cytosine deamination activity) to facilitate the editing of RNA bases.

The derivatization can be by adding the above-described addition moieties to the N-or C-terminus of the subject Cas13e and Cas13f effector proteins, or to the interior (e.g., internal fusion or through internal amino acid side chain bonding).

In a second related aspect, the invention provides conjugates of the subject Cas13e and Cas13f effector proteins, which are based on the amino acid sequences set forth in SEQ ID NOs: 1-7 or orthologs, homologs, derivatives and functional fragments thereof as described above, conjugated to such moieties as other proteins or polypeptides, detectable labels or combinations thereof. Such conjugate moieties may include, but are not limited to: localization signals, reporter genes (e.g., GST, HRP, CAT, GFP, HcRed, DsRed, CFP, YFP, BFP), labels (e.g., fluorescent dyes such as FITC or DAPI), NLS, targeting moieties, DNA binding domains (e.g., MBP, Lex a DBD, Gal4DBD), epitope tags (e.g., Hismyc, V5, FLAG, HA, VSV-G, Trx, etc.), transcriptional activation domains (e.g., VP64 or VPR), transcriptional repression domains (e.g., KRAB or SID moieties), nucleases (e.g., FokI), deamination domains (e.g., ADAR1, ADAR2, APOBEC, AID or TAD), methylase, demethylase, transcriptional release factor, HDAC, ssRNA cleavage activity, dsRNA cleavage activity, ssDNA cleavage activity, dsDNA or RNA ligase, combinations of any of the above, and the like.

For example, the conjugate can include one or more NLS, which can be near the N-terminus, C-terminus, interior, or a few thereof. The linkage may be achieved by amino acid (e.g., D or E, or S or T), amino acid derivatives (e.g., Ahx, β -Ala, GABA or Ava), or PEG linkage.

In some embodiments, conjugation does not affect the function of the original protein, such as the ability to bind to form a complex with a guide RNA/crRNA (described below) of the present invention, rnase activity, and the ability to bind to and cleave a target RNA at a specific site under a crRNA guide that is at least partially complementary to the target RNA.

In a related third aspect, the invention provides fusions of the subject Cas13e and Cas13f effector proteins that are based on the amino acid sequences of SEQ ID NOs: 1-7 or any of the above orthologs, homologs, derivatives and functional fragments thereof fused to: such as localization signals, reporter genes (e.g., GST, HRP, CAT, GFP, HcRed, DsRed, CFP, YFP, BFP), NLS, protein targeting moieties, DNA binding domains (e.g., MBP, Lex A DBD, Gal4DBD), epitope tags (e.g., His, myc, V5, FLAG, HA, VSV-G, Trx, etc.), transcription activation domains (e.g., VP64 or VPR), transcription inhibition domains (e.g., KRAB or SID moieties), nucleases (e.g., FokI), deamination domains (e.g., ADAR1, ADAR2, APOBEC, AID, or TAD), methylase, demethylase, transcription release factor, HDAC, ssRNA cleavage activity, dsRNA cleavage activity, ssDNA cleavage activity, dsDNA cleavage activity, DNA or RNA ligase, any combination thereof, and the like

For example, the fusion can include one or more NLS, which can be N-terminal, C-terminal, internal or near. In some embodiments, conjugation does not affect the function of the original protein, such as the ability to bind to form a complex with a guide RNA/crRNA (described below) of the present invention, rnase activity, and the ability to bind to and cleave a target RNA at a specific site under a crRNA guide that is at least partially complementary to the target RNA.

In a fourth aspect, the present invention provides an isolated polynucleotide comprising: (i) SEQ ID NO: any one of 8-14; (ii) a polynucleotide that hybridizes to SEQ ID NO: 8-14 with 1, 2, 3, 4, or 5 nucleotide deletions, additions, and/or substitutions as compared to any one of; (iii) and SEQ ID NO: 8-14, a polynucleotide having at least 80%, 85%, 90%, 95% sequence identity; (iv) (iv) a polynucleotide which hybridizes to any one of the polynucleotides of (i) - (iii) or the complement thereof under stringent conditions; (v) (iv) the complement of any of the polynucleotides of (i) - (iii).

(ii) Any polynucleotide of (iv) retains the original SEQ ID NO: 8-14, which function is to encode a Direct Repeat (DR) sequence of the crRNA in the subject Cas13e or Cas13f system.

As used herein, "direct repeats" may refer to DNA coding sequences in the CRISPR locus, or to RNA encoded thereby in crRNA. Thus, in the context of RNA molecules, if reference is made to SEQ ID NO: 8-14 (e.g., crRNA), each T should be understood to represent a U.

Thus, in some embodiments, the isolated polynucleotide is DNA that encodes the DR sequence of the subject Cas13e and Cas13f system crRNA.

In some other embodiments, the isolated polynucleotide is an RNA that is the DR sequence of the subject Cas13e and Cas13f system crRNA.

In a fifth aspect, the present invention provides a composite comprising: (i) a protein composition, which may be a subject Cas13e or Cas13f effector protein, or any of an ortholog, homolog, derivative, conjugate, functional fragment thereof, conjugate thereof, fusion thereof; (ii) a polynucleotide composition comprising an isolated polynucleotide (e.g., a DR sequence) according to the fourth aspect of the invention, a spacer sequence complementary to at least a portion of a target RNA. In some embodiments, the DR sequence is located at the 3' end of the spacer sequence.

In some embodiments, the polynucleotide composition is a guide RNA/crRNA of the subject Cas13e or Cas13f system, which does not comprise a tracrRNA.

In some embodiments, for use with Cas13e and Cas13f effector proteins, homologs, orthologs, derivatives, fusions, conjugates, or functional fragments thereof that have RNase activity, the spacer sequence is at least about 10 nucleotides, or between 10-60, 15-50, 20-50, 25-40, 25-50, or 19-50 nucleotides. In some embodiments, for use with Cas13e and Cas13f effector proteins, homologs, orthologs, derivatives, fusions, conjugates, or functional fragments thereof that do not have RNase activity, but are capable of binding to a guide RNA, a target RNA complementary to a guide RNA, the spacer sequence is at least about 10 nucleotides, or between about 10-200, 15-180, 20-150, 25-125, 30-110, 35-100, 40-80, 45-60, 50-55, or about 50 nucleotides.

In some embodiments, the DR sequence is between 15-36, 20-36, 22-36, or about 36 nucleotides. In some embodiments, the DR sequence in the guide RNA has a sequence identical to SEQ ID NO: 8-14 (including stem, bulge, and loop structures).

In some embodiments, the guide RNA is about 36 nucleotides longer than any of the spacer sequences described above, e.g., between 45-96, 55-86, 60-86, 62-86, or 63-86 nucleotides in length.

In a sixth aspect, the present invention provides an isolated polynucleotide comprising: (i) a polynucleotide encoding the polypeptide of SEQ ID NO:1-7, or an ortholog, homolog, derivative, functional fragment, fusion thereof, of any of Cas13e or Cas13f effector proteins; and (ii) SEQ ID NO: 8-14; or (iii) a polynucleotide comprising (i) and (ii).

In some embodiments, the polynucleotide is not naturally occurring/naturally occurring, such as a polynucleotide other than SEQ ID NO: 15-21 external

In some embodiments, the polynucleotide is codon optimized for expression in a prokaryote. In some embodiments, the polynucleotide is codon optimized for expression in a eukaryote, such as a human or human cell.

In a seventh aspect, the invention provides a vector comprising or encompassing any of the polynucleotides of the sixth aspect. The vector may be a cloning vector or an expression vector. The vector may be a plasmid, phagemid or cosmid, to name a few. In some embodiments, the vector may be used to express a polynucleotide: in a mammalian cell, e.g., a human cell, the nucleic acid sequence of SEQ ID NO:1-7, or an ortholog, homolog, derivative, functional protein fragment, fusion thereof, or in a Cas13e or Cas13f effector protein; or any of the polynucleotides of the fourth aspect; or any complex described in the fifth aspect.

In an eighth aspect, the present invention provides a host cell comprising a polynucleotide according to the fourth or sixth aspect of the present invention, and/or a vector according to the seventh aspect. The host cell may be a prokaryote, such as E.coli, or a cell from a eukaryote, such as yeast, insect, plant, animal (e.g., mammals such as humans and mice). The host cell may be an isolated primary cell (e.g. a bone marrow cell for ex vivo therapy) or an established cell line, e.g. a tumor cell line, 293T cell or stem cell, iPC, etc.

In a related aspect, the invention provides a eukaryotic cell comprising a Clustered Regularly Interspaced Short Palindromic Repeats (CRISPR) -Cas complex comprising: (1) an RNA guide sequence comprising a spacer sequence capable of hybridizing to the target RNA, and a 3' Direct Repeat (DR) of the spacer sequence; (2) a CRISPR-associated protein (Cas) having the amino acid sequence of SEQ ID NO:1-7, or a derivative or functional fragment of said Cas; wherein the Cas, derivatives and functional fragments of Cas are capable of (i) binding to an RNA guide sequence and (ii) targeting a target RNA.

In a ninth aspect, the present invention provides a composition comprising: (i) a first (protein) composition selected from the group consisting of SEQ ID NO:1-7 of any one of Cas13e or Cas13f effector proteins, or orthologs, homologs, derivatives, conjugates, functional fragments, fusions thereof; (ii) a second (nucleotide) composition comprising an RNA comprising a guide RNA/crRNA, in particular a spacer sequence or the coding sequence of said RNA. The guide RNA may comprise a DR sequence, and a spacer sequence that may be complementary or hybridized to the target RNA. The guide RNA may form a complex with the first (protein) composition in (i). In some embodiments, the DR sequence may be a polynucleotide according to the fourth aspect of the invention. In some embodiments, the DR sequence may be located at the 3' end of the guide RNA. In some embodiments, the compositions (e.g., the compositions of (i) and/or (ii)) are non-naturally occurring or are engineered from naturally occurring compositions. In some embodiments, at least one component of the composition is non-naturally occurring or modified from a naturally occurring component of the composition. In some embodiments, the target sequence is an RNA from a prokaryote or eukaryote, e.g., a non-naturally occurring RNA. The target RNA may be present inside the cell, e.g., in the cytoplasm or inside an organelle. In some embodiments, the protein composition may have one NLS, which may be located at its N-terminus, C-terminus, or interior.

In a tenth aspect, the present invention provides a composition comprising one or more vectors of the seventh aspect of the invention, said one or more vectors comprising: (i) a first polynucleotide encoding the polypeptide of SEQ ID NO:1-7, or an ortholog, homolog, derivative, functional fragment, fusion of any of Cas13e or Cas13f effector proteins; is operably linked to the first stage regulatory element; (ii) a second polynucleotide encoding a guide RNA of the invention operably linked to a second regulatory element. The first and second polynucleotides may be on different vectors or on the same vector. The guide RNA may form a complex with the protein product encoded by the first polynucleotide and comprise a DR sequence (as any of the DR sequences described in the fourth aspect), a spacer sequence that may bind/complement to the target RNA. In some embodiments, the first regulatory element is a promoter, e.g., an inducible promoter. In some embodiments, the second regulatory element is a promoter, such as an inducible promoter. In some embodiments, the compositions (e.g., (i) and/or (ii)) are non-naturally occurring or are modified from a naturally occurring composition. In some embodiments, at least one component of the composition is non-naturally occurring or modified by a naturally occurring component of the composition. In some embodiments, the target sequence is an RNA from a prokaryote or eukaryote, e.g., a non-naturally occurring RNA. The target RNA may be present inside the cell, e.g., in the cytoplasm or inside an organelle. In some embodiments, the protein composition may have one NLS, which may be located at its N-terminus, C-terminus, or interior.

In some embodiments, the vector is a plasmid. In some embodiments, the vector is a viral vector based on a retrovirus, a replication incompetent retrovirus, an adenovirus, a replication incompetent adenovirus, or an AAV. In some embodiments, the vector may be capable of autonomous replication in a host cell (e.g., having a bacterial origin of replication sequence). In some embodiments, the vector may integrate into the host genome and subsequently replicate. In some embodiments, the vector is a cloning vector. In some embodiments, the vector is an expression vector.

The present invention also provides a delivery composition for delivering any of the SEQ ID NOs: 1-7 of Cas13e or Cas13f effector proteins, or orthologs, homologs, derivatives, conjugates, functional fragments, fusions thereof; a polynucleotide according to the fourth and/or sixth aspect of the invention; a complex according to the fifth aspect of the invention; a vector according to the seventh aspect of the invention; a cell according to the eighth aspect of the present invention, and a composition according to the ninth and/or tenth aspect of the present invention. Delivery can be by any delivery method known in the art, such as transfection, lipofection, electroporation, gene gun, microinjection, sonication, calcium phosphate transfection, cationic transfection, viral vector delivery, and the like, by using a vehicle, such as liposomes, nanoparticles, exosomes, microvesicles, gene guns, or one or more viral vectors.

The invention also provides a kit comprising any one or more of: the amino acid sequence of SEQ ID NOs: 1-7, or an ortholog, homolog, derivative, conjugate, functional fragment, fusion thereof, of any of Cas13e or Cas13f effector proteins; a polynucleotide according to the fourth and/or sixth aspect of the invention; a complex according to the fifth aspect of the invention; a vector according to the seventh aspect of the invention; a cell according to the eighth aspect of the present invention, and a composition according to the ninth and/or tenth aspect of the present invention. In some embodiments, the kit may further include an instruction on how to use the components of the kit and/or how to obtain additional components from a third party for use with the components of the kit. Any of the components of the kit may be stored in any suitable container.

The foregoing is a general description of the invention, and the following detailed description of various aspects of the invention will be presented in conjunction with the accompanying drawings. However, the description of the present invention should be understood as follows: for simplicity and to reduce redundancy, certain embodiments of the invention are described in only one part, or only in the claims or examples. Therefore, it should also be understood as follows: any one embodiment of the invention, including embodiments described in only one aspect, part, or example only, may be combined with any other embodiment described in the present invention unless specifically indicated to the contrary or to the extent of the combined forms.

1. Novel class 2 type VI CRISPR RNA-directed RNases and derivatives thereof

In one aspect, the invention described herein provides two novel CRISPR type 2 class VI effector protein families that have two strictly conserved RX4-6h (rxxxxh) motifs characteristic of the higher eukaryotic and prokaryotic nucleotide binding (HEPN) domains. Similar CRISPR type 2 VI effector proteins containing two HEPN domains have been found previously, including for example CRISPR Cas13a (C2C2), Cas13b, Cas13C, Cas13 d.

The HEPN domain has been demonstrated as an RNase domain and has the ability to bind to and cleave target RNA molecules. The target RNA can be any suitable form of RNA, including but not limited to mRNA, tRNA, ribosomal RNA, non-coding RNA, incrna (long non-coding RNA), nuclear RNA. For example, in some embodiments, the Cas protein recognizes and cleaves an RNA target on the coding strand of an Open Reading Frame (ORF).

In one embodiment, the disclosure herein provides two families of CRISPR type 2 class VI effector proteins, generally referred to herein as CRISPR-Cas effector proteins type VI-E and VI-F, Cas13E or Cas13F. Comparing the CRISPR-Cas effector protein of VI-E, VI-F with effector proteins of other systems, the CRISPR-Cas effector protein of VI-E, VI-F was found to be significantly smaller (e.g., about 20% fewer amino acids), even smaller than the smallest VI-D/Cas13D effector protein previously found (see fig. 4), and had less than 30% sequence similarity in a one-to-one sequence alignment with the other effector proteins described above (see fig. 3), including the phylogenetically most closely related sibling Cas13 b.

These two newly discovered CRISPR type 2 class VI effector protein families are useful for a variety of applications, and are particularly suited for use in therapy, since these effector proteins are significantly smaller than other effector proteins (e.g. CRISPR Cas13a, Cas13b, Cas13c, Cas13d), and thus the nucleic acids encoding the effector proteins and their guide RNA coding sequences can be packaged into size-limited delivery systems (e.g. AAV vectors). Furthermore, when the Cas effector proteins are activated by specific RNase activity, the absence of detectable additional/non-specific RNase activity for spacer sequences over a range of lengths (e.g., over about 30 nucleotides, see fig. 11) makes these Cas effector proteins less susceptible to (or free from) the potential risk of digestion of broadly off-target RNAs in target cells that should not be destroyed. On the other hand, within some other specific spacer length (e.g. about 30 nucleotides), these Cas effector proteins present significant accessory RNase activity, and therefore the subject Cas effector proteins are also useful in applications requiring accessory RNase activity.

In bacteria, the VI-E and VI-F type CRISPR-Cas systems contain a single effector protein (approximately 775 and 790 residues in length, respectively) within the proximity of the CRISPR array (see fig. 1). The CRISPR array contains some Direct Repeats (DR) sequences, typically 36 nucleotides in length, which are generally well conserved in both sequence and secondary structure (see fig. 2).

The data presented herein indicate that the crRNA is processed from the 5 'end, and thus the DR sequence will eventually be located at the 3' end of the mature crRNA.

The spacer sequences comprised in Cas13e and Cas13f CRISPR arrays are most often 30 nucleotides in length, mostly between 29 and 30 nucleotides in length, but the length of the spacer sequences can be in a large range. For example, for use in a functional Cas13e or Cas13f effector protein or homolog, ortholog, derivative, fusion, conjugate or functional fragment thereof, the spacer may be between 10-60 nucleotides, 20-50 nucleotides, 25-45 nucleotides, 25-35 nucleotides, or about 27, 28, 29, 30, 31, 32 or 33 nucleotides in length. However, if used for any of the above versions of dCas, the spacer length may be between 10-200 nucleotides, 20-150 nucleotides, 25-100 nucleotides, 25-85 nucleotides, 35-75 nucleotides, 45-60 nucleotides, or about 46, 47, 48, 49, 50, 51, 52, 53, 54, or 55 nucleotides.

The sequences of CRISPR-Cas effector proteins of types VI-E and VI-F are provided in the table below.

In the above sequences, the two RX4-6H (RXXXH) motifs in each effector protein are indicated by double underlining. In Cas13e.1, there are two possibilities for the C-terminal sequence, since the RR and HH sequences flank the motif. Mutations made in one or both of these domains may result in RNase-inactivated versions (or "dCas") of Cas13e and Cas13f effector proteins, homologs, orthologs, fusions, conjugates, derivatives, or functional fragments thereof, while substantially retaining their ability to bind to a guide RNA, a target RNA complementary to the guide RNA.

The corresponding DR coding sequences for the Cas effector proteins are listed below:

Cas13e.1 GCTGGAGCAGCCCCCGATTTGTGGGGTGATTACAGC(SEQ ID NO:8)
Cas13e.2 GCTGAAGAAGCCTCCGATTTGAGAGGTGATTACAGC(SEQ ID NO:9)
Cas13f.1 GCTGTGATAGACCTCGATTTGTGGGGTAGTAACAGC(SEQ ID NO:10)
Cas13f.2 GCTGTGATAGACCTCGATTTGTGGGGTAGTAACAGC(SEQ ID NO:11)
Cas13f.3 GCTGTGATAGACCTCGATTTGTGGGGTAGTAACAGC(SEQ ID NO:12)
Cas13f.4 GCTGTGATGGGCCTCAATTTGTGGGGAAGTAACAGC(SEQ ID NO:13)
Cas13f.5 GCTGTGATAGGCCTCGATTTGTGGGGTAGTAACAGC(SEQ ID NO:14)

Since the secondary structures of the DR sequences,including the location and size of the step,bulge,and loop structures,are likely more important than the specific nucleotide sequences that form such secondary structures,alternative or derivative DR sequences can also be used in the systems and methods of the invention,so long as these derivative or alternative DR sequences have a secondary structure that substantially resembles the secondary structure of an RNA encoded by any one of SEQ ID NO:8-14.For example,the derivative DR sequence may have±1or 2base pair(s)in one or both stems(see FIG.2),have±1,2,or 3bases in either or both of the single strands in the bulge,and/or have±1,2,3,or 4bases in the loop region.

since the neck-loop secondary structure of DR sequences may be more important than the particular nucleotide sequence forming such secondary structure, secondary structures, substitutions or derivatives of these DR sequences may also be used in the systems and methods of the invention, provided that the secondary structure of these derivatives or DR sequence substitutions is similar to that of the sequences represented by SEQ ID NOs: the secondary structure of RNA encoded by any of 8-14 is substantially similar. For example, the derivative DR sequence may have ± 1 or 2 base pairs in one or both stem structures (see fig. 2), may have ± 1, 2 or 3 bases on one or both single strands of the bulge structure, and/or have ± 1, 2, 3 or 4 bases in the loop domain.

In some embodiments, the CRISPR-Cas effector proteins of types VI-E and VI-F comprise a "derivative" having an amino acid sequence that is identical to the amino acid sequence set forth in SEQ ID NO:1-7 (e.g., 81%, 82%, 83%, 84%, 85%, 86%, 87%, 88%, 89%, 90%, 91%, 92%, 93%, 94%, 95%, 96%, 97%, 98%, 99%, or 100%) of the amino acid sequence. Such derivative Cas effector proteins are compared to SEQ ID NO:1-7 has significant identity in protein sequence to any one of SEQ ID NOs: 1-7 (see below), as compared to a Cas comprising at least one SEQ ID NO: 8-14, the ability of the crRNA of the DR sequence to bind and form a complex. For example, the Cas13e.1 derivative may be compared to SEQ ID NO: 1. 2, 3, 4, 5, 6 or has 85% amino acid sequence identity while retaining the ability to bind to and form a complex with a crRNA having the amino acid sequence of SEQ ID NO: 8. 9, 10, 11, 12, 13 or 14.

In some embodiments, the derivative comprises a substitution with a conserved amino acid residue. In some embodiments, the derivative comprises only conservative amino acid residue substitutions (i.e., all amino acid substitutions in the derivative are conservative substitutions and there are no non-conservative substitutions).

In some embodiments, the derivative is represented in SEQ ID NO:1-7, no more than 1, 2, 3, 4, 5, 6, 7, 8, 9, or 10 amino acids are inserted or deleted from any of the wild-type sequences. The insertions and/or deletions may be grouped together or separated over the entire sequence as long as the wild-type sequence retains at least one function. Such functions may include the ability to bind to the guide RNA/crRNA, RNase activity, the ability to bind to and/or cleave a target RNA complementary to the guide RNA/crRNA. In some embodiments, no insertions and/or deletions are present in the RXXXXH motif, or are absent within 5, 10, 15, or 20 residues near the RXXXXH motif.

In some embodiments, the derivative retains the ability to bind to guide RNA/crRNA.

In some embodiments, the derivative retains RNase activity activated by the guide RNA/crRNA.

In some embodiments, the derivative retains the ability to bind to and/or cleave the target RNA in the presence of the bound guide RNA/crRNA, which is complementary to at least a portion of the sequence of the target RNA.

In other embodiments, the derivative has lost completely or partially RNase activity activated by the guide rRNA/crRNA for several reasons, such as mutation of one or more catalytic residues in the RNA guide RNase. Such derivatives are sometimes referred to as dCas, e.g., dCas13e.1, and the like.

T thus in some embodiments, the derivative may be modified to reduce nuclease/RNase activity, such as by at least 50%, 60%, 70%, 80%, 90%, 95%, 97% or 100% inactivation of the wild-type protein nuclease. Nuclease activity can be reduced by several methods known in the art, such as the introduction of mutations into the nuclease (catalytic) domain of a protein. In some embodiments, some catalytic residues for nuclease activity are found, and these acid residues may be substituted with different amino acid residues (e.g., glycine or alanine) to reduce nuclease activity. In some embodiments, the amino acid substitution is a conservative amino acid substitution. In some embodiments, the amino acid substitution is a non-conservative amino acid substitution.

In some embodiments, the modification contains one or more mutations (e.g., amino acid deletions, insertions, or substitutions) in at least one HEPN domain. In some embodiments, there are 1, 2, 3, 4, 5, 6, 7, 8, 9, or more amino acid substitutions in at least one HEPN domain. For example, in some embodiments, one or more of the mutations described above comprises a substitution at an amino acid residue (e.g., an alanine substitution) that is substantially identical to the amino acid sequence of SEQ ID NO: 1R 84, H89, R739, H744, R740, H745, or SEQ ID NO: 2, R97, H102, R770, H775, or SEQ ID NO:3, R77, H82, R764, H769, or SEQ ID NO: 4, R79, H84, R766A, H771, or SEQ ID NO: 5, R79, H84, R766, H771, or SEQ ID NO: 6, R89, H94, R773, H778, or SEQ ID NO: 7R 89, H94, R777, H782.

In some embodiments, the one or more mutations, or two or more mutations, can be in a catalytically active domain of an effector protein comprising a HEPN domain or a catalytically active domain homologous to a HEPN domain. In some embodiments, the effector protein comprises one or more of the following mutations: R84A, H89A, R739A, H744A, R740A, H745A (where the amino acid position corresponds to the amino acid position of cas13e.1). It will be appreciated by those skilled in the art that corresponding amino acid positions in the different Cas13e and Cas13f proteins may be mutated to the same effect. In some embodiments, the one or more mutations completely or partially abolish the catalytic activity (e.g., change in cleavage rate, change in specificity, etc.) of the protein.

Other examples of residue mutations (catalysis) include: R97A, H102A, R770A, H775A of cas13e.2, or R77A, H82A, R764A, H769A of cas13f.1, or R79A, H84A, R766A, H771A, or R79A, H84A, R766A, H771A of cas13f.3, or R89A, H94A, R773A, H778A of cas13f.4, or R89A, H94A, R777A, H782A of cas13f.5. In some embodiments, any of the R residues and/or H residues herein may be substituted with G, V or I instead of a.

The presence of at least one of the above mutations results in a derivative having a reduced or decreased RNase activity compared to the corresponding wild type protein lacking such mutation.

In some embodiments, the effector protein described herein is an "inactivated" effector protein, such as an inactivated Cas13e or Cas13f effector protein (i.e., dCas13e and dCas13 f). In some embodiments, the effector protein has one or more mutations in HEPN domain 1 (N-terminus). In some embodiments, the effector protein has one or more mutations in HEPN domain 2 (C-terminus). In some embodiments, the effector protein has one or more mutations in HEPN domain 1 and HEPN domain 2.

The inactivated Cas, or derivative or functional fragment thereof, may be fused or associated with one or more heterologous/functional domains (e.g., via a fusion protein, linker peptide, "GS" linker, etc.). These domains may have various activities, such as methylase activity, demethylase activity, transcriptional activation activity, transcriptional repression activity, transcriptional release factor activity, histone modification activity, RNA cleavage activity, DNA cleavage activity, nucleic acid binding activity, base editing activity, and switching activity (e.g., photoinduction). In some embodiments, the functional domain is a kruppel-related cassette (i.e., KRAB), SID (e.g., SID4X), VP64, VPR, VP16, Fok1, P65, HSF1, MyoD1, or an RNA-acting adenosine deaminase, such as ADAR1, ADAR2, APOBEC, cytidine deaminase (AID), TAD, small SOG, APEX, and biotin APEX.

In some embodiments, the functional domain is a basal editing domain, such as ADAR1 (including wild-type or ADAR1DD version thereof, with or without E1008Q), ADAR2 (including wild-type or ADAR2DD version thereof, with or without one or more E488Q mutations), APOBEC, or AID.

In some embodiments, the functional domain may comprise one or more Nuclear Localization Signal (NLS) domains. The one or more heterologous functional domains may comprise at least two or more NLS domains. The one or more NLS domains may be located at, near, or in the middle of an effector protein (e.g., Cas13e/Cas13f effector protein), and if there are two or more NLS, both may be located at, near, or adjacent to the effector protein (e.g., Cas13e/Cas13f effector protein).

In some embodiments, the at least one or more heterologous functional domains may be at or near the amino-terminus of the effector protein, and/or wherein the at least one or more heterologous functional domains are at or near the carboxy-terminus of the effector protein. The one or more heterologous domains may be fused to the effector protein. The one or more heterologous functional domains may be fused to the effector protein. The one or more heterologous functional domains may be linked to the effector protein by a linker moiety.

In some embodiments, there are multiple (e.g., 2, 3, 4, 5, 6, 7, 8, or more) functional domains that are the same or different.

In some embodiments, the functional domain (e.g., base editing domain) is further fused to a domain that binds to RNA (e.g., MS 2).

In some embodiments, the functional domain is associated with or fused to a linker sequence (e.g., a flexible linker sequence or a rigid linker sequence). The following table provides examples of linker sequences and functional domain sequences.

Amino acid sequences of motifs and functional domains in artificially engineered VI-E and VI-F CRISPR Cas effector protein variants

The position of the one or more functional domains on the inactivated Cas protein. Such a position may allow for the correct spatial orientation of the functional domain, thereby producing an incidental functional effect on the target. For example, if the functional domain is a transcriptional activator (e.g., VP16, VP64, or p65), the transcriptional activator is placed in a spatial orientation that can affect transcription of the target. Similarly, a localized transcriptional repressor can be placed in a position that affects transcription of the target, and a nuclease (e.g., Fok1) can be placed in a position that cleaves or partially cleaves the target. In some embodiments, the functional domain is N-terminal to Cas/dCas. In some embodiments, the functional domain is located C-terminal to the Cas/dCas. In some embodiments, the inactivated CRISPR-associated protein (dCas) is modified, the modified dCas comprising a first domain at the N-terminus and a second domain at the C-terminus.

Various examples and methods for the fusion of an inactivated CRISPR-associated protein with one or more functional domains have been described, for example, in international publication No. WO 2017/219027, which is hereby incorporated by reference in its entirety, especially with respect to the parts of the features described herein.

In some embodiments, the CRISPR-Cas effector proteins of types VI-E and VI-F comprise the amino acid sequences set forth in SEQ ID NOs: 1-7. In some embodiments, the CRISPR-Cas effector proteins of types VI-E and VI-F do not include the amino acid sequences set forth in SEQ ID NOs: 1-7.

In some embodiments, instead of using full-length wild-type effector proteins (SEQ ID NOS: 1-7) or derived VI-E and VI-F type Cas effector proteins, their "functional fragments" may be used.

A "functional fragment", as used herein, refers to SEQ ID NO:1-7, or a derivative thereof, which has a sequence less than the full-length sequence. The deletion residues in the functional fragment may be N-terminal, C-terminal and/or internal. The functional fragment retains the function of at least one wild-type VI-E or VI-F Cas, or the function of at least one derivative thereof. Functional segments are thus defined specifically for the function in question. For example, a functional fragment whose function is the ability to bind crRNA and target RNA, which may not be a functional fragment for RNase function, because the loss of RXXXXH motif across Cas may not affect its ability to bind crRNA and target RNA, but may destroy RNase activity.

In some embodiments, the sequence identical to the full-length sequence of SEQ ID NO:1-7, said CRISPR-Cas effector protein of type VI-E or VI-F or a derivative or functional fragment thereof lacks about 30, 60, 90, 120, 150 or about 180 residues at the N-terminus.

In some embodiments, the sequence identical to the full-length sequence of SEQ ID NO:1-7, said CRISPR-Cas effector protein of type VI-E or VI-F or a derivative or functional fragment thereof lacks about 30, 60, 90, 120 or about 150 residues at the C-terminus.

In some embodiments, the sequence identical to the full-length sequence of SEQ ID NO:1-7, the CRISPR-Cas effector protein of type VI-E or VI-F or a derivative or functional fragment thereof lacks about 30, 60, 90, 120, 150 or about 180 residues at the N-terminus and lacks about 30, 60, 90, 120 or about 150 residues at the C-terminus.

In some embodiments, the VI-E or VI-F type CRISPR-Cas effector protein or derivative or functional fragment thereof has RNase activity, e.g., specific RNase activity activated by guide RNA/crRNA.

In some embodiments, the VI-E or VI-F type CRISPR-Cas effector protein or derivative or functional fragment thereof has no substantial/detectable accessory RNase activity.

"additional RNase activity" herein refers to the non-specific RNase activity observed in some RNA guided class 2 type VI other RNases (e.g., Cas13 a). For example, a complex comprising Cas13a undergoes a conformational change upon activation by binding to a target nucleic acid (e.g., target RNA), which in turn causes the complex to act as a non-specific RNase, cleaving and/or degrading nearby RNA molecules (e.g., ssRNA or dsRNA molecules) (i.e., "side effects").

In some embodiments, a complex comprising (but not limited to) a cRNA and the VI-E or VI-F type CRISPR-Cas effector protein or derivative or functional fragment thereof, does not exhibit attendant RNase activity following target recognition. Such "non-attendant" embodiments may comprise wild-type or engineered/derived effector proteins, or functional fragments thereof.

In some embodiments, the VI-E or VI-F type CRISPR-Cas effector protein or derivative or functional fragment thereof recognizes and cleaves target RNA without additional requirements for a protospacer (i.e., protospacer adjacent motif "PAM" or protospacer flanking sequence "PFS" requirements).

The present disclosure also provides a cleaved form of a CRISPR-associated protein described herein (e.g., a CRISPR-Cas effector protein of type VI-E or VI-F). The cleaved form of the CRISPR-associated protein can facilitate delivery. In some embodiments, the CRISPR-associated protein is divided into two portions of an enzyme, which together substantially comprise the functional CRISPR-associated protein.

The cleavage can be accomplished in a manner that does not affect the catalytic domain. The CRISPR-associated protein can act as a nuclease or inactivated enzyme, which is essentially an RNA-binding protein with little or no catalytic activity (e.g., due to mutation of its catalytic domain). Cleavage enzymes are described, for example, in Wright et al, rational design of the Cas9 enzyme complex, Proc. nat' l. Acad. Sci.112(10):2984-2989,2015, which is incorporated herein by reference in its entirety.

For example, in some embodiments, the nuclease leaf and the a-helical leaf are expressed as separate polypeptides. Although these leaves do not interact by themselves, crRNA recruits them as a ternary complex that recapitulates the activity of full-length CRISPR-associated proteins and catalyzes site-specific DNA cleavage. An inducible dimerization system can be developed using a modified crRNA, which can eliminate the activity of the mitose by preventing dimerization.

In some embodiments, the split CRISPR-associated protein may be fused to a dimerization partner, such as by employing a rapamycin sensitive dimerization domain. This enables the production of chemically inducible CRISPR-associated proteins, enabling the temporary control of the activity of said proteins. Thus, the CRISPR-associated protein can be split into two fragments to become chemically inducible, and the rapamycin sensitive dimerization domain can be used for controlled recombination of the protein.

The cleavage point is typically molecularly designed and cloned into the construct. In this process, mutations can be introduced into the CRISPR-associated split protein, removing the non-functional domains.

In some embodiments, the two portions or fragments (i.e., N-terminal and C-terminal fragments) of the aforementioned cleaved CRISPR-associated protein can form one complete CRISPR-associated protein comprising, for example, at least 70%, at least 80%, at least 90%, at least 95%, or at least 99% of the sequence of the wild-type CRISPR-associated protein.

CRISPR-associated proteins described herein (e.g., CRISPR-Cas effector proteins of type VI-E or VI-F) can be designed to be self-activating or self-inactivating. For example, the target sequence may be introduced into a construct encoding a CRISPR-associated protein. Thus, CRISPR-associated proteins can cleave target sequences and also encode constructs of the proteins such that their expression is self-inactivating. Methods of constructing self-inactivating CRISPR systems are described, for example, in mol. ther.24: S50,2016, to Epstein and schafer, which are incorporated herein by reference in their entirety.

In some other embodiments, another crRNA is expressed under the control of a weak promoter (e.g., the 7SK promoter), which may target the nucleic acid sequence encoding the CRISPR-associated protein to prevent and/or block its expression (e.g., by preventing transcription and/or translation of the nucleic acid). If a cell contains a vector for expressing a CRISPR-associated protein, the crRNA and a crRNA targeting a nucleic acid encoding the CRISPR-associated protein, the transfection of the cell can effectively block the nucleic acid encoding the CRISPR-associated protein and reduce the level of the CRISPR-associated protein, thereby limiting the genome editing activity.

In some embodiments, the genome editing activity of the CRISPR-associated proteins described above can be modulated by an endogenous RNA signature (e.g., miRNA) in a mammalian cell. The use of miRNA complements in the 5' -UTRs encoding CRISPR-associated protein mrnas can form a CRISPR-associated protein switch that will selectively and efficiently respond to mirnas in target cells. Thus, these switches can enable differential control of genome editing by sensing endogenous miRNA activity within a heterogeneous population of cells. Thus, this switch system provides a framework for selective genome editing and cell engineering for cell types based on intracellular miRNA information (see, e.g., nucleic acids Res.45(13): e118,2017, Hirosawa et al).

The CRISPR-associated proteins (such as the VI-E and VI-F type CRISPR-Cas effector proteins) can be expressed in an induced mode, such as light-induced or chemical-induced expression. This mechanism can allow activation of functional domains in CRISPR-associated proteins. Photoinductivity can be achieved by various methods known in the art, such as designing a fusion complex that uses the CRY2 PHR/CIBN pair in cleaving CRISPR-associated proteins (see, e.g., Konermann et al, "optical control of mammalian endogenous transcriptional and epigenetic states," Nature 500:7463,2013.).

Chemical inducibility can be achieved by various methods, such as designing a fusion complex that cleaves CRISPR-associated proteins using FKBP/FRB (FK 506-binding protein/FKBP rapamycin-binding domain) pairing. Rapamycin must be used in forming this fusion complex in order to achieve activation of CRISPR-associated proteins (see e.g. Zetsche et al, Split-Cas9 structure for inducible genome editing and transcriptional regulation, Nature biotech 33:2:139-42, 2015).

In addition, the expression of the CRISPR-associated protein can be regulated by inducible promoters (e.g., tetracycline or doxycycline controlled transcriptional activation, i.e., Tet-On and Tet-Off expression systems), hormone inducible gene expression systems (e.g., ecdysone inducible gene expression systems), and arabinose inducible gene expression systems. When delivered in the form of RNA, the expression of RNA-targeting effector proteins can be regulated by a riboswitch that can sense a small molecule such as tetracycline (see, e.g., Goldflex et al, "direct and specific chemical control of eukaryotes using synthetic RNA-protein interactions", Nucl. acids Res.40:9: e64-e64,2012).

Various embodiments of inducible CRISPR-associated proteins, inducible CRISPR systems have been described, see U.S. patent No.8,871,445, U.S. patent publication No. 2016/0208243, international patent publication No. WO2016/205764, which are incorporated herein by reference in their entirety.

In some embodiments, the CRISPR-associated protein comprises at least one (e.g., 1, 2, 3, 4, 5, 6, 7, 8, 9, or 10) Nuclear Localization Signal (NLS) attached to the N-terminus or C-terminus. For example, some of the NLS's comprise a sequence of NLS's derived from (not exhaustive here): an NLS comprising the T antigen of amino acid sequence PKKKRKV; NLS of riboplasmin (e.g., a ribonucleoprotein bi-part NLS containing sequence KRPAATKKAGQAKKKK); c-myc NLS comprising amino acid sequence PAAKRVKLD or RQRRNELKRSP; hRNPA 1M 9 NLS containing sequence NQSSNFGPMKGGNFGGRSSGPYGGGGQYFAKPRNQGGY; RMRIZKNKGKDTAELRRVEVSVARRK sequence of IBB functional domain in import-alpha; VSRKRPRP and PPKKARED sequences of myoma T protein; the PQPKKKPL sequence of human p 53; SALIKKKKKMAP sequence of mouse c-abl IV; DRLRR and PKQKKRK sequences of influenza virus NS 1; RKLKKKIKKL sequence of hepatitis virus delta antigen; REKKKFLKRR sequence of mouse Mx1 protein; KRKGDEVDGVDEVAKKKSKK sequence of human poly (ADP-ribose) polymerase; and RKCLQAGMNLEARKTKK sequence of human glucocorticoid receptor. In some embodiments, the CRISPR-associated protein comprises at least one (e.g., 1, 2, 3, 4, 5, 6, 7, 8, 9, or 10) Nuclear Export Signal (NES) attached to the N-terminus or C-terminus. In a preferred embodiment, an NLS or NES is attached to the C-terminus and/or N-terminus of the CRISPR-associated protein, in which case the protein can be optimally expressed and optimally targeted in eukaryotic cells (e.g., human cells).

In some embodiments, the CRISPR-associated proteins described herein are mutated at one or more amino acid residues to alter one or more functional activities.

For example, in some embodiments, the CRISPR-associated protein is mutated at one or more amino acid residues to alter its helicase activity.

In some embodiments, the CRISPR-associated protein is mutated at one or more amino acid residues to alter its nuclease activity (e.g., endonuclease activity or exonuclease activity).

In some embodiments, the CRISPR-associated protein is mutated at one or more amino acid residues to alter its ability to functionally bind to a guide RNA.

In some embodiments, the CRISPR-associated protein is mutated at one or more amino acid residues to alter its ability to functionally associate with a target nucleic acid.

In some embodiments, a CRISPR-associated protein described herein is capable of cleaving a target RNA molecule.

In some embodiments, the CRISPR-associated protein has been mutated at one or more amino acid residues to alter its cleavage activity. For example, in some embodiments, one or more mutations may be present in a CRISPR-associated protein such that the enzyme is unable to cleave a target nucleic acid.

In some embodiments, the CRISPR-associated protein is capable of cleaving: a target nucleic acid strand complementary to the strand hybridized to the guide RNA.

In some embodiments, CRISPR-associated proteins described herein can be engineered to delete one or more amino acid residues in a way that can reduce the size of the enzyme while retaining one or more desired functional activities (e.g., nuclease activity, ability to functionally interact with a guide RNA). The use of such truncated CRISPR-associated proteins has certain advantages when combined with a delivery system with load restrictions.

In some embodiments, the CRISPR-associated proteins described herein can be fused to one or more peptide tags including a His tag, a GST tag, a V5-tag, a FLAG tag, an HA tag, an SV-G tag, a Trx tag, or a myc tag.

In some embodiments, a CRISPR-associated protein described herein can be fused to a detectable moiety, for example GST, a fluorescent protein (such as GFP, HcRed, DsRed, CFP, YFP, or BFP), or an enzyme (such as HRP or CAT).

In some embodiments, the CRISPR-associated proteins described herein can be fused to MBP, a domain that binds LexA DNA, or a Gal4 DNA binding domain.

In some embodiments, the CRISPR-associated proteins described herein can be linked or conjugated to a detectable label, such as a fluorescent dye (including FITC and DAPI).

In any of the embodiments herein, the linkage of a CRISPR-associated protein described herein to another moiety described above can be at the N-terminus or C-terminus of the CRISPR-associated protein, sometimes even internally linked by a covalent chemical bond. The above-mentioned linkage may be achieved by any chemical linkage known in the art, e.g. peptide linkage, PEG linkage, or linkage through an amino acid side chain such as D, E, S, T or an amino acid derivative (Ahx, β -Ala, GABA or Ava).

2. Polynucleotide

The invention also provides nucleic acids (e.g., CRISPR-associated proteins or accessory proteins) encoding the proteins described herein and guide RNAs (e.g., crrnas).

In some embodiments, the nucleic acid is a synthetic nucleic acid. In some embodiments, the nucleic acid is a DNA molecule. In some embodiments, the nucleic acid is an RNA molecule (e.g., an mRNA molecule encoding the Cas, a derivative thereof, or a functional fragment thereof). In some embodiments, the mRNA is capped, polyadenylated, substituted with 5-methylcytosine nucleosides, substituted with pseudouridine, or any combination thereof.

In some embodiments, the nucleic acid (e.g., DNA) is operably linked to regulatory elements (e.g., a promoter) so as to control expression of the nucleic acid. In some embodiments, the promoter is a constitutive promoter. In some embodiments, the promoter is an inducible promoter. In some embodiments, the promoter is a cell-specific promoter. In some embodiments, the promoter is an organism-specific promoter.

Suitable promoters are those known in the art and include, for example, the pol I promoter, pol II promoter, pol III promoter, T7 promoter, U6 promoter, H1 promoter, retroviral Rous sarcoma virus LTR promoter, Cytomegalovirus (CMV) promoter, SV40 promoter, dihydrofolate reductase promoter, and the β -actin promoter. For example, the U6 promoter can be used to regulate expression of guide RNA molecules as described herein.

In some embodiments, the one or more nucleic acids are present in a vector (e.g., a viral vector or a phage). The vector may be a cloning vector or an expression vector. The vector may be a plasmid, phagemid, cosmid, or the like. The vector may include one or more regulatory elements that allow the vector to replicate in the cell of interest (e.g., a bacterial cell or a mammalian cell). In some embodiments, the vector contains a nucleic acid encoding a single component of a CRISPR-associated (Cas) system described herein. In some embodiments, the vector comprises a plurality of nucleic acids, each nucleic acid encoding a component of a CRISPR-associated (Cas) system described herein.

In one aspect, the disclosure provides a nucleic acid sequence that is at least 50%, 55%, 60%, 65%, 70%, 75%, 80%, 85%, 90%, 91%, 92%, 93%, 94%, 95%, 96%, 97%, 98%, 99% or 100% identical to a nucleic acid sequence described herein, i.e., a nucleic acid sequence encoding the Cas protein, derivative, functional fragment or guide RNA/crRNA, comprising SEQ ID NO: DR sequence of 8-14.

In another aspect, the disclosure also provides nucleic acid sequences encoding amino acid sequences at least 50%, 55%, 60%, 65%, 70%, 75%, 80%, 85%, 90%, 91%, 92%, 93%, 94%, 95%, 96%, 97%, 98%, 99%, or 100% identical to the amino acid sequences described herein, e.g., SEQ ID NOs 1-7 sequences.

In some embodiments, the nucleic acid sequence contains at least a portion that is identical to a sequence described herein (e.g., at least 1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 13, 14, 15, 20, 30, 40, 50, 60, 70, 80, 90, or 100 nucleotides, such as contiguous or non-contiguous nucleotides). In some embodiments, the nucleic acid sequence has at least a portion that differs from a sequence described herein (e.g., at least 1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 13, 14, 15, 20, 30, 40, 50, 60, 70, 80, 90, or 100 nucleotides, such as contiguous or non-contiguous nucleotides).

In related embodiments, the invention provides amino acid sequences that are at least partially identical to the sequences described herein (e.g., at least 1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 13, 14, 1520, 30, 40, 50, 60, 70, 80, 90, or 100 amino acid residues, such as contiguous or non-contiguous amino acid residues). In some embodiments, the above-described amino acid sequences differ from at least a portion of a sequence described herein (e.g., by at least 1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 13, 14, 15, 20, 30, 40, 50, 60, 70, 80, 90, or 100 amino acid residues, e.g., contiguous or non-contiguous amino acid residues).

To determine the percent identity of two amino acid sequences or two nucleic acid sequences, the sequences are aligned for optimal comparison purposes (e.g., gaps can be introduced in the first and/or second amino acid or nucleic acid sequence for optimal alignment, and non-homologous sequences can be disregarded for comparison). Typically, for comparison purposes, the length of the reference sequence being aligned should be at least 80% of the length of the reference sequence, and in some embodiments, the length of the reference sequence being aligned is at least 90%, 95%, or 100% of the length of the reference sequence.

Then, the amino acid residues or nucleotides at the corresponding amino acid positions or nucleotide positions are compared. When a first sequence is occupied by the same amino acid residue or nucleotide as a second sequence at the same position, the two molecules are identical at that position. The percentage of identity between two sequences is a function of the number of positions at which the two sequences share the same position, which has taken into account the number of gaps and the length of each gap, both factors being introduced if optimal alignment of the two sequences is to be achieved. For use in this disclosure, a Blossum 62 scoring matrix can be used for comparison between sequences and to determine the percentage of identity of two sequences, the Blossum 62 scoring matrix having a gap penalty of 12, an extension gap penalty of 4, and a frameshift gap penalty of 5.

The proteins described herein (e.g., CRISPR-associated proteins or accessory proteins) can be delivered or used as nucleic acid molecules or polypeptides.

In some embodiments, the nucleic acid molecule encoding a CRISPR-associated protein, derivative or functional fragment thereof is codon optimized for expression in a host cell or organism. The host cell may comprise an established cell line (e.g., 293T cell) or an isolated primary cell. The above nucleic acids can be codon optimized to allow their use in any organism of interest, in particular human cells or bacteria. For example, the nucleic acid can be codon optimized for use with any prokaryote (e.g., e.coli) or any eukaryote, such as human, as well as other non-human eukaryotes, including yeast, worms, insects, plants and algae (including food crops, rice, corn, vegetables, fruits, trees, grasses), vertebrates, fish, non-human mammals (e.g., mice, rats, rabbits, dogs, birds (e.g., chickens), livestock (cows or cattle, pigs, horses, sheep, goats, etc.), or non-human primates). Codon Usage tables can be found, for example, in the "Codon Usage Database" www.kazusa.orjp/Codon/and these tables can be modified in a number of ways. See, Nucl. acids Res.28:292,2000, Nakamura et al, herein incorporated by reference in its entirety. Computer algorithms for optimizing codon-specific sequences for expression in a particular host cell are also available, for example, at Gene Forge (Aptagen; Jacobus, Pa.).

One example of an optimized codon sequence is described in WO2014/093622 (PCT/US2013/074667) as SaCas9 human codon optimized sequence, discussed herein as an optimized sequence expressed in a eukaryote (e.g., a human) (i.e., optimized for expression in a human), or discussed herein as another eukaryote, animal, or mammal. While this embodiment is ideal, it is to be understood that other embodiments are possible, and in addition thereto, it is known that codon-optimized embodiments exist that are implemented for host species other than humans or for use in specific organs. In general, codon optimization refers to modifying a nucleic acid sequence to achieve better expression in a target host cell by replacing at least one codon of the original sequence (e.g., equal to or greater than about 1, 2, 3, 4, 5, 10, 15, 20, 25, 50 or more codons) with a more frequently or most frequently used codon in a gene of the host cell while maintaining its native amino acid sequence. Many species exhibit specific biases for certain codons for particular amino acids. Codon bias (difference in codon usage between organisms) is often related to the translation efficiency of messenger rna (mrna), which in turn is believed to depend, inter alia, on the identity of the codons being translated and the availability of specific transfer rna (trna) molecules. The predominance of the selected tRNA in a cell typically reflects those codons most commonly used in peptide synthesis. Thus, genes can be tailored based on codon optimization, achieving optimal gene expression in the target organism. Codon Usage tables are available, for example, in the "Codon Usage Database" www.kazusa.orjp/Codon/and these tables can be modified in a number of ways. See Nakamura, Y, et al, Table of codon usage from International DNA sequence databases, Nucl. acids Res.28:292 (2000). Computer algorithms for optimizing codon-specific sequences for expression in a particular host cell are also available, for example, at Gene Forge (Aptagen; Jacobus, Pa.). In some embodiments, one or more codons (e.g., 1, 2, 3, 4, 5, 10, 15, 20, 25, 50 or more, or all codons) in the sequence encoding the Cas correspond to the codons most frequently used in one particular amino acid.

RNA guide or crRNA

In some embodiments, the CRISPR system described herein contains at least an RNA guide (e.g., one gRNA or crRNA).

The structure of the multiple RNA guides are structures known in the art (see, e.g., international publication nos. WO2014/093622 and WO 2015/070083, herein incorporated by reference in their entirety).

In some embodiments, the CRISPR systems described herein comprise a plurality of RNA guides (e.g., 1, 2, 3, 4, 5, 6, 7, 8, or more RNA guides).

In some embodiments, the RNA guide comprises one crRNA. In some embodiments, the RNA guide comprises one crRNA but not tracrRNA.

Sequences for guide RNAs from multiple CRISPR systems are generally known in the art,see,for example,Grissa et al.(Nucleic Acids Res.35(web server issue):W52-7,2007;Grissa et al.,BMC Bioinformatics 8:172,2007;Grissa et al.,Nucleic Acids Res.36(web server issue):W145-8,2008;and Moller and Liang,PeerJ 5:e3788,2017;the CRISPR database at:crispr.i2bc.paris-saclayfr/crispr/BLAST/CRISPRsBlast.php;and MetaCRAST available at:github.com/molleraj/MetaCRAST).All incorporated herein by reference.

The sequences of guide RNAs from several CRISPR systems are essentially known in the art, see, e.g., Grissa et al (Nucleic Acids Res.35: W52-7,2007; Grissa et al, BMC biologics 8:172,2007; Grissa et al, Nucleic Acids Res.36(web server essence): W145-8,2008; and Muller and Liang, Peer J5: e3788,2017; the CRISPR database at: CRISCOM.I 2bc. part-Sacrylfr/CRISpr/BLAST/CRISSPRsblast. php; and Meta ST availaat. All of the above are incorporated herein by reference.

In some embodiments, the crRNA includes a Direct Repeat (DR) sequence and a spacer sequence. In some embodiments, the crRNA comprises or consists of or consists essentially of a direct repeat sequence linked to a guide or spacer sequence, ideally linked to the 3' end of the spacer sequence.

Typically, the Cas protein forms a complex with the mature crRNA, and a spacer sequence directs the complex to specifically bind to a target RNA sequence that is complementary to and/or hybridizes to the spacer sequence. The resulting complex comprises the Cas protein described above and the mature crRNA bound to the target RNA.

The direct repeats of Cas13e and Cas13f systems are generally well conserved, especially at the termini, GCTG of Cas13e and Cas13f have a GCTGT at the 5 'end and are reverse complementary to a CAGC of Cas13e and ACAGC of Cas13f at the 3' end. This conservation implies strong base pairing of the RNA stem-loop structure, which may interact with proteins in the locus.

In some embodiments, when in RNA, the direct repeat sequence comprises the general secondary structure 5'-S1a-Ba-S2a-L-S2b-Bb-S1b-3', wherein fragments S1a and S1b are reverse complementary sequences and form a first stem of 4 nucleotides in Cas13e and 5 nucleotides in Cas13f (S1); fragments Ba and Bb, which are complementary base-paired and form a symmetric or nearly symmetric bulge structure (B), have 5 nucleotides in Cas13e, respectively, and 5(Ba) and 4(Bb) or 6(Ba) and 5(Bb) nucleotides in Cas13f, respectively; fragments S2a and S2b are reverse complementary sequences and form a second stem of 5 base pairs in Cas13e and 6 or 5 base pairs in Cas13f (S2); l is an 8 nucleotide loop in Cas13e and a 5 nucleotide loop in Cas13f. Please see fig. 2.

In some embodiments, S1a has the sequence of GCUG in Cas13e and the sequence of GCUGU in Cas13f.

In some embodiments, S2a has a GCCCC sequence in Cas13e and an a/G CCUC G/a sequence in Cas13f (where the first a or G may not be present).

In some embodiments, the direct repeat sequence comprises or consists of SEQ ID NO: 8-14.

As used herein, "direct repeats" may refer to DNA coding sequences in the CRISPR locus, or to RNA encoded by crRNA. Thus, in the context of RNA molecules, if reference is made to SEQ ID NO: 8-14 (e.g., crRNA), each T is understood to represent a U.

In some embodiments, the direct repeat sequence comprises or consists of a nucleic acid sequence having deletions, insertions, or substitutions of up to 1, 2, 3, 4, 5, 6, 7, or 8 of the amino acid sequences set forth in SEQ ID NOs: 8-14 nucleotides. In some embodiments, the direct repeat sequence comprises or consists of a nucleic acid sequence that is identical to SEQ ID NO: 8-14 (e.g., due to deletion, insertion, or substitution of nucleotides in SEQ ID NOS: 8-14) have at least 80%, 85%, 90%, 95%, or 97% sequence identity. In some embodiments, the direct repeat sequence comprises or consists of a nucleic acid sequence that is identical to SEQ ID NO: 8-14, but can be different from any of SEQ ID NOs: 8-14 under stringent hybridization conditions, or can bind to SEQ ID NO: 8-14, or a complement thereof.

In some embodiments, the deletion, insertion or substitution does not alter the amino acid sequence of SEQ ID NO: 8-14 (e.g., the stem, bulge, and loop structures do not deviate significantly from the relative positions and/or sizes of the original stems, bulges, and loops). For example, deletions, insertions or substitutions may occur in the bulge or loop-like structures and regions, such that the overall symmetry of the bulge structure remains largely the same. Deletions, insertions or substitutions may occur in the stem structure such that the length of the stem structure does not deviate significantly from the original length (e.g., one base pair is added or deleted in each of the two stems, resulting in a total of 4 base changes).

In some embodiments, the deletion, insertion or substitution results in a derivative DR sequence that can have ± 1 or 2 base pairs in one or both of the stem structures (see fig. 2), ± 1, 2 or 3 bases in one or both of the single strands of the bulge structure, and/or ± 1, 2, 3 or 4 bases in the loop structure region.

In some embodiments, the deletion, insertion or substitution results in a derivative DR sequence that can have ± 1 or 2 base pairs in one or both of the stem structures (see fig. 2), ± 1, 2 or 3 bases in one or both of the single strands of the bulge structure, and/or ± 1, 2, 3 or 4 bases in the loop region.

In some embodiments, the direct repeat sequence has a sequence that is identical to any one of SEQ ID NOs: 8-14, which retain the function as direct repeats in Cas13e or Cas13f proteins, as shown in SEQ ID NO: DR sequence of 8-14.

In some embodiments, the direct repeat sequence comprises or consists of a nucleic acid having the sequence of SEQ ID NO: 8-14, and truncated by the initial 3, 4, 5, 6, 7, or 8 nucleotides 3'.

In some embodiments, the Cas protein comprises SEQ ID NO:1, and the crRNA comprises a direct repeat sequence, wherein the direct repeat sequence comprises or consists of the amino acid sequence of SEQ ID NO: 8.

In some embodiments, the Cas protein comprises SEQ ID NO: 2, and the crRNA comprises a direct repeat sequence, wherein the direct repeat sequence comprises or consists of the amino acid sequence of SEQ ID NO: 9.

In some embodiments, the Cas protein comprises SEQ ID NO:3, and the crRNA comprises a direct repeat sequence, wherein the direct repeat sequence comprises or consists of the amino acid sequence of SEQ ID NO: 10.

In some embodiments, the Cas protein comprises SEQ ID NO: 4, and the crRNA comprises a direct repeat sequence, wherein the direct repeat sequence comprises or consists of the amino acid sequence of SEQ ID NO: 11.

In some embodiments, the Cas protein comprises SEQ ID NO: 5, and the crRNA comprises a direct repeat, wherein the direct repeat comprises or consists of the amino acid sequence of SEQ ID NO: 12.

In some embodiments, the Cas protein comprises SEQ ID NO: 6, and the crRNA comprises a direct repeat, wherein the direct repeat comprises or consists of the amino acid sequence of SEQ ID NO: 13, or a nucleic acid sequence of seq id no.

In some embodiments, the Cas protein comprises SEQ ID NO: 7, and the crRNA comprises a direct repeat, wherein the direct repeat comprises or consists of the amino acid sequence of SEQ ID NO: 14, or a nucleic acid sequence of seq id no.

In classical CRISPR systems, the degree of complementarity between a guide sequence (e.g., crRNA) and its corresponding target sequence can be about 50%, 60%, 75%, 80%, 85%, 90%, 95%, 97.5%, 99%, or 100%. In some embodiments, this degree of complementarity is 90-100%.

The guide RNA can be about 5, 10, 11, 12, 13, 14, 15, 16, 17, 18, 19, 20, 21, 22, 23, 24, 25, 26, 27, 28, 29, 30, 35, 40, 45, 50, 75, 100, 125, 150, 175, 200 or more nucleotides in length. For example, for use in a functional Cas13e or Cas13f effector protein or homolog, ortholog, derivative, fusion, conjugate or functional fragment thereof, the spacer can be between 10-60 nucleotides, 20-50 nucleotides, 25-45 nucleotides, 25-35 nucleotides, or about 27, 28, 29, 30, 31, 32 or 33 nucleotides in length. However, if used in a version of dCas according to any of the above, the spacer may be between 10-200 nucleotides, 20-150 nucleotides, 25-100 nucleotides, 25-85 nucleotides, 35-75 nucleotides, 45-60 nucleotides in length, or as large as 46, 47, 48, 49, 50, 51, 52, 53, 54 or 55 nucleotides in length.

To reduce off-target interactions, e.g., to reduce interaction of a guide with a target sequence of low complementarity, mutations can be introduced in the CRISPR system that enable the CRISPR system to distinguish between target and off-target sequences that are more than 80%, 85%, 90%, or 95% complementary. In some embodiments, the degree of complementarity is 80% -95%, such as about 83%, 84%, 85%, 86%, 87%, 88%, 89%, 90%, 91%, 92%, 93%, 94%, or 95% (e.g., one target with 18 nucleotides can be distinguished from an off-target with 18 nucleotides with 1, 2, or 3 mismatches). Thus, in some embodiments, a guide sequence is more than 94.5%, 95%, 95.5%, 96%, 96.5%, 97%, 97.5%, 98%, 98.5%, 99%, 99.5% or 99.9% complementary to its corresponding target sequence. In some embodiments, the degree of complementarity is 100%.

It is known in the art that sufficient complementarity is available to function without requiring complete complementarity. Cleavage efficiency can be modulated by introducing mismatches, e.g., one or more mismatches, such as1 or 2 mismatches between the spacer sequence and the target sequence (including the position of the mismatch along the spacer/target). The effect on the efficiency of cleavage is greater if a mismatch (e.g., double mismatch) is located closer to the center (i.e., not at the 3 'or 5' end). Thus, mismatches can be introduced at positions along the spacer sequence to modulate cleavage efficiency. For example, if it is desired to achieve less than 100% target cleavage (as in a population of cells), mismatches between 1 or 2 spacers and the target sequence can be introduced into the spacer sequence.

It has been demonstrated that type VI CRISPR-Cas effector proteins can use multiple RNA guides, enabling these effector proteins, systems and complexes comprising these effector proteins, to target multiple nucleic acids. In some embodiments, the CRISPR systems described herein comprise a plurality (e.g., 2, 3, 4, 5, 6, 7, 8, 9, 10, 15, 20, 30, 40, or more) RNA guides. In some embodiments, the CRISPR systems described herein comprise a single strand of RNA or a nucleic acid encoding a single stranded RNA, wherein the RNAs are in a guide tandem arrangement. The single RNA strand may include multiple identical copies of one RNA guide, multiple different copies of different RNA guides, or both. The processing ability of the CRISPR-Cas effector proteins of types VI-E and VI-F described herein enables these effector proteins to target multiple target nucleic acids (e.g., target RNAs) without loss of activity. In some embodiments, such VI-E and VI-F CRISPR-Cas effector proteins can be delivered in complex with multiple RNA guides directed to different target RNAs. In some embodiments, the CRISPR-Cas effector proteins of types VI-E and VI-F can be co-delivered with multiple RNA guides, each RNA guide being specific for a different target nucleic acid. Methods of multiplexing CRISPR-associated proteins have been described, for example, in U.S. patent nos. 9,790,490B2 and ep 3009511B1, which are incorporated herein by reference in their entirety and specifically herein.

The spacer length of the crRNA may range between about 10-60 nucleotides, such as 15-50 nucleotides, 20-50 nucleotides, 25-50 nucleotides, or 19-50 nucleotides. In some embodiments, the spacer of the guide RNA is at least 16 nucleotides, at least 17 nucleotides, at least 18 nucleotides, at least 19 nucleotides, at least 20 nucleotides, at least 21 nucleotides, or at least 22 nucleotides in length. In some embodiments, the spacer is 15-17 nucleotides (e.g., 15, 16, or 17 nucleotides), 17-20 nucleotides (e.g., 17, 18, 19, or 20 nucleotides), 20-24 nucleotides (e.g., 20, 21, 22, 23, or 24 nucleotides), 23-25 nucleotides (e.g., 23, 24, or 25 nucleotides), 24-27 nucleotides, 27-30 nucleotides, 30-45 nucleotides (e.g., 30, 31, 32, 33, 34, 35, 36, 37, 38, 39, 40, 41, 42, 43, 44, or 45 nucleotides), 30 or 35-40 nucleotides, 41-45 nucleotides, 45-50 nucleotides (e.g., 45, 46, 47, 48, 49, or 50 nucleotides), or longer in length. In some embodiments, the spacer is about 15 to about 42 nucleotides in length.

In some embodiments, the guide RNA has a direct repeat length of 15-36 nucleotides, at least 16 nucleotides, 16-20 nucleotides (e.g., 16, 17, 18, 19, or 20 nucleotides), 20-30 nucleotides (e.g., 20, 21, 22, 23, 24, 25, 26, 27, 28, 29, or 30 nucleotides), 30-40 nucleotides (e.g., 30, 31, 32, 33, 34, 35, 36, 37, 38, 39, or 40 nucleotides), or about 36 nucleotides (e.g., 33, 34, 35, 36, 37, 38, or 39 nucleotides). In some embodiments, the direct repeat length of the guide RNA is 36 nucleotides.

In some embodiments, the total length of the crRNA/guide RNA is about 36 nucleotides longer than the length of any one of the spacer sequences described above. For example, the total length of crRNA/guide RNA may be between 45-86 nucleotides or 60-86 nucleotides, 62-86 nucleotides or 63-86 nucleotides.

The crRNA sequence may be modified by: a crRNA and CRISPR-associated protein complex is formed and successfully bound to the target, but at the same time does not allow for successful nuclease activity (i.e. no nuclease activity/no indels). These modified guide sequences are referred to as "death crRNA", "death guide" or "death guide sequence". The nuclease activity of these death-or death-guide sequences can be catalytically or conformationally inactive. These death leaders are generally shorter than the corresponding leaders for active RNA cleavage. In some embodiments, the death guide is 5%, 10%, 20%, 30%, 40%, or 50% shorter than a corresponding guide RNA having nuclease activity. The death guide sequence of the guide RNA may be 13-15 nucleotides in length (e.g., 13, 14, or 15 nucleotides in length), 15-19 nucleotides in length, or 17-18 nucleotides in length (e.g., 17 nucleotides in length).

Accordingly, in one aspect of the present disclosure there is provided non-naturally occurring or engineered CRISPR systems comprising a functional CRISPR-associated protein as described herein and a crRNA comprising a dead crRNA sequence enabling the crRNA to hybridize to a target sequence such that said CRISPR system can be directed to a genomic site of interest in a cell without detectable nuclease activity (e.g., RNase activity).

For example, the death guide is described in detail in International publication No. WO 2016/094872, which is incorporated herein by reference in its entirety.

Guide RNAs (e.g., crRNA) may be generated as a component of the induction system. Due to the inducibility of the system, spatiotemporal control of gene editing or gene expression can be achieved. In some embodiments, the inducible system is stimulated using electromagnetic radiation, acoustic energy, chemical energy, and/or thermal energy, among others.

In some embodiments, the transcription of a guide RNA (e.g., crRNA) can be modulated by: inducible promoters such as tetracycline or doxycycline controlled transcriptional activation (Tet-On and Tet-Off expression systems), hormone inducible gene expression systems (e.g., ecdysone), or arabinose inducible gene expression systems. Some other embodiments of the induction system include: small molecule two-hybrid transcriptional activation systems (FKBP, ABA, etc.), photoinduced systems (phytochrome, LOV domains or cryptochrome), or photoinduced transcriptional effectors (LITE). These induction systems have been described, for example, in WO2016205764, and U.S. Pat. No.8,795,965, which are incorporated herein by reference in their entirety.

Chemical modifications may be applied to the phosphate backbone, sugars and/or bases of the crRNA. Backbone modifications (e.g., phosphorothioates) modify the charge on the phosphate backbone and aid in delivery of oligonucleotides and nuclease resistance (see, e.g., Eckstein, "basic elements of phosphorothioates, therapeutic oligonucleotides," nuclear.acid ther.,24, pp.374-387,2014); sugar modifications, such as 2 '-O-methyl (2' -OMe), 2'-F, Locked Nucleic Acid (LNA) modifications, can enhance base pairing and nuclease resistance (see, e.g., Allerson et al, "2' -fully modified oligonucleotide duplexes have better in vitro potency and stability compared to unmodified small interfering RNAs", J.Med.chem.48.4:901-904, 2005). Chemically modified bases, such as 2-thiouridine or N6-methyladenosine, can either make base pairing stronger or weaker (see, e.g., Bramsen et al, development of therapeutic-grade small interfering RNA by chemical engineering, front. In addition, the RNA can be conjugated at the 5 'and 3' ends with various functional moieties, including fluorescent dyes, polyethylene glycol, or proteins.

Various modifications can be applied to chemically synthesized crRNA molecules. For example, modification of oligonucleotides with 2' -OMe can increase nuclease resistance, thereby altering the binding energy of Watson-Crick base pairing. In addition, 2' -OMe modifications may affect the interaction of the oligonucleotide with transfection reagents, proteins or with any other molecule in the cell. The effect of these modifications can be obtained by empirical testing.

In some embodiments, the crRNA comprises one or more phosphorothioate modifications. In some embodiments, to enhance base pairing and/or increase nuclease resistance, the crRNA comprises one or more locked nucleic acids.

A summary of these chemical modifications can be found, for example, in "versatility of chemically synthesized guide RNAs for CRISPR-Cas9 genome editing" (Kelley et al, J.Biotechnol.233:74-83,2016), WO2016205764, U.S. Pat. No.8,795,965B 2. The above is incorporated herein in its entirety.

The sequence and length of the RNA guide (e.g., crRNA) described herein can be optimized. In some embodiments, the optimal length of the RNA guide can be determined by identifying the processed crRNA (i.e., mature crRNA) or studying the empirical length of the crRNA tetracycle.

These crrnas may also contain one or more aptamer sequences. An aptamer is an oligonucleotide or peptide molecule that has a specific three-dimensional structure and can bind to a specific target molecule. The aptamer may be a gene effector, a gene activator, or a gene repressor. In some embodiments, the aptamer may be specific for a protein that in turn recruits and/or binds, is specific for a particular gene effector protein, gene activator, or gene repressor. The gene effector protein, gene activator or gene repressor may be present in the form of a fusion protein. In some embodiments, the guide RNA has two or more aptamer sequences specific for the same aptamer protein. In some embodiments, two or more aptamer sequences are specific for different aptamer proteins. Aptamer proteins may include, for example, MS2, PP7, Q β, F2, GA, fr, JP501, M12, R17, BZ13, JP34, JP500, KU1, M11, MX1, TW18, VK, SP, FI, ID2, NL95, TW19, AP205, Φ kCb5, Φ kCb8R, Φ kCb12R, Φ kCb23R, 7s, PRR 1. Thus, in some embodiments, the aptamer is selected from a binding protein that specifically binds to any of the aptamer proteins described herein. In some embodiments, the aptamer sequence is a binding loop (5'-ggcccAACAUGAGGAUCACCCAUGUCUGCAGgggcc-3'). In some embodiments, the aptamer sequence is a QBeta binding loop (5'-ggcccAUGCUGUCUAAGACAGCAUgggcc-3'). In some embodiments, the aptamer sequence is a PP7 binding loop (5'-ggcccUAAGGGUUUAUAUGGAAACCCUUAgggcc-3'). A detailed description of aptamers can be found, for example, in guide RNA engineering for Universal Cas9 functionality (Nowak et al, Nucl. acid. Res.,44(20): 9555-.

In some embodiments, the methods utilize chemically modified guide RNAs. Examples of chemical modifications of guide RNAs include, but are not limited to, the incorporation of 2' -O-methyl (M), 2' -O-methyl 3' -phosphorothioate (MS), or 2' -O-methyl 3' -thiopace (msp) at one or more of the terminal nucleotides. Chemically modified guide RNAs can have higher stability and activity than unmodified guide RNAs, and on-target specificity and off-target specificity cannot be predicted. See, Hendel Nat Biotechnol.33(9): 985-. Chemically modified guide RNAs may also include, but are not limited to, an RNA containing phosphorothioate linkages as well as Locked Nucleic Acid (LNA) nucleotides having a methylene bridge between the 2 'and 4' carbons.

The invention also includes methods of delivering a plurality of nucleic acid components, wherein each nucleic acid component is specific for a different target locus of interest, modifying a plurality of target loci of interest. The nucleic acid component of the complex may comprise one or more protein-binding RNA aptamers. One or more of the aptamers described above may be capable of binding to a bacteriophage coat protein. The phage coat protein may be selected from Q β, F2, GA, fr, JP501, MS2, M12, R17, BZ13, JP34, JP500, KU1, M11, MX1, TW18, VK, SP, FI, ID2, NL95, TW19, AP205, Φ Cb5, Φ Cb8R, Cb Φ 12R, Cb Φ 23R, 7s and PRR 1. In some embodiments, the bacteriophage coat protein is MS 2.

3. Target RNA

The target RNA can be any RNA molecule of interest, including naturally occurring RNA molecules and engineered RNA molecules. The target RNA can be an mRNA, tRNA, ribosomal RNA (rRNA), microRNA (miRNA), interfering RNA (siRNA), ribozyme, riboswitch, satellite RNA, microswitch, micro-enzyme or viral RNA.

In some embodiments, the target nucleic acid is associated with a disorder or disease (e.g., infectious disease or cancer).

Thus, in some embodiments, the systems described herein can be used to target these disorders or disease nucleic acids to treat these disorders or diseases. For example, a target nucleic acid associated with a disorder or disease can be an RNA molecule that is overexpressed in a diseased cell (e.g., a cancer cell or tumor cell). The target nucleic acid may also be a toxic RNA and/or a mutated RNA (e.g., an mRNA molecule having a splice defect or mutation). The target nucleic acid can also be an RNA specific for a particular microorganism (e.g., a pathogenic bacterium).

4. Complex and cell

One aspect of the invention provides a CRISPR/Cas13e or CRISPR/Cas13f complex comprising (1) any Cas13e/Cas13f effector protein, homolog, ortholog, fusion, derivative, conjugate, or functional fragment thereof described herein, and (2) any guide RNA described herein, each RNA comprising a spacer sequence designed to be at least partially complementary to a target RNA, and a DR sequence compatible with the Cas13e/Cas13f effector protein, homolog, ortholog, fusion, derivative, conjugate, or functional fragment thereof.

In some embodiments, the complex further comprises a target RNA that binds to the guide RNA.

In some embodiments, the complex is not naturally occurring/naturally occurring. For example, at least one component of the complex is not naturally occurring/naturally occurring. In some embodiments, the Cas13e/Cas13f effector protein, homolog, ortholog, fusion, derivative, conjugate, or functional fragment thereof is not naturally occurring/naturally occurring, such as the presence of at least one amino acid mutation (deletion, insertion, and/or substitution). In some embodiments, the DR sequence is not naturally occurring/naturally occurring, i.e., is not SEQ ID NO: 8-14, such as the sequence has at least one nucleotide base addition, deletion and/or substitution as compared to the wild type sequence. In some embodiments, the spacer sequence is not naturally occurring in that it is not present in, or encoded by, a spacer sequence present in any prokaryotic wild-type CRISPR locus in which the subject Cas13e or Cas13f is present. When the spacer sequence is not 100% complementary to a naturally occurring bacteriophage nucleic acid, it may be a non-naturally occurring sequence.

The invention also provides in a related aspect a cell comprising any of the complexes of the invention.

In some embodiments, the cell is a prokaryote.

In some embodiments, the cell is a eukaryote. When the cell is eukaryotic, the complex in eukaryotic cells can be a Cas13e/Cas13f complex naturally occurring in a prokaryote from which Cas13e/Cas13f can be isolated.

5. Methods of using CRISPR systems

The CRISPR systems described herein have a variety of uses, including modifying (e.g., deleting, inserting, transporting, inactivating, or activating) a target polynucleotide or nucleic acid in a variety of cell types. The CRISPR system is widely applicable as: DNA/RNA detection (e.g., specific high sensitivity enzyme reporter unlocking, i.e., SHERLOCK), nucleic acid tracking and labeling, enrichment assays (extracting desired sequences from background), control of interfering RNA or miRNA, detection of circulating tumor DNA, preparation of next generation libraries, drug screening, disease diagnosis and prognosis, treatment of various genetic diseases, and the like.

DNA/RNA detection

In one aspect, the CRISPR systems described herein can be used for DNA or RNA detection. As shown in the examples, the Cas13e and Cas13f proteins of the present invention exhibit non-specific/accessory RNase activity when the spacer sequence, about 30 nucleotides, is activated by its guide RNA-dependent specific RNase activity. Thus, the CRISPR-associated proteins of the invention can be reprogrammed by CRISPR RNA (crRNA), providing a specific RNA sensing platform. By selecting a particular spacer length, CRISPR-associated proteins will be activated when their RNA target is recognized, thereby participating in the "collateral" cleavage of nearby non-targeted RNAs. This additional cleavage activity of crRNA programming allows the CRISPR system to trigger programmed cell death or non-specific degradation of labeled RNA, thereby detecting the presence of specific RNA.

The SHERLOCK method (specific high sensitivity enzyme reporter unlocked) provides an attomolar sensitivity in vitro nucleic acid detection platform that allows for real-time detection of targets via nucleic acid amplification and attendant cleavage of reporter RNA. Combining different isothermal amplification steps can be used for signal detection. For example, Recombinase Polymerase Amplification (RPA) can be combined with transcription of T7, and detection can be performed after converting the amplified DNA into RNA. The following combination is referred to as SHERLOCK: RPA amplification, transcription of the amplified DNA into RNA with T7 RNA polymerase, detection of target RNA with reporter gene signal mediated release by attendant RNA cleavage. The method of using CRISPR in SHERLock is described in detail in, for example, Gootenberg et al, nucleic acid detection using CRISPR-Cas13a/C2C2 (Science,2017 Apr.28; 356(6336):438 442), which is incorporated herein by reference in its entirety.

The CRISPR-associated proteins described herein can be used in Northern blot analysis, which separates RNA samples by size by electrophoresis. The CRISPR-associated protein can be used for specifically binding and aligning with a target RNA sequence for detection. The CRISPR-associated proteins can also be fused to fluorescent proteins (e.g., GFP) and used to track RNA localization in living cells. In particular, CRISPR-associated proteins can be made inactive without cleaving RNA as described above. Thus, the CRISPR-associated proteins claimed can be used to determine the localization of RNA or particular splice variants, to determine the level of mRNA transcripts, to up-regulate or down-regulate transcript levels, and to make disease-specific diagnostics. The CRISPR-associated proteins can be used to visualize RNA in (living) cells, e.g. using fluorescence microscopy or flow cytometry (e.g. fluorescence activated cell sorting, i.e. FACS), to achieve high throughput screening of cells, recovery of living cells after cell sorting. A detailed description of how to detect DNA and RNA can be found, for example, in International publication No. WO 2017/070605, which is incorporated herein by reference in its entirety.

In some embodiments, the CRISPR systems described herein can be used for multiple error robust fluorescence in situ hybridization (merish). The method is described, for example, in Chen et al, "spatially resolved, highly multiplexed RNA profiles in single cells" (Science,2015 Apr.24; 348(6233): aaa6090,), which is incorporated herein by reference in its entirety.

The specificity of detecting and quantifying RNA in a sample can have a wide variety of applications, including diagnostic applications. In some embodiments, the method comprises contacting the sample with: i) an RNA guide (e.g., crRNA) and/or nucleic acid encoding the RNA guide, wherein the RNA guide consists of a direct repeat and a spacer sequence capable of hybridizing to a target RNA; (ii) a CRISPR-Cas effector protein of type VI-E or VI-F (Cas13E or Cas13F) and/or a nucleic acid encoding an effector protein; (iii) a labeled detector RNA; wherein the effector protein binds to the RNA guide to form a complex; wherein the RNA guide is capable of hybridizing to a target RNA; wherein upon binding of said complex to a target RNA, said effector protein exhibits an attached RNase activity and cleaves a labeled detector RNA; and b) measuring a detectable signal generated by cleavage of the labeled detection RNA, wherein the measuring detects single-stranded target RNA in the sample. In some embodiments, the CRISPR systems described herein can be used to detect a target RNA in a sample (e.g., a clinical sample, a cell, or a cell lysate). When the spacer sequence is of a specifically selected length (e.g., about 30 nucleotides), the accessory RNase activity of the VI-E and/or VI-F CRISPR-Cas effector proteins described herein is activated when the effector protein binds to a target nucleic acid. When the effector protein binds to the target RNA of interest, a signal (e.g., an enhanced/diminished signal) is generated by cleavage of the labeled detector RNA, thereby allowing qualitative and quantitative detection of the target RNA in the sample. In some embodiments, the method further comprises comparing the detectable signal to a reference signal to determine the amount of target RNA in the sample. In some embodiments by: gold nanoparticle detection, fluorescence polarization, colloidal phase transition/dispersion, electrochemical detection, semiconductor-based sensing. In some embodiments, the labeled detector RNA comprises a fluorescent emission dye pair, a Fluorescence Resonance Energy Transfer (FRET) pair, or a quencher/fluorescence pair. In some embodiments, the amount of detectable signal produced by the labeled detector RNA decreases or increases when the effector protein cleaves the labeled detector RNA. In some embodiments, the labeled detector RNA produces a first detectable signal prior to cleavage by the effector protein and a second detectable signal after cleavage by the effector protein. In some embodiments, a detectable signal is produced when the labeled detector RNA is cleaved by an effector protein. In some embodiments, the labeled detector RNA comprises a modified nucleobase, a modified sugar moiety, a modified nucleic acid linkage, or a combination thereof. In some embodiments, the methods comprise multichannel detection of multiple (e.g., 2, 3, 4, 5, 6, 7, 8, 9, 10, 15, 20, 30, 40, or more) independent target RNAs in a sample by using multiple VI-E and/or VI-F CRISPR-Cas (Cas13E and/or Cas13F) systems, each system comprising a different orthologous effector protein and a corresponding RNA guide, to distinguish multiple target RNAs in a sample. In some embodiments, the methods comprise performing a multichannel assay on a plurality of independent target RNAs in a sample using a plurality of instances of VI-E and/or VI-F CRISPR-Cas type systems, each instance comprising an orthologous effector protein and a distinguishable accessory RNase substrate. Methods for detecting RNA in a sample using CRISPR-associated proteins are described, for example, in U.S. patent publication No. 2017/0362644, which is incorporated herein by reference in its entirety.

Nucleic acid tracking and labeling

Cellular processes depend on a network of molecular interactions between proteins, RNA and DNA, and it is critical to know cellular processes that the interactions between proteins and DNA, and between proteins and RNA be accurately detected. In vitro proximity labeling techniques employ an affinity label in combination with a reporter group (e.g., a photoactivatable group) to label polypeptides and RNAs in the vicinity of a protein or RNA of interest in vitro. Upon irradiation with ultraviolet light, the photoactivatable groups will react with and label proteins and other molecular proteins in the vicinity of the labeled molecule. The interacted labeled molecules can then be recovered and identified. The CRISPR-associated proteins can be used, for example, to target probes to selected RNA sequences. These applications can also be applied to animal models for in vivo imaging of disease or difficult to culture cell types. Methods for tracking and labeling nucleic acids are described, for example, in U.S. Pat. Nos. 8,795,965, WO2016205764, WO 2017070605, which are incorporated herein by reference in their entirety.

Isolation, purification, enrichment and/or consumption of RNA

The CRISPR systems (e.g., CRISPR-associated proteins) described herein can be used to isolate and/or purify RNA. The CRISPR-associated protein can be fused to an affinity tag that can be used to isolate and/or purify RNA-CRISPR-associated protein complexes. These can be used, for example, for gene expression profiling in cells.

In some embodiments, CRISPR-associated proteins can be used to target specific non-coding rnas (ncrnas) to block their activity. In some embodiments, CRISPR-associated proteins can be used to specifically enrich for a particular RNA (including but not limited to increased stability, etc.) or to specifically deplete a particular RNA (e.g., particular splice variants, isoforms).

Such methods are described in U.S. patent nos. 8,795,965, WO2016205764, WO 2017070605, which are hereby incorporated by reference in their entirety.

High throughput screening

The CRISPR system described herein can be used to prepare Next Generation Sequencing (NGS) libraries. For example, to create a cost-effective NGS library, the CRISPR system can be used to disrupt the coding sequence of a target gene, and the clone pages transfected with the CRISPR-associated protein can be simultaneously screened with next generation sequencing (e.g., on the ion torrent PGM system). For a detailed description of how to prepare NGS libraries, see, e.g., Bell et al, "high throughput screening strategy for detecting CRISPR-Cas 9-induced mutations using next generation sequencing technologies" (BMC Genomics,15.1(2014):1002 "), incorporated herein by reference in its entirety.

Engineering microorganisms

Microorganisms (e.g., E.coli, yeast and microalgae) are widely used in synthetic biology. The development of synthetic biology has a wide range of uses including various clinical applications. For example, the CRISPR system can be programmed to divide proteins of toxic domains to achieve targeted cell death, e.g., using RNA associated with cancer as a target transcript. In addition, pathways involving protein-protein interactions may be affected by appropriate effectors (e.g., kinase or enzyme fusion complexes) in synthetic biological systems.

In some embodiments, crRNA targeting phage sequences can be introduced into the microorganism. Accordingly, the present disclosure also provides methods of phage infection inoculation of microorganisms (e.g., production strains).

In some embodiments, the CRISPR systems provided herein can be used to engineer microorganisms, such as to increase yield or increase fermentation efficiency. For example, the CRISPR systems described herein can be used to engineer yeast, use engineered yeast for fermentable sugars, produce biofuels or combustible biopolymers; or engineered yeast is used to degrade plant lignocellulose derived from agricultural waste into a source of fermentable sugars. In particular, the methods described herein can be used to modify the expression of endogenous genes required for the production of biofuels, and/or modify endogenous genes that can interfere with the synthesis of biofuels. These methods of engineering microorganisms are described, for example, in Verwaal et al CRISPR/Cpf1 for Rapid and simple genome editing of Saccharomyces cerevisiae (Yeast doi:10.1002/yea.3278,2017), in Hlavova et al for biotechnological improvement of microalgae, from genetics to synthetic biology (Biotechnol.adv.,33:1194-203,2015), both of which are incorporated herein by reference in their entirety.

In some embodiments, the CRISPR systems provided herein can be used to induce death or dormancy of a cell (e.g., a microorganism, engineered microorganism). These methods can be used to induce dormancy or death of a variety of cell types, including prokaryotic and cellular eukaryotic cells, including but not limited to mammalian cells (e.g., cancer cells or tissue culture cells), protozoa, fungal cells, viruses, cells infected with intracellular bacteria, cells infected with intracellular protozoa, cells infected with pr virus, bacteria (e.g., pathogenic and non-pathogenic bacteria), protozoa, single-cell and multicellular parasites. In the field of synthetic biology, for example, there is a great need for a mechanism for controlling engineered microorganisms (e.g., bacteria) to prevent their propagation or spread. The systems described herein can be used as "kill switches" to regulate and/or prevent the propagation or spread of engineered microorganisms. Furthermore, there is currently a need in the art to find alternative ways of antibiotic treatment. The systems described herein may also be used to kill or control a particular microbial population (e.g., a bacterial population). For example, the systems described herein can include an RNA guide (e.g., crRNA) that can target a genus, species, or strain-specific nucleic acid (e.g., RNA) and can be delivered to a cell. After complexing and binding to the target nucleic acid, the accessory RNase activity of the VI-E and/or VI-F CRISPR-Cas effector protein is activated, leading to cleavage of non-target RNA within the microorganism, eventually leading to dormancy or death. In some embodiments, the methods comprise contacting the cell with a system described herein comprising a CRISPR-Cas effector protein of type VI-E and/or VI-F or a nucleic acid encoding the effector protein, and an RNA guide (such as crRNA) or a nucleic acid encoding an RNA guide, wherein the spacer sequence is complementary (such as 16, 17, 18, 19, 20, 21, 22, 23, 24, 25, 26, 27, 28, 29, 30, 35, 40, 45, 50 or more nucleotides) to at least 15 nucleotides of a target nucleic acid (such as a genus-specific, strain-specific or species-specific RNA guide). Without wishing to be bound by any particular theory, cleavage of the non-target RNA by the VI-E and/or VI-F CRISPR-Cas effector protein may induce programmed cell death, cytotoxicity, apoptosis, necrosis, programmed necrosis, cell death, cell cycle arrest, cell unresponsiveness, reduced cell growth, or reduced cell proliferation. For example in bacteria, the cleavage of non-target RNA by the VI-E and/or VI-F CRISPR-Cas effector protein may produce bacteriostatic or bactericidal results.

Application in plants

The CRISPR systems described herein have a variety of uses in plants. In some embodiments, the CRISPR system can be used to engineer the genome of a plant (e.g., to increase yield, to produce a product with desired post-translational modifications, or to introduce genes to achieve industrial production of a crop). In some embodiments, the CRISPR system can be used to introduce a desired trait into a plant (such as with or without genetic modification to the genome), or to modulate the expression of an endogenous gene in a plant cell or in the whole plant.

In some embodiments, the CRISPR system can be used to identify, edit, and/or silence genes encoding particular proteins, such as allergen proteins (e.g., allergen proteins in peanuts, soybeans, lentils, peas, beans, mung beans). A detailed description of how to identify, edit and/or silence genes encoding proteins can be found, for example, in Nicolaou et al molecular diagnostics for peanut and legume allergies (curr. Opin. allergy Clin. Immunol.11(3):222-8,2011) and WO2016205764A1, both of which are incorporated herein by reference in their entirety.

Gene drive

Gene drive is a phenomenon in which the inheritance of a particular gene or set of genes is favourably biased. The CRISPR system described herein can be used to construct gene drives. For example, the CRISPR system can be designed to target and disrupt a particular allele of one gene, allowing the cell to repair the sequence by replicating a second allele. The replication causes the first allele to be converted to the second allele, thereby increasing the chance of the second allele being passed to offspring. For example, in Hammond et al, Anopheles gambiae malaria mosquito vector targeting female reproductive CRISPR-Cas9 gene drive system (nat. Biotechnol.34(1):78-83,2016), a detailed method of how to construct gene drives using the CRISPR system described herein is described. The entire disclosures of which are incorporated herein by reference.

Mixed screening

As described herein, CRISPR hybrid screening is a powerful tool to identify genes in certain biological mechanisms, such as cell proliferation, drug resistance, viral infection. Cells were mass-transduced using a library of vectors encoded by guide rnas (grnas) as described herein, and the distribution of grnas was measured before and after selective challenge application. The pooled CRISPR screen is very effective for mechanisms that affect cell survival and proliferation, and can be extended to measure the activity of individual genes (e.g., by using engineered reporter cell lines). Arrayed CRISPR screens targeting only one gene at a time make it possible to use RNA sequences as reads. In some embodiments, the CRISPR systems described herein can be used in single cell CRISPR screening. A detailed description of pooled CRISPR screens can be found, for example, in "pooled CRISPR screens with single-cell transcriptome reads" by Datlinger et al. A method. 14(3): 297-301, 2017, which is incorporated herein by reference in its entirety.

Saturation mutagenesis (extinguishing)

The CRISPR system described herein can be used for in situ saturation mutagenesis. In some embodiments, the pooled guide RNA library can be used for in situ saturation mutagenesis of a particular gene or regulatory element. Such methods may reveal key minimal features and discrete vulnerabilities of these genes or regulatory elements (e.g., enhancers). These methods are described, for example, in cancer et al, "stripping of BCL11A enhancer by Cas 9-mediated in situ saturation mutagenesis", Nature 527 (7577): 192-7, 2015, which is incorporated herein by reference in its entirety.

RNA related applications

The CRISPR systems described herein can have a variety of applications related to RNA, such as modulating gene expression, degrading RNA molecules, inhibiting RNA expression, screening for RNA or RNA products, determining the function of lincRNA or non-coding RNA, inducing cell dormancy, inducing cell cycle arrest, reducing cell growth and/or cell proliferation, inducing cell anergy, inducing apoptosis, inducing cell necrosis, inducing cell death and/or inducing programmed cell death. A detailed description of these applications can be found, for example, in WO2016/205764 a1, which is incorporated herein by reference in its entirety. In various embodiments, the methods described herein can be performed in vitro, in vivo, or ex vivo.

For example, the CRISPR system described herein can be used in a subject having a disease or disorder to target and induce death in cells in a diseased state (e.g., cancer cells or cells infected with an infectious agent). For example, in some embodiments, the CRISPR systems described herein can be used to target and induce cell death in cancer cells from a subject having wilms 'tumor, ewing's sarcoma, neuroendocrine tumor, glioblastoma, neuroblastoma, melanoma, skin cancer, breast cancer, colon cancer, rectal cancer, prostate cancer, liver cancer, kidney cancer, pancreatic cancer, lung cancer, biliary tract cancer, cervical cancer, endometrial cancer, esophageal cancer, gastric cancer, head and neck cancer, medullary thyroid cancer, ovarian cancer, glioma, lymphoma, leukemia, myeloma, acute lymphocytic leukemia, acute myelogenous leukemia, chronic lymphocytic leukemia, chronic myelogenous leukemia, hodgkin lymphoma, non-hodgkin lymphoma, or bladder cancer.

Modulating gene expression

The CRISPR systems described herein can be used to modulate gene expression. The CRISPR system can target gene expression by controlling RNA processing, along with suitable guide RNAs. Controlling RNA processing may include: RNA processing reactions such as RNA splicing (e.g., alternative splicing), viral replication, tRNA biosynthesis. RNA targeting proteins in combination with suitable guide RNAs can also be used to control RNA activation (i.e., RNAa). RNA activation is a gene regulation phenomenon guided by small RNAs and dependent on argonaute (ago), in which short double-stranded RNA (dsrna) targeted by a promoter induces expression of a target gene at the transcriptional/apparent level. RNAa promotes gene expression, and thus control of gene expression can be achieved by disrupting or reducing RNAa. In some embodiments, such methods include using CRISPR-targeting RNAs as substitutions, such as interfering ribonucleic acids (e.g., sirnas, shrnas, or dsrnas). Methods for modulating gene expression are described, for example, in WO2016205764, which is incorporated herein by reference in its entirety.

Controlling RNA interference

Control of interfering RNA or microrna (miRNA) can help reduce off-target effects by reducing the lifetime of interfering RNA or miRNA in vivo or in vitro. In some embodiments, the target RNA can include interfering RNA, i.e., RNA that is involved in an RNA interference pathway, such as small hairpin RNA (shrna), small interference (siRNA), and the like. In some embodiments, the target RNA comprises, e.g., miRNA or double-stranded RNA (dsrna).

In some embodiments, if the RNA targeting protein and the appropriate guide RNA are selectively expressed (e.g., spatially or temporally under the control of a regulated promoter, such as a tissue-or cell cycle-specific promoter and/or enhancer), this may be used for protection of a cell or system (in vivo or in vitro) from RNA interference (RNAi) in said cell. This can be used in adjacent tissues or cells where RNAi is not required, or in cells or tissues where CRISPR-associated protein and appropriate crRNA expression or non-expression are compared (i.e., where RNAi is controlled or uncontrolled, respectively). The RNA-targeting proteins can be used to control or bind to RNA-containing or RNA-composed molecules, such as ribozymes, ribosomes, or riboswitches. In some embodiments, the guide RNA may recruit RNA-targeting proteins into these molecules, enabling these RNA-targeting proteins to bind to them. The above processes are described, for example, in WO2016205764 and WO 2017070605, both of which are incorporated herein by reference in their entirety.

Modified riboswitches and controlled metabolic regulation

Riboswitches are regulatory fragments of messenger RNAs that regulate gene expression by binding to small molecules. This mechanism enables cells to sense the intracellular concentration of these small molecules. Specific riboswitches often regulate these genes by altering the transcription, translation, or splicing of their neighboring genes. Thus, in some embodiments, riboswitch can be targeted by binding the RNA targeting protein to a suitable guide RNA for the purpose of controlling riboswitch activity. This can be achieved by cleavage or binding to riboswitches. Methods of controlling riboswitches using CRISPR systems are described, for example, in WO2016205764 and WO 2017070605, both of which are incorporated herein by reference in their entirety.

RNA modification

In some embodiments, a CRISPR-associated protein described herein can be fused to a base-editing domain, such as ADAR1, ADAR2, APOBEC, or activation-induced cytidine deaminase (AID), for modifying an RNA sequence (e.g., mRNA). In some embodiments, the CRISPR-associated protein comprises one or more mutations (e.g., in the catalytic domain) such that the CRISPR-associated protein is unable to cleave RNA.

In some embodiments, the CRISPR-associated protein can be used with an RNA-binding fusion polypeptide comprising a base editing domain (such as ADAR1, ADAR2, APOBEC, or AID) fused to an RNA-binding domain, such as MS2 (also known as MS2 coat protein), Qbeta (also known as Qbeta coat protein), or PP7 (also known as PP7 coat protein). The amino acid sequences of the RNA binding domains MS2, Qbeta and PP7 are as follows:

MS2(MS2 coat protein)

MASNFTQFVLVDNGGTGDVTVAPSNFANGVAEWISSNSRSQAYKVTCSVRQSSAQKRKYTIKVEVPKVATQTVGGVELPVAAWRSYLNMELTIPIFATNSDCELIVKAMQGLLKDGNPIPSAIAANSGIY

Qbeta (Qbeta coat protein)

MAKLETVTLGNIGKDGKQTLVLNPRGVNPTNGVASLSQAGAVPALEKRVTVSVSQPSRNRKNYKVQVKIQNPTACTANGSCDPSVTRQAYADVTFSFTQYSTDEERAFVRTELAALLASPLLIDAIDQLNPAY

PP7(PP7 coat protein)

MSKTIVLSVGEATRTLTEIQSTADRQIFEEKVGPLVGRLRLTASLRQNGAKTAYRVNLKLDQADVVDCSTSVCGELPKVRYTQVWSHDVTIVANSTEASRKSLYDLTKSLVVQATSEDLVVNLVPLGR

In some embodiments, the RNA binding domain can bind to a specific sequence (e.g., an aptamer sequence) or secondary structural motif on a crRNA in a system described herein (e.g., when the crRNA is in an effector-crRNA complex), thereby recruiting an RNA binding fusion polypeptide (with a base editing domain) into the effector complex. For example, in some embodiments, the CRISPR system comprises a CRISPR-associated protein, a crRNA having an aptamer sequence (e.g., MS 2-binding loop, QBeta-binding loop, or PP 7-binding loop), an RNA-binding fusion polypeptide having a base-editing domain that specifically binds to the aptamer sequence. In the system, the CRISPR-associated protein forms a complex with the crRNA containing the aptamer sequence. In addition, the RNA binding fusion polypeptide binds to crRNA (via the aptamer sequence) to form a tripartite complex that can modify the target RNA.

Methods of base editing using CRISPR systems are described, for example, in international publication No. WO 2017/219027, which is incorporated herein by reference in its entirety, particularly in the discussion of RNA modifications.

RNA splicing

In some embodiments, an inactivated CRISPR-associated protein described herein (e.g., a CRISPR-associated protein having one or more mutations in the catalytic domain) can be used to target and bind to a particular splice site on an RNA transcript. The inactivated CRISPR-associated protein binds to RNA and sterically inhibits the interaction of spliceosomes with transcripts, thereby enabling alteration of the frequency of production of particular transcript isoforms. In this way, exons can be skipped to treat diseases, and mutated exons can be skipped in the mature protein. Methods of altering splicing using the CRISPR system are described, for example, in international publication No. WO 2017/219027, which is incorporated herein by reference in its entirety, particularly with respect to the discussion of RNA splicing.

Therapeutic applications

The CRISPR systems described herein can be used in a variety of therapeutic applications, which can be based on the ability of one or more of the following CRISPR/Cas13e or Cas13f systems in vitro and in vivo: inducing cell senescence, inducing cell cycle arrest, inhibiting cell growth and/or proliferation, inducing apoptosis, inducing necrosis, and the like.

In some embodiments, the novel CRISPR systems described herein can be used to treat a variety of diseases and disorders, such as genetic diseases (e.g., monogenic diseases), diseases that can be treated by nuclease activity (e.g., for targeting Pcsk9, duchenne muscular dystrophy or DMD, targeting BCL11a), and various cancers, among others.

In some embodiments, the CRISPR systems described herein can be used to edit a target nucleic acid to modify it (e.g., by one or more nucleic acid residue insertions, deletions, or mutations). For example, in some embodiments, the CRISPR systems described herein comprise an exogenous donor template nucleic acid (e.g., a DNA molecule or an RNA molecule) comprising the nucleic acid sequence we desire. When the CRISPR system described herein addresses induced cleavage, the molecular mechanisms of the cell will utilize exogenous donor template nucleic acid to repair and/or address cleavage. Alternatively, the molecular mechanisms of the cell may utilize endogenous templates to repair and/or complete the cleavage event. In some embodiments, the CRISPR systems described herein can be used to alter a target nucleic acid, forming insertions, deletions, and/or point mutations. In some embodiments, the insertion is a scarless insertion (i.e., a desired nucleic acid sequence is inserted into the target nucleic acid such that additional nucleic acid sequences are not accidentally inserted while addressing the cleavage event). The donor template nucleic acid can be a double-stranded or single-stranded nucleic acid molecule (e.g., DNA or RNA). Methods for designing exogenous donor template nucleic acids are described, for example, in International publication No. WO 2016/094874A 1, which is expressly incorporated herein in its entirety.

In one aspect, the CRISPR systems described herein are useful for treating diseases caused by overexpression (e.g., splicing defects or truncations) of RNA, toxic RNA, and/or mutant RNA. For example, expression of toxic RNA may be associated with nuclear content formation, tardive changes in brain, heart or skeletal muscle. In some embodiments, the disease is myotonic dystrophy. In myotonic dystrophy, the major pathogenic role of toxic RNA is to sequester binding proteins and impair the regulation of alternative splicing, see e.g. Osborne et al RNA-guided diseases (hum. mol. gene., 2009 apr.15; 18(8):1471-81) geneticists also pay special attention to myotonic dystrophy (or muscular dystrophy, DM) because it can produce a great number of clinical features. We refer to classical DM, now DM1 (DM1), which is caused by the amplification of CTG repeats in the 3' -untranslated region (UTR) of DMPK (the gene encoding cytoplasmic protein kinase). The CRISPR systems described herein can target over-expressed or toxic RNA, such as the DMPK gene, or can target mis-regulated alternative splicing in any DM1 skeletal muscle, heart, or brain.

The CRISPR system described herein can also target trans mutations that can affect RNA-dependent functions that cause a variety of diseases, such as prader willi syndrome, Spinal Muscular Atrophy (SMA), and congenital keratosis. The list of diseases that can be treated using the CRISPR system described herein is summarized in Cooper et al, RNA and diseases (Cell,136.4(2009):777-793) and WO 2016/205764A 1. The entire disclosures of which are incorporated herein by reference. Those skilled in the art will know how to use the novel CRISPR system to treat the above diseases.

The CRISPR system described herein can also be used to treat various tauopathy (tauopathies), including, for example, primary and secondary tauopathy, such as age-related primary tauopathy (PART)/neurofibrillary tangle (NFT) -dominated senile dementia (in which the NFT is similar to but plaque-free from the NFT in Alzheimer's Disease (AD)), preglis dementia (chronic traumatic encephalopathy), progressive supranuclear palsy. A list of tauopathies is provided and methods of treating these diseases are described, for example, in WO2016205764, which is incorporated herein by reference in its entirety.

The CRISPR system described herein can also be used to target mutations that disrupt the cis-acting splice codon, which indirect codon can lead to splice defects and diseases including, for example, motor neuron degenerative diseases (e.g., spinal muscular atrophy), Duchenne Muscular Dystrophy (DMD), frontotemporal dementia, chromosome 17-associated parkinson's disease (FTDP-17), cystic fibrosis, caused by deletion of the SMN1 gene.

The CRISPR systems described herein can also be used for antiviral activity, particularly RNA viruses. The CRISPR-associated protein can target viral RNA sequences using a suitable guide RNA selected, in this way to target viral RNA.

The CRISPR systems described herein can also be used to treat cancer in a subject (e.g., a human subject). For example, the CRISPR-associated proteins described herein can be programmed to use crRNA to target an RNA molecule that is aberrant (e.g., comprises a point mutation or alternative splicing) and found in a cancer cell to effect induction of cell death (e.g., apoptosis) in the cancer cell.

The CRISPR systems described herein can also be used to treat an autoimmune disease or disorder in a subject (e.g., a human subject). For example, crrnas can be used to target an RNA molecule that is aberrant (e.g., comprises a point mutation or is alternatively spliced), present in a cell that causes an autoimmune disease or disorder, encoded in a CRISPR-associated protein described herein.

Furthermore, the CRISPR systems described herein can also be used to treat infectious diseases in a subject. For example, the CRISPR-associated proteins described herein can be programmed to target an RNA molecule expressed by an infectious agent (e.g., a bacterium, virus, parasite, or protozoan) with crRNA to target and induce cell death of the infectious agent cells. The CRISPR system can also be used to treat a disease in a host cell subject infected with an intracellular infectious agent. The CRISPR-associated protein is programmed to target the RNA molecule encoded by the gene of the infectious agent, in such a way that cells infected with the infectious agent can be targeted and their cell death induced.

In addition, in vitro RNA sensing assays can be used to detect specific RNA substrates. The CRISPR-associated proteins described herein are useful for sensing RNA-based substances in living cells. Some implementations of this application include diagnostic methods of sensing disease-specific RNA.

Therapeutic applications of the CRISPR systems described herein are described in detail, for example, in U.S. patent No.8,795,965, EP3009511, WO2016205764, WO 2017070605, the entire contents of which are incorporated herein by reference.

Cell and progeny thereof

In some embodiments, the methods of the invention can be used to introduce a CRISPR system described herein into a cell such that the cell and/or progeny thereof alter one or more cellular products, such as antibodies, starch, ethanol, or any other product desired to be altered. Such cells and their progeny are included within the scope of the present invention.

In some embodiments, the methods and/or CRISPR systems described herein modify the translation and/or transcription of one or more RNA products of a cell. For example, such modifications may increase transcription/translation/expression of the RNA product. In other embodiments, such modifications can reduce transcription/translation/expression of the RNA product.

In some embodiments, the cell is a prokaryotic cell.

In some embodiments, the cell is a eukaryotic cell, such as a mammalian cell, including a human cell (a human primary cell or an established human cell line). In some embodiments, the cell is a non-human mammalian cell, e.g., from a non-human primate (e.g., monkey), cow/bull/cow, sheep, goat, pig, horse, dog, cat, rodent (e.g., rabbit, small, rat, hamster), and the like. In some embodiments, the cells are from fish (e.g., salmon), birds (e.g., avian birds, including chickens, ducks, geese), reptiles, shellfish (e.g., oysters, clams, lobsters, shrimps), insects, worms, yeast, and the like. In some embodiments, the cell is from a plant, such as a monocot or a dicot. In some embodiments, the plant is a food crop, such as barley, cassava, cotton, peanut, corn, millet, oil palm fruit, potato, beans, rapeseed or canola, rice, rye, sorghum, soybean, sugarcane, sugar beet, sunflower, and wheat. In some embodiments, the plant is a cereal (barley, maize, millet, rice, rye, sorghum, and wheat). In some embodiments, the plant is a tuber (cassava and potato). In some embodiments, the plant is a sugar crop (sugar beet and sugarcane). In some embodiments, the plant is an oil-bearing crop (soybean, peanut, rapeseed or canola, sunflower and oil palm fruit). In some embodiments, the plant is a fiber crop (cotton). In some embodiments, the plant is a tree (e.g., a peach or nectarine tree, an apple or pear tree, a nut tree (e.g., an almond or walnut tree or a pistachio tree), or a citrus tree (e.g., an orange, grapefruit or lemon tree), a grass, a vegetable, a fruit, or an algae.

In a related aspect, provided herein are modified cells, or progeny thereof, the modification being a method of using the CRISPR system of the invention.

In some embodiments, the cell is modified in vitro, in vivo, or ex vivo.

In some embodiments, the cell is a stem cell.

6. Delivery of

According to the present disclosure and the knowledge in the art, the CRISPR system described herein, or any component thereof (Cas protein, derivative, functional fragment or various fusions or adducts thereof, as well as guide RNA/crRNA), nucleic acid molecules thereof, and/or nucleic acid molecules encoding or providing a component thereof, can be delivered by various delivery systems (e.g., by plasmids, viral delivery vectors, etc. vectors) using any suitable means in the art. Such methods include, but are not limited to, electroporation, lipofection, microinjection, transfection, sonication, gene gun, and the like.

In some embodiments, CRISPR-associated proteins and/or any of the RNAs (e.g., guide RNA or crRNA) and/or helper proteins may be delivered using a suitable vector (e.g., a plasmid or viral vector), such as using an adeno-associated virus (i.e., AAV), lentivirus, adenovirus, retroviral vector, other viral vectors, or a combination of the above. The protein and one or more crrnas may be packaged into one or more vectors, such as a plasmid or viral vector. As applied to bacteria, nucleic acids encoding any component of the CRISPR system can be delivered into bacteria using a bacteriophage. Examples of such phages include, but are not limited to, T4 phage, Mu, lambda phage, T5 phage, T7 phage, T3 phage, Φ 29, M13, MS2, Q β, Φ X174.

In some embodiments, the delivery of the vector (e.g., plasmid or viral vector) to the target tissue is by, for example, intramuscular injection, intravenous administration, transdermal administration, intranasal administration, oral administration, or mucosal administration. Such delivery may be single or multiple dose. It will be understood by those skilled in the art that the actual dosage delivered as described herein may vary greatly due to a variety of factors, such as the choice of carrier, the target cell, organism, tissue, general condition of the subject to be treated, the degree of conversion/modification sought, the route of administration, the mode of administration, the type of conversion/modification sought, and the like.

In some embodiments, the delivery is accomplished by adenovirus, which can be used in a single dose comprising at least 1 x 105 adenovirus particles (also referred to as particle units, pu). In some embodiments, a desired dose is at least about 1 × 106 particles, at least about 1 × 107 particles, at least about 1 × 108 particles, at least about 1 × 109 particles of adenovirus. Such delivery methods and dosages are described, for example, in WO2016205764a1 and U.S. patent No.8,454,972B 2, both of which are incorporated herein by reference in their entirety.

In some embodiments, the delivery is accomplished by plasmid. The dose may be a sufficient number of plasmids to elicit a response. In some cases, a suitable amount of plasmid DNA in the plasmid composition can be from about 0.1 to about 2 mg. The plasmid will typically include (i) a promoter; (ii) a sequence encoding a nucleic acid-targeting CRISPR-associated protein and/or an accessory protein, which proteins are operably linked to a promoter (e.g., the same promoter or different promoters), respectively; (iii) a selectable marker; (iv) an origin of replication; (v) (iii) a transcription terminator downstream of and operably linked to (ii). The plasmid may also encode the RNA component of one CRISPR complex, but one or more of the components may also be encoded on a different vector. The frequency of administration is within the capability of a medical or veterinary practitioner (e.g., physician, veterinarian) or professional in the art.

In another embodiment, the delivery is accomplished by liposomes or lipofectins, and the like, which can be prepared by methods known to those skilled in the art. Such methods are described, for example, in WO2016205764, and U.S. patent nos. 5,593,972, 5,589,466, 5,580,859, each of which is incorporated herein by reference in its entirety.

In some embodiments, the delivery is accomplished by nanoparticles or exosomes. For example, exosomes have proven to be particularly useful in delivering RNA.

In addition, there are methods for introducing one or more components of the novel CRISPR system into cells via Cell Penetrating Peptides (CPPs). In some embodiments, one cell penetrating peptide is linked to the CRISPR-associated protein. In some embodiments, the CRISPR-associated protein and/or guide RNA is coupled to one or more CPPs, thereby efficiently transporting these CPPs into a cell (e.g., a plant protoplast). In some embodiments, the CRISPR-associated protein and/or guide RNA is encoded by a nucleic acid molecule that is coupled to one or more CPPs to effect cellular delivery.

CPPs are short peptides of less than 35 amino acids derived from proteins or chimeric sequences that are capable of transporting biomolecules across cell membranes in a receptor-independent manner. The CPP may be a cationic peptide, a peptide having a hydrophobic sequence, an amphiphilic peptide, a peptide having proline-rich and antimicrobial sequences, a chimeric peptide, or a bipartite peptide. Examples of CPPs include: tat (a nuclear transcription activator protein required for the replication of type 1 HIV virus), a penetration peptide, a Kaposi Fibroblast Growth Factor (FGF) signal peptide sequence, an integrin beta 3 signal peptide sequence, a poly-arginine peptide Arg sequence, a guanine-rich molecular transporter, and a sweet arrow peptide. CPP and methods of use thereof are described inEt al prediction of cell penetrating peptides (Methods mol. biol., 2015; 1324:39-58), Ramakrishna et al Gene disruption by cell penetrating, peptide mediated delivery of Cas9 protein and guide RNA (Genome Res.,2014 June; 24(6):1020-7), WO2016205764A1, each of which is incorporated herein in its entirety.

Various delivery methods using the CRISPR systems described herein are described, for example, in U.S. patent No.8,795,965, EP300951, WO2016205764, WO 2017070605, each of which is incorporated herein by reference in its entirety.

7. Reagent kit

Another aspect of the invention provides a kit comprising two or more of any of the components of the subject CRISPR/Cas system described herein, e.g., Cas13e and Cas13f proteins, derivatives, functional fragments or fusions or adducts thereof, guide RNA/crRNA, complexes thereof, vectors comprising the above, or hosts comprising the above.

In some embodiments, the kit further comprises instructions how to use the components therein, and/or instructions how to use them in combination with other components obtained elsewhere.

In some embodiments, the kit further comprises one or more nucleotides, some of which are responsive to the insertion of an RNA coding sequence into a vector to which the coding sequence is operably linked to one or more control elements.

In some embodiments, the kit further comprises one or more buffers that can be used to solubilize any of the components and/or provide suitable reaction conditions for one or more of the components. The buffer may include one or more of the following buffers: PBS, HEPES, Tris, MOPS, Na2CO3、NaHCO3NaB or any combination thereof. In some embodiments, the reaction conditions include a suitable pH, such as an alkaline pH. In some embodiments, the pH is between 7 and 10.

In some embodiments, any one or more of the components of the kit can be stored in a suitable container.

Examples

Example 1: identification of novel Cas13e and Cas13f systems

We extended the class 2 CRISPR-Cas system by mining genomic and metagenomic data using a computational procedure. The genomic and metagenomic sequences used were downloaded from databases such as NCBI (Benson et al, 2013; Pruitt et al, 2012), NCBI Whole Genome Sequencing (WGS), DOE JGI Integrated microbial genomes (Markowitz et al, 2012). Proteins greater than 5kb long on all contigs were predicted (Prodigal anonymity model, Hyatt et al, 2010) and de-duplicated (i.e. identical protein sequences were removed) to construct a complete protein database. Proteins greater than 600 residues are considered Large Proteins (LP). Since the Cas13 protein identified so far is mostly larger than 900 residues in size, only the large protein is considered in the subsequent steps in order to reduce the computational complexity.

All default parameters of Pidle-CR (see Edgar "PILER-CR: Rapid, accurate identification of CRISPR repeats" BMC Bioinformatics 8:18,2007) were then used to identify CRISPR arrays. The sequence-encoding ORFs of the non-redundant large proteins within ± 10kb from the CRISPR array are grouped into several CRISPR-adjacent large protein-encoding clusters, and the encoded LP is defined as Cas-LP.

First, Cas-LP pairs were aligned using BLASP to obtain BLASTP alignment results for Evalaue < 1E-10. Then, based on the BLASTP results, Cas-LP was further classified into clusters using MCL, creating a Cas protein family.

Cas-LP was then aligned to all LPs using BLASTP to obtain BLASP alignment results for Evalaue < 1E-10. The Cas-LPs family was further expanded according to the BLASTP alignment results, and the resulting Cas-LP family was used for further analysis, retaining the Cas-LP protein family amplified by no more than one fold.

We used Cas proteins in protein family databases Pfam (Finn et al 2014), NR database, NCBI to annotate candidate Cas proteins functionally and filter out proteins with known functions. Multiple sequence alignments were then performed for each candidate Cas effector protein with MAFFT (Katoh and Standley, 2013). JPred and HHpred were then used to analyze conserved regions in these proteins to identify candidate Cas proteins/families with two conserved RXXXXH motifs.

The above analysis identified seven new Cas13 effector proteins belonging to two new Cas13 families that differ from all previously identified class 2 CRISPR-Cas systems, including Cas13e.1(SEQ ID NO: 1), Cas13e.2(SEQ ID NO: 2) in the new Cas13e family, and Cas13f.1(SEQ ID NO: 3), Cas13f.2(SEQ ID NO: 4), Cas13f.3(SEQ ID NO: 5), Cas13f.4(SEQ ID NO: 6), Cas13f.5(SEQ ID NO: 7) in the new Cas13f family.

MAQVSKQTSKKRELSIDEYQGARKWCFTIAFNKALVNRDKNDGLFVESLLRHEKYSKHDWYDEDTRALIKCSTQAANAKAEALRNYFSHYRHSPGCLTFTAEDELRTIMERAYERAIFECRRRETEVIIEFPSLFEGDRITTAGVVFFVSFFVERRVLDRLYGAVSGLKKNEGQYKLTRKALSMYCLKDSRFTKAWDKRVLLFRDILAQLGRIPAEAYEYYHGEQGDKKRANDNEGTNPKRHKDKFIEFALHYLEAQHSEICFGRRHIVREEAGAGDEHKKHRTKGKVVVDFSKKDEDQSYYISKNNVIVRIDKNAGPRSYRMGLNELKYLVLLSLQGKGDDAIAKLYRYRQHVENILDVVKVTDKDNHVFLPRFVLEQHGIGRKAFKQRIDGRVKHVRGVWEKKKAATNEMTLHEKARDILQYVNENCTRSFNPGEYNRLLVCLVGKDVENFQAGLKRLQLAERIDGRVYSIFAQTSTINEMHQVVCDQILNRLCRIGDQKLYDYVGLGKKDEIDYKQKVAWFKEHISIRRGFLRKKFWYDSKKGFAKLVEEHLESGGGQRDVGLDKKYYHIDAIGRFEGANPALYETLARDRLCLMMAQYFLGSVRKELGNKIVWSNDSIELPVEGSVGNEKSIVFSVSDYGKLYVLDDAEFLGRICEYFMPHEKGKIRYHTVYEKGFRAYNDLQKKCVEAVLAFEEKVVKAKKMSEKEGAHYIDFREILAQTMCKEAEKTAVNKVRRAFFHHHLKFVIDEFGLFSDVMKKYGIEKEWKFPVK*(SEQ ID NO:1)

MKVENIKEKSKKAMYLINHYEGPKKWCFAIVLNRACDNYEDNPHLFSKSLLEFEKTSRKDWFDEETRELVEQADTEIQPNPNLKPNTTANRKLKDIRNYFSHHYHKNECLYFKNDDPIRCIMEAAYEKSKIYIKGKQIEQSDIPLPELFESSGWITPAGILLLASFFVERGILHRLMGNIGGFKDNRGEYGLTHDIFTTYCLKGSYSIRAQDHDAVMFRDILGYLSRVPTESFQRIKQPQIRKEGQLSERKTDKFITFALNYLEDYGLKDLEGCKACFARSKIVREQENVESINDKEYKPHENKKKVEIHFDQSKEDRFYINRNNVILKIQKKDGHSNIVRMGVYELKYLVLMSLVGKAKEAVEKIDNYIQDLRDQLPYIEGKNKEEIKEYVRFFPRFIRSHLGLLQINDEEKIKARLDYVKTKWLDKKEKSKELELHKKGRDILRYINERCDRELNRNVYNRILELLVSKDLTGFYRELEELKRTRRIDKNIVQNLSGQKTINALHEKVCDLVLKEIESLDTENLRKYLGLIPKEEKEVTFKEKVDRILKQPVIYKGFLRYQFFKDDKKSFVLLVEDALKEKGGGCDVPLGKEYYKIVSLDKYDKENKTLCETLAMDRLCLMMARQYYLSLNAKLAQEAQQIEWKKEDSIELIIFTLKNPDQSKQSFSIRFSVRDFTKLYVTDDPEFLARLCSYFFPVEKEIEYHKLYSEGINKYTNLQKEGIEAILELEKKLIERNRIQSAKNYLSFNEIMNKSGYNKDEQDDLKKVRNSLLHYKLIFEKEHLKKFYEVMRGEGIEKKWSLIV*(SEQ ID NO:2)

MNGIELKKEEAAFYFNQAELNLKAIEDNIFDKERRKTLLNNPQILAKMENFIFNFRDVTKNAKGEIDCLLLKLRELRNFYSHYVHKRDVRELSKGEKPILEKYYQFAIESTGSENVKLEIIENDAWLADAGVLFFLCIFLKKSQANKLISGISGFKRNDDTGQPRRNLFTYFSIREGYKVVPEMQKHFLLFSLVNHLSNQDDYIEKAHQPYDIGEGLFFHRIASTFLNISGILRNMKFYTYQSKRLVEQRGELKREKDIFAWEEPFQGNSYFEINGHKGVIGEDELKELCYAFLIGNQDANKVEGRITQFLEKFRNANSVQQVKDDEMLKPEYFPANYFAESGVGRIKDRVLNRLNKAIKSNKAKKGEIIAYDKMREVMAFINNSLPVDEKLKPKDYKRYLGMVRFWDREKDNIKREFETKEWSKYLPSNFWTAKNLERVYGLAREKNAELFNKLKADVEKMDERELEKYQKINDAKDLANLRRLASDFGVKWEEKDWDEYSGQIKKQITDSQKLTIMKQRITAGLKKKHGIENLNLRITIDINKSRKAVLNRIAIPRGFVKRHILGWQESEKVSKKIREAECEILLSKEYEELSKQFFQSKDYDKMTRINGLYEKNKLIALMAVYLMGQLRILFKEHTKLDDITKTTVDFKISDKVTVKIPFSNYPSLVYTMSSKYVDNIGNYGFSNKDKDKPILGKIDVIEKQRMEFIKEVLGFEKYLFDDKIIDKSKFADTATHISFAEIVEELVEKGWDKDRLTKLKDARNKALHGEILTGTSFDETKSLINELKK*(SEQ ID NO:3)

MSPDFIKLEKQEAAFYFNQTELNLKAIESNILDKQQRMILLNNPRILAKVGNFIFNFRDVTKNAKGEIDCLLFKLEELRNFYSHYVHTDNVKELSNGEKPLLERYYQIAIQATRSEDVKFELFETRNENKITDAGVLFFLCMFLKKSQANKLISGISGFKRNDPTGQPRRNLFTYFSAREGYKALPDMQKHFLLFTLVNYLSNQDEYISELKQYGEIGQGAFFNRIASTFLNISGISGNTKFYSYQSKRIKEQRGELNSEKDSFEWIEPFQGNSYFEINGHKGVIGEDELKELCYALLVAKQDINAVEGKIMQFLKKFRNTGNLQQVKDDEMLEIEYFPASYFNESKKEDIKKEILGRLDKKIRSCSAKAEKAYDKMKEVMEFINNSLPAEEKLKRKDYRRYLKMVRFWSREKGNIEREFRTKEWSKYFSSDFWRKNNLEDVYKLATQKNAELFKNLKAAAEKMGETEFEKYQQINDVKDLASLRRLTQDFGLKWEEKDWEEYSEQIKKQITDRQKLTIMKQRVTAELKKKHGIENLNLRITIDSNKSRKAVLNRIAIPRGFVKKHILGWQGSEKISKNIREAECKILLSKKYEELSRQFFEAGNFDKLTQINGLYEKNKLTAFMSVYLMGRLNIQLNKHTELGNLKKTEVDFKISDKVTEKIPFSQYPSLVYAMSRKYVDNVDKYKFSHQDKKKPFLGKIDSIEKERIEFIKEVLDFEEYLFKNKVIDKSKFSDTATHISFKEICDEMGKKGCNRNKLTELNNARNAALHGEIPSETSFREAKPLINELKK*(SEQ ID NO:4)

MSPDFIKLEKQEAAFYFNQTELNLKAIESNIFDKQQRVILLNNPQILAKVGDFIFNFRDVTKNAKGEIDCLLLKLRELRNFYSHYVYTDDVKILSNGERPLLEKYYQFAIEATGSENVKLEIIESNNRLTEAGVLFFLCMFLKKSQANKLISGISGFKRNDPTGQPRRNLFTYFSVREGYKVVPDMQKHFLLFVLVNHLSGQDDYIEKAQKPYDIGEGLFFHRIASTFLNISGILRNMEFYIYQSKRLKEQQGELKREKDIFPWIEPFQGNSYFEINGNKGIIGEDELKELCYALLVAGKDVRAVEGKITQFLEKFKNADNAQQVEKDEMLDRNNFPANYFAESNIGSIKEKILNRLGKTDDSYNKTGTKIKPYDMMKEVMEFINNSLPADEKLKRKDYRRYLKMVRIWDSEKDNIKREFESKEWSKYFSSDFWMAKNLERVYGLAREKNAELFNKLKAVVEKMDEREFEKYRLINSAEDLASLRRLAKDFGLKWEEKDWQEYSGQIKKQISDRQKLTIMKQRITAELKKKHGIENLNLRITIDSNKSRKAVLNRIAVPRGFVKEHILGWQGSEKVSKKTREAKCKILLSKEYEELSKQFFQTRNYDKMTQVNGLYEKNKLLAFMVVYLMERLNILLNKPTELNELEKAEVDFKISDKVMAKIPFSQYPSLVYAMSSKYADSVGSYKFENDEKNKPFLGKIDTIEKQRMEFIKEVLGFEEYLFEKKIIDKSEFADTATHISFDEICNELIKKGWDKDKLTKLKDARNAALHGEIPAETSFREAKPLINGLKK*(SEQ ID NO:5)

MNIIKLKKEEAAFYFNQTILNLSGLDEIIEKQIPHIISNKENAKKVIDKIFNNRLLLKSVENYIYNFKDVAKNARTEIEAILLKLVELRNFYSHYVHNDTVKILSNGEKPILEKYYQIAIEATGSKNVKLVIIENNNCLTDSGVLFLLCMFLKKSQANKLISSVSGFKRNDKEGQPRRNLFTYYSVREGYKVVPDMQKHFLLFALVNHLSEQDDHIEKQQQSDELGKGLFFHRIASTFLNESGIFNKMQFYTYQSNRLKEKRGELKHEKDTFTWIEPFQGNSYFTLNGHKGVISEDQLKELCYTILIEKQNVDSLEGKIIQFLKKFQNVSSKQQVDEDELLKREYFPANYFGRAGTGTLKEKILNRLDKRMDPTSKVTDKAYDKMIEVMEFINMCLPSDEKLRQKDYRRYLKMVRFWNKEKHNIKREFDSKKWTRFLPTELWNKRNLEEAYQLARKENKKKLEDMRNQVRSLKENDLEKYQQINYVNDLENLRLLSQELGVKWQEKDWVEYSGQIKKQISDNQKLTIMKQRITAELKKMHGIENLNLRISIDTNKSRQTVMNRIALPKGFVKNHIQQNSSEKISKRIREDYCKIELSGKYEELSRQFFDKKNFDKMTLINGLCEKNKLIAFMVIYLLERLGFELKEKTKLGELKQTRMTYKISDKVKEDIPLSYYPKLVYAMNRKYVDNIDSYAFAAYESKKAILDKVDIIEKQRMEFIKQVLCFEEYIFENRIIEKSKFNDEETHISFTQIHDELIKKGRDTEKLSKLKHARNKALHGEIPDGTSFEKAKLLINEIKK*(SEQ ID NO:6)

MNAIELKKEEAAFYFNQARLNISGLDEIIEKQLPHIGSNRENAKKTVDMILDNPEVLKKMENYVFNSRDIAKNARGELEALLLKLVELRNFYSHYVHKDDVKTLSYGEKPLLDKYYEIAIEATGSKDVRLEIIDDKNKLTDAGVLFLLCMFLKKSEANKLISSIRGFKRNDKEGQPRRNLFTYYSVREGYKVVPDMQKHFLLFTLVNHLSNQDEYISNLRPNQEIGQGGFFHRIASKFLSDSGILHSMKFYTYRSKRLTEQRGELKPKKDHFTWIEPFQGNSYFSVQGQKGVIGEEQLKELCYVLLVAREDFRAVEGKVTQFLKKFQNANNVQQVEKDEVLEKEYFPANYFENRDVGRVKDKILNRLKKITESYKAKGREVKAYDKMKEVMEFINNCLPTDENLKLKDYRRYLKMVRFWGREKENIKREFDSKKWERFLPRELWQKRNLEDAYQLAKEKNTELFNKLKTTVERMNELEFEKYQQINDAKDLANLRQLARDFGVKWEEKDWQEYSGQIKKQITDRQKLTIMKQRITAALKKKQGIENLNLRITTDTNKSRKVVLNRIALPKGFVRKHILKTDIKISKQIRQSQCPIILSNNYMKLAKEFFEERNFDKMTQINGLFEKNVLIAFMIVYLMEQLNLRLGKNTELSNLKKTEVNFTITDKVTEKVQISQYPSLVFAINREYVDGISGYKLPPKKPKEPPYTFFEKIDAIEKERMEFIKQVLGFEEHLFEKNVIDKTRFTDTATHISFNEICDELIKKGWDENKIIKLKDARNAALHGKIPEDTSFDEAKVLINELKK*(SEQ ID NO:7)

In the corresponding pre-crRNA sequence, the DNA encoding the Direct Repeat (DR) sequence is SEQ ID NO: 8-14.

GCTGGAGCAGCCCCCGATTTGTGGGGTGATTACAGC(SEQ ID NO:8)

GCTGAAGAAGCCTCCGATTTGAGAGGTGATTACAGC(SEQ ID NO:9)

GCTGTGATAGACCTCGATTTGTGGGGTAGTAACAGC(SEQ ID NO:10)

GCTGTGATAGACCTCGATTTGTGGGGTAGTAACAGC(SEQ ID NO:11)

GCTGTGATAGACCTCGATTTGTGGGGTAGTAACAGC(SEQ ID NO:12)

GCTGTGATGGGCCTCAATTTGTGGGGAAGTAACAGC(SEQ ID NO:13)

GCTGTGATAGGCCTCGATTTGTGGGGTAGTAACAGC(SEQ ID NO:14)

The natural (wild-type) DNA coding sequences for the cas13e.1, cas13e.2, cas13f.1, cas13f.2, cas13f.3, cas13f.4, cas13f.5 proteins, SEQ ID NOs: 15-21.

ATGGCGCAAGTGTCAAAGCAGACTTCGAAAAAGAGAGAGTTGTCTATCGATGAATATCAAGGTGCTCGGAAATGGTGTTTTACGATTGCCTTCAACAAGGCTCTTGTGAATCGAGATAAGAACGACGGGCTTTTTGTCGAGTCGCTGTTACGCCATGAAAAGTATTCAAAGCACGACTGGTACGATGAGGATACACGCGCTTTGATCAAGTGTAGCACACAAGCGGCCAATGCGAAGGCCGAGGCGTTAAGAAACTATTTCTCCCACTATCGACATTCGCCCGGGTGTCTGACATTTACAGCAGAAGATGAGTTGCGGACAATCATGGAAAGGGCGTATGAGCGGGCGATCTTTGAATGCAGGAGACGCGAAACTGAAGTGATCATCGAGTTTCCCAGCCTGTTCGAAGGCGACCGGATCACTACGGCGGGGGTTGTGTTTTTCGTTTCGTTCTTTGTTGAACGGCGGGTGCTGGATCGTTTGTACGGTGCGGTAAGTGGGCTTAAGAAAAACGAAGGACAGTACAAGCTGACTCGGAAGGCGCTTTCGATGTATTGCCTGAAAGACAGTCGTTTCACGAAGGCGTGGGACAAACGCGTGCTGCTTTTCAGGGATATACTCGCGCAGCTTGGACGCATCCCTGCGGAGGCGTATGAATACTACCACGGAGAGCAGGGCGACAAGAAAAGAGCAAACGACAATGAGGGGACGAATCCGAAACGCCATAAAGACAAGTTCATCGAGTTTGCACTGCATTATCTGGAGGCGCAACACAGTGAGATATGCTTCGGGCGGCGACACATTGTCAGGGAGGAGGCCGGGGCAGGCGACGAACACAAAAAGCACAGGACCAAAGGCAAGGTAGTTGTCGACTTTTCAAAAAAAGACGAAGATCAGTCATACTATATCAGTAAGAACAATGTTATCGTCAGGATTGATAAGAATGCCGGGCCTCGGAGTTATCGCATGGGGCTTAACGAATTGAAATACCTTGTATTGCTTAGCCTTCAGGGAAAGGGCGACGATGCGATTGCAAAACTGTACAGGTATCGGCAGCATGTGGAGAACATTCTGGATGTAGTGAAGGTCACAGATAAGGATAATCACGTCTTCCTGCCGCGATTTGTGCTGGAGCAACATGGGATTGGCAGGAAAGCTTTTAAGCAAAGAATAGACGGCAGAGTAAAGCATGTTCGAGGGGTGTGGGAAAAGAAGAAGGCGGCGACCAACGAGATGACACTTCACGAGAAGGCGCGGGACATTCTTCAATACGTAAATGAAAATTGCACGAGGTCTTTCAATCCCGGCGAGTACAACCGGCTGCTGGTGTGTCTGGTTGGCAAGGATGTTGAGAATTTTCAGGCGGGACTGAAACGCCTGCAACTGGCCGAGCGAATCGACGGGCGGGTATATTCAATTTTTGCGCAGACCTCCACAATAAACGAGATGCATCAGGTGGTGTGTGATCAGATTCTCAACAGACTTTGCCGAATCGGCGATCAGAAGCTCTACGATTATGTGGGGCTTGGGAAGAAGGATGAAATAGATTACAAGCAGAAGGTTGCATGGTTCAAGGAGCATATTTCTATCCGCAGGGGTTTCTTGCGCAAGAAGTTCTGGTATGACAGCAAGAAGGGATTCGCGAAGCTTGTGGAAGAGCATTTGGAAAGCGGCGGCGGACAGAGGGACGTTGGGCTGGATAAAAAGTATTATCATATTGATGCGATTGGGCGATTCGAGGGTGCTAATCCAGCCTTGTATGAAACGCTGGCGCGAGACCGTTTGTGTCTGATGATGGCGCAATACTTCCTGGGGAGTGTACGCAAGGAATTGGGTAATAAAATTGTGTGGTCGAATGATAGCATCGAGTTGCCCGTGGAGGGCTCAGTGGGTAACGAAAAAAGCATCGTCTTCTCAGTGAGTGATTACGGCAAGTTATATGTGTTGGATGACGCTGAGTTTCTTGGGCGGATATGTGAGTACTTTATGCCGCACGAAAAAGGGAAGATACGGTATCATACAGTTTACGAAAAAGGGTTTAGGGCATATAATGATCTGCAGAAGAAATGTGTCGAGGCGGTGCTGGCGTTTGAAGAGAAGGTTGTCAAAGCCAAAAAGATGAGCGAGAAGGAAGGGGCGCATTATATTGATTTTCGTGAGATACTGGCACAAACAATGTGTAAAGAGGCGGAGAAGACCGCCGTGAATAAGGTGCGTAGAGCGTTTTTCCATCATCATTTAAAGTTTGTGATAGATGAATTTGGGTTGTTTAGTGATGTTATGAAGAAATATGGAATTGAAAAGGAGTGGAAGTTTCCTGTTAAATGA(SEQ ID NO:15)

ATGAAGGTTGAAAATATTAAAGAAAAAAGCAAAAAAGCAATGTATTTAATCAACCATTATGAGGGACCCAAAAAATGGTGTTTTGCAATAGTTCTGAATAGGGCATGTGATAATTACGAGGACAATCCACACTTGTTTTCCAAATCACTTTTGGAATTTGAAAAAACAAGTCGAAAAGATTGGTTTGACGAAGAAACACGAGAGCTTGTTGAGCAAGCAGATACAGAAATACAGCCAAATCCTAACCTGAAACCTAATACAACAGCTAACCGAAAACTCAAAGATATAAGAAACTATTTTTCGCATCATTATCACAAGAACGAATGCCTGTATTTTAAGAACGATGATCCCATACGCTGCATTATGGAAGCGGCGTATGAAAAATCTAAAATTTATATCAAAGGAAAGCAGATTGAGCAAAGCGATATACCATTGCCCGAATTGTTTGAAAGCAGCGGTTGGATTACACCGGCGGGGATTTTGTTACTGGCATCCTTTTTTGTTGAACGAGGGATTCTACATCGCTTGATGGGAAATATCGGAGGATTTAAAGATAATCGAGGCGAATACGGTCTTACACACGATATTTTTACCACCTATTGTCTTAAGGGTAGTTATTCAATTCGGGCGCAGGATCATGATGCGGTAATGTTCAGAGATATTCTCGGCTATCTGTCACGAGTTCCCACTGAGTCATTTCAGCGTATCAAGCAACCTCAAATACGAAAAGAAGGCCAATTAAGTGAAAGAAAGACGGACAAATTTATAACATTTGCACTAAATTATCTTGAGGATTATGGGCTGAAAGATTTGGAAGGCTGCAAAGCCTGTTTTGCCAGAAGTAAAATTGTAAGGGAACAAGAAAATGTTGAAAGCATAAATGATAAGGAATACAAACCTCACGAGAACAAAAAGAAAGTTGAAATTCACTTCGATCAGAGCAAAGAAGACCGATTTTATATTAATCGCAATAACGTTATTTTGAAGATTCAGAAGAAAGATGGACATTCCAACATAGTTAGGATGGGAGTATATGAACTTAAATATCTCGTTCTTATGAGTTTAGTGGGAAAAGCAAAAGAAGCAGTTGAAAAAATTGACAACTATATCCAGGATTTGCGAGACCAGTTGCCTTACATAGAGGGGAAAAATAAGGAAGAGATTAAAGAATACGTCAGGTTCTTTCCACGATTTATACGTTCTCACCTCGGTTTACTACAGATTAACGATGAAGAAAAGATAAAAGCTCGATTAGATTATGTTAAGACCAAGTGGTTAGATAAAAAGGAAAAATCGAAAGAGCTTGAACTTCATAAAAAAGGACGGGACATCCTCAGGTATATCAACGAGCGATGTGATAGAGAGCTTAACAGGAATGTATATAACCGTATTTTAGAGCTCCTGGTCAGCAAAGACCTCACTGGTTTTTATCGTGAGCTTGAAGAACTAAAAAGAACAAGGCGGATAGATAAAAATATTGTCCAGAATCTTTCTGGGCAAAAAACCATTAATGCACTGCATGAAAAGGTCTGTGATCTGGTGCTGAAGGAAATCGAAAGTCTCGATACAGAAAATCTCAGGAAATATCTTGGATTGATACCCAAAGAAGAAAAAGAGGTCACTTTCAAAGAAAAGGTCGATAGGATTTTGAAACAGCCAGTTATTTACAAAGGGTTTCTGAGATACCAATTCTTCAAAGATGACAAAAAGAGTTTTGTCTTACTTGTTGAAGACGCATTGAAGGAAAAAGGAGGAGGTTGTGATGTTCCTCTTGGGAAAGAGTATTATAAAATCGTGTCACTTGATAAGTATGATAAAGAAAATAAAACCCTGTGTGAAACTCTGGCGATGGATAGGCTTTGCCTTATGATGGCAAGACAATATTATCTCAGTCTGAATGCAAAACTTGCACAGGAAGCTCAGCAAATCGAATGGAAGAAAGAAGATAGTATAGAATTGATTATTTTCACCTTAAAAAATCCCGATCAATCAAAGCAGAGTTTTTCTATACGGTTTTCGGTCAGAGATTTTACGAAGTTGTATGTAACGGATGATCCTGAATTTCTGGCCCGGCTTTGTTCCTACTTTTTCCCAGTTGAAAAAGAGATTGAATATCACAAGCTCTATTCAGAAGGGATAAATAAATACACAAACCTGCAAAAAGAGGGAATCGAAGCAATACTCGAGCTTGAAAAAAAGCTTATTGAACGAAATCGGATTCAATCTGCAAAAAATTATCTCTCATTTAATGAGATAATGAATAAAAGCGGTTATAATAAAGATGAGCAGGATGATCTAAAGAAGGTGCGAAATTCTCTTTTGCATTATAAGCTTATCTTTGAGAAAGAACATCTCAAGAAGTTCTATGAGGTTATGAGAGGAGAAGGGATAGAGAAAAAGTGGTCTTTAATAGTATGA(SEQ I D NO:16)

ATGAATGGCATTGAATTAAAAAAAGAAGAAGCAGCATTTTATTTTAATCAGGCAGAGCTTAATTTAAAAGCCATAGAAGACAATATTTTTGATAAAGAAAGACGAAAGACTCTGCTTAATAATCCACAGATACTTGCCAAAATGGAAAATTTCATTTTCAATTTCAGAGATGTAACAAAAAATGCAAAAGGGGAAATTGACTGCTTGCTGTTGAAACTAAGAGAGCTGAGAAACTTTTACTCGCATTATGTCCACAAACGAGATGTAAGAGAATTAAGCAAGGGCGAGAAACCTATACTTGAAAAGTATTACCAATTTGCGATTGAATCAACCGGAAGTGAAAATGTTAAACTTGAGATAATAGAAAACGACGCGTGGCTTGCAGATGCCGGTGTGTTGTTTTTCTTATGTATTTTTTTGAAGAAATCTCAGGCAAATAAGCTTATAAGCGGTATCAGCGGTTTTAAAAGAAACGATGATACCGGTCAGCCGAGAAGGAATTTATTTACCTATTTCAGTATAAGGGAGGGATACAAGGTTGTTCCGGAAATGCAGAAACATTTCCTTTTGTTTTCTCTTGTTAATCATCTCTCTAATCAAGATGATTATATTGAAAAAGCGCATCAGCCATACGATATAGGCGAGGGTTTATTTTTTCATCGAATAGCTTCTACATTTCTTAATATAAGTGGGATTTTAAGAAATATGAAATTCTATACCTATCAGAGTAAAAGGTTAGTAGAGCAGCGGGGAGAACTCAAACGAGAAAAGGATATTTTTGCGTGGGAAGAACCGTTTCAAGGAAATAGTTATTTTGAAATAAATGGTCATAAAGGAGTAATCGGTGAAGATGAATTGAAGGAACTATGTTATGCATTTCTGATTGGCAATCAAGATGCTAATAAAGTGGAAGGCAGGATTACACAATTTCTAGAAAAGTTTAGAAATGCGAACAGTGTGCAACAAGTTAAAGATGATGAAATGCTAAAACCAGAGTATTTTCCTGCAAATTATTTTGCTGAATCAGGCGTCGGAAGAATAAAGGATAGAGTGCTTAATCGTTTGAATAAAGCGATTAAAAGCAATAAGGCCAAGAAAGGAGAGATTATAGCATACGATAAGATGAGAGAGGTTATGGCGTTCATAAATAATTCTCTGCCGGTAGATGAAAAATTGAAACCAAAAGATTACAAACGATATCTGGGAATGGTTCGTTTCTGGGACAGGGAAAAAGATAACATAAAGCGGGAGTTCGAGACAAAAGAATGGTCTAAATATCTTCCATCTAATTTCTGGACGGCAAAAAACCTTGAAAGGGTCTATGGTCTGGCAAGAGAGAAAAACGCAGAATTATTCAATAAACTAAAAGCGGATGTAGAAAAAATGGACGAACGGGAACTTGAGAAGTATCAGAAGATAAATGATGCAAAGGATTTGGCAAATTTACGCCGGCTTGCAAGCGACTTTGGTGTGAAGTGGGAAGAAAAAGACTGGGATGAGTATTCAGGACAGATAAAAAAACAAATTACAGACAGCCAGAAACTAACAATAATGAAGCAGCGGATAACCGCAGGACTAAAGAAAAAGCACGGCATAGAAAATCTTAACCTGAGAATAACTATCGACATCAATAAAAGCAGAAAGGCAGTTTTGAACAGAATTGCGATTCCGAGGGGTTTTGTAAAAAGGCATATTTTAGGATGGCAAGAGTCTGAGAAGGTATCGAAAAAGATAAGAGAGGCAGAATGCGAAATTCTGCTGTCGAAAGAATACGAAGAACTATCGAAACAATTTTTCCAAAGCAAAGATTATGACAAAATGACACGGATAAATGGCCTTTATGAAAAAAACAAACTTATAGCCCTGATGGCAGTTTATCTAATGGGGCAATTGAGAATCCTGTTTAAAGAACACACAAAACTTGACGATATTACGAAAACAACTGTGGATTTCAAAATATCTGATAAGGTGACGGTAAAAATCCCCTTTTCAAATTATCCTTCGCTCGTTTATACAATGTCCAGTAAGTATGTTGATAATATAGGGAATTATGGATTTTCCAACAAAGATAAAGACAAGCCGATTTTAGGTAAGATTGATGTAATAGAAAAACAGCGAATGGAATTTATAAAAGAGGTTCTTGGTTTTGAAAAATATCTTTTTGATGATAAAATAATAGATAAAAGCAAATTTGCTGATACAGCGACTCATATAAGTTTTGCAGAAATAGTTGAGGAGCTTGTTGAAAAAGGATGGGACAAAGACAGACTGACAAAACTTAAAGATGCAAGAAATAAAGCCCTGCATGGTGAAATACTGACGGGAACCAGCTTTGATGAAACAAAATCATTGATAAACGAATTAAAAAAATGA(SEQ I D NO:17)

ATGTCCCCAGATTTCATCAAATTAGAAAAACAGGAAGCAGCTTTTTACTTTAATCAGACAGAGCTTAATTTAAAAGCCATAGAAAGCAATATTTTAGACAAACAACAGCGAATGATTCTGCTTAATAATCCACGGATACTTGCCAAAGTAGGAAATTTCATTTTCAATTTCAGAGATGTAACAAAAAATGCAAAAGGAGAAATAGACTGTCTGCTATTTAAACTGGAAGAGCTAAGAAACTTTTACTCGCATTATGTTCATACCGACAATGTAAAGGAATTGAGTAACGGAGAAAAACCCCTACTGGAAAGATATTATCAAATCGCTATTCAGGCAACCAGGAGTGAGGATGTTAAGTTCGAATTGTTTGAAACAAGAAACGAGAATAAGATTACGGATGCCGGTGTATTGTTTTTCTTATGTATGTTTTTAAAAAAATCACAGGCAAACAAGCTTATAAGCGGTATCAGCGGCTTCAAAAGAAATGATCCAACAGGCCAGCCGAGAAGAAACTTATTTACCTATTTCAGTGCAAGAGAAGGATATAAGGCTTTGCCTGATATGCAGAAACATTTTCTTCTTTTTACTCTGGTTAATTATTTGTCGAATCAGGATGAGTATATCAGCGAGCTTAAACAATATGGAGAGATTGGTCAAGGAGCCTTTTTTAATCGAATAGCTTCAACATTTTTGAATATCAGCGGGATTTCAGGAAATACGAAATTCTATTCGTATCAAAGTAAAAGGATAAAAGAGCAGCGAGGCGAACTCAATAGCGAAAAGGACAGCTTTGAATGGATAGAGCCTTTCCAAGGAAACAGCTATTTTGAAATAAATGGGCATAAAGGAGTAATCGGCGAAGACGAATTAAAAGAACTTTGTTATGCATTGTTGGTTGCCAAGCAAGATATTAATGCCGTTGAAGGCAAAATTATGCAATTCCTGAAAAAGTTTAGAAATACTGGCAATTTGCAGCAAGTTAAAGATGATGAAATGCTGGAAATAGAATATTTTCCCGCAAGTTATTTTAATGAATCAAAAAAAGAGGACATAAAGAAAGAGATTCTTGGCCGGCTGGATAAAAAGATTCGCTCCTGCTCTGCAAAGGCAGAAAAAGCCTATGATAAGATGAAAGAGGTGATGGAGTTTATAAATAATTCTCTGCCGGCAGAGGAAAAATTGAAACGCAAAGATTATAGAAGATATCTAAAGATGGTTCGTTTCTGGAGCAGAGAAAAAGGCAATATAGAGCGGGAATTTAGAACAAAGGAATGGTCAAAATATTTTTCATCTGATTTTTGGCGGAAGAACAATCTTGAAGATGTGTACAAACTGGCAACACAAAAAAACGCTGAACTGTTCAAAAATCTAAAAGCGGCAGCAGAGAAAATGGGTGAAACGGAATTTGAAAAGTATCAGCAGATAAACGATGTAAAGGATTTGGCAAGTTTAAGGCGGCTTACGCAAGATTTTGGTTTGAAGTGGGAAGAAAAGGACTGGGAGGAGTATTCCGAGCAGATAAAAAAACAAATTACGGACAGGCAGAAACTGACAATAATGAAACAAAGGGTTACGGCTGAACTAAAGAAAAAGCACGGCATAGAAAATCTTAATCTGAGAATAACCATCGACAGCAATAAAAGCAGAAAGGCGGTTTTGAACAGAATAGCAATTCCAAGAGGATTTGTAAAAAAACATATTTTAGGCTGGCAGGGATCTGAGAAGATATCGAAAAATATAAGGGAAGCAGAATGCAAAATTCTGCTATCGAAAAAATATGAAGAGTTATCAAGGCAGTTTTTTGAAGCCGGTAATTTCGATAAGCTGACGCAGATAAATGGTCTTTATGAAAAGAATAAACTTACAGCTTTTATGTCAGTATATTTGATGGGTCGGTTGAATATTCAGCTTAATAAGCACACAGAACTTGGAAATCTTAAAAAAACAGAGGTGGATTTTAAGATATCTGATAAGGTGACTGAAAAAATACCGTTTTCTCAGTATCCTTCGCTTGTCTATGCGATGTCTCGCAAATATGTTGACAATGTGGATAAATATAAATTTTCTCATCAAGATAAAAAGAAGCCATTTTTAGGTAAAATTGATTCAATTGAAAAAGAACGTATTGAATTCATAAAAGAGGTTCTCGATTTTGAAGAGTATCTTTTTAAAAATAAGGTAATAGATAAAAGCAAATTTTCCGATACAGCGACTCATATTAGCTTTAAGGAAATATGTGATGAAATGGGTAAAAAAGGATGTAACCGAAACAAACTAACCGAACTTAACAACGCAAGGAACGCAGCCCTGCATGGTGAAATACCGTCGGAGACCTCTTTTCGTGAAGCAAAACCGTTGATAAATGAATTGAAAAAATGA(SEQ ID NO:18)

ATGTCCCCAGATTTCATCAAATTAGAAAAACAAGAAGCAGCTTTTTACTTTAATCAGACAGAGCTTAATTTAAAAGCCATAGAAAGCAATATTTTCGACAAACAACAGCGAGTGATTCTGCTTAATAATCCACAGATACTTGCCAAAGTAGGAGATTTTATTTTCAATTTCAGAGATGTAACAAAAAACGCAAAAGGAGAAATAGACTGTTTGCTATTGAAACTAAGAGAGCTGAGAAACTTTTACTCACACTATGTCTATACCGATGACGTGAAGATATTGAGTAACGGCGAAAGACCTCTGCTGGAAAAATATTATCAATTTGCGATTGAAGCAACCGGAAGTGAAAATGTTAAACTTGAAATAATAGAAAGCAACAACCGACTTACGGAAGCGGGCGTGCTGTTTTTCTTGTGTATGTTTTTGAAAAAGTCTCAGGCAAATAAGCTTATAAGCGGTATCAGCGGTTTTAAAAGAAATGACCCGACAGGTCAGCCGAGAAGGAATTTATTTACCTACTTCAGTGTAAGGGAGGGATACAAGGTTGTGCCGGATATGCAGAAACATTTTCTTTTGTTTGTTCTTGTCAATCATCTCTCTGGTCAGGATGATTATATTGAAAAGGCGCAAAAGCCATACGATATAGGCGAGGGTTTATTTTTTCATCGAATAGCTTCTACATTTCTTAATATCAGTGGGATTTTAAGAAATATGGAATTCTATATTTACCAGAGCAAAAGACTAAAGGAGCAGCAAGGAGAGCTCAAACGTGAAAAGGATATTTTTCCATGGATAGAGCCTTTCCAGGGAAATAGTTATTTTGAAATAAATGGTAATAAAGGAATAATCGGCGAAGATGAATTGAAAGAGCTTTGTTATGCGTTGCTGGTTGCAGGAAAAGATGTCAGAGCCGTCGAAGGTAAAATAACACAATTTTTGGAAAAGTTTAAAAATGCGGACAATGCTCAGCAAGTTGAAAAAGATGAAATGCTGGACAGAAACAATTTTCCCGCCAATTATTTCGCCGAATCGAACATCGGCAGCATAAAGGAAAAAATACTTAATCGTTTGGGAAAAACTGATGATAGTTATAATAAGACGGGGACAAAGATTAAACCATACGACATGATGAAAGAGGTAATGGAGTTTATAAATAATTCTCTTCCGGCAGATGAAAAATTGAAACGCAAAGATTACAGAAGATATCTAAAGATGGTTCGTATCTGGGACAGTGAGAAAGATAATATAAAGCGGGAGTTTGAAAGCAAAGAATGGTCAAAATATTTTTCATCTGATTTCTGGATGGCAAAAAATCTTGAAAGGGTCTATGGGTTGGCAAGAGAGAAAAACGCCGAATTATTCAATAAGCTAAAAGCGGTTGTGGAGAAAATGGACGAGCGGGAATTTGAGAAGTATCGGCTGATAAATAGCGCAGAGGATTTGGCAAGTTTAAGACGGCTTGCGAAAGATTTTGGCCTGAAGTGGGAAGAAAAGGACTGGCAAGAGTATTCTGGGCAGATAAAAAAACAAATTTCTGACAGGCAGAAACTGACAATAATGAAACAAAGGATTACGGCTGAACTAAAGAAAAAGCACGGCATAGAAAATCTCAATCTTAGAATAACCATCGACAGCAATAAAAGCAGAAAGGCAGTTTTGAACAGAATCGCAGTTCCAAGAGGTTTTGTGAAAGAGCATATTTTAGGATGGCAGGGGTCTGAGAAGGTATCGAAAAAGACAAGAGAAGCAAAGTGCAAAATTCTGCTCTCGAAAGAATATGAAGAATTATCAAAGCAATTTTTCCAAACCAGAAATTACGACAAGATGACGCAGGTAAACGGTCTTTACGAAAAGAATAAACTCTTAGCATTTATGGTCGTTTATCTTATGGAGCGGTTGAATATCCTGCTTAATAAGCCCACAGAACTTAATGAACTTGAAAAAGCAGAGGTGGATTTCAAGATATCTGATAAGGTGATGGCCAAAATCCCGTTTTCACAGTATCCTTCGCTTGTGTACGCGATGTCCAGCAAATATGCTGATAGTGTAGGCAGTTATAAATTTGAGAATGATGAAAAAAACAAGCCGTTTTTAGGCAAGATCGATACAATAGAAAAACAACGAATGGAGTTTATAAAAGAAGTCCTTGGTTTTGAAGAGTATCTTTTTGAAAAGAAGATAATAGATAAAAGCGAATTTGCCGACACAGCGACTCATATAAGTTTTGATGAAATATGTAATGAGCTTATTAAAAAAGGATGGGATAAAGACAAACTAACCAAACTTAAAGATGCCAGGAACGCGGCCCTGCATGGCGAAATACCGGCGGAGACCTCTTTTCGTGAAGCAAAACCGTTGATAAATGGATTGAAAAAATGA(SEQ ID NO:19)

ATGAACATCATTAAATTAAAAAAAGAAGAAGCTGCGTTTTATTTTAATCAGACGATCCTCAATCTTTCAGGGCTTGATGAAATTATTGAAAAACAAATTCCGCACATAATCAGCAACAAGGAAAATGCAAAGAAAGTGATTGATAAGATTTTCAATAACCGCTTATTATTAAAAAGTGTGGAGAATTATATCTACAACTTTAAAGATGTGGCTAAAAACGCAAGAACTGAAATTGAGGCTATATTGTTGAAATTAGTAGAGCTACGTAATTTTTACTCACATTACGTTCATAATGATACCGTCAAGATACTAAGTAACGGTGAAAAACCTATACTGGAAAAATATTATCAAATTGCTATAGAAGCAACCGGAAGTAAAAATGTTAAACTTGTAATCATAGAAAACAACAACTGTCTCACGGATTCTGGCGTGCTGTTTTTGCTGTGTATGTTCTTAAAAAAATCACAGGCAAACAAGCTTATAAGTTCCGTTAGTGGTTTTAAAAGGAATGATAAAGAAGGACAACCGAGAAGAAATCTATTCACTTATTATAGTGTGAGGGAGGGATATAAGGTTGTGCCTGATATGCAGAAGCATTTCCTTCTATTCGCTCTGGTCAATCATCTATCTGAGCAGGATGATCATATTGAGAAGCAGCAGCAGTCAGACGAGCTCGGTAAGGGTTTGTTTTTCCATCGTATAGCTTCGACTTTTTTAAACGAGAGCGGCATCTTCAATAAAATGCAATTTTATACATATCAGAGCAACAGGCTAAAAGAGAAAAGAGGAGAACTCAAACACGAAAAGGATACCTTTACATGGATAGAGCCTTTTCAAGGCAATAGTTATTTTACGTTAAATGGACATAAGGGAGTGATTAGTGAAGATCAATTGAAGGAGCTTTGTTACACAATTTTAATTGAGAAGCAAAACGTTGATTCCTTGGAAGGTAAAATTATACAATTTCTCAAAAAATTTCAGAATGTCAGCAGCAAGCAGCAAGTTGACGAAGATGAATTGCTTAAAAGAGAATATTTCCCTGCAAATTACTTTGGCCGGGCAGGAACAGGGACCCTAAAAGAAAAGATTCTAAACCGGCTTGATAAGAGGATGGATCCTACATCTAAAGTGACGGATAAAGCTTATGACAAAATGATTGAAGTGATGGAATTTATCAATATGTGCCTTCCGTCTGATGAGAAGTTGAGGCAAAAGGATTATAGACGATACTTAAAGATGGTTCGTTTCTGGAATAAGGAAAAGCATAACATTAAGCGCGAGTTTGACAGTAAAAAATGGACGAGGTTTTTGCCGACGGAATTGTGGAATAAAAGAAATCTAGAAGAAGCCTATCAATTAGCACGGAAAGAGAACAAAAAGAAACTTGAAGATATGAGAAATCAAGTACGAAGCCTTAAAGAAAATGACCTTGAAAAATATCAGCAGATTAATTACGTTAATGACCTGGAGAATTTAAGGCTTCTGTCACAGGAGTTAGGTGTGAAATGGCAGGAAAAGGACTGGGTTGAATATTCCGGGCAGATAAAGAAGCAGATATCAGACAATCAGAAACTTACAATCATGAAACAAAGGATTACCGCTGAACTAAAGAAAATGCACGGCATCGAGAATCTTAATCTTAGAATAAGCATTGACACGAATAAAAGCAGGCAGACGGTTATGAACAGGATAGCTTTGCCCAAAGGTTTTGTGAAGAATCATATCCAGCAAAATTCGTCTGAGAAAATATCGAAAAGAATAAGAGAGGATTATTGTAAAATTGAGCTATCGGGAAAATATGAAGAACTTTCAAGGCAATTTTTTGATAAAAAGAATTTCGATAAGATGACACTGATAAACGGCCTTTGTGAAAAGAACAAACTTATCGCATTTATGGTTATCTATCTTTTGGAGCGGCTTGGATTTGAATTAAAGGAGAAAACAAAATTAGGCGAGCTTAAACAAACAAGGATGACATATAAAATATCCGATAAGGTAAAAGAAGATATCCCGCTTTCCTATTACCCCAAGCTTGTGTATGCAATGAACCGAAAATATGTTGACAATATCGATAGTTATGCATTTGCGGCTTACGAATCCAAAAAAGCTATTTTGGATAAAGTGGATATCATAGAAAAGCAACGTATGGAATTTATCAAACAAGTTCTCTGTTTTGAGGAATATATTTTCGAAAATAGGATTATCGAAAAAAGCAAATTTAATGACGAGGAGACTCATATAAGTTTTACACAAATACATGATGAGCTTATTAAAAAAGGACGGGACACAGAAAAACTCTCTAAACTCAAACATGCAAGGAATAAAGCCTTGCACGGCGAGATTCCTGATGGGACTTCTTTTGAAAAAGCAAAGCTATTGATAAATGAAATCAAAAAATGA(SEQ ID NO:20)

ATGAATGCTATCGAACTAAAAAAAGAGGAAGCAGCATTTTATTTTAATCAGGCAAGACTCAACATTTCAGGACTTGATGAAATTATTGAAAAGCAGTTACCACATATAGGTAGTAACAGGGAGAATGCGAAAAAAACTGTTGATATGATTTTGGATAATCCCGAAGTCTTGAAGAAGATGGAAAATTATGTCTTTAACTCACGAGATATAGCAAAGAACGCAAGAGGTGAACTTGAAGCATTGTTGTTGAAATTAGTAGAACTGCGTAATTTTTATTCACATTATGTTCATAAAGATGATGTTAAGACATTGAGTTACGGAGAAAAACCTTTACTGGATAAATATTATGAAATTGCGATTGAAGCGACCGGAAGTAAAGATGTCAGACTTGAGATAATAGATGATAAAAATAAGCTTACAGATGCCGGTGTGCTTTTTTTATTGTGTATGTTTTTGAAAAAATCAGAGGCAAACAAACTTATCAGTTCAATCAGGGGCTTTAAAAGAAACGATAAAGAAGGCCAGCCGAGAAGAAATCTATTCACTTACTACAGTGTCAGAGAGGGATATAAGGTTGTGCCTGATATGCAGAAACATTTTCTTTTATTCACACTGGTTAACCATTTGTCAAATCAGGATGAATACATCAGTAATCTTAGGCCGAATCAAGAAATCGGCCAAGGGGGATTTTTCCATAGAATAGCATCAAAATTTTTGAGCGATAGCGGGATTTTACATAGTATGAAATTCTACACCTACCGGAGTAAAAGACTAACAGAACAACGGGGGGAGCTTAAGCCGAAAAAAGATCATTTTACATGGATAGAGCCTTTTCAGGGAAACAGTTATTTTTCAGTGCAGGGCCAAAAAGGAGTAATTGGTGAAGAGCAATTAAAGGAGCTTTGTTATGTATTGCTGGTTGCCAGAGAAGATTTTAGGGCCGTTGAGGGCAAAGTTACACAATTTCTGAAAAAGTTTCAGAATGCTAATAACGTACAGCAAGTTGAAAAAGATGAAGTGCTGGAAAAAGAATATTTTCCTGCAAATTATTTTGAAAATCGAGACGTAGGCAGAGTAAAGGATAAGATACTTAATCGTTTGAAAAAAATCACTGAAAGCTATAAAGCTAAAGGGAGGGAGGTTAAAGCCTATGACAAGATGAAAGAGGTAATGGAGTTTATAAATAATTGCCTGCCAACAGATGAAAATTTGAAACTCAAAGATTACAGAAGATATCTGAAAATGGTTCGTTTCTGGGGCAGGGAAAAGGAAAATATAAAGCGGGAATTTGACAGTAAAAAATGGGAGAGGTTTTTGCCAAGAGAACTCTGGCAGAAAAGAAACCTCGAAGATGCGTATCAACTGGCAAAAGAGAAAAACACCGAGTTATTCAATAAATTGAAAACAACTGTTGAGAGAATGAACGAACTGGAATTCGAAAAGTATCAGCAGATAAACGACGCAAAAGATTTGGCAAATTTAAGGCAACTGGCGCGGGACTTCGGCGTGAAGTGGGAAGAAAAGGACTGGCAAGAGTATTCGGGGCAGATAAAAAAACAAATTACAGACAGGCAAAAACTTACAATAATGAAACAAAGGATTACTGCTGCATTGAAGAAAAAGCAAGGCATAGAAAATCTTAATCTTAGGATAACAACCGACACCAATAAAAGCAGAAAGGTGGTATTGAACAGAATAGCGCTACCTAAAGGTTTTGTAAGGAAGCATATCTTAAAAACAGATATAAAGATATCAAAGCAAATAAGGCAATCACAATGTCCTATTATACTGTCAAACAATTATATGAAGCTGGCAAAGGAATTCTTTGAGGAGAGAAATTTTGATAAGATGACGCAGATAAACGGGCTATTTGAGAAAAATGTACTTATAGCGTTTATGATAGTTTATCTGATGGAACAACTGAATCTTCGACTTGGTAAGAATACGGAACTTAGCAATCTTAAAAAAACGGAGGTTAATTTTACGATAACCGACAAGGTAACGGAAAAAGTCCAGATTTCGCAGTATCCATCGCTTGTTTTCGCCATAAACAGAGAATATGTTGATGGAATCAGCGGTTATAAGTTACCGCCCAAAAAACCGAAAGAGCCTCCGTATACTTTCTTCGAGAAAATAGACGCAATAGAAAAAGAACGAATGGAATTCATAAAACAGGTCCTCGGTTTCGAAGAACATCTTTTTGAGAAGAATGTAATAGACAAAACTCGCTTTACTGATACTGCGACTCATATAAGTTTTAATGAAATATGTGATGAGCTTATAAAAAAAGGATGGGACGAAAACAAAATAATAAAACTTAAAGATGCGAGGAATGCAGCATTGCATGGTAAGATACCGGAGGATACGTCTTTTGATGAAGCGAAAGTACTGATAAATGAATTAAAAAAATGA(SEQ ID NO:21)

We performed human codon optimization of seven Cas13e and Cas13f proteins (i.e., cas13e.1, cas13e.2, cas13f.1, cas13f.2, cas13f.3, cas13f.4, cas13f.5) for further functional experiments. These codon-optimized coding sequences are SEQ ID NOs: 22-28.

ATGGCCCAGGTGAGCAAGCAGACCTCCAAGAAGAGGGAGCTGAGCATCGACGAGTACCAGGGCGCCCGGAAGTGGTGCTTCACCATTGCCTTCAACAAGGCCCTGGTGAACCGGGACAAGAACGACGGCCTGTTCGTGGAAAGCCTGCTGAGACACGAGAAGTACAGCAAGCACGACTGGTACGACGAAGATACCCGGGCCCTGATCAAGTGCAGCACCCAGGCCGCCAACGCCAAGGCTGAAGCCCTGCGGAACTACTTCAGTCACTACCGGCATAGCCCTGGCTGCCTGACCTTCACCGCCGAGGACGAACTGCGGACCATCATGGAGAGAGCCTATGAGCGGGCCATCTTCGAGTGCAGAAGAAGAGAGACAGAGGTGATCATCGAGTTTCCCAGCCTGTTCGAGGGCGACCGGATCACCACCGCCGGCGTGGTGTTTTTCGTGAGCTTTTTCGTGGAAAGAAGAGTGCTGGATCGGCTGTATGGAGCCGTGTCCGGCCTGAAGAAGAATGAGGGACAGTACAAGCTGACCCGGAAGGCCCTGAGCATGTACTGCCTGAAGGACAGCAGATTCACCAAGGCCTGGGATAAGCGGGTGCTGCTGTTCAGAGACATCCTGGCCCAGCTGGGAAGAATCCCCGCCGAGGCCTACGAGTACTACCACGGCGAGCAGGGTGATAAGAAGAGAGCTAACGACAATGAGGGCACAAATCCCAAGCGGCACAAGGACAAGTTCATCGAATTTGCACTGCACTACCTGGAAGCCCAGCACAGCGAGATCTGCTTCGGCAGACGCCACATCGTGCGGGAAGAGGCCGGCGCCGGCGATGAGCACAAGAAGCACCGGACCAAGGGAAAGGTGGTGGTGGACTTCAGCAAGAAGGACGAGGACCAGAGCTACTATATCTCCAAGAACAACGTGATCGTGCGGATCGACAAGAACGCCGGCCCTAGAAGCTACCGGATGGGCCTGAACGAGCTGAAGTACCTCGTGCTGCTGAGCCTGCAGGGGAAGGGCGACGATGCCATCGCCAAGCTGTACAGATACAGACAGCACGTGGAGAACATCCTGGATGTGGTGAAGGTGACCGATAAGGATAACCACGTGTTCCTGCCCCGCTTCGTGCTGGAGCAGCACGGCATCGGCAGAAAGGCCTTCAAGCAGCGGATCGATGGACGGGTGAAGCACGTGCGGGGCGTGTGGGAGAAGAAGAAGGCCGCCACCAATGAAATGACCCTGCACGAGAAGGCCAGAGACATCCTGCAGTACGTGAACGAAAACTGCACCCGGTCCTTCAACCCTGGCGAATACAACAGACTGCTGGTGTGCCTGGTGGGCAAGGACGTGGAGAACTTTCAGGCCGGCCTGAAGCGGCTGCAGCTGGCCGAAAGGATCGATGGCCGGGTGTACTCCATCTTCGCCCAGACCAGCACCATCAATGAGATGCACCAGGTGGTGTGCGACCAGATCCTGAACCGGCTGTGCAGAATCGGCGACCAGAAGCTGTACGATTACGTGGGACTGGGCAAGAAGGACGAAATCGACTACAAGCAGAAGGTGGCCTGGTTCAAGGAGCACATCAGCATCCGGAGAGGATTCCTGAGAAAGAAGTTCTGGTACGATAGCAAGAAGGGATTCGCAAAGCTGGTGGAGGAACACCTGGAGTCCGGCGGCGGCCAGCGCGACGTGGGCCTGGACAAGAAGTACTACCACATCGACGCCATCGGCAGATTCGAGGGCGCCAACCCCGCCCTGTACGAGACCCTGGCCAGAGATCGGCTGTGCCTCATGATGGCCCAGTACTTCCTGGGCAGCGTGAGAAAGGAACTGGGCAACAAGATTGTGTGGAGCAACGACAGCATCGAACTGCCTGTGGAAGGCTCTGTGGGAAATGAGAAGAGCATCGTGTTCTCCGTGTCTGACTACGGCAAGCTGTACGTGCTGGACGATGCCGAATTCCTGGGCCGGATCTGCGAATACTTCATGCCCCACGAAAAGGGCAAGATCCGGTACCACACAGTGTACGAAAAGGGCTTTAGAGCATACAACGACCTGCAGAAGAAGTGCGTGGAGGCCGTGCTGGCTTTCGAAGAGAAGGTGGTGAAGGCCAAGAAGATGAGCGAGAAGGAAGGCGCCCACTACATCGACTTCCGGGAGATCCTGGCCCAGACCATGTGCAAGGAGGCCGAGAAGACCGCAGTGAACAAGGTGAGACGCGCCTTCTTCCACCACCACCTGAAGTTCGTGATTGACGAGTTCGGCCTGTTCAGCGACGTGATGAAGAAGTACGGCATCGAGAAGGAATGGAAGTTCCCTGTCAAGTAA(SEQ ID NO:22)

ATGAAGGTGGAGAACATCAAGGAAAAGTCCAAGAAGGCTATGTATCTGATCAACCACTATGAAGGCCCTAAGAAGTGGTGCTTCGCCATCGTGCTGAATAGGGCCTGCGACAACTATGAGGATAACCCCCACCTGTTCAGCAAGAGCCTGCTGGAATTTGAAAAGACCAGCAGAAAGGACTGGTTCGACGAGGAGACCAGGGAACTGGTGGAGCAGGCCGACACCGAGATCCAGCCCAACCCCAACCTGAAGCCTAACACCACCGCCAACAGAAAGCTGAAGGACATCCGGAACTACTTCAGCCACCACTACCACAAGAATGAGTGCCTGTACTTCAAGAACGACGACCCTATCCGGTGCATCATGGAGGCAGCCTACGAGAAGTCCAAGATCTACATCAAGGGCAAGCAGATTGAGCAGTCCGACATCCCCCTCCCTGAGCTGTTTGAGTCTAGCGGCTGGATCACCCCAGCCGGCATCCTGCTGCTGGCCAGCTTCTTTGTGGAGAGAGGCATTCTGCACAGACTGATGGGCAACATCGGCGGCTTCAAGGACAACCGGGGCGAATACGGACTGACCCACGATATCTTCACCACCTACTGCCTGAAGGGCAGCTACTCCATCAGAGCCCAGGACCACGACGCCGTGATGTTCAGAGACATCCTGGGCTACCTGAGCAGAGTGCCGACCGAGAGCTTTCAGCGCATCAAGCAGCCACAGATCAGAAAGGAGGGGCAGCTGAGCGAGCGGAAGACAGACAAGTTTATCACCTTCGCCCTGAACTACCTGGAAGATTATGGACTGAAGGATCTGGAAGGCTGCAAGGCCTGCTTCGCCCGGAGCAAGATCGTGAGAGAGCAGGAGAACGTGGAAAGCATCAATGACAAGGAGTACAAGCCTCACGAAAACAAGAAGAAGGTGGAAATCCACTTCGATCAGTCTAAGGAAGACCGGTTCTACATCAACCGGAACAACGTGATCCTGAAGATCCAGAAGAAGGACGGCCACAGCAACATCGTGAGAATGGGCGTGTACGAGCTGAAGTATCTGGTGCTGATGTCCCTGGTGGGCAAGGCCAAGGAAGCCGTGGAGAAGATCGACAACTACATCCAGGATCTGAGAGACCAGCTGCCCTACATCGAGGGCAAGAACAAGGAAGAAATCAAGGAGTACGTGAGATTCTTCCCCAGATTCATCAGATCCCACCTGGGCCTGCTGCAGATTAACGATGAGGAGAAGATCAAGGCCCGGCTGGACTATGTGAAGACAAAGTGGCTGGACAAGAAGGAGAAGTCCAAGGAGCTGGAGCTGCACAAGAAGGGCCGGGATATCCTGCGGTACATCAACGAGCGGTGCGACCGGGAGCTGAACCGGAACGTGTACAACCGGATCCTGGAGCTGCTGGTGAGCAAGGACCTGACCGGCTTCTACCGGGAGCTGGAGGAGCTGAAGCGGACCAGACGGATCGATAAGAACATTGTGCAGAACCTGTCCGGCCAGAAGACCATCAACGCCCTGCACGAAAAGGTGTGCGATCTCGTGCTGAAGGAGATCGAGAGCCTGGACACCGAGAACCTGCGGAAGTACCTGGGCCTGATCCCCAAGGAGGAGAAGGAAGTGACCTTTAAGGAGAAGGTGGACAGGATCCTGAAGCAGCCGGTGATCTACAAGGGCTTCCTGCGGTACCAGTTCTTCAAGGACGACAAGAAGAGCTTCGTGCTGCTGGTGGAAGACGCCCTGAAGGAGAAGGGAGGCGGCTGCGACGTGCCCCTGGGCAAGGAGTACTACAAGATCGTGTCCCTGGACAAGTATGACAAGGAAAATAAGACCCTGTGCGAGACCCTGGCAATGGATAGACTGTGCCTGATGATGGCCCGGCAGTATTACCTGAGCCTGAACGCCAAGCTGGCCCAGGAGGCCCAGCAGATCGAATGGAAGAAGGAGGATAGCATTGAGCTGATCATCTTCACACTGAAGAATCCTGACCAGTCCAAGCAGAGCTTCTCCATCCGGTTCAGCGTGCGGGACTTCACCAAGCTGTACGTGACCGACGACCCCGAATTCCTGGCCCGGCTGTGCAGCTACTTCTTCCCCGTGGAGAAGGAGATCGAATACCACAAGCTGTACTCTGAAGGCATTAACAAGTACACCAACCTGCAGAAGGAGGGGATCGAAGCCATCCTGGAGCTGGAGAAGAAGCTGATCGAAAGAAACCGGATCCAGTCCGCCAAGAACTACCTGAGCTTTAACGAAATCATGAACAAGAGCGGCTACAACAAGGATGAGCAGGATGACCTGAAGAAGGTGAGGAACTCCCTGCTGCACTACAAGCTGATCTTCGAAAAGGAGCACCTGAAGAAGTTCTATGAAGTGATGCGGGGCGAGGGAATCGAGAAGAAGTGGTCCCTGATCGTGTAA(SEQ ID NO:23)

ATGAATGGCATCGAGCTGAAGAAGGAAGAAGCCGCCTTCTACTTCAATCAGGCCGAGCTGAACCTGAAGGCCATTGAGGACAACATCTTCGACAAGGAGAGACGGAAGACACTGCTGAACAACCCCCAGATCCTGGCCAAGATGGAGAACTTTATCTTCAATTTCCGGGACGTGACCAAGAACGCCAAGGGCGAAATCGACTGCCTGCTGCTGAAGCTGAGAGAGCTGCGGAACTTTTACAGCCACTACGTGCACAAGCGGGACGTCAGAGAACTGAGCAAGGGCGAGAAGCCGATCCTGGAGAAGTACTACCAGTTCGCCATCGAATCCACCGGCTCTGAGAACGTGAAGCTCGAAATCATCGAAAACGACGCCTGGCTGGCCGACGCCGGCGTGCTGTTCTTCCTGTGCATCTTCCTGAAGAAGAGCCAGGCAAACAAGCTGATCAGCGGCATCAGCGGCTTCAAGAGAAACGACGACACCGGCCAGCCTCGGAGAAACCTGTTCACCTACTTCTCCATCCGGGAGGGCTACAAGGTGGTGCCCGAAATGCAGAAGCACTTCCTGCTGTTCTCCCTGGTGAACCACCTGAGCAACCAGGACGATTATATCGAAAAGGCCCACCAGCCCTACGACATCGGCGAGGGCCTCTTCTTCCACCGGATTGCCAGCACCTTCCTGAACATCTCCGGAATCCTGAGAAACATGAAGTTCTACACCTATCAGAGCAAGAGACTGGTGGAGCAGAGAGGCGAGCTGAAGCGGGAAAAGGACATCTTCGCCTGGGAAGAACCGTTTCAGGGCAATTCCTACTTTGAGATCAACGGCCACAAGGGCGTGATTGGCGAAGACGAGCTGAAGGAGCTGTGCTACGCCTTCCTGATCGGCAACCAGGACGCCAACAAGGTGGAGGGCCGGATCACCCAGTTCCTGGAGAAGTTCAGAAACGCCAACAGCGTGCAGCAGGTGAAGGACGACGAGATGCTGAAGCCTGAATATTTCCCCGCCAACTACTTTGCCGAGAGCGGCGTGGGCCGGATCAAGGACCGGGTGCTGAACAGACTGAACAAGGCCATCAAGAGCAACAAGGCCAAGAAGGGCGAGATCATCGCCTATGACAAGATGAGAGAAGTGATGGCTTTCATCAATAACTCTCTGCCCGTGGACGAGAAGCTGAAGCCCAAGGATTACAAGAGATACCTGGGCATGGTGAGATTCTGGGATAGAGAAAAGGACAATATCAAGCGCGAGTTCGAAACGAAGGAGTGGAGCAAGTATCTGCCCTCCAACTTCTGGACCGCCAAGAACCTGGAGAGAGTGTACGGACTGGCCCGGGAAAAGAACGCAGAGCTGTTTAACAAGCTGAAGGCCGACGTGGAGAAGATGGACGAAAGAGAGCTGGAAAAGTATCAGAAGATCAACGACGCCAAGGATCTGGCCAACCTGCGGCGGCTGGCCAGCGACTTCGGAGTGAAGTGGGAGGAGAAGGATTGGGACGAGTACTCCGGCCAGATCAAGAAGCAGATCACAGATTCCCAGAAGCTGACCATCATGAAGCAGAGAATCACAGCCGGCCTGAAGAAGAAGCACGGCATCGAAAACCTGAACCTGAGGATCACCATCGACATCAACAAGTCCAGAAAGGCCGTGCTGAATCGGATCGCCATCCCCAGAGGATTTGTGAAGCGGCACATCCTGGGCTGGCAGGAATCCGAGAAGGTGAGCAAGAAGATCAGAGAAGCCGAATGCGAGATTCTGCTGAGCAAGGAGTACGAGGAGCTGAGCAAGCAGTTCTTTCAGAGCAAGGACTACGACAAGATGACCCGCATCAACGGCCTGTACGAGAAGAATAAGCTGATCGCCCTGATGGCCGTGTATCTGATGGGGCAGCTGAGAATCCTGTTCAAGGAGCACACCAAGCTGGACGACATCACCAAGACCACCGTGGATTTCAAGATCAGCGACAAGGTGACCGTGAAGATCCCCTTCTCCAACTATCCCTCCCTGGTGTACACCATGAGCAGCAAGTACGTGGACAATATCGGCAACTACGGCTTCAGCAACAAGGACAAGGATAAGCCCATTCTGGGCAAGATCGACGTGATCGAGAAGCAGCGGATGGAGTTTATCAAGGAGGTGCTGGGATTCGAGAAGTACCTGTTTGACGATAAGATCATCGACAAGAGCAAGTTCGCCGACACCGCCACCCACATCAGCTTTGCCGAAATCGTGGAAGAACTGGTGGAGAAGGGCTGGGACAAGGACCGGCTGACGAAGCTGAAGGATGCCCGGAACAAGGCCCTGCACGGCGAGATCCTGACCGGCACCAGCTTCGACGAGACAAAGTCCCTGATCAACGAGCTGAAGAAGTAA(SEQ I D NO:24)

ATGAGCCCTGATTTCATCAAGCTGGAGAAGCAGGAAGCAGCCTTCTACTTTAACCAGACCGAGCTGAACCTGAAGGCCATCGAATCCAATATCCTGGATAAGCAGCAGAGAATGATCCTGCTGAACAACCCCAGAATCCTGGCCAAGGTGGGCAACTTCATCTTCAATTTCCGGGACGTGACCAAGAACGCAAAGGGCGAAATCGACTGCCTGCTGTTCAAGCTGGAGGAACTGCGGAACTTCTACAGCCACTACGTGCACACCGATAACGTGAAGGAACTGTCCAACGGAGAGAAGCCTCTGCTGGAGCGGTACTACCAGATCGCCATCCAGGCCACAAGAAGCGAGGACGTGAAGTTCGAGCTGTTCGAGACCAGGAACGAGAACAAGATCACCGACGCAGGCGTGCTGTTCTTCCTGTGCATGTTCCTGAAGAAGAGCCAGGCTAATAAGCTGATTTCCGGCATCAGCGGCTTCAAGCGGAACGACCCCACCGGCCAGCCCAGACGGAACCTCTTTACCTACTTCTCTGCCCGGGAGGGCTACAAGGCCCTGCCTGACATGCAGAAGCACTTCCTGCTGTTCACCCTGGTGAACTACCTGAGCAACCAGGACGAGTACATCTCCGAGCTGAAGCAGTACGGAGAGATCGGACAGGGAGCCTTCTTCAACAGAATCGCCAGCACCTTCCTGAACATCAGCGGCATCAGCGGCAACACCAAGTTCTACAGCTACCAGAGCAAGAGAATCAAGGAGCAGCGGGGCGAACTGAACAGCGAAAAGGACAGCTTCGAGTGGATCGAGCCCTTTCAGGGCAACTCTTATTTTGAGATCAACGGCCACAAGGGCGTGATCGGCGAAGACGAGCTGAAGGAGCTGTGCTACGCCCTGCTGGTGGCCAAGCAGGACATCAATGCCGTGGAGGGAAAGATCATGCAGTTCCTGAAGAAGTTCAGGAACACCGGCAACCTGCAGCAGGTGAAGGACGACGAGATGCTGGAAATCGAGTACTTTCCCGCCAGCTACTTCAACGAGAGCAAGAAGGAGGACATCAAGAAGGAGATCCTGGGCAGACTGGACAAGAAGATCCGGTCCTGCAGCGCCAAGGCCGAGAAGGCCTACGACAAGATGAAGGAGGTGATGGAGTTTATCAATAACAGCCTGCCCGCCGAGGAGAAGCTGAAGAGGAAGGACTACCGCAGATACCTGAAGATGGTGAGATTCTGGTCCAGAGAAAAGGGCAACATCGAGAGAGAGTTCAGAACCAAGGAGTGGTCCAAGTACTTCAGCAGCGACTTCTGGAGAAAGAACAATCTGGAGGATGTGTACAAGCTGGCCACCCAGAAGAACGCCGAGCTGTTCAAGAATCTGAAGGCCGCCGCCGAGAAGATGGGCGAAACAGAATTCGAAAAGTACCAGCAGATCAACGATGTGAAGGACCTGGCCAGCCTGAGACGGCTGACCCAGGATTTCGGCCTGAAGTGGGAGGAGAAGGATTGGGAGGAGTACAGCGAACAGATCAAGAAGCAGATCACCGACCGGCAGAAGCTGACAATCATGAAGCAGCGGGTGACCGCCGAGCTGAAGAAGAAGCACGGCATCGAGAATCTGAACCTCAGAATTACCATCGATTCCAACAAGAGCAGAAAGGCCGTGCTGAACAGAATCGCCATTCCCCGGGGCTTCGTGAAGAAGCACATTCTGGGCTGGCAGGGCAGCGAAAAGATCAGCAAGAATATCCGGGAGGCCGAGTGCAAGATCCTGCTGTCCAAGAAGTATGAGGAGCTGTCTCGGCAGTTCTTTGAGGCTGGCAACTTCGACAAGCTGACCCAGATCAACGGCCTGTACGAAAAGAATAAGCTGACCGCCTTCATGTCCGTCTACCTGATGGGCAGACTGAACATCCAGCTGAACAAGCACACGGAGCTGGGAAATCTGAAGAAGACCGAGGTGGACTTCAAGATTTCCGACAAGGTGACAGAAAAGATCCCCTTCTCCCAGTACCCTAGCCTGGTGTACGCTATGAGCCGGAAGTACGTGGACAACGTGGACAAGTACAAGTTCAGCCACCAGGACAAGAAGAAGCCCTTCCTGGGCAAGATCGACAGCATCGAAAAGGAGAGAATCGAATTCATCAAGGAGGTGCTGGACTTCGAAGAGTACCTGTTTAAGAACAAGGTGATCGACAAGAGCAAGTTCAGCGATACCGCCACCCATATCTCTTTCAAGGAAATCTGCGACGAGATGGGCAAGAAGGGCTGCAACCGCAACAAGCTGACCGAGCTGAATAACGCTAGAAACGCCGCACTGCACGGAGAAATCCCCAGCGAGACCAGCTTCCGGGAGGCCAAGCCCCTGATCAACGAACTGAAGAAGTAA(SEQ ID NO:25)

ATGAGCCCTGACTTCATCAAGCTGGAAAAGCAGGAAGCCGCCTTCTACTTTAATCAGACCGAGCTGAACCTGAAGGCCATCGAGAGCAACATCTTCGACAAGCAGCAGCGGGTGATCCTGCTGAATAACCCCCAGATCCTGGCCAAGGTGGGCGACTTCATCTTCAACTTCCGGGACGTGACCAAGAACGCCAAGGGAGAAATCGACTGCCTGCTGCTGAAGCTGCGGGAGCTGAGAAACTTCTACAGCCACTATGTGTACACCGACGACGTGAAGATCCTGAGCAACGGCGAGAGGCCCCTGCTGGAGAAGTACTACCAGTTTGCCATCGAGGCCACCGGATCTGAGAATGTGAAGCTGGAGATCATCGAGAGCAACAACCGGCTGACCGAAGCGGGCGTGCTGTTCTTCCTGTGCATGTTCCTGAAGAAGAGCCAGGCCAACAAGCTGATTTCCGGCATCTCCGGATTCAAGCGCAACGACCCTACCGGACAGCCTCGGCGGAACCTGTTCACCTACTTTAGCGTGCGGGAGGGCTACAAGGTGGTGCCCGACATGCAGAAGCACTTCCTGCTGTTCGTGCTGGTGAACCACCTGTCCGGCCAGGATGACTATATTGAGAAGGCCCAGAAGCCCTACGACATCGGCGAAGGCCTGTTCTTCCACAGAATCGCCAGCACCTTTCTCAACATCAGCGGCATCCTGAGAAACATGGAATTCTACATCTACCAGAGCAAGCGGCTGAAGGAGCAGCAGGGAGAGCTGAAGAGAGAGAAGGACATCTTCCCTTGGATCGAGCCTTTCCAGGGCAACAGCTACTTTGAGATCAACGGAAACAAGGGCATCATCGGCGAGGACGAACTGAAGGAACTGTGCTACGCCCTGCTGGTGGCCGGCAAGGACGTGAGAGCCGTGGAAGGAAAGATCACCCAGTTCCTGGAGAAGTTCAAGAACGCCGATAACGCCCAGCAGGTGGAGAAGGATGAAATGCTGGACCGGAACAACTTCCCTGCCAATTACTTTGCCGAAAGCAACATCGGCAGCATCAAGGAAAAGATCCTGAATAGACTGGGCAAGACCGACGACTCCTACAACAAGACCGGCACCAAGATCAAGCCCTACGACATGATGAAGGAGGTGATGGAGTTCATCAATAATTCTCTGCCCGCCGATGAGAAGCTGAAGCGGAAGGACTACCGGAGATACCTGAAGATGGTCCGGATCTGGGACAGCGAAAAGGACAATATCAAGCGGGAGTTTGAGAGCAAGGAATGGAGCAAGTATTTCAGCAGCGACTTCTGGATGGCCAAGAACCTGGAAAGAGTGTACGGCCTGGCCAGGGAAAAGAACGCCGAGCTGTTTAACAAGCTGAAGGCCGTGGTGGAGAAGATGGACGAGCGGGAGTTCGAAAAGTACCGGCTGATCAACAGCGCCGAAGACCTGGCCAGCCTGCGGAGACTGGCCAAGGACTTCGGCCTGAAGTGGGAGGAGAAGGACTGGCAGGAGTATTCTGGCCAGATCAAGAAGCAGATCTCCGACAGACAGAAGCTGACAATTATGAAGCAGCGGATCACAGCCGAACTGAAGAAGAAGCACGGAATCGAGAACCTGAATCTGCGGATCACCATCGACAGCAACAAGTCCAGAAAGGCCGTGCTGAACCGGATCGCCGTGCCCCGGGGCTTCGTGAAGGAACACATCCTGGGCTGGCAAGGCTCTGAAAAGGTGAGCAAGAAGACCAGAGAAGCCAAGTGCAAGATCCTGCTGAGCAAGGAGTACGAGGAACTGAGCAAGCAGTTCTTTCAGACACGGAATTACGACAAGATGACCCAGGTGAACGGCCTGTACGAGAAGAACAAGCTGCTGGCCTTCATGGTGGTGTACCTGATGGAGAGACTGAACATCCTGCTGAACAAGCCCACAGAGCTGAACGAACTGGAAAAGGCCGAAGTGGACTTCAAGATCTCCGACAAGGTGATGGCCAAGATCCCTTTCTCTCAGTACCCCAGCCTGGTGTATGCAATGAGCTCCAAGTACGCCGACAGCGTGGGCTCTTACAAGTTCGAAAACGACGAGAAGAACAAGCCCTTTCTGGGCAAGATCGACACAATCGAGAAGCAGAGAATGGAGTTCATCAAGGAGGTGCTGGGCTTCGAGGAATACCTGTTCGAGAAGAAGATCATCGATAAGAGCGAATTCGCCGACACCGCCACCCACATCAGCTTCGACGAGATCTGCAACGAGCTGATCAAGAAGGGCTGGGACAAGGACAAGCTGACCAAGCTGAAGGACGCCCGGAACGCCGCCCTGCACGGCGAGATCCCCGCCGAGACCAGCTTCCGGGAGGCCAAGCCCCTGATTAACGGCCTGAAGAAGTAA(SEQ ID NO:26)

ATGAACATCATCAAGCTGAAGAAGGAGGAAGCCGCCTTTTACTTTAACCAGACAATCCTGAATCTGAGCGGCCTGGACGAGATCATCGAGAAGCAGATCCCCCACATCATCTCCAATAAGGAAAACGCCAAGAAGGTGATTGATAAGATCTTCAATAACAGACTGCTGCTGAAGAGCGTGGAAAACTATATCTACAACTTCAAGGACGTGGCCAAGAACGCCCGGACCGAAATCGAAGCCATCCTGCTGAAGCTGGTGGAGCTGAGAAACTTCTACTCCCACTACGTGCACAACGACACCGTGAAGATCCTGTCCAATGGCGAGAAGCCCATCCTGGAAAAGTACTACCAGATCGCCATCGAAGCCACCGGCTCTAAGAACGTGAAGCTGGTCATTATCGAAAACAACAACTGCCTGACCGACTCCGGCGTGCTGTTCCTGCTGTGCATGTTCCTGAAGAAGAGCCAGGCCAACAAGCTGATTAGCAGCGTGAGCGGCTTTAAGCGGAACGACAAGGAAGGCCAGCCCAGAAGGAACCTCTTTACTTACTATAGCGTGAGGGAAGGCTACAAGGTGGTGCCAGACATGCAGAAGCACTTCCTGCTGTTCGCCCTGGTCAACCACCTGTCCGAGCAGGACGACCACATCGAGAAGCAGCAGCAGAGCGACGAGCTGGGCAAGGGCCTGTTCTTCCACAGAATCGCCAGCACATTCCTGAATGAAAGCGGCATCTTCAACAAGATGCAGTTTTACACCTACCAGAGCAATCGGCTGAAGGAGAAGCGGGGCGAGCTGAAGCACGAGAAGGACACCTTCACCTGGATCGAGCCTTTCCAGGGAAACAGCTACTTCACCCTGAACGGGCACAAGGGCGTGATCAGCGAGGATCAGCTGAAGGAACTGTGCTACACAATCCTGATCGAGAAGCAGAACGTGGACAGCCTGGAGGGCAAGATCATTCAGTTCCTGAAGAAGTTTCAGAACGTGTCTAGCAAGCAGCAGGTGGATGAGGACGAGCTGCTGAAGCGGGAATACTTCCCCGCCAACTACTTCGGCCGGGCCGGCACCGGCACCCTGAAGGAGAAGATCCTGAACCGGCTGGACAAGCGGATGGACCCCACCAGCAAGGTGACCGACAAGGCCTATGACAAGATGATCGAGGTGATGGAGTTCATCAACATGTGCCTGCCCAGCGACGAGAAGCTGCGGCAGAAGGATTACCGGAGATATCTGAAGATGGTCAGATTCTGGAACAAGGAGAAGCACAACATCAAGAGAGAATTCGACAGCAAGAAGTGGACCAGATTCCTGCCCACCGAGCTGTGGAATAAGCGGAACCTGGAGGAAGCCTACCAGCTGGCCCGGAAGGAGAACAAGAAGAAGCTGGAGGACATGAGGAATCAGGTGAGGAGCCTGAAGGAGAACGACCTGGAGAAGTACCAGCAGATCAACTATGTGAACGACCTGGAAAACCTGCGGCTGCTGTCCCAAGAGCTGGGCGTGAAGTGGCAGGAGAAGGACTGGGTGGAATACAGCGGCCAGATCAAGAAGCAGATCAGCGATAACCAGAAGCTGACAATCATGAAGCAGAGAATCACCGCCGAGCTGAAGAAGATGCACGGCATCGAGAACCTGAACCTGAGAATCAGCATCGACACCAACAAGTCCCGGCAGACTGTGATGAACAGAATTGCCCTGCCCAAGGGCTTCGTGAAGAACCACATTCAGCAGAACAGCAGCGAGAAGATCAGCAAGAGAATCAGAGAGGACTACTGCAAGATCGAGCTGTCCGGCAAGTACGAAGAGCTGAGCAGACAGTTTTTCGACAAGAAGAACTTTGACAAGATGACCCTGATCAACGGACTGTGCGAGAAGAATAAGCTCATCGCCTTCATGGTGATTTACCTGCTGGAGCGGCTGGGCTTCGAGCTGAAGGAGAAGACCAAGCTGGGCGAGCTGAAGCAGACCCGGATGACATATAAGATCAGCGACAAGGTGAAGGAGGACATCCCCCTCTCCTACTACCCCAAGCTGGTGTACGCCATGAATCGGAAGTATGTGGACAACATCGATAGCTACGCCTTCGCCGCCTACGAGTCTAAGAAGGCCATCCTGGACAAGGTGGACATCATTGAGAAGCAGAGAATGGAATTCATCAAGCAGGTGCTGTGCTTCGAGGAATACATCTTCGAGAACAGAATCATCGAGAAGAGCAAGTTCAACGATGAGGAGACCCACATCAGCTTCACCCAGATCCACGACGAACTGATCAAGAAGGGCAGAGATACCGAAAAGCTGAGCAAGCTGAAGCACGCCAGAAACAAGGCCCTGCACGGCGAGATCCCCGACGGGACCAGCTTTGAGAAGGCCAAGCTGCTGATCAACGAAATCAAGAAGTAA(SEQ ID NO:27)

ATGAACGCCATCGAGCTGAAGAAGGAAGAGGCCGCCTTCTACTTCAACCAGGCCAGACTGAACATCTCTGGCCTGGACGAAATCATCGAGAAGCAACTGCCACACATCGGCTCTAACAGAGAGAACGCCAAGAAGACTGTGGACATGATCCTGGATAACCCCGAGGTGCTGAAGAAGATGGAAAACTACGTGTTCAACTCCCGCGATATTGCCAAGAATGCCCGGGGCGAGCTGGAGGCCCTGCTGCTGAAGCTGGTCGAGCTGAGAAACTTCTATAGCCACTACGTGCACAAGGACGACGTCAAGACACTGAGCTACGGTGAGAAGCCTCTGCTGGATAAGTACTACGAGATCGCCATCGAAGCCACCGGATCCAAGGACGTGCGGCTGGAGATCATTGACGACAAGAATAAGCTGACCGACGCCGGAGTGCTGTTCCTGCTGTGCATGTTCCTGAAGAAGAGCGAGGCTAACAAGCTGATTTCCAGCATCCGGGGCTTCAAGAGGAACGACAAGGAGGGCCAGCCTAGAAGAAACCTGTTCACCTACTACAGCGTGAGAGAGGGCTATAAGGTGGTGCCCGACATGCAGAAGCACTTTCTGCTGTTCACCCTGGTGAACCACCTGTCCAATCAGGACGAGTACATCTCCAACCTGCGCCCAAACCAGGAAATCGGCCAGGGCGGATTTTTCCACCGGATCGCCAGCAAGTTCCTGAGCGACAGCGGAATCCTGCACAGCATGAAGTTCTACACATACAGATCCAAGCGGCTGACCGAGCAGCGGGGAGAGCTGAAGCCCAAGAAGGACCACTTTACATGGATCGAGCCTTTCCAGGGCAATTCCTACTTCAGCGTGCAGGGCCAGAAGGGCGTGATCGGAGAGGAGCAGCTCAAGGAGCTGTGCTACGTGCTGCTGGTGGCCCGGGAGGACTTCAGAGCCGTGGAGGGCAAGGTGACCCAGTTCCTGAAGAAGTTCCAGAATGCCAATAACGTGCAGCAGGTGGAGAAGGACGAGGTGCTGGAAAAGGAGTACTTCCCCGCCAACTACTTTGAGAACCGGGACGTGGGAAGAGTCAAGGACAAGATCCTGAACAGACTGAAGAAGATCACCGAGAGTTATAAGGCCAAGGGTAGAGAGGTGAAGGCCTACGACAAGATGAAGGAAGTGATGGAGTTCATCAACAACTGCCTGCCCACCGATGAAAACCTGAAGCTGAAGGACTACCGGCGGTACCTGAAGATGGTGAGATTCTGGGGCAGAGAGAAGGAAAACATCAAGCGGGAGTTCGACTCCAAGAAGTGGGAGCGCTTTCTCCCCCGGGAGCTGTGGCAGAAGAGAAACCTGGAGGACGCCTACCAGCTCGCCAAGGAGAAGAACACAGAGCTGTTCAACAAGCTGAAGACCACCGTGGAGAGAATGAACGAACTGGAGTTCGAGAAGTACCAGCAGATCAATGACGCCAAGGACCTGGCCAACCTGAGACAGCTGGCCAGAGACTTTGGAGTGAAGTGGGAGGAAAAGGACTGGCAGGAATACTCTGGACAGATCAAGAAGCAGATCACCGACCGGCAGAAGCTGACCATCATGAAGCAGCGGATCACCGCCGCCCTGAAGAAGAAGCAGGGAATCGAAAACCTGAACCTGAGAATCACAACAGATACGAATAAGAGCAGGAAGGTGGTGCTGAACCGGATCGCACTGCCCAAGGGATTCGTCAGAAAGCACATCCTGAAGACCGACATCAAGATCAGCAAGCAGATCCGGCAGAGCCAGTGCCCTATCATCCTGTCTAACAACTACATGAAGCTGGCCAAGGAGTTCTTTGAAGAGCGGAACTTCGATAAGATGACCCAGATCAATGGCCTGTTCGAGAAGAACGTGCTGATCGCCTTCATGATCGTGTACCTGATGGAGCAGCTGAACCTGAGACTGGGCAAGAACACCGAGCTGTCCAACCTGAAGAAGACCGAGGTGAACTTTACCATCACCGACAAGGTGACCGAGAAGGTGCAAATCTCCCAGTACCCCAGCCTGGTGTTCGCCATTAACCGGGAGTACGTGGACGGCATCAGCGGCTACAAGCTGCCCCCCAAGAAGCCCAAGGAACCTCCCTACACCTTCTTCGAAAAGATCGACGCCATCGAAAAGGAGCGGATGGAATTCATCAAGCAGGTGCTGGGCTTCGAGGAGCACCTCTTCGAAAAGAACGTGATCGACAAGACCCGGTTTACCGACACCGCCACCCACATCAGCTTCAATGAGATCTGCGATGAGCTGATCAAGAAGGGCTGGGACGAAAACAAGATCATCAAGCTGAAGGATGCACGGAACGCTGCCCTGCACGGCAAGATCCCTGAAGATACCTCCTTTGACGAAGCCAAGGTGCTGATCAACGAACTGAAGAAGTAA(SEQ ID NO:28)

The locus structures of the seven CRISPR/Cas13e and Cas13f are shown in fig. 1.

We performed further analysis of RNA secondary structure using RNAfold for seven DR sequences in pre-crRNA. See figure 2 for results. All DR sequences apparently possess a very conserved secondary structure.

For example, in the Cas13e family, each DR sequence forms a secondary structure consisting of a4 base pair stem (5 '-GCUG-3'), followed by a symmetrical bulge of 5+5 nucleotides (excluding the 4 stem nucleotides), followed by a 5 base pair stem (5 '-GCC C/U C-3') ending with an 8 base circular structure (5 '-CGAUUUGU-3', excluding the 2 stem nucleotides).

Similarly, in the Cas13f family, each DR sequence forms a secondary structure as follows, with the exception of cas13f.4: a 5 base pair stem (5'GCUGU3'), followed by a roughly symmetrical 5+4 nucleotide protrusion (excluding the 4 stem nucleotides mentioned above), followed by a6 base pair stem (5 'a/G CCUCG 3'), ending with a 5 base loop (5 'AUUUG 3', excluding the 2 stem nucleotides mentioned above). The only exception was the DR sequence of Cas13f.4, which was 1 base pair less in the second step and 2 additional bases in the first bulge, forming a roughly symmetrical 6+5 bulge.

The Cas13e and Cas13f proteins were subjected to multiple sequence alignments with previously identified proteins of the Cas13a, Cas13b, Cas13c and Cas13d families, and the results showed that Cas13e and Cas13f proteins were relatively closest to the Cas13b protein on the phylogenetic tree (see fig. 3).

Furthermore, regarding the location of the RXXXXH motif relative to the N-and C-termini of the Cas protein, the RXXXXH motif of Cas13e and Cas13f proteins is closer to their N-and C-termini than Cas13a, Cas13C, Cas13d, and the RXXXXH motif of Cas13b protein is less close to its N-and C-termini (see fig. 4).

We predicted the 3D structure of Cas13e protein with I-TASSER and visualized the predicted structure with PyMOL. Although the two RXXXXH motifs are very close to the N-and C-termini of cas13e.1, they are very close in 3D structure (see figure 5).

Example 2 Cas13e is an effector RNase

To verify whether the newly discovered Cas13e protein is a CRISPR/Cas system with RNA nuclease activity, the DNA sequence encoding Cas13e.1 in bacteria was first codon optimized for overexpression in human cells (SEQ ID NO: 22), and then the codon optimized Cas13e.1 was cloned into a plasmid with the Green Fluorescent Protein (GFP) gene. At the same time, the coding sequence of the guide rna (grna) targeting the reporter (mCherry) mRNA was cloned into a plasmid for GFP. The gRNA contains a spacer sequence targeting mCherry and direct repeats flanking it (SEQ ID NO: 29). The sequences of the GFP and mCherry genes are SEQ ID NOs: 30-31.

GCTGGAGCAGCCCCCGATTTGTGGGGTGATTACAGCGGTCTTCGATATTCAAGCGTCGGAAGACCTGCTGGAGCAGCCCCCGATTTGTGGGGTGATTACAGC(SEQ ID NO:29)

ATGGTGAGCAAGGGCGAGGAGGATAACATGGCCATCATCAAGGAGTTCATGCGCTTCAAGGTGCACATGGAGGGCTCCGTGAACGGCCACGAGTTCGAGATCGAGGGCGAGGGCGAGGGCCGCCCCTACGAGGGCACCCAGACCGCCAAGCTGAAGGTGACCAAGGGTGGCCCCCTGCCCTTCGCCTGGGACATCCTGTCCCCTCAGTTCATGTACGGCTCCAAGGCCTACGTGAAGCACCCCGCCGACATCCCCGACTACTTGAAGCTGTCCTTCCCCGAGGGCTTCAAGTGGGAGCGCGTGATGAACTTCGAGGACGGCGGCGTGGTGACCGTGACCCAGGACTCCTCCCTGCAGGACGGCGAGTTCATCTACAAGGTGAAGCTGCGCGGCACCAACTTCCCCTCCGACGGCCCCGTAATGCAGAAGAAGACCATGGGCTGGGAGGCCTCCTCCGAGCGGATGTACCCCGAGGACGGCGCCCTGAAGGGCGAGATCAAGCAGAGGCTGAAGCTGAAGGACGGCGGCCACTACGACGCTGAGGTCAAGACCACCTACAAGGCCAAGAAGCCCGTGCAGCTGCCCGGCGCCTACAACGTCAACATCAAGTTGGACATCACCTCCCACAACGAGGACTACACCATCGTGGAACAGTACGAACGCGCCGAGGGCCGCCACTCCACCGGCGGCATGGACGAGCTGTACAAGTAA(SEQ ID NO:30)

ATGGTGAGCAAGGGCGAGGAGCTGTTCACCGGGGTGGTGCCCATCCTGGTCGAGCTGGACGGCGACGTAAACGGCCACAAGTTCAGCGTGTCCGGCGAGGGCGAGGGCGATGCCACCTACGGCAAGCTGACCCTGAAGTTCATCTGCACCACCGGCAAGCTGCCCGTGCCCTGGCCCACCCTCGTGACCACCCTGACCTACGGCGTGCAGTGCTTCAGCCGCTACCCCGACCACATGAAGCAGCACGACTTCTTCAAGTCCGCCATGCCCGAAGGCTACGTCCAGGAGCGCACCATCTTCTTCAAGGACGACGGCAACTACAAGACCCGCGCCGAGGTGAAGTTCGAGGGCGACACCCTGGTGAACCGCATCGAGCTGAAGGGCATCGACTTCAAGGAGGACGGCAACATCCTGGGGCACAAGCTGGAGTACAACTACAACAGCCACAACGTCTATATCATGGCCGACAAGCAGAAGAACGGCATCAAGGTGAACTTCAAGATCCGCCACAACATCGAGGACGGCAGCGTGCAGCTCGCCGACCACTACCAGCAGAACACCCCCATCGGCGACGGCCCCGTGCTGCTGCCCGACAACCACTACCTGAGCACCCAGTCCGCCCTGAGCAAAGACCCCAACGAGAAGCGCGATCACATGGTCCTGCTGGAGTTCGTGACCGCCGCCGGGATCACTCTCGGCATGGACGAGCTGTACAAGTGA(SEQ ID NO:31)

Human HEK293T cells were plated in 24-well tissue culture plates according to the usual mammalian cell culture method and used3000 and P3000TM reagents were co-transfected with 3 plasmids, along with plasmids encoding the Cas13e.1 protein, gRNAs targeting mCherry, and the mCherry reporter. A negative control experiment used a control plasmid encoding a grna (nt) that did not target mCherry. The presence of a GFP coding sequence in plasmids for cas13e.1 and gRNA allows the use of GFP expression as a reference for transfection success/efficiency. Please refer to the schematic diagram in fig. 6. Then, transfected HEK293T cells were incubated at 37 ℃ with 5% CO2For about 24 hours, and then detected and analyzed under a fluorescence microscope.

As shown in fig. 7, gRNA-transfected cells targeting mCherry had the same growth and morphology under bright field microscopy as non-targeting (NT) gRNA-transfected control cells, and GFP expression was essentially the same in both. However, flow cytometry analysis found a significant decrease in mCherry fluorescence signal intensity, by up to 75% (see figure 8). This indicates that Cas13e can effectively reduce the mRNA level of mCherry using grnas targeting mCherry, thereby reducing the expression of mCherry protein.

Example 3 efficient guiding of Cas13e by sgRNA

In theory, since the crRNA precursor in the CRISPR/Cas13e system can generate two structures of crRNA during maturation, i.e., Direct Repeat (DR) + spacer (5 'DR) or spacer + direct repeat (3' DR), this experiment was designed to determine which structure of crRNA Cas13e binds to function.

By using the similar three-plasmid co-transfection assay of example 2, it was found that only the 3' DR orientation (i.e., spacer + direct repeats) could significantly knock down mCherry levels. This suggests that Cas13e functions in binding to the mature crRNA structure with spacer + direct repeats. See fig. 9.

The SgRNA sequences of the Direct Repeat (DR) + spacer (5 'DR) or spacer + direct repeat (3' DR) are SEQ ID NO:32 and SEQ ID NO:33, respectively.

GCTGGAGCAGCCCCCGATTTGTGGGGTGATTACAGCGGTCTTCGATATTCAAGCGTCGGAAGACCT(SEQ ID NO:32)

GGTCTTCGATATTCAAGCGTCGGAAGACCTGCTGGAGCAGCCCCCGATTTGTGGGGTGATTACAGC(SEQ ID NO:33)

Example 4 Effect of spacer sequence Length on Cas13e.1 specific Activity and side Activity

To investigate the effect of spacer sequence length on the specific and non-specific nuclease activity of Cas13e.1, this experiment designed a set of sgRNAs targeting the mCherry reporter gene, with their spacer sequence lengths of 20nt, 25nt, 30nt, 35nt, 40nt, 45nt or 50nt, respectively (SEQ ID NOS: 34-40).

TTGGTGCCGCGCAGCTTCAC(SEQ ID NO:34)

TTGGTGCCGCGCAGCTTCACCTTGT(SEQ ID NO:35)

TTGGTGCCGCGCAGCTTCACCTTGTAGATG(SEQ ID NO:36)

TTGGTGCCGCGCAGCTTCACCTTGTAGATGAACTC(SEQ ID NO:37)

TTGGTGCCGCGCAGCTTCACCTTGTAGATGAACTCGCCGT(SEQ ID NO:38)

TTGGTGCCGCGCAGCTTCACCTTGTAGATGAACTCGCCGTCCTGC(SEQ ID NO:39)

TTGGTGCCGCGCAGCTTCACCTTGTAGATGAACTCGCCGTCCTGCAGGGA(SEQ ID NO:40)

The knockdown efficiency of mCherry and GFP reporter in cells 48 hours after transfection was analyzed by flow cytometry using the analogous three plasmid co-transfection assay of example 2, representing the specific and non-specific nuclease activity of Cas13e, respectively.

Results of mCherry and GFP knockdown experiments found that cas13e.1 has high specific activity with a spacer length between about 30nt and about 50 nt. The results are shown in FIG. 10. Meanwhile, Cas13e.1 has the highest nonspecific activity when the spacer length is about 30 nt. The results are shown in FIG. 11.

Example 5 Single base RNA editing Using dCas13e.1-ADAR2DD fusion

To test whether Cas13e could be used for RNA single base editing, we prepared nuclease-inactivated cas13e.1(dcas13e.1) by mutating the two RXXXXH motifs to eliminate their nuclease activity. Then, the high fidelity ADAR2dd mutant containing the double mutations E488Q and T375G was fused to the C-terminus of dcas13e.1, thereby creating an RNA single base editor that achieved site-directed mutation of a to G, which was named dcas13e.1-ADAR2dd (eABE). See SEQ ID NO: 41.

ATGCCCAAGAAGAAGCGGAAGGTGGCCCAGGTGAGCAAGCAGACCTCCAAGAAGAGGGAGCTGAGCATCGACGAGTACCAGGGCGCCCGGAAGTGGTGCTTCACCATTGCCTTCAACAAGGCCCTGGTGAACCGGGACAAGAACGACGGCCTGTTCGTGGAAAGCCTGCTGAGACACGAGAAGTACAGCAAGCACGACTGGTACGACGAAGATACCCGGGCCCTGATCAAGTGCAGCACCCAGGCCGCCAACGCCAAGGCTGAAGCCCTGGCGAACTACTTCAGTGCTTACCGGCATAGCCCTGGCTGCCTGACCTTCACCGCCGAGGACGAACTGCGGACCATCATGGAGAGAGCCTATGAGCGGGCCATCTTCGAGTGCAGAAGAAGAGAGACAGAGGTGATCATCGAGTTTCCCAGCCTGTTCGAGGGCGACCGGATCACCACCGCCGGCGTGGTGTTTTTCGTGAGCTTTTTCGTGGAAAGAAGAGTGCTGGATCGGCTGTATGGAGCCGTGTCCGGCCTGAAGAAGAATGAGGGACAGTACAAGCTGACCCGGAAGGCCCTGAGCATGTACTGCCTGAAGGACAGCAGATTCACCAAGGCCTGGGATAAGCGGGTGCTGCTGTTCAGAGACATCCTGGCCCAGCTGGGAAGAATCCCCGCCGAGGCCTACGAGTACTACCACGGCGAGCAGGGTGATAAGAAGAGAGCTAACGACAATGAGGGCACAAATCCCAAGCGGCACAAGGACAAGTTCATCGAATTTGCACTGCACTACCTGGAAGCCCAGCACAGCGAGATCTGCTTCGGCAGACGCCACATCGTGCGGGAAGAGGCCGGCGCCGGCGATGAGCACAAGAAGCACCGGACCAAGGGAAAGGTGGTGGTGGACTTCAGCAAGAAGGACGAGGACCAGAGCTACTATATCTCCAAGAACAACGTGATCGTGCGGATCGACAAGAACGCCGGCCCTAGAAGCTACCGGATGGGCCTGAACGAGCTGAAGTACCTCGTGCTGCTGAGCCTGCAGGGGAAGGGCGACGATGCCATCGCCAAGCTGTACAGATACAGACAGCACGTGGAGAACATCCTGGATGTGGTGAAGGTGACCGATAAGGATAACCACGTGTTCCTGCCCCGCTTCGTGCTGGAGCAGCACGGCATCGGCAGAAAGGCCTTCAAGCAGCGGATCGATGGACGGGTGAAGCACGTGCGGGGCGTGTGGGAGAAGAAGAAGGCCGCCACCAATGAAATGACCCTGCACGAGAAGGCCAGAGACATCCTGCAGTACGTGAACGAAAACTGCACCCGGTCCTTCAACCCTGGCGAATACAACAGACTGCTGGTGTGCCTGGTGGGCAAGGACGTGGAGAACTTTCAGGCCGGCCTGAAGCGGCTGCAGCTGGCCGAAAGGATCGATGGCCGGGTGTACTCCATCTTCGCCCAGACCAGCACCATCAATGAGATGCACCAGGTGGTGTGCGACCAGATCCTGAACCGGCTGTGCAGAATCGGCGACCAGAAGCTGTACGATTACGTGGGACTGGGCAAGAAGGACGAAATCGACTACAAGCAGAAGGTGGCCTGGTTCAAGGAGCACATCAGCATCCGGAGAGGATTCCTGAGAAAGAAGTTCTGGTACGATAGCAAGAAGGGATTCGCAAAGCTGGTGGAGGAACACCTGGAGTCCGGCGGCGGCCAGCGCGACGTGGGCCTGGACAAGAAGTACTACCACATCGACGCCATCGGCAGATTCGAGGGCGCCAACCCCGCCCTGTACGAGACCCTGGCCAGAGATCGGCTGTGCCTCATGATGGCCCAGTACTTCCTGGGCAGCGTGAGAAAGGAACTGGGCAACAAGATTGTGTGGAGCAACGACAGCATCGAACTGCCTGTGGAAGGCTCTGTGGGAAATGAGAAGAGCATCGTGTTCTCCGTGTCTGACTACGGCAAGCTGTACGTGCTGGACGATGCCGAATTCCTGGGCCGGATCTGCGAATACTTCATGCCCCACGAAAAGGGCAAGATCCGGTACCACACAGTGTACGAAAAGGGCTTTAGAGCATACAACGACCTGCAGAAGAAGTGCGTGGAGGCCGTGCTGGCTTTCGAAGAGAAGGTGGTGAAGGCCAAGAAGATGAGCGAGAAGGAAGGCGCCCACTACATCGACTTCCGGGAGATCCTGGCCCAGACCATGTGCAAGGAGGCCGAGAAGACCGCAGTGAACAAGGTGGCGGCTGCCTTCTTCGCTGCGCACCTGAAGTTCGTGATTGACGAGTTCGGCCTGTTCAGCGACGTGATGAAGAAGTACGGCATCGAGAAGGAATGGAAGTTCCCTGTCAAGCCCAAGAAGAAGCGGAAGGTGGGTGGAGGCGGAGGTTCTGGGGGAGGAGGTAGTGGCGGTGGTGGTTCAGGAGGCGGCGGAAGCCAGCTGCATTTACCGCAGGTTTTAGCTGACGCTGTCTCACGCCTGGTCCTGGGTAAGTTTGGTGACCTGACCGACAACTTCTCCTCCCCTCACGCTCGCAGAAAAGTGCTGGCTGGAGTCGTCATGACAACAGGCACAGATGTTAAAGATGCCAAGGTGATAAGTGTTTCTACAGGAGGCAAATGTATTAATGGTGAATACATGAGTGATCGTGGCCTTGCATTAAATGACTGCCATGCAGAAATAATATCTCGGAGATCCTTGCTCAGATTTCTTTATACACAACTTGAGCTTTACTTAAATAACAAAGATGATCAAAAAAGATCCATCTTTCAGAAATCAGAGCGAGGGGGGTTTAGGCTGAAGGAGAATGTCCAGTTTCATCTGTACATCAGCACCTCTCCCTGTGGAGATGCCAGAATCTTCTCACCACATGAGCCAATCCTGGAAGAACCAGCAGATAGACACCCAAATCGTAAAGCAAGAGGACAGCTACGGACCAAAATAGAGTCTGGTCAGGGGACGATTCCAGTGCGCTCCAATGCGAGCATCCAAACGTGGGACGGGGTGCTGCAAGGGGAGCGGCTGCTCACCATGTCCTGCAGTGACAAGATTGCACGCTGGAACGTGGTGGGCATCCAGGGATCACTGCTCAGCATTTTCGTGGAGCCCATTTACTTCTCGAGCATCATCCTGGGCAGCCTTTACCACGGGGACCACCTTTCCAGGGCCATGTACCAGCGGATCTCCAACATAGAGGACCTGCCACCTCTCTACACCCTCAACAAGCCTTTGCTCAGTGGCATCAGCAATGCAGAAGCACGGCAGCCAGGGAAGGCCCCCAACTTCAGTGTCAACTGGACGGTAGGCGACTCCGCTATTGAGGTCATCAACGCCACGACTGGGAAGGATGAGCTGGGCCGCGCGTCCCGCCTGTGTAAGCACGCGTTGTACTGTCGCTGGATGCGTGTGCACGGCAAGGTTCCCTCCCACTTACTACGCTCCAAGATTACCAAGCCCAACGTGTACCATGAGTCCAAGCTGGCGGCAAAGGAGTACCAGGCCGCCAAGGCGCGTCTGTTCACAGCCTTCATCAAGGCGGGGCTGGGGGCCTGGGTGGAGAAGCCCACCGAGCAGGACCAGTTCTCACTCACGTACCCATACGACGTACCAGATTACGCTTAA(SEQ ID NO:41)

In order to visually detect the editing activity of eABE, we made a stop codon (TAG) in the coding sequence of wild type mCherry to destroy the translation of the complete mCherry (see the bold double underlined sequence in SEQ ID NO: 42), so before the TAG mutation is not repaired, the mutated mChery gene cannot produce fluorescent protein, and only when eABE edits A into G, the normal translation of mChery can be recovered, thus producing fluorescent protein. See fig. 12 and 14 for schematic illustrations. After designing crRNA for TAG mutation sites, pCX530 plasmid encoding eABE, pCX537(gRNA-1)/Cx538(gRNA-2) plasmid encoding sgrna (crRNA), and mCherry reporter plasmid pCX337 were co-transfected into HEK293T cells. Transfected HEK293T cells at 37 ℃ in 5% CO2After 24 hours of culture, cells recovering mCherry fluorescence expression are separated by flow cytometry, and RNA is extracted for reverse transcription and PCR amplification and sequencing analysis. See the description of fig. 12. The results of the flow cytometry analysis are shown in FIG. 13.

Flow and sequencing results show that both gRNA-1(SEQ ID NO: 43) and gRNA-2(SEQ ID NO: 44) successfully correct the introduced TAG stop codon and restore the normal translation expression of mCherry.

ATGGTGAGCAAGGGCGAGGAGGATAACATGGCCATCATCAAGGAGTTCATGCGCTTCAAGGTGCACATGGAGGGCTCCGTGAACGGCCACGAGTTCGAGATCGAGGGCGAGGGCGAGGGCCGCCCCTACGAGGGCACCCAGACCGCCAAGCTGAAGGTGACCAAGGGTGGCCCCCTGCCCTTCGCCTGGGACATCCTGTCCCCTCAGTTCATGTACGGCTCCAAGGCCTACGTGAAGCACCCCGCCGACATCCCCGACTACTTGAAGCTGTCCTTCCCCGAGGGCTTCAAGTAGGAGCGCGTGATGAACTTCGAGGACGGCGGCGTGGTGACCGTGACCCAGGACTCCTCCCTGCAGGACGGCGAGTTCATCTACAAGGTGAAGCTGCGCGGCACCAACTTCCCCTCCGACGGCCCCGTAATGCAGAAGAAGACCATGGGCTGGGAGGCCTCCTCCGAGCGGATGTACCCCGAGGACGGCGCCCTGAAGGGCGAGATCAAGCAGAGGCTGAAGCTGAAGGACGGCGGCCACTACGACGCTGAGGTCAAGACCACCTACAAGGCCAAGAAGCCCGTGCAGCTGCCCGGCGCCTACAACGTCAACATCAAGTTGGACATCACCTCCCACAACGAGGACTACACCATCGTGGAACAGTACGAACGCGCCGAGGGCCGCCACTCCACCGGCGGCATGGACGAGCTGTACAAGTAA(SEQ ID NO:42)

caagtagtcggggatgtcggcggggtgcttcacCtaggccttggagccgtGCTGGAGCAGCCCCCGATTTGTGGGGTGATTACAGC(SEQ ID NO:43)

cggggatgtcggcggggtgcttcacCtaggccttggagccgtacatgaacGCTGGAGCAGCCCCCGATTTGTGGGGTGATTACAGC(SEQ ID NO:44)

Example 6 Single base RNA editing Using shortened dCas13e.1-ADAR2DD Fusion

To construct a mini-version of the eABE single base editor, a series of mutants were made by truncating dCas13e.1, first by 30 amino acid steps from the C-terminus of dCas13e.1, thus generating five mutants with deletion of 30, 60, 90, 120 and 150 residues from the C-terminus, and then fusing with high fidelity ADAR2dd to construct truncated eABEs, and constructing plasmids of these eABEs, designated Vysz-19 ("V19") to Vysz15 ("V15"), respectively (see FIG. 15), in which the fusion genes are transcriptionally expressed from introns after the CMV promoter (pCMV) and enhancer (eCMV). The truncated eABE was fused at both ends to the Nuclear Localization Sequence (NLS), ADAR2dd fused to the C-terminus of the truncated dcas13e.1, and transcription was terminated using polyA. Also, these plasmids carry a GFP expression element initiated by EFS alone to indicate positive cell transfection.

The experimental results show that the editing activity of the truncated eABE is increased along with the gradual increase of the deletion length of the C terminal of dCas13e, wherein the editor V19 with 150 amino acids deleted from the C terminal shows the highest base editing activity. Please see fig. 16. However, when the C-terminal truncation was up to 180 amino acids in length, the base editing activity was almost completely lost, indicating that the longest tolerable deletion length of dCas13e.1 was between 150-180 residues.

Based on the C-terminal deletion 150 amino acid mutant, we next constructed a series of N-terminal deletion mutants. A total of 7 such N-terminal deletion mutants were generated with 30, 60, 90, 120, 150, 180 and 210 residue deletions at the N-terminus, respectively. Please see fig. 17. The results in fig. 18 show that the mutant containing both the N-terminal 180 residues and the C-terminal 150 residues deletion had the best base editing activity, the Cas13e.1 protein of the original length 775a. was changed to a mini-version dCas13e.1 of 445a. only after cutting 330 residues, and the eABE fused with ADAR2dd ″, had the best editing activity.

Example 7 comparison of knockout efficiency of mammalian endogenous mRNA with different Cas13 proteins

This experiment shows that Cas13e and Cas13f proteins (especially cas13f.1) are very effective in knocking out mammalian endogenous target mRNA, superior to Cas13 protein identified previously.

Specifically, we constructed five plasmids, each expressing one Cas13 protein, Cas13e.1(SEQ ID NO: 22), Cas13f.1(SEQ ID NO: 23), LwaCas13a (SEQ ID NO: 45), PspCas13b (SEQ ID NO: 46), RxCas13d (SEQ ID NO: 47). Each plasmid also encodes its mCherry reporter, as well as the coding sequence of the corresponding sgRNA/crRNA of each Cas13 protein, which flanks both native DR sequences. These sgrnas were designed with a spacer sequence targeting ANXA4 mRNA. See SEQ ID NO: 48-50. As negative controls, we also constructed 5 other plasmids, each encoding a non-targeted sgRNA/crRNA, but not a sgRNA/crRNA targeted to ANXA4 (i.e., a "control NT construct"). See fig. 19.

ATGCCCAAGAAGAAGCGGAAGGTGGGATCCATGAAAGTGACCAAGGTCGATGGCATCAGCCACAAGAAGTACATCGAAGAGGGCAAGCTCGTGAAGTCCACCAGCGAGGAAAACCGGACCAGCGAGAGACTGAGCGAGCTGCTGAGCATCCGGCTGGACATCTACATCAAGAACCCCGACAACGCCTCCGAGGAAGAGAACCGGATCAGAAGAGAGAACCTGAAGAAGTTCTTTAGCAACAAGGTGCTGCACCTGAAGGACAGCGTGCTGTATCTGAAGAACCGGAAAGAAAAGAACGCCGTGCAGGACAAGAACTATAGCGAAGAGGACATCAGCGAGTACGACCTGAAAAACAAGAACAGCTTCTCCGTGCTGAAGAAGATCCTGCTGAACGAGGACGTGAACTCTGAGGAACTGGAAATCTTTCGGAAGGACGTGGAAGCCAAGCTGAACAAGATCAACAGCCTGAAGTACAGCTTCGAAGAGAACAAGGCCAACTACCAGAAGATCAACGAGAACAACGTGGAAAAAGTGGGCGGCAAGAGCAAGCGGAACATCATCTACGACTACTACAGAGAGAGCGCCAAGCGCAACGACTACATCAACAACGTGCAGGAAGCCTTCGACAAGCTGTATAAGAAAGAGGATATCGAGAAACTGTTTTTCCTGATCGAGAACAGCAAGAAGCACGAGAAGTACAAGATCCGCGAGTACTATCACAAGATCATCGGCCGGAAGAACGACAAAGAGAACTTCGCCAAGATTATCTACGAAGAGATCCAGAACGTGAACAACATCAAAGAGCTGATTGAGAAGATCCCCGACATGTCTGAGCTGAAGAAAAGCCAGGTGTTCTACAAGTACTACCTGGACAAAGAGGAACTGAACGACAAGAATATTAAGTACGCCTTCTGCCACTTCGTGGAAATCGAGATGTCCCAGCTGCTGAAAAACTACGTGTACAAGCGGCTGAGCAACATCAGCAACGATAAGATCAAGCGGATCTTCGAGTACCAGAATCTGAAAAAGCTGATCGAAAACAAACTGCTGAACAAGCTGGACACCTACGTGCGGAACTGCGGCAAGTACAACTACTATCTGCAAGTGGGCGAGATCGCCACCTCCGACTTTATCGCCCGGAACCGGCAGAACGAGGCCTTCCTGAGAAACATCATCGGCGTGTCCAGCGTGGCCTACTTCAGCCTGAGGAACATCCTGGAAACCGAGAACGAGAACGATATCACCGGCCGGATGCGGGGCAAGACCGTGAAGAACAACAAGGGCGAAGAGAAATACGTGTCCGGCGAGGTGGACAAGATCTACAATGAGAACAAGCAGAACGAAGTGAAAGAAAATCTGAAGATGTTCTACAGCTACGACTTCAACATGGACAACAAGAACGAGATCGAGGACTTCTTCGCCAACATCGACGAGGCCATCAGCAGCATCAGACACGGCATCGTGCACTTCAACCTGGAACTGGAAGGCAAGGACATCTTCGCCTTCAAGAATATCGCCCCCAGCGAGATCTCCAAGAAGATGTTTCAGAACGAAATCAACGAAAAGAAGCTGAAGCTGAAAATCTTCAAGCAGCTGAACAGCGCCAACGTGTTCAACTACTACGAGAAGGATGTGATCATCAAGTACCTGAAGAATACCAAGTTCAACTTCGTGAACAAAAACATCCCCTTCGTGCCCAGCTTCACCAAGCTGTACAACAAGATTGAGGACCTGCGGAATACCCTGAAGTTTTTTTGGAGCGTGCCCAAGGACAAAGAAGAGAAGGACGCCCAGATCTACCTGCTGAAGAATATCTACTACGGCGAGTTCCTGAACAAGTTCGTGAAAAACTCCAAGGTGTTCTTTAAGATCACCAATGAAGTGATCAAGATTAACAAGCAGCGGAACCAGAAAACCGGCCACTACAAGTATCAGAAGTTCGAGAACATCGAGAAAACCGTGCCCGTGGAATACCTGGCCATCATCCAGAGCAGAGAGATGATCAACAACCAGGACAAAGAGGAAAAGAATACCTACATCGACTTTATTCAGCAGATTTTCCTGAAGGGCTTCATCGACTACCTGAACAAGAACAATCTGAAGTATATCGAGAGCAACAACAACAATGACAACAACGACATCTTCTCCAAGATCAAGATCAAAAAGGATAACAAAGAGAAGTACGACAAGATCCTGAAGAACTATGAGAAGCACAATCGGAACAAAGAAATCCCTCACGAGATCAATGAGTTCGTGCGCGAGATCAAGCTGGGGAAGATTCTGAAGTACACCGAGAATCTGAACATGTTTTACCTGATCCTGAAGCTGCTGAACCACAAAGAGCTGACCAACCTGAAGGGCAGCCTGGAAAAGTACCAGTCCGCCAACAAAGAAGAAACCTTCAGCGACGAGCTGGAACTGATCAACCTGCTGAACCTGGACAACAACAGAGTGACCGAGGACTTCGAGCTGGAAGCCAACGAGATCGGCAAGTTCCTGGACTTCAACGAAAACAAAATCAAGGACCGGAAAGAGCTGAAAAAGTTCGACACCAACAAGATCTATTTCGACGGCGAGAACATCATCAAGCACCGGGCCTTCTACAATATCAAGAAATACGGCATGCTGAATCTGCTGGAAAAGATCGCCGATAAGGCCAAGTATAAGATCAGCCTGAAAGAACTGAAAGAGTACAGCAACAAGAAGAATGAGATTGAAAAGAACTACACCATGCAGCAGAACCTGCACCGGAAGTACGCCAGACCCAAGAAGGACGAAAAGTTCAACGACGAGGACTACAAAGAGTATGAGAAGGCCATCGGCAACATCCAGAAGTACACCCACCTGAAGAACAAGGTGGAATTCAATGAGCTGAACCTGCTGCAGGGCCTGCTGCTGAAGATCCTGCACCGGCTCGTGGGCTACACCAGCATCTGGGAGCGGGACCTGAGATTCCGGCTGAAGGGCGAGTTTCCCGAGAACCACTACATCGAGGAAATTTTCAATTTCGACAACTCCAAGAATGTGAAGTACAAAAGCGGCCAGATCGTGGAAAAGTATATCAACTTCTACAAAGAACTGTACAAGGACAATGTGGAAAAGCGGAGCATCTACTCCGACAAGAAAGTGAAGAAACTGAAGCAGGAAAAAAAGGACCTGTACATCCGGAACTACATTGCCCACTTCAACTACATCCCCCACGCCGAGATTAGCCTGCTGGAAGTGCTGGAAAACCTGCGGAAGCTGCTGTCCTACGACCGGAAGCTGAAGAACGCCATCATGAAGTCCATCGTGGACATTCTGAAAGAATACGGCTTCGTGGCCACCTTCAAGATCGGCGCTGACAAGAAGATCGAAATCCAGACCCTGGAATCAGAGAAGATCGTGCACCTGAAGAATCTGAAGAAAAAGAAACTGATGACCGACCGGAACAGCGAGGAACTGTGCGAACTCGTGAAAGTCATGTTCGAGTACAAGGCCCTGGAATGA(SEQ ID NO:45)

ATGCCCAAGAAGAAGCGGAAGGTGGTCGACAACATCCCCGCTCTGGTGGAAAACCAGAAGAAGTACTTTGGCACCTACAGCGTGATGGCCATGCTGAACGCTCAGACCGTGCTGGACCACATCCAGAAGGTGGCCGATATTGAGGGCGAGCAGAACGAGAACAACGAGAATCTGTGGTTTCACCCCGTGATGAGCCACCTGTACAACGCCAAGAACGGCTACGACAAGCAGCCCGAGAAAACCATGTTCATCATCGAGCGGCTGCAGAGCTACTTCCCATTCCTGAAGATCATGGCCGAGAACCAGAGAGAGTACAGCAACGGCAAGTACAAGCAGAACCGCGTGGAAGTGAACAGCAACGACATCTTCGAGGTGCTGAAGCGCGCCTTCGGCGTGCTGAAGATGTACAGGGACCTGACCAACCACTACAAGACCTACGAGGAAAAGCTGAACGACGGCTGCGAGTTCCTGACCAGCACAGAGCAACCTCTGAGCGGCATGATCAACAACTACTACACAGTGGCCCTGCGGAACATGAACGAGAGATACGGCTACAAGACAGAGGACCTGGCCTTCATCCAGGACAAGCGGTTCAAGTTCGTGAAGGACGCCTACGGCAAGAAAAAGTCCCAAGTGAATACCGGATTCTTCCTGAGCCTGCAGGACTACAACGGCGACACACAGAAGAAGCTGCACCTGAGCGGAGTGGGAATCGCCCTGCTGATCTGCCTGTTCCTGGACAAGCAGTACATCAACATCTTTCTGAGCAGGCTGCCCATCTTCTCCAGCTACAATGCCCAGAGCGAGGAACGGCGGATCATCATCAGATCCTTCGGCATCAACAGCATCAAGCTGCCCAAGGACCGGATCCACAGCGAGAAGTCCAACAAGAGCGTGGCCATGGATATGCTCAACGAAGTGAAGCGGTGCCCCGACGAGCTGTTCACAACACTGTCTGCCGAGAAGCAGTCCCGGTTCAGAATCATCAGCGACGACCACAATGAAGTGCTGATGAAGCGGAGCAGCGACAGATTCGTGCCTCTGCTGCTGCAGTATATCGATTACGGCAAGCTGTTCGACCACATCAGGTTCCACGTGAACATGGGCAAGCTGAGATACCTGCTGAAGGCCGACAAGACCTGCATCGACGGCCAGACCAGAGTCAGAGTGATCGAGCAGCCCCTGAACGGCTTCGGCAGACTGGAAGAGGCCGAGACAATGCGGAAGCAAGAGAACGGCACCTTCGGCAACAGCGGCATCCGGATCAGAGACTTCGAGAACATGAAGCGGGACGACGCCAATCCTGCCAACTATCCCTACATCGTGGACACCTACACACACTACATCCTGGAAAACAACAAGGTCGAGATGTTTATCAACGACAAAGAGGACAGCGCCCCACTGCTGCCCGTGATCGAGGATGATAGATACGTGGTCAAGACAATCCCCAGCTGCCGGATGAGCACCCTGGAAATTCCAGCCATGGCCTTCCACATGTTTCTGTTCGGCAGCAAGAAAACCGAGAAGCTGATCGTGGACGTGCACAACCGGTACAAGAGACTGTTCCAGGCCATGCAGAAAGAAGAAGTGACCGCCGAGAATATCGCCAGCTTCGGAATCGCCGAGAGCGACCTGCCTCAGAAGATCCTGGATCTGATCAGCGGCAATGCCCACGGCAAGGATGTGGACGCCTTCATCAGACTGACCGTGGACGACATGCTGACCGACACCGAGCGGAGAATCAAGAGATTCAAGGACGACCGGAAGTCCATTCGGAGCGCCGACAACAAGATGGGAAAGAGAGGCTTCAAGCAGATCTCCACAGGCAAGCTGGCCGACTTCCTGGCCAAGGACATCGTGCTGTTTCAGCCCAGCGTGAACGATGGCGAGAACAAGATCACCGGCCTGAACTACCGGATCATGCAGAGCGCCATTGCCGTGTACGATAGCGGCGACGATTACGAGGCCAAGCAGCAGTTCAAGCTGATGTTCGAGAAGGCCCGGCTGATCGGCAAGGGCACAACAGAGCCTCATCCATTTCTGTACAAGGTGTTCGCCCGCAGCATCCCCGCCAATGCCGTCGAGTTCTACGAGCGCTACCTGATCGAGCGGAAGTTCTACCTGACCGGCCTGTCCAACGAGATCAAGAAAGGCAACAGAGTGGATGTGCCCTTCATCCGGCGGGACCAGAACAAGTGGAAAACACCCGCCATGAAAACCCTGGGCAGAATCTACAGCGAGGATCTGCCCGTGGAACTGCCCAGACAGATGTTCGACAATGAGATCAAGTCCCACCTGAAGTCCCTGCCACAGATGGAAGGCATCGACTTCAACAATGCCAACGTGACCTATCTGATCGCCGAGTACATGAAGAGAGTGCTGGACGACGACTTCCAGACCTTCTACCAGTGGAACCGCAACTACCGGTACATGGACATGCTTAAGGGCGAGTACGACAGAAAGGGCTCCCTGCAGCACTGCTTCACCAGCGTGGAAGAGAGAGAAGGCCTCTGGAAAGAGCGGGCCTCCAGAACAGAGCGGTACAGAAAGCAGGCCAGCAACAAGATCCGCAGCAACCGGCAGATGAGAAACGCCAGCAGCGAAGAGATCGAGACAATCCTGGATAAGCGGCTGAGCAACAGCCGGAACGAGTACCAGAAAAGCGAGAAAGTGATCCGGCGCTACAGAGTGCAGGATGCCCTGCTGTTTCTGCTGGCCAAAAAGACCCTGACCGAACTGGCCGATTTCGACGGCGAGAGGTTCAAACTGAAAGAAATCATGCCCGACGCCGAGAAGGGAATCCTGAGCGAGATCATGCCCATGAGCTTCACCTTCGAGAAAGGCGGCAAGAAGTACACCATCACCAGCGAGGGCATGAAGCTGAAGAACTACGGCGACTTCTTTGTGCTGGCTAGCGACAAGAGGATCGGCAACCTGCTGGAACTCGTGGGCAGCGACATCGTGTCCAAAGAGGATATCATGGAAGAGTTCAACAAATACGACCAGTGCAGGCCCGAGATCAGCTCCATCGTGTTCAACCTGGAAAAGTGGGCCTTCGACACATACCCCGAGCTGTCTGCCAGAGTGGACCGGGAAGAGAAGGTGGACTTCAAGAGCATCCTGAAAATCCTGCTGAACAACAAGAACATCAACAAAGAGCAGAGCGACATCCTGCGGAAGATCCGGAACGCCTTCGATCACAACAATTACCCCGACAAAGGCGTGGTGGAAATCAAGGCCCTGCCTGAGATCGCCATGAGCATCAAGAAGGCCTTTGGGGAGTACGCCATCATGAAGGGATCCCTTCAATGA(SEQ ID NO:46)

ATGCCTAAAAAGAAAAGAAAGGTGGGTTCTGGTATCGAGAAGAAGAAGAGCTTCGCCAAGGGCATGGGAGTGAAGAGCACCCTGGTGTCCGGCTCTAAGGTGTACATGACCACATTTGCTGAGGGAAGCGACGCCAGGCTGGAGAAGATCGTGGAGGGCGATAGCATCAGATCCGTGAACGAGGGAGAGGCTTTCAGCGCCGAGATGGCTGACAAGAACGCTGGCTACAAGATCGGAAACGCCAAGTTTTCCCACCCAAAGGGCTACGCCGTGGTGGCTAACAACCCACTGTACACCGGACCAGTGCAGCAGGACATGCTGGGACTGAAGGAGACACTGGAGAAGAGGTACTTCGGCGAGTCCGCCGACGGAAACGATAACATCTGCATCCAGGTCATCCACAACATCCTGGATATCGAGAAGATCCTGGCTGAGTACATCACAAACGCCGCTTACGCCGTGAACAACATCTCCGGCCTGGACAAGGATATCATCGGCTTCGGAAAGTTTTCTACCGTGTACACATACGACGAGTTCAAGGATCCAGAGCACCACCGGGCCGCTTTTAACAACAACGACAAGCTGATCAACGCCATCAAGGCTCAGTACGACGAGTTCGATAACTTTCTGGATAACCCCAGGCTGGGCTACTTCGGACAGGCTTTCTTTTCTAAGGAGGGCAGAAACTACATCATCAACTACGGAAACGAGTGTTACGACATCCTGGCCCTGCTGAGCGGACTGAGGCACTGGGTGGTGCACAACAACGAGGAGGAGTCTCGGATCAGCCGCACCTGGCTGTACAACCTGGACAAGAACCTGGATAACGAGTACATCTCCACACTGAACTACCTGTACGACAGGATCACCAACGAGCTGACAAACAGCTTCTCCAAGAACTCTGCCGCTAACGTGAACTACATCGCTGAGACCCTGGGCATCAACCCAGCTGAGTTCGCTGAGCAGTACTTCAGATTTTCCATCATGAAGGAGCAGAAGAACCTGGGCTTCAACATCACAAAGCTGAGAGAAGTGATGCTGGACAGAAAGGATATGTCCGAGATCAGGAAGAACCACAAGGTGTTCGATTCTATCAGAACCAAGGTGTACACAATGATGGACTTTGTGATCTACAGGTACTACATCGAGGAGGATGCCAAGGTGGCCGCTGCCAACAAGAGCCTGCCCGACAACGAGAAGTCTCTGAGCGAGAAGGATATCTTCGTGATCAACCTGAGAGGCTCCTTTAACGACGATCAGAAGGACGCTCTGTACTACGATGAGGCCAACAGGATCTGGAGAAAGCTGGAGAACATCATGCACAACATCAAGGAGTTCCGGGGAAACAAGACCCGCGAGTACAAGAAGAAGGACGCTCCAAGGCTGCCTAGGATCCTGCCTGCTGGAAGGGACGTGAGCGCCTTCAGCAAGCTGATGTACGCCCTGACAATGTTTCTGGACGGAAAGGAGATCAACGATCTGCTGACCACACTGATCAACAAGTTCGACAACATCCAGTCTTTTCTGAAAGTGATGCCTCTGATCGGCGTGAACGCTAAGTTCGTGGAGGAGTACGCCTTCTTTAAGGACAGCGCCAAGATCGCTGATGAGCTGCGGCTGATCAAGTCCTTTGCCAGGATGGGAGAGCCAATCGCTGACGCTAGGAGAGCTATGTACATCGATGCCATCCGGATCCTGGGAACCAACCTGTCTTACGACGAGCTGAAGGCTCTGGCCGACACCTTCAGCCTGGATGAGAACGGCAACAAGCTGAAGAAGGGCAAGCACGGAATGCGCAACTTCATCATCAACAACGTGATCAGCAACAAGCGGTTTCACTACCTGATCAGATACGGCGACCCAGCTCACCTGCACGAGATCGCTAAGAACGAGGCCGTGGTGAAGTTCGTGCTGGGACGGATCGCCGATATCCAGAAGAAGCAGGGCCAGAACGGAAAGAACCAGATCGACCGCTACTACGAGACCTGCATCGGCAAGGATAAGGGAAAGTCCGTGTCTGAGAAGGTGGACGCTCTGACCAAGATCATCACAGGCATGAACTACGACCAGTTCGATAAGAAGAGATCTGTGATCGAGGACACCGGAAGGGAGAACGCCGAGAGAGAGAAGTTTAAGAAGATCATCAGCCTGTACCTGACAGTGATCTACCACATCCTGAAGAACATCGTGAACATCAACGCTAGATACGTGATCGGCTTCCACTGCGTGGAGCGCGATGCCCAGCTGTACAAGGAGAAGGGATACGACATCAACCTGAAGAAGCTGGAGGAGAAGGGCTTTAGCTCCGTGACCAAGCTGTGCGCTGGAATCGACGAGACAGCCCCCGACAAGAGGAAGGATGTGGAGAAGGAGATGGCCGAGAGAGCTAAGGAGAGCATCGACTCCCTGGAGTCTGCTAACCCTAAGCTGTACGCCAACTACATCAAGTACTCCGATGAGAAGAAGGCCGAGGAGTTCACCAGGCAGATCAACAGAGAGAAGGCCAAGACCGCTCTGAACGCCTACCTGAGGAACACAAAGTGGAACGTGATCATCCGGGAGGACCTGCTGCGCATCGATAACAAGACCTGTACACTGTTCCGGAACAAGGCTGTGCACCTGGAGGTGGCTCGCTACGTGCACGCCTACATCAACGACATCGCCGAGGTGAACTCCTACTTTCAGCTGTACCACTACATCATGCAGAGGATCATCATGAACGAGAGATACGAGAAGTCTAGCGGCAAGGTGTCTGAGTACTTCGACGCCGTGAACGATGAGAAGAAGTACAACGATAGACTGCTGAAGCTGCTGTGCGTGCCTTTCGGATACTGTATCCCACGGTTTAAGAACCTGAGCATCGAGGCCCTGTTCGACCGCAACGAGGCTGCCAAGTTTGATAAGGAGAAGAAGAAGGTGAGCGGCAACTCCTGA(SEQ ID NO:47)

ATGGCCCTTCGCAGCTCTTGCACGTCATAC(SEQ ID NO:48)

TTAGGCAGCCCTCATCAGTGCCGGCTCCCT(SEQ ID NO:49)

GGCCAGGATCTCAATTAGGCAGCCCTCATC(SEQ ID NO:50)

As described in example 4, 5 Cas13/sgRNA encoding plasmids were transfected into HEK293 cells. After 24 hours of culture, cells expressing mCherry were isolated by flow cytometry and then assessed for knockdown efficiency by determining expression of ANXA4mRNA using RT-PCR, compared to control cells transfected with Cas13/NT encoding plasmid.

Fig. 20 shows that Cas13b had only a small knockdown of ANXA4mRNA, whereas cas13e.1, cas13f.1, Cas13d had more than 80% of the target ANXA4mRNA knockdown, respectively. Of these, Cas13e.1 was shown to have the strongest knock-down efficiency.

Sequence listing

<110> China academy of sciences brain science and intelligent technology prominent innovation center

<120> VI-E type and VI-F type CRISPR-Cas system and application

<130> 202656

<150> PCT/CN2020/077211

<151> 2020-02-28

<160> 50

<170> SIPOSequenceListing 1.0

<210> 1

<211> 775

<212> PRT

<213> metagenome (metagenomic)

<400> 1

Met Ala Gln Val Ser Lys Gln Thr Ser Lys Lys Arg Glu Leu Ser Ile

1 5 10 15

Asp Glu Tyr Gln Gly Ala Arg Lys Trp Cys Phe Thr Ile Ala Phe Asn

20 25 30

Lys Ala Leu Val Asn Arg Asp Lys Asn Asp Gly Leu Phe Val Glu Ser

35 40 45

Leu Leu Arg His Glu Lys Tyr Ser Lys His Asp Trp Tyr Asp Glu Asp

50 55 60

Thr Arg Ala Leu Ile Lys Cys Ser Thr Gln Ala Ala Asn Ala Lys Ala

65 70 75 80

Glu Ala Leu Arg Asn Tyr Phe Ser His Tyr Arg His Ser Pro Gly Cys

85 90 95

Leu Thr Phe Thr Ala Glu Asp Glu Leu Arg Thr Ile Met Glu Arg Ala

100 105 110

Tyr Glu Arg Ala Ile Phe Glu Cys Arg Arg Arg Glu Thr Glu Val Ile

115 120 125

Ile Glu Phe Pro Ser Leu Phe Glu Gly Asp Arg Ile Thr Thr Ala Gly

130 135 140

Val Val Phe Phe Val Ser Phe Phe Val Glu Arg Arg Val Leu Asp Arg

145 150 155 160

Leu Tyr Gly Ala Val Ser Gly Leu Lys Lys Asn Glu Gly Gln Tyr Lys

165 170 175

Leu Thr Arg Lys Ala Leu Ser Met Tyr Cys Leu Lys Asp Ser Arg Phe

180 185 190

Thr Lys Ala Trp Asp Lys Arg Val Leu Leu Phe Arg Asp Ile Leu Ala

195 200 205

Gln Leu Gly Arg Ile Pro Ala Glu Ala Tyr Glu Tyr Tyr His Gly Glu

210 215 220

Gln Gly Asp Lys Lys Arg Ala Asn Asp Asn Glu Gly Thr Asn Pro Lys

225 230 235 240

Arg His Lys Asp Lys Phe Ile Glu Phe Ala Leu His Tyr Leu Glu Ala

245 250 255

Gln His Ser Glu Ile Cys Phe Gly Arg Arg His Ile Val Arg Glu Glu

260 265 270

Ala Gly Ala Gly Asp Glu His Lys Lys His Arg Thr Lys Gly Lys Val

275 280 285

Val Val Asp Phe Ser Lys Lys Asp Glu Asp Gln Ser Tyr Tyr Ile Ser

290 295 300

Lys Asn Asn Val Ile Val Arg Ile Asp Lys Asn Ala Gly Pro Arg Ser

305 310 315 320

Tyr Arg Met Gly Leu Asn Glu Leu Lys Tyr Leu Val Leu Leu Ser Leu

325 330 335

Gln Gly Lys Gly Asp Asp Ala Ile Ala Lys Leu Tyr Arg Tyr Arg Gln

340 345 350

His Val Glu Asn Ile Leu Asp Val Val Lys Val Thr Asp Lys Asp Asn

355 360 365

His Val Phe Leu Pro Arg Phe Val Leu Glu Gln His Gly Ile Gly Arg

370 375 380

Lys Ala Phe Lys Gln Arg Ile Asp Gly Arg Val Lys His Val Arg Gly

385 390 395 400

Val Trp Glu Lys Lys Lys Ala Ala Thr Asn Glu Met Thr Leu His Glu

405 410 415

Lys Ala Arg Asp Ile Leu Gln Tyr Val Asn Glu Asn Cys Thr Arg Ser

420 425 430

Phe Asn Pro Gly Glu Tyr Asn Arg Leu Leu Val Cys Leu Val Gly Lys

435 440 445

Asp Val Glu Asn Phe Gln Ala Gly Leu Lys Arg Leu Gln Leu Ala Glu

450 455 460

Arg Ile Asp Gly Arg Val Tyr Ser Ile Phe Ala Gln Thr Ser Thr Ile

465 470 475 480

Asn Glu Met His Gln Val Val Cys Asp Gln Ile Leu Asn Arg Leu Cys

485 490 495

Arg Ile Gly Asp Gln Lys Leu Tyr Asp Tyr Val Gly Leu Gly Lys Lys

500 505 510

Asp Glu Ile Asp Tyr Lys Gln Lys Val Ala Trp Phe Lys Glu His Ile

515 520 525

Ser Ile Arg Arg Gly Phe Leu Arg Lys Lys Phe Trp Tyr Asp Ser Lys

530 535 540

Lys Gly Phe Ala Lys Leu Val Glu Glu His Leu Glu Ser Gly Gly Gly

545 550 555 560

Gln Arg Asp Val Gly Leu Asp Lys Lys Tyr Tyr His Ile Asp Ala Ile

565 570 575

Gly Arg Phe Glu Gly Ala Asn Pro Ala Leu Tyr Glu Thr Leu Ala Arg

580 585 590

Asp Arg Leu Cys Leu Met Met Ala Gln Tyr Phe Leu Gly Ser Val Arg

595 600 605

Lys Glu Leu Gly Asn Lys Ile Val Trp Ser Asn Asp Ser Ile Glu Leu

610 615 620

Pro Val Glu Gly Ser Val Gly Asn Glu Lys Ser Ile Val Phe Ser Val

625 630 635 640

Ser Asp Tyr Gly Lys Leu Tyr Val Leu Asp Asp Ala Glu Phe Leu Gly

645 650 655

Arg Ile Cys Glu Tyr Phe Met Pro His Glu Lys Gly Lys Ile Arg Tyr

660 665 670

His Thr Val Tyr Glu Lys Gly Phe Arg Ala Tyr Asn Asp Leu Gln Lys

675 680 685

Lys Cys Val Glu Ala Val Leu Ala Phe Glu Glu Lys Val Val Lys Ala

690 695 700

Lys Lys Met Ser Glu Lys Glu Gly Ala His Tyr Ile Asp Phe Arg Glu

705 710 715 720

Ile Leu Ala Gln Thr Met Cys Lys Glu Ala Glu Lys Thr Ala Val Asn

725 730 735

Lys Val Arg Arg Ala Phe Phe His His His Leu Lys Phe Val Ile Asp

740 745 750

Glu Phe Gly Leu Phe Ser Asp Val Met Lys Lys Tyr Gly Ile Glu Lys

755 760 765

Glu Trp Lys Phe Pro Val Lys

770 775

<210> 2

<211> 805

<212> PRT

<213> metagenome (metagenomic)

<400> 2

Met Lys Val Glu Asn Ile Lys Glu Lys Ser Lys Lys Ala Met Tyr Leu

1 5 10 15

Ile Asn His Tyr Glu Gly Pro Lys Lys Trp Cys Phe Ala Ile Val Leu

20 25 30

Asn Arg Ala Cys Asp Asn Tyr Glu Asp Asn Pro His Leu Phe Ser Lys

35 40 45

Ser Leu Leu Glu Phe Glu Lys Thr Ser Arg Lys Asp Trp Phe Asp Glu

50 55 60

Glu Thr Arg Glu Leu Val Glu Gln Ala Asp Thr Glu Ile Gln Pro Asn

65 70 75 80

Pro Asn Leu Lys Pro Asn Thr Thr Ala Asn Arg Lys Leu Lys Asp Ile

85 90 95

Arg Asn Tyr Phe Ser His His Tyr His Lys Asn Glu Cys Leu Tyr Phe

100 105 110

Lys Asn Asp Asp Pro Ile Arg Cys Ile Met Glu Ala Ala Tyr Glu Lys

115 120 125

Ser Lys Ile Tyr Ile Lys Gly Lys Gln Ile Glu Gln Ser Asp Ile Pro

130 135 140

Leu Pro Glu Leu Phe Glu Ser Ser Gly Trp Ile Thr Pro Ala Gly Ile

145 150 155 160

Leu Leu Leu Ala Ser Phe Phe Val Glu Arg Gly Ile Leu His Arg Leu

165 170 175

Met Gly Asn Ile Gly Gly Phe Lys Asp Asn Arg Gly Glu Tyr Gly Leu

180 185 190

Thr His Asp Ile Phe Thr Thr Tyr Cys Leu Lys Gly Ser Tyr Ser Ile

195 200 205

Arg Ala Gln Asp His Asp Ala Val Met Phe Arg Asp Ile Leu Gly Tyr

210 215 220

Leu Ser Arg Val Pro Thr Glu Ser Phe Gln Arg Ile Lys Gln Pro Gln

225 230 235 240

Ile Arg Lys Glu Gly Gln Leu Ser Glu Arg Lys Thr Asp Lys Phe Ile

245 250 255

Thr Phe Ala Leu Asn Tyr Leu Glu Asp Tyr Gly Leu Lys Asp Leu Glu

260 265 270

Gly Cys Lys Ala Cys Phe Ala Arg Ser Lys Ile Val Arg Glu Gln Glu

275 280 285

Asn Val Glu Ser Ile Asn Asp Lys Glu Tyr Lys Pro His Glu Asn Lys

290 295 300

Lys Lys Val Glu Ile His Phe Asp Gln Ser Lys Glu Asp Arg Phe Tyr

305 310 315 320

Ile Asn Arg Asn Asn Val Ile Leu Lys Ile Gln Lys Lys Asp Gly His

325 330 335

Ser Asn Ile Val Arg Met Gly Val Tyr Glu Leu Lys Tyr Leu Val Leu

340 345 350

Met Ser Leu Val Gly Lys Ala Lys Glu Ala Val Glu Lys Ile Asp Asn

355 360 365

Tyr Ile Gln Asp Leu Arg Asp Gln Leu Pro Tyr Ile Glu Gly Lys Asn

370 375 380

Lys Glu Glu Ile Lys Glu Tyr Val Arg Phe Phe Pro Arg Phe Ile Arg

385 390 395 400

Ser His Leu Gly Leu Leu Gln Ile Asn Asp Glu Glu Lys Ile Lys Ala

405 410 415

Arg Leu Asp Tyr Val Lys Thr Lys Trp Leu Asp Lys Lys Glu Lys Ser

420 425 430

Lys Glu Leu Glu Leu His Lys Lys Gly Arg Asp Ile Leu Arg Tyr Ile

435 440 445

Asn Glu Arg Cys Asp Arg Glu Leu Asn Arg Asn Val Tyr Asn Arg Ile

450 455 460

Leu Glu Leu Leu Val Ser Lys Asp Leu Thr Gly Phe Tyr Arg Glu Leu

465 470 475 480

Glu Glu Leu Lys Arg Thr Arg Arg Ile Asp Lys Asn Ile Val Gln Asn

485 490 495

Leu Ser Gly Gln Lys Thr Ile Asn Ala Leu His Glu Lys Val Cys Asp

500 505 510

Leu Val Leu Lys Glu Ile Glu Ser Leu Asp Thr Glu Asn Leu Arg Lys

515 520 525

Tyr Leu Gly Leu Ile Pro Lys Glu Glu Lys Glu Val Thr Phe Lys Glu

530 535 540

Lys Val Asp Arg Ile Leu Lys Gln Pro Val Ile Tyr Lys Gly Phe Leu

545 550 555 560

Arg Tyr Gln Phe Phe Lys Asp Asp Lys Lys Ser Phe Val Leu Leu Val

565 570 575

Glu Asp Ala Leu Lys Glu Lys Gly Gly Gly Cys Asp Val Pro Leu Gly

580 585 590

Lys Glu Tyr Tyr Lys Ile Val Ser Leu Asp Lys Tyr Asp Lys Glu Asn

595 600 605

Lys Thr Leu Cys Glu Thr Leu Ala Met Asp Arg Leu Cys Leu Met Met

610 615 620

Ala Arg Gln Tyr Tyr Leu Ser Leu Asn Ala Lys Leu Ala Gln Glu Ala

625 630 635 640

Gln Gln Ile Glu Trp Lys Lys Glu Asp Ser Ile Glu Leu Ile Ile Phe

645 650 655

Thr Leu Lys Asn Pro Asp Gln Ser Lys Gln Ser Phe Ser Ile Arg Phe

660 665 670

Ser Val Arg Asp Phe Thr Lys Leu Tyr Val Thr Asp Asp Pro Glu Phe

675 680 685

Leu Ala Arg Leu Cys Ser Tyr Phe Phe Pro Val Glu Lys Glu Ile Glu

690 695 700

Tyr His Lys Leu Tyr Ser Glu Gly Ile Asn Lys Tyr Thr Asn Leu Gln

705 710 715 720

Lys Glu Gly Ile Glu Ala Ile Leu Glu Leu Glu Lys Lys Leu Ile Glu

725 730 735

Arg Asn Arg Ile Gln Ser Ala Lys Asn Tyr Leu Ser Phe Asn Glu Ile

740 745 750

Met Asn Lys Ser Gly Tyr Asn Lys Asp Glu Gln Asp Asp Leu Lys Lys

755 760 765

Val Arg Asn Ser Leu Leu His Tyr Lys Leu Ile Phe Glu Lys Glu His

770 775 780

Leu Lys Lys Phe Tyr Glu Val Met Arg Gly Glu Gly Ile Glu Lys Lys

785 790 795 800

Trp Ser Leu Ile Val

805

<210> 3

<211> 790

<212> PRT

<213> metagenome (metagenomic)

<400> 3

Met Asn Gly Ile Glu Leu Lys Lys Glu Glu Ala Ala Phe Tyr Phe Asn

1 5 10 15

Gln Ala Glu Leu Asn Leu Lys Ala Ile Glu Asp Asn Ile Phe Asp Lys

20 25 30

Glu Arg Arg Lys Thr Leu Leu Asn Asn Pro Gln Ile Leu Ala Lys Met

35 40 45

Glu Asn Phe Ile Phe Asn Phe Arg Asp Val Thr Lys Asn Ala Lys Gly

50 55 60

Glu Ile Asp Cys Leu Leu Leu Lys Leu Arg Glu Leu Arg Asn Phe Tyr

65 70 75 80

Ser His Tyr Val His Lys Arg Asp Val Arg Glu Leu Ser Lys Gly Glu

85 90 95

Lys Pro Ile Leu Glu Lys Tyr Tyr Gln Phe Ala Ile Glu Ser Thr Gly

100 105 110

Ser Glu Asn Val Lys Leu Glu Ile Ile Glu Asn Asp Ala Trp Leu Ala

115 120 125

Asp Ala Gly Val Leu Phe Phe Leu Cys Ile Phe Leu Lys Lys Ser Gln

130 135 140

Ala Asn Lys Leu Ile Ser Gly Ile Ser Gly Phe Lys Arg Asn Asp Asp

145 150 155 160

Thr Gly Gln Pro Arg Arg Asn Leu Phe Thr Tyr Phe Ser Ile Arg Glu

165 170 175

Gly Tyr Lys Val Val Pro Glu Met Gln Lys His Phe Leu Leu Phe Ser

180 185 190

Leu Val Asn His Leu Ser Asn Gln Asp Asp Tyr Ile Glu Lys Ala His

195 200 205

Gln Pro Tyr Asp Ile Gly Glu Gly Leu Phe Phe His Arg Ile Ala Ser

210 215 220

Thr Phe Leu Asn Ile Ser Gly Ile Leu Arg Asn Met Lys Phe Tyr Thr

225 230 235 240

Tyr Gln Ser Lys Arg Leu Val Glu Gln Arg Gly Glu Leu Lys Arg Glu

245 250 255

Lys Asp Ile Phe Ala Trp Glu Glu Pro Phe Gln Gly Asn Ser Tyr Phe

260 265 270

Glu Ile Asn Gly His Lys Gly Val Ile Gly Glu Asp Glu Leu Lys Glu

275 280 285

Leu Cys Tyr Ala Phe Leu Ile Gly Asn Gln Asp Ala Asn Lys Val Glu

290 295 300

Gly Arg Ile Thr Gln Phe Leu Glu Lys Phe Arg Asn Ala Asn Ser Val

305 310 315 320

Gln Gln Val Lys Asp Asp Glu Met Leu Lys Pro Glu Tyr Phe Pro Ala

325 330 335

Asn Tyr Phe Ala Glu Ser Gly Val Gly Arg Ile Lys Asp Arg Val Leu

340 345 350

Asn Arg Leu Asn Lys Ala Ile Lys Ser Asn Lys Ala Lys Lys Gly Glu

355 360 365

Ile Ile Ala Tyr Asp Lys Met Arg Glu Val Met Ala Phe Ile Asn Asn

370 375 380

Ser Leu Pro Val Asp Glu Lys Leu Lys Pro Lys Asp Tyr Lys Arg Tyr

385 390 395 400

Leu Gly Met Val Arg Phe Trp Asp Arg Glu Lys Asp Asn Ile Lys Arg

405 410 415

Glu Phe Glu Thr Lys Glu Trp Ser Lys Tyr Leu Pro Ser Asn Phe Trp

420 425 430

Thr Ala Lys Asn Leu Glu Arg Val Tyr Gly Leu Ala Arg Glu Lys Asn

435 440 445

Ala Glu Leu Phe Asn Lys Leu Lys Ala Asp Val Glu Lys Met Asp Glu

450 455 460

Arg Glu Leu Glu Lys Tyr Gln Lys Ile Asn Asp Ala Lys Asp Leu Ala

465 470 475 480

Asn Leu Arg Arg Leu Ala Ser Asp Phe Gly Val Lys Trp Glu Glu Lys

485 490 495

Asp Trp Asp Glu Tyr Ser Gly Gln Ile Lys Lys Gln Ile Thr Asp Ser

500 505 510

Gln Lys Leu Thr Ile Met Lys Gln Arg Ile Thr Ala Gly Leu Lys Lys

515 520 525

Lys His Gly Ile Glu Asn Leu Asn Leu Arg Ile Thr Ile Asp Ile Asn

530 535 540

Lys Ser Arg Lys Ala Val Leu Asn Arg Ile Ala Ile Pro Arg Gly Phe

545 550 555 560

Val Lys Arg His Ile Leu Gly Trp Gln Glu Ser Glu Lys Val Ser Lys

565 570 575

Lys Ile Arg Glu Ala Glu Cys Glu Ile Leu Leu Ser Lys Glu Tyr Glu

580 585 590

Glu Leu Ser Lys Gln Phe Phe Gln Ser Lys Asp Tyr Asp Lys Met Thr

595 600 605

Arg Ile Asn Gly Leu Tyr Glu Lys Asn Lys Leu Ile Ala Leu Met Ala

610 615 620

Val Tyr Leu Met Gly Gln Leu Arg Ile Leu Phe Lys Glu His Thr Lys

625 630 635 640

Leu Asp Asp Ile Thr Lys Thr Thr Val Asp Phe Lys Ile Ser Asp Lys

645 650 655

Val Thr Val Lys Ile Pro Phe Ser Asn Tyr Pro Ser Leu Val Tyr Thr

660 665 670

Met Ser Ser Lys Tyr Val Asp Asn Ile Gly Asn Tyr Gly Phe Ser Asn

675 680 685

Lys Asp Lys Asp Lys Pro Ile Leu Gly Lys Ile Asp Val Ile Glu Lys

690 695 700

Gln Arg Met Glu Phe Ile Lys Glu Val Leu Gly Phe Glu Lys Tyr Leu

705 710 715 720

Phe Asp Asp Lys Ile Ile Asp Lys Ser Lys Phe Ala Asp Thr Ala Thr

725 730 735

His Ile Ser Phe Ala Glu Ile Val Glu Glu Leu Val Glu Lys Gly Trp

740 745 750

Asp Lys Asp Arg Leu Thr Lys Leu Lys Asp Ala Arg Asn Lys Ala Leu

755 760 765

His Gly Glu Ile Leu Thr Gly Thr Ser Phe Asp Glu Thr Lys Ser Leu

770 775 780

Ile Asn Glu Leu Lys Lys

785 790

<210> 4

<211> 792

<212> PRT

<213> metagenome (metagenomic)

<400> 4

Met Ser Pro Asp Phe Ile Lys Leu Glu Lys Gln Glu Ala Ala Phe Tyr

1 5 10 15

Phe Asn Gln Thr Glu Leu Asn Leu Lys Ala Ile Glu Ser Asn Ile Leu

20 25 30

Asp Lys Gln Gln Arg Met Ile Leu Leu Asn Asn Pro Arg Ile Leu Ala

35 40 45

Lys Val Gly Asn Phe Ile Phe Asn Phe Arg Asp Val Thr Lys Asn Ala

50 55 60

Lys Gly Glu Ile Asp Cys Leu Leu Phe Lys Leu Glu Glu Leu Arg Asn

65 70 75 80

Phe Tyr Ser His Tyr Val His Thr Asp Asn Val Lys Glu Leu Ser Asn

85 90 95

Gly Glu Lys Pro Leu Leu Glu Arg Tyr Tyr Gln Ile Ala Ile Gln Ala

100 105 110

Thr Arg Ser Glu Asp Val Lys Phe Glu Leu Phe Glu Thr Arg Asn Glu

115 120 125

Asn Lys Ile Thr Asp Ala Gly Val Leu Phe Phe Leu Cys Met Phe Leu

130 135 140

Lys Lys Ser Gln Ala Asn Lys Leu Ile Ser Gly Ile Ser Gly Phe Lys

145 150 155 160

Arg Asn Asp Pro Thr Gly Gln Pro Arg Arg Asn Leu Phe Thr Tyr Phe

165 170 175

Ser Ala Arg Glu Gly Tyr Lys Ala Leu Pro Asp Met Gln Lys His Phe

180 185 190

Leu Leu Phe Thr Leu Val Asn Tyr Leu Ser Asn Gln Asp Glu Tyr Ile

195 200 205

Ser Glu Leu Lys Gln Tyr Gly Glu Ile Gly Gln Gly Ala Phe Phe Asn

210 215 220

Arg Ile Ala Ser Thr Phe Leu Asn Ile Ser Gly Ile Ser Gly Asn Thr

225 230 235 240

Lys Phe Tyr Ser Tyr Gln Ser Lys Arg Ile Lys Glu Gln Arg Gly Glu

245 250 255

Leu Asn Ser Glu Lys Asp Ser Phe Glu Trp Ile Glu Pro Phe Gln Gly

260 265 270

Asn Ser Tyr Phe Glu Ile Asn Gly His Lys Gly Val Ile Gly Glu Asp

275 280 285

Glu Leu Lys Glu Leu Cys Tyr Ala Leu Leu Val Ala Lys Gln Asp Ile

290 295 300

Asn Ala Val Glu Gly Lys Ile Met Gln Phe Leu Lys Lys Phe Arg Asn

305 310 315 320

Thr Gly Asn Leu Gln Gln Val Lys Asp Asp Glu Met Leu Glu Ile Glu

325 330 335

Tyr Phe Pro Ala Ser Tyr Phe Asn Glu Ser Lys Lys Glu Asp Ile Lys

340 345 350

Lys Glu Ile Leu Gly Arg Leu Asp Lys Lys Ile Arg Ser Cys Ser Ala

355 360 365

Lys Ala Glu Lys Ala Tyr Asp Lys Met Lys Glu Val Met Glu Phe Ile

370 375 380

Asn Asn Ser Leu Pro Ala Glu Glu Lys Leu Lys Arg Lys Asp Tyr Arg

385 390 395 400

Arg Tyr Leu Lys Met Val Arg Phe Trp Ser Arg Glu Lys Gly Asn Ile

405 410 415

Glu Arg Glu Phe Arg Thr Lys Glu Trp Ser Lys Tyr Phe Ser Ser Asp

420 425 430

Phe Trp Arg Lys Asn Asn Leu Glu Asp Val Tyr Lys Leu Ala Thr Gln

435 440 445

Lys Asn Ala Glu Leu Phe Lys Asn Leu Lys Ala Ala Ala Glu Lys Met

450 455 460

Gly Glu Thr Glu Phe Glu Lys Tyr Gln Gln Ile Asn Asp Val Lys Asp

465 470 475 480

Leu Ala Ser Leu Arg Arg Leu Thr Gln Asp Phe Gly Leu Lys Trp Glu

485 490 495

Glu Lys Asp Trp Glu Glu Tyr Ser Glu Gln Ile Lys Lys Gln Ile Thr

500 505 510

Asp Arg Gln Lys Leu Thr Ile Met Lys Gln Arg Val Thr Ala Glu Leu

515 520 525

Lys Lys Lys His Gly Ile Glu Asn Leu Asn Leu Arg Ile Thr Ile Asp

530 535 540

Ser Asn Lys Ser Arg Lys Ala Val Leu Asn Arg Ile Ala Ile Pro Arg

545 550 555 560

Gly Phe Val Lys Lys His Ile Leu Gly Trp Gln Gly Ser Glu Lys Ile

565 570 575

Ser Lys Asn Ile Arg Glu Ala Glu Cys Lys Ile Leu Leu Ser Lys Lys

580 585 590

Tyr Glu Glu Leu Ser Arg Gln Phe Phe Glu Ala Gly Asn Phe Asp Lys

595 600 605

Leu Thr Gln Ile Asn Gly Leu Tyr Glu Lys Asn Lys Leu Thr Ala Phe

610 615 620

Met Ser Val Tyr Leu Met Gly Arg Leu Asn Ile Gln Leu Asn Lys His

625 630 635 640

Thr Glu Leu Gly Asn Leu Lys Lys Thr Glu Val Asp Phe Lys Ile Ser

645 650 655

Asp Lys Val Thr Glu Lys Ile Pro Phe Ser Gln Tyr Pro Ser Leu Val

660 665 670

Tyr Ala Met Ser Arg Lys Tyr Val Asp Asn Val Asp Lys Tyr Lys Phe

675 680 685

Ser His Gln Asp Lys Lys Lys Pro Phe Leu Gly Lys Ile Asp Ser Ile

690 695 700

Glu Lys Glu Arg Ile Glu Phe Ile Lys Glu Val Leu Asp Phe Glu Glu

705 710 715 720

Tyr Leu Phe Lys Asn Lys Val Ile Asp Lys Ser Lys Phe Ser Asp Thr

725 730 735

Ala Thr His Ile Ser Phe Lys Glu Ile Cys Asp Glu Met Gly Lys Lys

740 745 750

Gly Cys Asn Arg Asn Lys Leu Thr Glu Leu Asn Asn Ala Arg Asn Ala

755 760 765

Ala Leu His Gly Glu Ile Pro Ser Glu Thr Ser Phe Arg Glu Ala Lys

770 775 780

Pro Leu Ile Asn Glu Leu Lys Lys

785 790

<210> 5

<211> 792

<212> PRT

<213> metagenome (metagenomic)

<400> 5

Met Ser Pro Asp Phe Ile Lys Leu Glu Lys Gln Glu Ala Ala Phe Tyr

1 5 10 15

Phe Asn Gln Thr Glu Leu Asn Leu Lys Ala Ile Glu Ser Asn Ile Phe

20 25 30

Asp Lys Gln Gln Arg Val Ile Leu Leu Asn Asn Pro Gln Ile Leu Ala

35 40 45

Lys Val Gly Asp Phe Ile Phe Asn Phe Arg Asp Val Thr Lys Asn Ala

50 55 60

Lys Gly Glu Ile Asp Cys Leu Leu Leu Lys Leu Arg Glu Leu Arg Asn

65 70 75 80

Phe Tyr Ser His Tyr Val Tyr Thr Asp Asp Val Lys Ile Leu Ser Asn

85 90 95

Gly Glu Arg Pro Leu Leu Glu Lys Tyr Tyr Gln Phe Ala Ile Glu Ala

100 105 110

Thr Gly Ser Glu Asn Val Lys Leu Glu Ile Ile Glu Ser Asn Asn Arg

115 120 125

Leu Thr Glu Ala Gly Val Leu Phe Phe Leu Cys Met Phe Leu Lys Lys

130 135 140

Ser Gln Ala Asn Lys Leu Ile Ser Gly Ile Ser Gly Phe Lys Arg Asn

145 150 155 160

Asp Pro Thr Gly Gln Pro Arg Arg Asn Leu Phe Thr Tyr Phe Ser Val

165 170 175

Arg Glu Gly Tyr Lys Val Val Pro Asp Met Gln Lys His Phe Leu Leu

180 185 190

Phe Val Leu Val Asn His Leu Ser Gly Gln Asp Asp Tyr Ile Glu Lys

195 200 205

Ala Gln Lys Pro Tyr Asp Ile Gly Glu Gly Leu Phe Phe His Arg Ile

210 215 220

Ala Ser Thr Phe Leu Asn Ile Ser Gly Ile Leu Arg Asn Met Glu Phe

225 230 235 240

Tyr Ile Tyr Gln Ser Lys Arg Leu Lys Glu Gln Gln Gly Glu Leu Lys

245 250 255

Arg Glu Lys Asp Ile Phe Pro Trp Ile Glu Pro Phe Gln Gly Asn Ser

260 265 270

Tyr Phe Glu Ile Asn Gly Asn Lys Gly Ile Ile Gly Glu Asp Glu Leu

275 280 285

Lys Glu Leu Cys Tyr Ala Leu Leu Val Ala Gly Lys Asp Val Arg Ala

290 295 300

Val Glu Gly Lys Ile Thr Gln Phe Leu Glu Lys Phe Lys Asn Ala Asp

305 310 315 320

Asn Ala Gln Gln Val Glu Lys Asp Glu Met Leu Asp Arg Asn Asn Phe

325 330 335

Pro Ala Asn Tyr Phe Ala Glu Ser Asn Ile Gly Ser Ile Lys Glu Lys

340 345 350

Ile Leu Asn Arg Leu Gly Lys Thr Asp Asp Ser Tyr Asn Lys Thr Gly

355 360 365

Thr Lys Ile Lys Pro Tyr Asp Met Met Lys Glu Val Met Glu Phe Ile

370 375 380

Asn Asn Ser Leu Pro Ala Asp Glu Lys Leu Lys Arg Lys Asp Tyr Arg

385 390 395 400

Arg Tyr Leu Lys Met Val Arg Ile Trp Asp Ser Glu Lys Asp Asn Ile

405 410 415

Lys Arg Glu Phe Glu Ser Lys Glu Trp Ser Lys Tyr Phe Ser Ser Asp

420 425 430

Phe Trp Met Ala Lys Asn Leu Glu Arg Val Tyr Gly Leu Ala Arg Glu

435 440 445

Lys Asn Ala Glu Leu Phe Asn Lys Leu Lys Ala Val Val Glu Lys Met

450 455 460

Asp Glu Arg Glu Phe Glu Lys Tyr Arg Leu Ile Asn Ser Ala Glu Asp

465 470 475 480

Leu Ala Ser Leu Arg Arg Leu Ala Lys Asp Phe Gly Leu Lys Trp Glu

485 490 495

Glu Lys Asp Trp Gln Glu Tyr Ser Gly Gln Ile Lys Lys Gln Ile Ser

500 505 510

Asp Arg Gln Lys Leu Thr Ile Met Lys Gln Arg Ile Thr Ala Glu Leu

515 520 525

Lys Lys Lys His Gly Ile Glu Asn Leu Asn Leu Arg Ile Thr Ile Asp

530 535 540

Ser Asn Lys Ser Arg Lys Ala Val Leu Asn Arg Ile Ala Val Pro Arg

545 550 555 560

Gly Phe Val Lys Glu His Ile Leu Gly Trp Gln Gly Ser Glu Lys Val

565 570 575

Ser Lys Lys Thr Arg Glu Ala Lys Cys Lys Ile Leu Leu Ser Lys Glu

580 585 590

Tyr Glu Glu Leu Ser Lys Gln Phe Phe Gln Thr Arg Asn Tyr Asp Lys

595 600 605

Met Thr Gln Val Asn Gly Leu Tyr Glu Lys Asn Lys Leu Leu Ala Phe

610 615 620

Met Val Val Tyr Leu Met Glu Arg Leu Asn Ile Leu Leu Asn Lys Pro

625 630 635 640

Thr Glu Leu Asn Glu Leu Glu Lys Ala Glu Val Asp Phe Lys Ile Ser

645 650 655

Asp Lys Val Met Ala Lys Ile Pro Phe Ser Gln Tyr Pro Ser Leu Val

660 665 670

Tyr Ala Met Ser Ser Lys Tyr Ala Asp Ser Val Gly Ser Tyr Lys Phe

675 680 685

Glu Asn Asp Glu Lys Asn Lys Pro Phe Leu Gly Lys Ile Asp Thr Ile

690 695 700

Glu Lys Gln Arg Met Glu Phe Ile Lys Glu Val Leu Gly Phe Glu Glu

705 710 715 720

Tyr Leu Phe Glu Lys Lys Ile Ile Asp Lys Ser Glu Phe Ala Asp Thr

725 730 735

Ala Thr His Ile Ser Phe Asp Glu Ile Cys Asn Glu Leu Ile Lys Lys

740 745 750

Gly Trp Asp Lys Asp Lys Leu Thr Lys Leu Lys Asp Ala Arg Asn Ala

755 760 765

Ala Leu His Gly Glu Ile Pro Ala Glu Thr Ser Phe Arg Glu Ala Lys

770 775 780

Pro Leu Ile Asn Gly Leu Lys Lys

785 790

<210> 6

<211> 799

<212> PRT

<213> metagenome (metagenomic)

<400> 6

Met Asn Ile Ile Lys Leu Lys Lys Glu Glu Ala Ala Phe Tyr Phe Asn

1 5 10 15

Gln Thr Ile Leu Asn Leu Ser Gly Leu Asp Glu Ile Ile Glu Lys Gln

20 25 30

Ile Pro His Ile Ile Ser Asn Lys Glu Asn Ala Lys Lys Val Ile Asp

35 40 45

Lys Ile Phe Asn Asn Arg Leu Leu Leu Lys Ser Val Glu Asn Tyr Ile

50 55 60

Tyr Asn Phe Lys Asp Val Ala Lys Asn Ala Arg Thr Glu Ile Glu Ala

65 70 75 80

Ile Leu Leu Lys Leu Val Glu Leu Arg Asn Phe Tyr Ser His Tyr Val

85 90 95

His Asn Asp Thr Val Lys Ile Leu Ser Asn Gly Glu Lys Pro Ile Leu

100 105 110

Glu Lys Tyr Tyr Gln Ile Ala Ile Glu Ala Thr Gly Ser Lys Asn Val

115 120 125

Lys Leu Val Ile Ile Glu Asn Asn Asn Cys Leu Thr Asp Ser Gly Val

130 135 140

Leu Phe Leu Leu Cys Met Phe Leu Lys Lys Ser Gln Ala Asn Lys Leu

145 150 155 160

Ile Ser Ser Val Ser Gly Phe Lys Arg Asn Asp Lys Glu Gly Gln Pro

165 170 175

Arg Arg Asn Leu Phe Thr Tyr Tyr Ser Val Arg Glu Gly Tyr Lys Val

180 185 190

Val Pro Asp Met Gln Lys His Phe Leu Leu Phe Ala Leu Val Asn His

195 200 205

Leu Ser Glu Gln Asp Asp His Ile Glu Lys Gln Gln Gln Ser Asp Glu

210 215 220

Leu Gly Lys Gly Leu Phe Phe His Arg Ile Ala Ser Thr Phe Leu Asn

225 230 235 240

Glu Ser Gly Ile Phe Asn Lys Met Gln Phe Tyr Thr Tyr Gln Ser Asn

245 250 255

Arg Leu Lys Glu Lys Arg Gly Glu Leu Lys His Glu Lys Asp Thr Phe

260 265 270

Thr Trp Ile Glu Pro Phe Gln Gly Asn Ser Tyr Phe Thr Leu Asn Gly

275 280 285

His Lys Gly Val Ile Ser Glu Asp Gln Leu Lys Glu Leu Cys Tyr Thr

290 295 300

Ile Leu Ile Glu Lys Gln Asn Val Asp Ser Leu Glu Gly Lys Ile Ile

305 310 315 320

Gln Phe Leu Lys Lys Phe Gln Asn Val Ser Ser Lys Gln Gln Val Asp

325 330 335

Glu Asp Glu Leu Leu Lys Arg Glu Tyr Phe Pro Ala Asn Tyr Phe Gly

340 345 350

Arg Ala Gly Thr Gly Thr Leu Lys Glu Lys Ile Leu Asn Arg Leu Asp

355 360 365

Lys Arg Met Asp Pro Thr Ser Lys Val Thr Asp Lys Ala Tyr Asp Lys

370 375 380

Met Ile Glu Val Met Glu Phe Ile Asn Met Cys Leu Pro Ser Asp Glu

385 390 395 400

Lys Leu Arg Gln Lys Asp Tyr Arg Arg Tyr Leu Lys Met Val Arg Phe

405 410 415

Trp Asn Lys Glu Lys His Asn Ile Lys Arg Glu Phe Asp Ser Lys Lys

420 425 430

Trp Thr Arg Phe Leu Pro Thr Glu Leu Trp Asn Lys Arg Asn Leu Glu

435 440 445

Glu Ala Tyr Gln Leu Ala Arg Lys Glu Asn Lys Lys Lys Leu Glu Asp

450 455 460

Met Arg Asn Gln Val Arg Ser Leu Lys Glu Asn Asp Leu Glu Lys Tyr

465 470 475 480

Gln Gln Ile Asn Tyr Val Asn Asp Leu Glu Asn Leu Arg Leu Leu Ser

485 490 495

Gln Glu Leu Gly Val Lys Trp Gln Glu Lys Asp Trp Val Glu Tyr Ser

500 505 510

Gly Gln Ile Lys Lys Gln Ile Ser Asp Asn Gln Lys Leu Thr Ile Met

515 520 525

Lys Gln Arg Ile Thr Ala Glu Leu Lys Lys Met His Gly Ile Glu Asn

530 535 540

Leu Asn Leu Arg Ile Ser Ile Asp Thr Asn Lys Ser Arg Gln Thr Val

545 550 555 560

Met Asn Arg Ile Ala Leu Pro Lys Gly Phe Val Lys Asn His Ile Gln

565 570 575

Gln Asn Ser Ser Glu Lys Ile Ser Lys Arg Ile Arg Glu Asp Tyr Cys

580 585 590

Lys Ile Glu Leu Ser Gly Lys Tyr Glu Glu Leu Ser Arg Gln Phe Phe

595 600 605

Asp Lys Lys Asn Phe Asp Lys Met Thr Leu Ile Asn Gly Leu Cys Glu

610 615 620

Lys Asn Lys Leu Ile Ala Phe Met Val Ile Tyr Leu Leu Glu Arg Leu

625 630 635 640

Gly Phe Glu Leu Lys Glu Lys Thr Lys Leu Gly Glu Leu Lys Gln Thr

645 650 655

Arg Met Thr Tyr Lys Ile Ser Asp Lys Val Lys Glu Asp Ile Pro Leu

660 665 670

Ser Tyr Tyr Pro Lys Leu Val Tyr Ala Met Asn Arg Lys Tyr Val Asp

675 680 685

Asn Ile Asp Ser Tyr Ala Phe Ala Ala Tyr Glu Ser Lys Lys Ala Ile

690 695 700

Leu Asp Lys Val Asp Ile Ile Glu Lys Gln Arg Met Glu Phe Ile Lys

705 710 715 720

Gln Val Leu Cys Phe Glu Glu Tyr Ile Phe Glu Asn Arg Ile Ile Glu

725 730 735

Lys Ser Lys Phe Asn Asp Glu Glu Thr His Ile Ser Phe Thr Gln Ile

740 745 750

His Asp Glu Leu Ile Lys Lys Gly Arg Asp Thr Glu Lys Leu Ser Lys

755 760 765

Leu Lys His Ala Arg Asn Lys Ala Leu His Gly Glu Ile Pro Asp Gly

770 775 780

Thr Ser Phe Glu Lys Ala Lys Leu Leu Ile Asn Glu Ile Lys Lys

785 790 795

<210> 7

<211> 803

<212> PRT

<213> metagenome (metagenomic)

<400> 7

Met Asn Ala Ile Glu Leu Lys Lys Glu Glu Ala Ala Phe Tyr Phe Asn

1 5 10 15

Gln Ala Arg Leu Asn Ile Ser Gly Leu Asp Glu Ile Ile Glu Lys Gln

20 25 30

Leu Pro His Ile Gly Ser Asn Arg Glu Asn Ala Lys Lys Thr Val Asp

35 40 45

Met Ile Leu Asp Asn Pro Glu Val Leu Lys Lys Met Glu Asn Tyr Val

50 55 60

Phe Asn Ser Arg Asp Ile Ala Lys Asn Ala Arg Gly Glu Leu Glu Ala

65 70 75 80

Leu Leu Leu Lys Leu Val Glu Leu Arg Asn Phe Tyr Ser His Tyr Val

85 90 95

His Lys Asp Asp Val Lys Thr Leu Ser Tyr Gly Glu Lys Pro Leu Leu

100 105 110

Asp Lys Tyr Tyr Glu Ile Ala Ile Glu Ala Thr Gly Ser Lys Asp Val

115 120 125

Arg Leu Glu Ile Ile Asp Asp Lys Asn Lys Leu Thr Asp Ala Gly Val

130 135 140

Leu Phe Leu Leu Cys Met Phe Leu Lys Lys Ser Glu Ala Asn Lys Leu

145 150 155 160

Ile Ser Ser Ile Arg Gly Phe Lys Arg Asn Asp Lys Glu Gly Gln Pro

165 170 175

Arg Arg Asn Leu Phe Thr Tyr Tyr Ser Val Arg Glu Gly Tyr Lys Val

180 185 190

Val Pro Asp Met Gln Lys His Phe Leu Leu Phe Thr Leu Val Asn His

195 200 205

Leu Ser Asn Gln Asp Glu Tyr Ile Ser Asn Leu Arg Pro Asn Gln Glu

210 215 220

Ile Gly Gln Gly Gly Phe Phe His Arg Ile Ala Ser Lys Phe Leu Ser

225 230 235 240

Asp Ser Gly Ile Leu His Ser Met Lys Phe Tyr Thr Tyr Arg Ser Lys

245 250 255

Arg Leu Thr Glu Gln Arg Gly Glu Leu Lys Pro Lys Lys Asp His Phe

260 265 270

Thr Trp Ile Glu Pro Phe Gln Gly Asn Ser Tyr Phe Ser Val Gln Gly

275 280 285

Gln Lys Gly Val Ile Gly Glu Glu Gln Leu Lys Glu Leu Cys Tyr Val

290 295 300

Leu Leu Val Ala Arg Glu Asp Phe Arg Ala Val Glu Gly Lys Val Thr

305 310 315 320

Gln Phe Leu Lys Lys Phe Gln Asn Ala Asn Asn Val Gln Gln Val Glu

325 330 335

Lys Asp Glu Val Leu Glu Lys Glu Tyr Phe Pro Ala Asn Tyr Phe Glu

340 345 350

Asn Arg Asp Val Gly Arg Val Lys Asp Lys Ile Leu Asn Arg Leu Lys

355 360 365

Lys Ile Thr Glu Ser Tyr Lys Ala Lys Gly Arg Glu Val Lys Ala Tyr

370 375 380

Asp Lys Met Lys Glu Val Met Glu Phe Ile Asn Asn Cys Leu Pro Thr

385 390 395 400

Asp Glu Asn Leu Lys Leu Lys Asp Tyr Arg Arg Tyr Leu Lys Met Val

405 410 415

Arg Phe Trp Gly Arg Glu Lys Glu Asn Ile Lys Arg Glu Phe Asp Ser

420 425 430

Lys Lys Trp Glu Arg Phe Leu Pro Arg Glu Leu Trp Gln Lys Arg Asn

435 440 445

Leu Glu Asp Ala Tyr Gln Leu Ala Lys Glu Lys Asn Thr Glu Leu Phe

450 455 460

Asn Lys Leu Lys Thr Thr Val Glu Arg Met Asn Glu Leu Glu Phe Glu

465 470 475 480

Lys Tyr Gln Gln Ile Asn Asp Ala Lys Asp Leu Ala Asn Leu Arg Gln

485 490 495

Leu Ala Arg Asp Phe Gly Val Lys Trp Glu Glu Lys Asp Trp Gln Glu

500 505 510

Tyr Ser Gly Gln Ile Lys Lys Gln Ile Thr Asp Arg Gln Lys Leu Thr

515 520 525

Ile Met Lys Gln Arg Ile Thr Ala Ala Leu Lys Lys Lys Gln Gly Ile

530 535 540

Glu Asn Leu Asn Leu Arg Ile Thr Thr Asp Thr Asn Lys Ser Arg Lys

545 550 555 560

Val Val Leu Asn Arg Ile Ala Leu Pro Lys Gly Phe Val Arg Lys His

565 570 575

Ile Leu Lys Thr Asp Ile Lys Ile Ser Lys Gln Ile Arg Gln Ser Gln

580 585 590

Cys Pro Ile Ile Leu Ser Asn Asn Tyr Met Lys Leu Ala Lys Glu Phe

595 600 605

Phe Glu Glu Arg Asn Phe Asp Lys Met Thr Gln Ile Asn Gly Leu Phe

610 615 620

Glu Lys Asn Val Leu Ile Ala Phe Met Ile Val Tyr Leu Met Glu Gln

625 630 635 640

Leu Asn Leu Arg Leu Gly Lys Asn Thr Glu Leu Ser Asn Leu Lys Lys

645 650 655

Thr Glu Val Asn Phe Thr Ile Thr Asp Lys Val Thr Glu Lys Val Gln

660 665 670

Ile Ser Gln Tyr Pro Ser Leu Val Phe Ala Ile Asn Arg Glu Tyr Val

675 680 685

Asp Gly Ile Ser Gly Tyr Lys Leu Pro Pro Lys Lys Pro Lys Glu Pro

690 695 700

Pro Tyr Thr Phe Phe Glu Lys Ile Asp Ala Ile Glu Lys Glu Arg Met

705 710 715 720

Glu Phe Ile Lys Gln Val Leu Gly Phe Glu Glu His Leu Phe Glu Lys

725 730 735

Asn Val Ile Asp Lys Thr Arg Phe Thr Asp Thr Ala Thr His Ile Ser

740 745 750

Phe Asn Glu Ile Cys Asp Glu Leu Ile Lys Lys Gly Trp Asp Glu Asn

755 760 765

Lys Ile Ile Lys Leu Lys Asp Ala Arg Asn Ala Ala Leu His Gly Lys

770 775 780

Ile Pro Glu Asp Thr Ser Phe Asp Glu Ala Lys Val Leu Ile Asn Glu

785 790 795 800

Leu Lys Lys

<210> 8

<211> 36

<212> DNA

<213> metagenome (metagenomic)

<400> 8

gctggagcag cccccgattt gtggggtgat tacagc 36

<210> 9

<211> 36

<212> DNA

<213> metagenome (metagenomic)

<400> 9

gctgaagaag cctccgattt gagaggtgat tacagc 36

<210> 10

<211> 36

<212> DNA

<213> metagenome (metagenomic)

<400> 10

gctgtgatag acctcgattt gtggggtagt aacagc 36

<210> 11

<211> 36

<212> DNA

<213> metagenome (metagenomic)

<400> 11

gctgtgatag acctcgattt gtggggtagt aacagc 36

<210> 12

<211> 36

<212> DNA

<213> metagenome (metagenomic)

<400> 12

gctgtgatag acctcgattt gtggggtagt aacagc 36

<210> 13

<211> 36

<212> DNA

<213> metagenome (metagenomic)

<400> 13

gctgtgatgg gcctcaattt gtggggaagt aacagc 36

<210> 14

<211> 36

<212> DNA

<213> metagenome (metagenomic)

<400> 14

gctgtgatag gcctcgattt gtggggtagt aacagc 36

<210> 15

<211> 2328

<212> DNA

<213> metagenome (metagenomic)

<400> 15

atggcgcaag tgtcaaagca gacttcgaaa aagagagagt tgtctatcga tgaatatcaa 60

ggtgctcgga aatggtgttt tacgattgcc ttcaacaagg ctcttgtgaa tcgagataag 120

aacgacgggc tttttgtcga gtcgctgtta cgccatgaaa agtattcaaa gcacgactgg 180

tacgatgagg atacacgcgc tttgatcaag tgtagcacac aagcggccaa tgcgaaggcc 240

gaggcgttaa gaaactattt ctcccactat cgacattcgc ccgggtgtct gacatttaca 300

gcagaagatg agttgcggac aatcatggaa agggcgtatg agcgggcgat ctttgaatgc 360

aggagacgcg aaactgaagt gatcatcgag tttcccagcc tgttcgaagg cgaccggatc 420

actacggcgg gggttgtgtt tttcgtttcg ttctttgttg aacggcgggt gctggatcgt 480

ttgtacggtg cggtaagtgg gcttaagaaa aacgaaggac agtacaagct gactcggaag 540

gcgctttcga tgtattgcct gaaagacagt cgtttcacga aggcgtggga caaacgcgtg 600

ctgcttttca gggatatact cgcgcagctt ggacgcatcc ctgcggaggc gtatgaatac 660

taccacggag agcagggcga caagaaaaga gcaaacgaca atgaggggac gaatccgaaa 720

cgccataaag acaagttcat cgagtttgca ctgcattatc tggaggcgca acacagtgag 780

atatgcttcg ggcggcgaca cattgtcagg gaggaggccg gggcaggcga cgaacacaaa 840

aagcacagga ccaaaggcaa ggtagttgtc gacttttcaa aaaaagacga agatcagtca 900

tactatatca gtaagaacaa tgttatcgtc aggattgata agaatgccgg gcctcggagt 960

tatcgcatgg ggcttaacga attgaaatac cttgtattgc ttagccttca gggaaagggc 1020

gacgatgcga ttgcaaaact gtacaggtat cggcagcatg tggagaacat tctggatgta 1080

gtgaaggtca cagataagga taatcacgtc ttcctgccgc gatttgtgct ggagcaacat 1140

gggattggca ggaaagcttt taagcaaaga atagacggca gagtaaagca tgttcgaggg 1200

gtgtgggaaa agaagaaggc ggcgaccaac gagatgacac ttcacgagaa ggcgcgggac 1260

attcttcaat acgtaaatga aaattgcacg aggtctttca atcccggcga gtacaaccgg 1320

ctgctggtgt gtctggttgg caaggatgtt gagaattttc aggcgggact gaaacgcctg 1380

caactggccg agcgaatcga cgggcgggta tattcaattt ttgcgcagac ctccacaata 1440

aacgagatgc atcaggtggt gtgtgatcag attctcaaca gactttgccg aatcggcgat 1500

cagaagctct acgattatgt ggggcttggg aagaaggatg aaatagatta caagcagaag 1560

gttgcatggt tcaaggagca tatttctatc cgcaggggtt tcttgcgcaa gaagttctgg 1620

tatgacagca agaagggatt cgcgaagctt gtggaagagc atttggaaag cggcggcgga 1680

cagagggacg ttgggctgga taaaaagtat tatcatattg atgcgattgg gcgattcgag 1740

ggtgctaatc cagccttgta tgaaacgctg gcgcgagacc gtttgtgtct gatgatggcg 1800

caatacttcc tggggagtgt acgcaaggaa ttgggtaata aaattgtgtg gtcgaatgat 1860

agcatcgagt tgcccgtgga gggctcagtg ggtaacgaaa aaagcatcgt cttctcagtg 1920

agtgattacg gcaagttata tgtgttggat gacgctgagt ttcttgggcg gatatgtgag 1980

tactttatgc cgcacgaaaa agggaagata cggtatcata cagtttacga aaaagggttt 2040

agggcatata atgatctgca gaagaaatgt gtcgaggcgg tgctggcgtt tgaagagaag 2100

gttgtcaaag ccaaaaagat gagcgagaag gaaggggcgc attatattga ttttcgtgag 2160

atactggcac aaacaatgtg taaagaggcg gagaagaccg ccgtgaataa ggtgcgtaga 2220

gcgtttttcc atcatcattt aaagtttgtg atagatgaat ttgggttgtt tagtgatgtt 2280

atgaagaaat atggaattga aaaggagtgg aagtttcctg ttaaatga 2328

<210> 16

<211> 2418

<212> DNA

<213> metagenome (metagenomic)

<400> 16

atgaaggttg aaaatattaa agaaaaaagc aaaaaagcaa tgtatttaat caaccattat 60

gagggaccca aaaaatggtg ttttgcaata gttctgaata gggcatgtga taattacgag 120

gacaatccac acttgttttc caaatcactt ttggaatttg aaaaaacaag tcgaaaagat 180

tggtttgacg aagaaacacg agagcttgtt gagcaagcag atacagaaat acagccaaat 240

cctaacctga aacctaatac aacagctaac cgaaaactca aagatataag aaactatttt 300

tcgcatcatt atcacaagaa cgaatgcctg tattttaaga acgatgatcc catacgctgc 360

attatggaag cggcgtatga aaaatctaaa atttatatca aaggaaagca gattgagcaa 420

agcgatatac cattgcccga attgtttgaa agcagcggtt ggattacacc ggcggggatt 480

ttgttactgg catccttttt tgttgaacga gggattctac atcgcttgat gggaaatatc 540

ggaggattta aagataatcg aggcgaatac ggtcttacac acgatatttt taccacctat 600

tgtcttaagg gtagttattc aattcgggcg caggatcatg atgcggtaat gttcagagat 660

attctcggct atctgtcacg agttcccact gagtcatttc agcgtatcaa gcaacctcaa 720

atacgaaaag aaggccaatt aagtgaaaga aagacggaca aatttataac atttgcacta 780

aattatcttg aggattatgg gctgaaagat ttggaaggct gcaaagcctg ttttgccaga 840

agtaaaattg taagggaaca agaaaatgtt gaaagcataa atgataagga atacaaacct 900

cacgagaaca aaaagaaagt tgaaattcac ttcgatcaga gcaaagaaga ccgattttat 960

attaatcgca ataacgttat tttgaagatt cagaagaaag atggacattc caacatagtt 1020

aggatgggag tatatgaact taaatatctc gttcttatga gtttagtggg aaaagcaaaa 1080

gaagcagttg aaaaaattga caactatatc caggatttgc gagaccagtt gccttacata 1140

gaggggaaaa ataaggaaga gattaaagaa tacgtcaggt tctttccacg atttatacgt 1200

tctcacctcg gtttactaca gattaacgat gaagaaaaga taaaagctcg attagattat 1260

gttaagacca agtggttaga taaaaaggaa aaatcgaaag agcttgaact tcataaaaaa 1320

ggacgggaca tcctcaggta tatcaacgag cgatgtgata gagagcttaa caggaatgta 1380

tataaccgta ttttagagct cctggtcagc aaagacctca ctggttttta tcgtgagctt 1440

gaagaactaa aaagaacaag gcggatagat aaaaatattg tccagaatct ttctgggcaa 1500

aaaaccatta atgcactgca tgaaaaggtc tgtgatctgg tgctgaagga aatcgaaagt 1560

ctcgatacag aaaatctcag gaaatatctt ggattgatac ccaaagaaga aaaagaggtc 1620

actttcaaag aaaaggtcga taggattttg aaacagccag ttatttacaa agggtttctg 1680

agataccaat tcttcaaaga tgacaaaaag agttttgtct tacttgttga agacgcattg 1740

aaggaaaaag gaggaggttg tgatgttcct cttgggaaag agtattataa aatcgtgtca 1800

cttgataagt atgataaaga aaataaaacc ctgtgtgaaa ctctggcgat ggataggctt 1860

tgccttatga tggcaagaca atattatctc agtctgaatg caaaacttgc acaggaagct 1920

cagcaaatcg aatggaagaa agaagatagt atagaattga ttattttcac cttaaaaaat 1980

cccgatcaat caaagcagag tttttctata cggttttcgg tcagagattt tacgaagttg 2040

tatgtaacgg atgatcctga atttctggcc cggctttgtt cctacttttt cccagttgaa 2100

aaagagattg aatatcacaa gctctattca gaagggataa ataaatacac aaacctgcaa 2160

aaagagggaa tcgaagcaat actcgagctt gaaaaaaagc ttattgaacg aaatcggatt 2220

caatctgcaa aaaattatct ctcatttaat gagataatga ataaaagcgg ttataataaa 2280

gatgagcagg atgatctaaa gaaggtgcga aattctcttt tgcattataa gcttatcttt 2340

gagaaagaac atctcaagaa gttctatgag gttatgagag gagaagggat agagaaaaag 2400

tggtctttaa tagtatga 2418

<210> 17

<211> 2373

<212> DNA

<213> metagenome (metagenomic)

<400> 17

atgaatggca ttgaattaaa aaaagaagaa gcagcatttt attttaatca ggcagagctt 60

aatttaaaag ccatagaaga caatattttt gataaagaaa gacgaaagac tctgcttaat 120

aatccacaga tacttgccaa aatggaaaat ttcattttca atttcagaga tgtaacaaaa 180

aatgcaaaag gggaaattga ctgcttgctg ttgaaactaa gagagctgag aaacttttac 240

tcgcattatg tccacaaacg agatgtaaga gaattaagca agggcgagaa acctatactt 300

gaaaagtatt accaatttgc gattgaatca accggaagtg aaaatgttaa acttgagata 360

atagaaaacg acgcgtggct tgcagatgcc ggtgtgttgt ttttcttatg tatttttttg 420

aagaaatctc aggcaaataa gcttataagc ggtatcagcg gttttaaaag aaacgatgat 480

accggtcagc cgagaaggaa tttatttacc tatttcagta taagggaggg atacaaggtt 540

gttccggaaa tgcagaaaca tttccttttg ttttctcttg ttaatcatct ctctaatcaa 600

gatgattata ttgaaaaagc gcatcagcca tacgatatag gcgagggttt attttttcat 660

cgaatagctt ctacatttct taatataagt gggattttaa gaaatatgaa attctatacc 720

tatcagagta aaaggttagt agagcagcgg ggagaactca aacgagaaaa ggatattttt 780

gcgtgggaag aaccgtttca aggaaatagt tattttgaaa taaatggtca taaaggagta 840

atcggtgaag atgaattgaa ggaactatgt tatgcatttc tgattggcaa tcaagatgct 900

aataaagtgg aaggcaggat tacacaattt ctagaaaagt ttagaaatgc gaacagtgtg 960

caacaagtta aagatgatga aatgctaaaa ccagagtatt ttcctgcaaa ttattttgct 1020

gaatcaggcg tcggaagaat aaaggataga gtgcttaatc gtttgaataa agcgattaaa 1080

agcaataagg ccaagaaagg agagattata gcatacgata agatgagaga ggttatggcg 1140

ttcataaata attctctgcc ggtagatgaa aaattgaaac caaaagatta caaacgatat 1200

ctgggaatgg ttcgtttctg ggacagggaa aaagataaca taaagcggga gttcgagaca 1260

aaagaatggt ctaaatatct tccatctaat ttctggacgg caaaaaacct tgaaagggtc 1320

tatggtctgg caagagagaa aaacgcagaa ttattcaata aactaaaagc ggatgtagaa 1380

aaaatggacg aacgggaact tgagaagtat cagaagataa atgatgcaaa ggatttggca 1440

aatttacgcc ggcttgcaag cgactttggt gtgaagtggg aagaaaaaga ctgggatgag 1500

tattcaggac agataaaaaa acaaattaca gacagccaga aactaacaat aatgaagcag 1560

cggataaccg caggactaaa gaaaaagcac ggcatagaaa atcttaacct gagaataact 1620

atcgacatca ataaaagcag aaaggcagtt ttgaacagaa ttgcgattcc gaggggtttt 1680

gtaaaaaggc atattttagg atggcaagag tctgagaagg tatcgaaaaa gataagagag 1740

gcagaatgcg aaattctgct gtcgaaagaa tacgaagaac tatcgaaaca atttttccaa 1800

agcaaagatt atgacaaaat gacacggata aatggccttt atgaaaaaaa caaacttata 1860

gccctgatgg cagtttatct aatggggcaa ttgagaatcc tgtttaaaga acacacaaaa 1920

cttgacgata ttacgaaaac aactgtggat ttcaaaatat ctgataaggt gacggtaaaa 1980

atcccctttt caaattatcc ttcgctcgtt tatacaatgt ccagtaagta tgttgataat 2040

atagggaatt atggattttc caacaaagat aaagacaagc cgattttagg taagattgat 2100

gtaatagaaa aacagcgaat ggaatttata aaagaggttc ttggttttga aaaatatctt 2160

tttgatgata aaataataga taaaagcaaa tttgctgata cagcgactca tataagtttt 2220

gcagaaatag ttgaggagct tgttgaaaaa ggatgggaca aagacagact gacaaaactt 2280

aaagatgcaa gaaataaagc cctgcatggt gaaatactga cgggaaccag ctttgatgaa 2340

acaaaatcat tgataaacga attaaaaaaa tga 2373

<210> 18

<211> 2379

<212> DNA

<213> metagenome (metagenomic)

<400> 18

atgtccccag atttcatcaa attagaaaaa caggaagcag ctttttactt taatcagaca 60

gagcttaatt taaaagccat agaaagcaat attttagaca aacaacagcg aatgattctg 120

cttaataatc cacggatact tgccaaagta ggaaatttca ttttcaattt cagagatgta 180

acaaaaaatg caaaaggaga aatagactgt ctgctattta aactggaaga gctaagaaac 240

ttttactcgc attatgttca taccgacaat gtaaaggaat tgagtaacgg agaaaaaccc 300

ctactggaaa gatattatca aatcgctatt caggcaacca ggagtgagga tgttaagttc 360

gaattgtttg aaacaagaaa cgagaataag attacggatg ccggtgtatt gtttttctta 420

tgtatgtttt taaaaaaatc acaggcaaac aagcttataa gcggtatcag cggcttcaaa 480

agaaatgatc caacaggcca gccgagaaga aacttattta cctatttcag tgcaagagaa 540

ggatataagg ctttgcctga tatgcagaaa cattttcttc tttttactct ggttaattat 600

ttgtcgaatc aggatgagta tatcagcgag cttaaacaat atggagagat tggtcaagga 660

gcctttttta atcgaatagc ttcaacattt ttgaatatca gcgggatttc aggaaatacg 720

aaattctatt cgtatcaaag taaaaggata aaagagcagc gaggcgaact caatagcgaa 780

aaggacagct ttgaatggat agagcctttc caaggaaaca gctattttga aataaatggg 840

cataaaggag taatcggcga agacgaatta aaagaacttt gttatgcatt gttggttgcc 900

aagcaagata ttaatgccgt tgaaggcaaa attatgcaat tcctgaaaaa gtttagaaat 960

actggcaatt tgcagcaagt taaagatgat gaaatgctgg aaatagaata ttttcccgca 1020

agttatttta atgaatcaaa aaaagaggac ataaagaaag agattcttgg ccggctggat 1080

aaaaagattc gctcctgctc tgcaaaggca gaaaaagcct atgataagat gaaagaggtg 1140

atggagttta taaataattc tctgccggca gaggaaaaat tgaaacgcaa agattataga 1200

agatatctaa agatggttcg tttctggagc agagaaaaag gcaatataga gcgggaattt 1260

agaacaaagg aatggtcaaa atatttttca tctgattttt ggcggaagaa caatcttgaa 1320

gatgtgtaca aactggcaac acaaaaaaac gctgaactgt tcaaaaatct aaaagcggca 1380

gcagagaaaa tgggtgaaac ggaatttgaa aagtatcagc agataaacga tgtaaaggat 1440

ttggcaagtt taaggcggct tacgcaagat tttggtttga agtgggaaga aaaggactgg 1500

gaggagtatt ccgagcagat aaaaaaacaa attacggaca ggcagaaact gacaataatg 1560

aaacaaaggg ttacggctga actaaagaaa aagcacggca tagaaaatct taatctgaga 1620

ataaccatcg acagcaataa aagcagaaag gcggttttga acagaatagc aattccaaga 1680

ggatttgtaa aaaaacatat tttaggctgg cagggatctg agaagatatc gaaaaatata 1740

agggaagcag aatgcaaaat tctgctatcg aaaaaatatg aagagttatc aaggcagttt 1800

tttgaagccg gtaatttcga taagctgacg cagataaatg gtctttatga aaagaataaa 1860

cttacagctt ttatgtcagt atatttgatg ggtcggttga atattcagct taataagcac 1920

acagaacttg gaaatcttaa aaaaacagag gtggatttta agatatctga taaggtgact 1980

gaaaaaatac cgttttctca gtatccttcg cttgtctatg cgatgtctcg caaatatgtt 2040

gacaatgtgg ataaatataa attttctcat caagataaaa agaagccatt tttaggtaaa 2100

attgattcaa ttgaaaaaga acgtattgaa ttcataaaag aggttctcga ttttgaagag 2160

tatcttttta aaaataaggt aatagataaa agcaaatttt ccgatacagc gactcatatt 2220

agctttaagg aaatatgtga tgaaatgggt aaaaaaggat gtaaccgaaa caaactaacc 2280

gaacttaaca acgcaaggaa cgcagccctg catggtgaaa taccgtcgga gacctctttt 2340

cgtgaagcaa aaccgttgat aaatgaattg aaaaaatga 2379

<210> 19

<211> 2379

<212> DNA

<213> metagenome (metagenomic)

<400> 19

atgtccccag atttcatcaa attagaaaaa caagaagcag ctttttactt taatcagaca 60

gagcttaatt taaaagccat agaaagcaat attttcgaca aacaacagcg agtgattctg 120

cttaataatc cacagatact tgccaaagta ggagatttta ttttcaattt cagagatgta 180

acaaaaaacg caaaaggaga aatagactgt ttgctattga aactaagaga gctgagaaac 240

ttttactcac actatgtcta taccgatgac gtgaagatat tgagtaacgg cgaaagacct 300

ctgctggaaa aatattatca atttgcgatt gaagcaaccg gaagtgaaaa tgttaaactt 360

gaaataatag aaagcaacaa ccgacttacg gaagcgggcg tgctgttttt cttgtgtatg 420

tttttgaaaa agtctcaggc aaataagctt ataagcggta tcagcggttt taaaagaaat 480

gacccgacag gtcagccgag aaggaattta tttacctact tcagtgtaag ggagggatac 540

aaggttgtgc cggatatgca gaaacatttt cttttgtttg ttcttgtcaa tcatctctct 600

ggtcaggatg attatattga aaaggcgcaa aagccatacg atataggcga gggtttattt 660

tttcatcgaa tagcttctac atttcttaat atcagtggga ttttaagaaa tatggaattc 720

tatatttacc agagcaaaag actaaaggag cagcaaggag agctcaaacg tgaaaaggat 780

atttttccat ggatagagcc tttccaggga aatagttatt ttgaaataaa tggtaataaa 840

ggaataatcg gcgaagatga attgaaagag ctttgttatg cgttgctggt tgcaggaaaa 900

gatgtcagag ccgtcgaagg taaaataaca caatttttgg aaaagtttaa aaatgcggac 960

aatgctcagc aagttgaaaa agatgaaatg ctggacagaa acaattttcc cgccaattat 1020

ttcgccgaat cgaacatcgg cagcataaag gaaaaaatac ttaatcgttt gggaaaaact 1080

gatgatagtt ataataagac ggggacaaag attaaaccat acgacatgat gaaagaggta 1140

atggagttta taaataattc tcttccggca gatgaaaaat tgaaacgcaa agattacaga 1200

agatatctaa agatggttcg tatctgggac agtgagaaag ataatataaa gcgggagttt 1260

gaaagcaaag aatggtcaaa atatttttca tctgatttct ggatggcaaa aaatcttgaa 1320

agggtctatg ggttggcaag agagaaaaac gccgaattat tcaataagct aaaagcggtt 1380

gtggagaaaa tggacgagcg ggaatttgag aagtatcggc tgataaatag cgcagaggat 1440

ttggcaagtt taagacggct tgcgaaagat tttggcctga agtgggaaga aaaggactgg 1500

caagagtatt ctgggcagat aaaaaaacaa atttctgaca ggcagaaact gacaataatg 1560

aaacaaagga ttacggctga actaaagaaa aagcacggca tagaaaatct caatcttaga 1620

ataaccatcg acagcaataa aagcagaaag gcagttttga acagaatcgc agttccaaga 1680

ggttttgtga aagagcatat tttaggatgg caggggtctg agaaggtatc gaaaaagaca 1740

agagaagcaa agtgcaaaat tctgctctcg aaagaatatg aagaattatc aaagcaattt 1800

ttccaaacca gaaattacga caagatgacg caggtaaacg gtctttacga aaagaataaa 1860

ctcttagcat ttatggtcgt ttatcttatg gagcggttga atatcctgct taataagccc 1920

acagaactta atgaacttga aaaagcagag gtggatttca agatatctga taaggtgatg 1980

gccaaaatcc cgttttcaca gtatccttcg cttgtgtacg cgatgtccag caaatatgct 2040

gatagtgtag gcagttataa atttgagaat gatgaaaaaa acaagccgtt tttaggcaag 2100

atcgatacaa tagaaaaaca acgaatggag tttataaaag aagtccttgg ttttgaagag 2160

tatctttttg aaaagaagat aatagataaa agcgaatttg ccgacacagc gactcatata 2220

agttttgatg aaatatgtaa tgagcttatt aaaaaaggat gggataaaga caaactaacc 2280

aaacttaaag atgccaggaa cgcggccctg catggcgaaa taccggcgga gacctctttt 2340

cgtgaagcaa aaccgttgat aaatggattg aaaaaatga 2379

<210> 20

<211> 2400

<212> DNA

<213> metagenome (metagenomic)

<400> 20

atgaacatca ttaaattaaa aaaagaagaa gctgcgtttt attttaatca gacgatcctc 60

aatctttcag ggcttgatga aattattgaa aaacaaattc cgcacataat cagcaacaag 120

gaaaatgcaa agaaagtgat tgataagatt ttcaataacc gcttattatt aaaaagtgtg 180

gagaattata tctacaactt taaagatgtg gctaaaaacg caagaactga aattgaggct 240

atattgttga aattagtaga gctacgtaat ttttactcac attacgttca taatgatacc 300

gtcaagatac taagtaacgg tgaaaaacct atactggaaa aatattatca aattgctata 360

gaagcaaccg gaagtaaaaa tgttaaactt gtaatcatag aaaacaacaa ctgtctcacg 420

gattctggcg tgctgttttt gctgtgtatg ttcttaaaaa aatcacaggc aaacaagctt 480

ataagttccg ttagtggttt taaaaggaat gataaagaag gacaaccgag aagaaatcta 540

ttcacttatt atagtgtgag ggagggatat aaggttgtgc ctgatatgca gaagcatttc 600

cttctattcg ctctggtcaa tcatctatct gagcaggatg atcatattga gaagcagcag 660

cagtcagacg agctcggtaa gggtttgttt ttccatcgta tagcttcgac ttttttaaac 720

gagagcggca tcttcaataa aatgcaattt tatacatatc agagcaacag gctaaaagag 780

aaaagaggag aactcaaaca cgaaaaggat acctttacat ggatagagcc ttttcaaggc 840

aatagttatt ttacgttaaa tggacataag ggagtgatta gtgaagatca attgaaggag 900

ctttgttaca caattttaat tgagaagcaa aacgttgatt ccttggaagg taaaattata 960

caatttctca aaaaatttca gaatgtcagc agcaagcagc aagttgacga agatgaattg 1020

cttaaaagag aatatttccc tgcaaattac tttggccggg caggaacagg gaccctaaaa 1080

gaaaagattc taaaccggct tgataagagg atggatccta catctaaagt gacggataaa 1140

gcttatgaca aaatgattga agtgatggaa tttatcaata tgtgccttcc gtctgatgag 1200

aagttgaggc aaaaggatta tagacgatac ttaaagatgg ttcgtttctg gaataaggaa 1260

aagcataaca ttaagcgcga gtttgacagt aaaaaatgga cgaggttttt gccgacggaa 1320

ttgtggaata aaagaaatct agaagaagcc tatcaattag cacggaaaga gaacaaaaag 1380

aaacttgaag atatgagaaa tcaagtacga agccttaaag aaaatgacct tgaaaaatat 1440

cagcagatta attacgttaa tgacctggag aatttaaggc ttctgtcaca ggagttaggt 1500

gtgaaatggc aggaaaagga ctgggttgaa tattccgggc agataaagaa gcagatatca 1560

gacaatcaga aacttacaat catgaaacaa aggattaccg ctgaactaaa gaaaatgcac 1620

ggcatcgaga atcttaatct tagaataagc attgacacga ataaaagcag gcagacggtt 1680

atgaacagga tagctttgcc caaaggtttt gtgaagaatc atatccagca aaattcgtct 1740

gagaaaatat cgaaaagaat aagagaggat tattgtaaaa ttgagctatc gggaaaatat 1800

gaagaacttt caaggcaatt ttttgataaa aagaatttcg ataagatgac actgataaac 1860

ggcctttgtg aaaagaacaa acttatcgca tttatggtta tctatctttt ggagcggctt 1920

ggatttgaat taaaggagaa aacaaaatta ggcgagctta aacaaacaag gatgacatat 1980

aaaatatccg ataaggtaaa agaagatatc ccgctttcct attaccccaa gcttgtgtat 2040

gcaatgaacc gaaaatatgt tgacaatatc gatagttatg catttgcggc ttacgaatcc 2100

aaaaaagcta ttttggataa agtggatatc atagaaaagc aacgtatgga atttatcaaa 2160

caagttctct gttttgagga atatattttc gaaaatagga ttatcgaaaa aagcaaattt 2220

aatgacgagg agactcatat aagttttaca caaatacatg atgagcttat taaaaaagga 2280

cgggacacag aaaaactctc taaactcaaa catgcaagga ataaagcctt gcacggcgag 2340

attcctgatg ggacttcttt tgaaaaagca aagctattga taaatgaaat caaaaaatga 2400

<210> 21

<211> 2412

<212> DNA

<213> metagenome (metagenomic)

<400> 21

atgaatgcta tcgaactaaa aaaagaggaa gcagcatttt attttaatca ggcaagactc 60

aacatttcag gacttgatga aattattgaa aagcagttac cacatatagg tagtaacagg 120

gagaatgcga aaaaaactgt tgatatgatt ttggataatc ccgaagtctt gaagaagatg 180

gaaaattatg tctttaactc acgagatata gcaaagaacg caagaggtga acttgaagca 240

ttgttgttga aattagtaga actgcgtaat ttttattcac attatgttca taaagatgat 300

gttaagacat tgagttacgg agaaaaacct ttactggata aatattatga aattgcgatt 360

gaagcgaccg gaagtaaaga tgtcagactt gagataatag atgataaaaa taagcttaca 420

gatgccggtg tgcttttttt attgtgtatg tttttgaaaa aatcagaggc aaacaaactt 480

atcagttcaa tcaggggctt taaaagaaac gataaagaag gccagccgag aagaaatcta 540

ttcacttact acagtgtcag agagggatat aaggttgtgc ctgatatgca gaaacatttt 600

cttttattca cactggttaa ccatttgtca aatcaggatg aatacatcag taatcttagg 660

ccgaatcaag aaatcggcca agggggattt ttccatagaa tagcatcaaa atttttgagc 720

gatagcggga ttttacatag tatgaaattc tacacctacc ggagtaaaag actaacagaa 780

caacgggggg agcttaagcc gaaaaaagat cattttacat ggatagagcc ttttcaggga 840

aacagttatt tttcagtgca gggccaaaaa ggagtaattg gtgaagagca attaaaggag 900

ctttgttatg tattgctggt tgccagagaa gattttaggg ccgttgaggg caaagttaca 960

caatttctga aaaagtttca gaatgctaat aacgtacagc aagttgaaaa agatgaagtg 1020

ctggaaaaag aatattttcc tgcaaattat tttgaaaatc gagacgtagg cagagtaaag 1080

gataagatac ttaatcgttt gaaaaaaatc actgaaagct ataaagctaa agggagggag 1140

gttaaagcct atgacaagat gaaagaggta atggagttta taaataattg cctgccaaca 1200

gatgaaaatt tgaaactcaa agattacaga agatatctga aaatggttcg tttctggggc 1260

agggaaaagg aaaatataaa gcgggaattt gacagtaaaa aatgggagag gtttttgcca 1320

agagaactct ggcagaaaag aaacctcgaa gatgcgtatc aactggcaaa agagaaaaac 1380

accgagttat tcaataaatt gaaaacaact gttgagagaa tgaacgaact ggaattcgaa 1440

aagtatcagc agataaacga cgcaaaagat ttggcaaatt taaggcaact ggcgcgggac 1500

ttcggcgtga agtgggaaga aaaggactgg caagagtatt cggggcagat aaaaaaacaa 1560

attacagaca ggcaaaaact tacaataatg aaacaaagga ttactgctgc attgaagaaa 1620

aagcaaggca tagaaaatct taatcttagg ataacaaccg acaccaataa aagcagaaag 1680

gtggtattga acagaatagc gctacctaaa ggttttgtaa ggaagcatat cttaaaaaca 1740

gatataaaga tatcaaagca aataaggcaa tcacaatgtc ctattatact gtcaaacaat 1800

tatatgaagc tggcaaagga attctttgag gagagaaatt ttgataagat gacgcagata 1860

aacgggctat ttgagaaaaa tgtacttata gcgtttatga tagtttatct gatggaacaa 1920

ctgaatcttc gacttggtaa gaatacggaa cttagcaatc ttaaaaaaac ggaggttaat 1980

tttacgataa ccgacaaggt aacggaaaaa gtccagattt cgcagtatcc atcgcttgtt 2040

ttcgccataa acagagaata tgttgatgga atcagcggtt ataagttacc gcccaaaaaa 2100

ccgaaagagc ctccgtatac tttcttcgag aaaatagacg caatagaaaa agaacgaatg 2160

gaattcataa aacaggtcct cggtttcgaa gaacatcttt ttgagaagaa tgtaatagac 2220

aaaactcgct ttactgatac tgcgactcat ataagtttta atgaaatatg tgatgagctt 2280

ataaaaaaag gatgggacga aaacaaaata ataaaactta aagatgcgag gaatgcagca 2340

ttgcatggta agataccgga ggatacgtct tttgatgaag cgaaagtact gataaatgaa 2400

ttaaaaaaat ga 2412

<210> 22

<211> 2328

<212> DNA

<213> Artificial Sequence (Artificial Sequence)

<220>

<221> misc_feature

<222> (1)..(2328)

<223> Human codon-optimized coding sequences

<400> 22

atggcccagg tgagcaagca gacctccaag aagagggagc tgagcatcga cgagtaccag 60

ggcgcccgga agtggtgctt caccattgcc ttcaacaagg ccctggtgaa ccgggacaag 120

aacgacggcc tgttcgtgga aagcctgctg agacacgaga agtacagcaa gcacgactgg 180

tacgacgaag atacccgggc cctgatcaag tgcagcaccc aggccgccaa cgccaaggct 240

gaagccctgc ggaactactt cagtcactac cggcatagcc ctggctgcct gaccttcacc 300

gccgaggacg aactgcggac catcatggag agagcctatg agcgggccat cttcgagtgc 360

agaagaagag agacagaggt gatcatcgag tttcccagcc tgttcgaggg cgaccggatc 420

accaccgccg gcgtggtgtt tttcgtgagc tttttcgtgg aaagaagagt gctggatcgg 480

ctgtatggag ccgtgtccgg cctgaagaag aatgagggac agtacaagct gacccggaag 540

gccctgagca tgtactgcct gaaggacagc agattcacca aggcctggga taagcgggtg 600

ctgctgttca gagacatcct ggcccagctg ggaagaatcc ccgccgaggc ctacgagtac 660

taccacggcg agcagggtga taagaagaga gctaacgaca atgagggcac aaatcccaag 720

cggcacaagg acaagttcat cgaatttgca ctgcactacc tggaagccca gcacagcgag 780

atctgcttcg gcagacgcca catcgtgcgg gaagaggccg gcgccggcga tgagcacaag 840

aagcaccgga ccaagggaaa ggtggtggtg gacttcagca agaaggacga ggaccagagc 900

tactatatct ccaagaacaa cgtgatcgtg cggatcgaca agaacgccgg ccctagaagc 960

taccggatgg gcctgaacga gctgaagtac ctcgtgctgc tgagcctgca ggggaagggc 1020

gacgatgcca tcgccaagct gtacagatac agacagcacg tggagaacat cctggatgtg 1080

gtgaaggtga ccgataagga taaccacgtg ttcctgcccc gcttcgtgct ggagcagcac 1140

ggcatcggca gaaaggcctt caagcagcgg atcgatggac gggtgaagca cgtgcggggc 1200

gtgtgggaga agaagaaggc cgccaccaat gaaatgaccc tgcacgagaa ggccagagac 1260

atcctgcagt acgtgaacga aaactgcacc cggtccttca accctggcga atacaacaga 1320

ctgctggtgt gcctggtggg caaggacgtg gagaactttc aggccggcct gaagcggctg 1380

cagctggccg aaaggatcga tggccgggtg tactccatct tcgcccagac cagcaccatc 1440

aatgagatgc accaggtggt gtgcgaccag atcctgaacc ggctgtgcag aatcggcgac 1500

cagaagctgt acgattacgt gggactgggc aagaaggacg aaatcgacta caagcagaag 1560

gtggcctggt tcaaggagca catcagcatc cggagaggat tcctgagaaa gaagttctgg 1620

tacgatagca agaagggatt cgcaaagctg gtggaggaac acctggagtc cggcggcggc 1680

cagcgcgacg tgggcctgga caagaagtac taccacatcg acgccatcgg cagattcgag 1740

ggcgccaacc ccgccctgta cgagaccctg gccagagatc ggctgtgcct catgatggcc 1800

cagtacttcc tgggcagcgt gagaaaggaa ctgggcaaca agattgtgtg gagcaacgac 1860

agcatcgaac tgcctgtgga aggctctgtg ggaaatgaga agagcatcgt gttctccgtg 1920

tctgactacg gcaagctgta cgtgctggac gatgccgaat tcctgggccg gatctgcgaa 1980

tacttcatgc cccacgaaaa gggcaagatc cggtaccaca cagtgtacga aaagggcttt 2040

agagcataca acgacctgca gaagaagtgc gtggaggccg tgctggcttt cgaagagaag 2100

gtggtgaagg ccaagaagat gagcgagaag gaaggcgccc actacatcga cttccgggag 2160

atcctggccc agaccatgtg caaggaggcc gagaagaccg cagtgaacaa ggtgagacgc 2220

gccttcttcc accaccacct gaagttcgtg attgacgagt tcggcctgtt cagcgacgtg 2280

atgaagaagt acggcatcga gaaggaatgg aagttccctg tcaagtaa 2328

<210> 23

<211> 2418

<212> DNA

<213> Artificial Sequence (Artificial Sequence)

<220>

<221> misc_feature

<222> (1)..(2418)

<223> Human codon-optimized coding sequences

<400> 23

atgaaggtgg agaacatcaa ggaaaagtcc aagaaggcta tgtatctgat caaccactat 60

gaaggcccta agaagtggtg cttcgccatc gtgctgaata gggcctgcga caactatgag 120

gataaccccc acctgttcag caagagcctg ctggaatttg aaaagaccag cagaaaggac 180

tggttcgacg aggagaccag ggaactggtg gagcaggccg acaccgagat ccagcccaac 240

cccaacctga agcctaacac caccgccaac agaaagctga aggacatccg gaactacttc 300

agccaccact accacaagaa tgagtgcctg tacttcaaga acgacgaccc tatccggtgc 360

atcatggagg cagcctacga gaagtccaag atctacatca agggcaagca gattgagcag 420

tccgacatcc ccctccctga gctgtttgag tctagcggct ggatcacccc agccggcatc 480

ctgctgctgg ccagcttctt tgtggagaga ggcattctgc acagactgat gggcaacatc 540

ggcggcttca aggacaaccg gggcgaatac ggactgaccc acgatatctt caccacctac 600

tgcctgaagg gcagctactc catcagagcc caggaccacg acgccgtgat gttcagagac 660

atcctgggct acctgagcag agtgccgacc gagagctttc agcgcatcaa gcagccacag 720

atcagaaagg aggggcagct gagcgagcgg aagacagaca agtttatcac cttcgccctg 780

aactacctgg aagattatgg actgaaggat ctggaaggct gcaaggcctg cttcgcccgg 840

agcaagatcg tgagagagca ggagaacgtg gaaagcatca atgacaagga gtacaagcct 900

cacgaaaaca agaagaaggt ggaaatccac ttcgatcagt ctaaggaaga ccggttctac 960

atcaaccgga acaacgtgat cctgaagatc cagaagaagg acggccacag caacatcgtg 1020

agaatgggcg tgtacgagct gaagtatctg gtgctgatgt ccctggtggg caaggccaag 1080

gaagccgtgg agaagatcga caactacatc caggatctga gagaccagct gccctacatc 1140

gagggcaaga acaaggaaga aatcaaggag tacgtgagat tcttccccag attcatcaga 1200

tcccacctgg gcctgctgca gattaacgat gaggagaaga tcaaggcccg gctggactat 1260

gtgaagacaa agtggctgga caagaaggag aagtccaagg agctggagct gcacaagaag 1320

ggccgggata tcctgcggta catcaacgag cggtgcgacc gggagctgaa ccggaacgtg 1380

tacaaccgga tcctggagct gctggtgagc aaggacctga ccggcttcta ccgggagctg 1440

gaggagctga agcggaccag acggatcgat aagaacattg tgcagaacct gtccggccag 1500

aagaccatca acgccctgca cgaaaaggtg tgcgatctcg tgctgaagga gatcgagagc 1560

ctggacaccg agaacctgcg gaagtacctg ggcctgatcc ccaaggagga gaaggaagtg 1620

acctttaagg agaaggtgga caggatcctg aagcagccgg tgatctacaa gggcttcctg 1680

cggtaccagt tcttcaagga cgacaagaag agcttcgtgc tgctggtgga agacgccctg 1740

aaggagaagg gaggcggctg cgacgtgccc ctgggcaagg agtactacaa gatcgtgtcc 1800

ctggacaagt atgacaagga aaataagacc ctgtgcgaga ccctggcaat ggatagactg 1860

tgcctgatga tggcccggca gtattacctg agcctgaacg ccaagctggc ccaggaggcc 1920

cagcagatcg aatggaagaa ggaggatagc attgagctga tcatcttcac actgaagaat 1980

cctgaccagt ccaagcagag cttctccatc cggttcagcg tgcgggactt caccaagctg 2040

tacgtgaccg acgaccccga attcctggcc cggctgtgca gctacttctt ccccgtggag 2100

aaggagatcg aataccacaa gctgtactct gaaggcatta acaagtacac caacctgcag 2160

aaggagggga tcgaagccat cctggagctg gagaagaagc tgatcgaaag aaaccggatc 2220

cagtccgcca agaactacct gagctttaac gaaatcatga acaagagcgg ctacaacaag 2280

gatgagcagg atgacctgaa gaaggtgagg aactccctgc tgcactacaa gctgatcttc 2340

gaaaaggagc acctgaagaa gttctatgaa gtgatgcggg gcgagggaat cgagaagaag 2400

tggtccctga tcgtgtaa 2418

<210> 24

<211> 2373

<212> DNA

<213> Artificial Sequence (Artificial Sequence)

<220>

<221> misc_feature

<222> (1)..(2373)

<400> 24

atgaatggca tcgagctgaa gaaggaagaa gccgccttct acttcaatca ggccgagctg 60

aacctgaagg ccattgagga caacatcttc gacaaggaga gacggaagac actgctgaac 120

aacccccaga tcctggccaa gatggagaac tttatcttca atttccggga cgtgaccaag 180

aacgccaagg gcgaaatcga ctgcctgctg ctgaagctga gagagctgcg gaacttttac 240

agccactacg tgcacaagcg ggacgtcaga gaactgagca agggcgagaa gccgatcctg 300

gagaagtact accagttcgc catcgaatcc accggctctg agaacgtgaa gctcgaaatc 360

atcgaaaacg acgcctggct ggccgacgcc ggcgtgctgt tcttcctgtg catcttcctg 420

aagaagagcc aggcaaacaa gctgatcagc ggcatcagcg gcttcaagag aaacgacgac 480

accggccagc ctcggagaaa cctgttcacc tacttctcca tccgggaggg ctacaaggtg 540

gtgcccgaaa tgcagaagca cttcctgctg ttctccctgg tgaaccacct gagcaaccag 600

gacgattata tcgaaaaggc ccaccagccc tacgacatcg gcgagggcct cttcttccac 660

cggattgcca gcaccttcct gaacatctcc ggaatcctga gaaacatgaa gttctacacc 720

tatcagagca agagactggt ggagcagaga ggcgagctga agcgggaaaa ggacatcttc 780

gcctgggaag aaccgtttca gggcaattcc tactttgaga tcaacggcca caagggcgtg 840

attggcgaag acgagctgaa ggagctgtgc tacgccttcc tgatcggcaa ccaggacgcc 900

aacaaggtgg agggccggat cacccagttc ctggagaagt tcagaaacgc caacagcgtg 960

cagcaggtga aggacgacga gatgctgaag cctgaatatt tccccgccaa ctactttgcc 1020

gagagcggcg tgggccggat caaggaccgg gtgctgaaca gactgaacaa ggccatcaag 1080

agcaacaagg ccaagaaggg cgagatcatc gcctatgaca agatgagaga agtgatggct 1140

ttcatcaata actctctgcc cgtggacgag aagctgaagc ccaaggatta caagagatac 1200

ctgggcatgg tgagattctg ggatagagaa aaggacaata tcaagcgcga gttcgaaacg 1260

aaggagtgga gcaagtatct gccctccaac ttctggaccg ccaagaacct ggagagagtg 1320

tacggactgg cccgggaaaa gaacgcagag ctgtttaaca agctgaaggc cgacgtggag 1380

aagatggacg aaagagagct ggaaaagtat cagaagatca acgacgccaa ggatctggcc 1440

aacctgcggc ggctggccag cgacttcgga gtgaagtggg aggagaagga ttgggacgag 1500

tactccggcc agatcaagaa gcagatcaca gattcccaga agctgaccat catgaagcag 1560

agaatcacag ccggcctgaa gaagaagcac ggcatcgaaa acctgaacct gaggatcacc 1620

atcgacatca acaagtccag aaaggccgtg ctgaatcgga tcgccatccc cagaggattt 1680

gtgaagcggc acatcctggg ctggcaggaa tccgagaagg tgagcaagaa gatcagagaa 1740

gccgaatgcg agattctgct gagcaaggag tacgaggagc tgagcaagca gttctttcag 1800

agcaaggact acgacaagat gacccgcatc aacggcctgt acgagaagaa taagctgatc 1860

gccctgatgg ccgtgtatct gatggggcag ctgagaatcc tgttcaagga gcacaccaag 1920

ctggacgaca tcaccaagac caccgtggat ttcaagatca gcgacaaggt gaccgtgaag 1980

atccccttct ccaactatcc ctccctggtg tacaccatga gcagcaagta cgtggacaat 2040

atcggcaact acggcttcag caacaaggac aaggataagc ccattctggg caagatcgac 2100

gtgatcgaga agcagcggat ggagtttatc aaggaggtgc tgggattcga gaagtacctg 2160

tttgacgata agatcatcga caagagcaag ttcgccgaca ccgccaccca catcagcttt 2220

gccgaaatcg tggaagaact ggtggagaag ggctgggaca aggaccggct gacgaagctg 2280

aaggatgccc ggaacaaggc cctgcacggc gagatcctga ccggcaccag cttcgacgag 2340

acaaagtccc tgatcaacga gctgaagaag taa 2373

<210> 25

<211> 2379

<212> DNA

<213> Artificial Sequence (Artificial Sequence)

<220>

<221> misc_feature

<222> (1)..(2379)

<223> Human codon-optimized coding sequences

<400> 25

atgagccctg atttcatcaa gctggagaag caggaagcag ccttctactt taaccagacc 60

gagctgaacc tgaaggccat cgaatccaat atcctggata agcagcagag aatgatcctg 120

ctgaacaacc ccagaatcct ggccaaggtg ggcaacttca tcttcaattt ccgggacgtg 180

accaagaacg caaagggcga aatcgactgc ctgctgttca agctggagga actgcggaac 240

ttctacagcc actacgtgca caccgataac gtgaaggaac tgtccaacgg agagaagcct 300

ctgctggagc ggtactacca gatcgccatc caggccacaa gaagcgagga cgtgaagttc 360

gagctgttcg agaccaggaa cgagaacaag atcaccgacg caggcgtgct gttcttcctg 420

tgcatgttcc tgaagaagag ccaggctaat aagctgattt ccggcatcag cggcttcaag 480

cggaacgacc ccaccggcca gcccagacgg aacctcttta cctacttctc tgcccgggag 540

ggctacaagg ccctgcctga catgcagaag cacttcctgc tgttcaccct ggtgaactac 600

ctgagcaacc aggacgagta catctccgag ctgaagcagt acggagagat cggacaggga 660

gccttcttca acagaatcgc cagcaccttc ctgaacatca gcggcatcag cggcaacacc 720

aagttctaca gctaccagag caagagaatc aaggagcagc ggggcgaact gaacagcgaa 780

aaggacagct tcgagtggat cgagcccttt cagggcaact cttattttga gatcaacggc 840

cacaagggcg tgatcggcga agacgagctg aaggagctgt gctacgccct gctggtggcc 900

aagcaggaca tcaatgccgt ggagggaaag atcatgcagt tcctgaagaa gttcaggaac 960

accggcaacc tgcagcaggt gaaggacgac gagatgctgg aaatcgagta ctttcccgcc 1020

agctacttca acgagagcaa gaaggaggac atcaagaagg agatcctggg cagactggac 1080

aagaagatcc ggtcctgcag cgccaaggcc gagaaggcct acgacaagat gaaggaggtg 1140

atggagttta tcaataacag cctgcccgcc gaggagaagc tgaagaggaa ggactaccgc 1200

agatacctga agatggtgag attctggtcc agagaaaagg gcaacatcga gagagagttc 1260

agaaccaagg agtggtccaa gtacttcagc agcgacttct ggagaaagaa caatctggag 1320

gatgtgtaca agctggccac ccagaagaac gccgagctgt tcaagaatct gaaggccgcc 1380

gccgagaaga tgggcgaaac agaattcgaa aagtaccagc agatcaacga tgtgaaggac 1440

ctggccagcc tgagacggct gacccaggat ttcggcctga agtgggagga gaaggattgg 1500

gaggagtaca gcgaacagat caagaagcag atcaccgacc ggcagaagct gacaatcatg 1560

aagcagcggg tgaccgccga gctgaagaag aagcacggca tcgagaatct gaacctcaga 1620

attaccatcg attccaacaa gagcagaaag gccgtgctga acagaatcgc cattccccgg 1680

ggcttcgtga agaagcacat tctgggctgg cagggcagcg aaaagatcag caagaatatc 1740

cgggaggccg agtgcaagat cctgctgtcc aagaagtatg aggagctgtc tcggcagttc 1800

tttgaggctg gcaacttcga caagctgacc cagatcaacg gcctgtacga aaagaataag 1860

ctgaccgcct tcatgtccgt ctacctgatg ggcagactga acatccagct gaacaagcac 1920

acggagctgg gaaatctgaa gaagaccgag gtggacttca agatttccga caaggtgaca 1980

gaaaagatcc ccttctccca gtaccctagc ctggtgtacg ctatgagccg gaagtacgtg 2040

gacaacgtgg acaagtacaa gttcagccac caggacaaga agaagccctt cctgggcaag 2100

atcgacagca tcgaaaagga gagaatcgaa ttcatcaagg aggtgctgga cttcgaagag 2160

tacctgttta agaacaaggt gatcgacaag agcaagttca gcgataccgc cacccatatc 2220

tctttcaagg aaatctgcga cgagatgggc aagaagggct gcaaccgcaa caagctgacc 2280

gagctgaata acgctagaaa cgccgcactg cacggagaaa tccccagcga gaccagcttc 2340

cgggaggcca agcccctgat caacgaactg aagaagtaa 2379

<210> 26

<211> 2379

<212> DNA

<213> Artificial Sequence (Artificial Sequence)

<220>

<221> misc_feature

<222> (1)..(2379)

<223> Human codon-optimized coding sequences

<400> 26

atgagccctg acttcatcaa gctggaaaag caggaagccg ccttctactt taatcagacc 60

gagctgaacc tgaaggccat cgagagcaac atcttcgaca agcagcagcg ggtgatcctg 120

ctgaataacc cccagatcct ggccaaggtg ggcgacttca tcttcaactt ccgggacgtg 180

accaagaacg ccaagggaga aatcgactgc ctgctgctga agctgcggga gctgagaaac 240

ttctacagcc actatgtgta caccgacgac gtgaagatcc tgagcaacgg cgagaggccc 300

ctgctggaga agtactacca gtttgccatc gaggccaccg gatctgagaa tgtgaagctg 360

gagatcatcg agagcaacaa ccggctgacc gaagcgggcg tgctgttctt cctgtgcatg 420

ttcctgaaga agagccaggc caacaagctg atttccggca tctccggatt caagcgcaac 480

gaccctaccg gacagcctcg gcggaacctg ttcacctact ttagcgtgcg ggagggctac 540

aaggtggtgc ccgacatgca gaagcacttc ctgctgttcg tgctggtgaa ccacctgtcc 600

ggccaggatg actatattga gaaggcccag aagccctacg acatcggcga aggcctgttc 660

ttccacagaa tcgccagcac ctttctcaac atcagcggca tcctgagaaa catggaattc 720

tacatctacc agagcaagcg gctgaaggag cagcagggag agctgaagag agagaaggac 780

atcttccctt ggatcgagcc tttccagggc aacagctact ttgagatcaa cggaaacaag 840

ggcatcatcg gcgaggacga actgaaggaa ctgtgctacg ccctgctggt ggccggcaag 900

gacgtgagag ccgtggaagg aaagatcacc cagttcctgg agaagttcaa gaacgccgat 960

aacgcccagc aggtggagaa ggatgaaatg ctggaccgga acaacttccc tgccaattac 1020

tttgccgaaa gcaacatcgg cagcatcaag gaaaagatcc tgaatagact gggcaagacc 1080

gacgactcct acaacaagac cggcaccaag atcaagccct acgacatgat gaaggaggtg 1140

atggagttca tcaataattc tctgcccgcc gatgagaagc tgaagcggaa ggactaccgg 1200

agatacctga agatggtccg gatctgggac agcgaaaagg acaatatcaa gcgggagttt 1260

gagagcaagg aatggagcaa gtatttcagc agcgacttct ggatggccaa gaacctggaa 1320

agagtgtacg gcctggccag ggaaaagaac gccgagctgt ttaacaagct gaaggccgtg 1380

gtggagaaga tggacgagcg ggagttcgaa aagtaccggc tgatcaacag cgccgaagac 1440

ctggccagcc tgcggagact ggccaaggac ttcggcctga agtgggagga gaaggactgg 1500

caggagtatt ctggccagat caagaagcag atctccgaca gacagaagct gacaattatg 1560

aagcagcgga tcacagccga actgaagaag aagcacggaa tcgagaacct gaatctgcgg 1620

atcaccatcg acagcaacaa gtccagaaag gccgtgctga accggatcgc cgtgccccgg 1680

ggcttcgtga aggaacacat cctgggctgg caaggctctg aaaaggtgag caagaagacc 1740

agagaagcca agtgcaagat cctgctgagc aaggagtacg aggaactgag caagcagttc 1800

tttcagacac ggaattacga caagatgacc caggtgaacg gcctgtacga gaagaacaag 1860

ctgctggcct tcatggtggt gtacctgatg gagagactga acatcctgct gaacaagccc 1920

acagagctga acgaactgga aaaggccgaa gtggacttca agatctccga caaggtgatg 1980

gccaagatcc ctttctctca gtaccccagc ctggtgtatg caatgagctc caagtacgcc 2040

gacagcgtgg gctcttacaa gttcgaaaac gacgagaaga acaagccctt tctgggcaag 2100

atcgacacaa tcgagaagca gagaatggag ttcatcaagg aggtgctggg cttcgaggaa 2160

tacctgttcg agaagaagat catcgataag agcgaattcg ccgacaccgc cacccacatc 2220

agcttcgacg agatctgcaa cgagctgatc aagaagggct gggacaagga caagctgacc 2280

aagctgaagg acgcccggaa cgccgccctg cacggcgaga tccccgccga gaccagcttc 2340

cgggaggcca agcccctgat taacggcctg aagaagtaa 2379

<210> 27

<211> 2400

<212> DNA

<213> Artificial Sequence (Artificial Sequence)

<220>

<221> misc_feature

<222> (1)..(2400)

<223> Human codon-optimized coding sequences

<400> 27

atgaacatca tcaagctgaa gaaggaggaa gccgcctttt actttaacca gacaatcctg 60

aatctgagcg gcctggacga gatcatcgag aagcagatcc cccacatcat ctccaataag 120

gaaaacgcca agaaggtgat tgataagatc ttcaataaca gactgctgct gaagagcgtg 180

gaaaactata tctacaactt caaggacgtg gccaagaacg cccggaccga aatcgaagcc 240

atcctgctga agctggtgga gctgagaaac ttctactccc actacgtgca caacgacacc 300

gtgaagatcc tgtccaatgg cgagaagccc atcctggaaa agtactacca gatcgccatc 360

gaagccaccg gctctaagaa cgtgaagctg gtcattatcg aaaacaacaa ctgcctgacc 420

gactccggcg tgctgttcct gctgtgcatg ttcctgaaga agagccaggc caacaagctg 480

attagcagcg tgagcggctt taagcggaac gacaaggaag gccagcccag aaggaacctc 540

tttacttact atagcgtgag ggaaggctac aaggtggtgc cagacatgca gaagcacttc 600

ctgctgttcg ccctggtcaa ccacctgtcc gagcaggacg accacatcga gaagcagcag 660

cagagcgacg agctgggcaa gggcctgttc ttccacagaa tcgccagcac attcctgaat 720

gaaagcggca tcttcaacaa gatgcagttt tacacctacc agagcaatcg gctgaaggag 780

aagcggggcg agctgaagca cgagaaggac accttcacct ggatcgagcc tttccaggga 840

aacagctact tcaccctgaa cgggcacaag ggcgtgatca gcgaggatca gctgaaggaa 900

ctgtgctaca caatcctgat cgagaagcag aacgtggaca gcctggaggg caagatcatt 960

cagttcctga agaagtttca gaacgtgtct agcaagcagc aggtggatga ggacgagctg 1020

ctgaagcggg aatacttccc cgccaactac ttcggccggg ccggcaccgg caccctgaag 1080

gagaagatcc tgaaccggct ggacaagcgg atggacccca ccagcaaggt gaccgacaag 1140

gcctatgaca agatgatcga ggtgatggag ttcatcaaca tgtgcctgcc cagcgacgag 1200

aagctgcggc agaaggatta ccggagatat ctgaagatgg tcagattctg gaacaaggag 1260

aagcacaaca tcaagagaga attcgacagc aagaagtgga ccagattcct gcccaccgag 1320

ctgtggaata agcggaacct ggaggaagcc taccagctgg cccggaagga gaacaagaag 1380

aagctggagg acatgaggaa tcaggtgagg agcctgaagg agaacgacct ggagaagtac 1440

cagcagatca actatgtgaa cgacctggaa aacctgcggc tgctgtccca agagctgggc 1500

gtgaagtggc aggagaagga ctgggtggaa tacagcggcc agatcaagaa gcagatcagc 1560

gataaccaga agctgacaat catgaagcag agaatcaccg ccgagctgaa gaagatgcac 1620

ggcatcgaga acctgaacct gagaatcagc atcgacacca acaagtcccg gcagactgtg 1680

atgaacagaa ttgccctgcc caagggcttc gtgaagaacc acattcagca gaacagcagc 1740

gagaagatca gcaagagaat cagagaggac tactgcaaga tcgagctgtc cggcaagtac 1800

gaagagctga gcagacagtt tttcgacaag aagaactttg acaagatgac cctgatcaac 1860

ggactgtgcg agaagaataa gctcatcgcc ttcatggtga tttacctgct ggagcggctg 1920

ggcttcgagc tgaaggagaa gaccaagctg ggcgagctga agcagacccg gatgacatat 1980

aagatcagcg acaaggtgaa ggaggacatc cccctctcct actaccccaa gctggtgtac 2040

gccatgaatc ggaagtatgt ggacaacatc gatagctacg ccttcgccgc ctacgagtct 2100

aagaaggcca tcctggacaa ggtggacatc attgagaagc agagaatgga attcatcaag 2160

caggtgctgt gcttcgagga atacatcttc gagaacagaa tcatcgagaa gagcaagttc 2220

aacgatgagg agacccacat cagcttcacc cagatccacg acgaactgat caagaagggc 2280

agagataccg aaaagctgag caagctgaag cacgccagaa acaaggccct gcacggcgag 2340

atccccgacg ggaccagctt tgagaaggcc aagctgctga tcaacgaaat caagaagtaa 2400

<210> 28

<211> 2412

<212> DNA

<213> Artificial Sequence (Artificial Sequence)

<220>

<221> misc_feature

<222> (1)..(2412)

<223> Human codon-optimized coding sequences

<400> 28

atgaacgcca tcgagctgaa gaaggaagag gccgccttct acttcaacca ggccagactg 60

aacatctctg gcctggacga aatcatcgag aagcaactgc cacacatcgg ctctaacaga 120

gagaacgcca agaagactgt ggacatgatc ctggataacc ccgaggtgct gaagaagatg 180

gaaaactacg tgttcaactc ccgcgatatt gccaagaatg cccggggcga gctggaggcc 240

ctgctgctga agctggtcga gctgagaaac ttctatagcc actacgtgca caaggacgac 300

gtcaagacac tgagctacgg tgagaagcct ctgctggata agtactacga gatcgccatc 360

gaagccaccg gatccaagga cgtgcggctg gagatcattg acgacaagaa taagctgacc 420

gacgccggag tgctgttcct gctgtgcatg ttcctgaaga agagcgaggc taacaagctg 480

atttccagca tccggggctt caagaggaac gacaaggagg gccagcctag aagaaacctg 540

ttcacctact acagcgtgag agagggctat aaggtggtgc ccgacatgca gaagcacttt 600

ctgctgttca ccctggtgaa ccacctgtcc aatcaggacg agtacatctc caacctgcgc 660

ccaaaccagg aaatcggcca gggcggattt ttccaccgga tcgccagcaa gttcctgagc 720

gacagcggaa tcctgcacag catgaagttc tacacataca gatccaagcg gctgaccgag 780

cagcggggag agctgaagcc caagaaggac cactttacat ggatcgagcc tttccagggc 840

aattcctact tcagcgtgca gggccagaag ggcgtgatcg gagaggagca gctcaaggag 900

ctgtgctacg tgctgctggt ggcccgggag gacttcagag ccgtggaggg caaggtgacc 960

cagttcctga agaagttcca gaatgccaat aacgtgcagc aggtggagaa ggacgaggtg 1020

ctggaaaagg agtacttccc cgccaactac tttgagaacc gggacgtggg aagagtcaag 1080

gacaagatcc tgaacagact gaagaagatc accgagagtt ataaggccaa gggtagagag 1140

gtgaaggcct acgacaagat gaaggaagtg atggagttca tcaacaactg cctgcccacc 1200

gatgaaaacc tgaagctgaa ggactaccgg cggtacctga agatggtgag attctggggc 1260

agagagaagg aaaacatcaa gcgggagttc gactccaaga agtgggagcg ctttctcccc 1320

cgggagctgt ggcagaagag aaacctggag gacgcctacc agctcgccaa ggagaagaac 1380

acagagctgt tcaacaagct gaagaccacc gtggagagaa tgaacgaact ggagttcgag 1440

aagtaccagc agatcaatga cgccaaggac ctggccaacc tgagacagct ggccagagac 1500

tttggagtga agtgggagga aaaggactgg caggaatact ctggacagat caagaagcag 1560

atcaccgacc ggcagaagct gaccatcatg aagcagcgga tcaccgccgc cctgaagaag 1620

aagcagggaa tcgaaaacct gaacctgaga atcacaacag atacgaataa gagcaggaag 1680

gtggtgctga accggatcgc actgcccaag ggattcgtca gaaagcacat cctgaagacc 1740

gacatcaaga tcagcaagca gatccggcag agccagtgcc ctatcatcct gtctaacaac 1800

tacatgaagc tggccaagga gttctttgaa gagcggaact tcgataagat gacccagatc 1860

aatggcctgt tcgagaagaa cgtgctgatc gccttcatga tcgtgtacct gatggagcag 1920

ctgaacctga gactgggcaa gaacaccgag ctgtccaacc tgaagaagac cgaggtgaac 1980

tttaccatca ccgacaaggt gaccgagaag gtgcaaatct cccagtaccc cagcctggtg 2040

ttcgccatta accgggagta cgtggacggc atcagcggct acaagctgcc ccccaagaag 2100

cccaaggaac ctccctacac cttcttcgaa aagatcgacg ccatcgaaaa ggagcggatg 2160

gaattcatca agcaggtgct gggcttcgag gagcacctct tcgaaaagaa cgtgatcgac 2220

aagacccggt ttaccgacac cgccacccac atcagcttca atgagatctg cgatgagctg 2280

atcaagaagg gctgggacga aaacaagatc atcaagctga aggatgcacg gaacgctgcc 2340

ctgcacggca agatccctga agatacctcc tttgacgaag ccaaggtgct gatcaacgaa 2400

ctgaagaagt aa 2412

<210> 29

<211> 102

<212> DNA

<213> Artificial Sequence (Artificial Sequence)

<220>

<221> misc_feature

<222> (1)..(102)

<223> gRNA

<400> 29

gctggagcag cccccgattt gtggggtgat tacagcggtc ttcgatattc aagcgtcgga 60

agacctgctg gagcagcccc cgatttgtgg ggtgattaca gc 102

<210> 30

<211> 711

<212> DNA

<213> Artificial Sequence (Artificial Sequence)

<220>

<221> misc_feature

<222> (1)..(711)

<223> GFP reporter genes

<400> 30

atggtgagca agggcgagga ggataacatg gccatcatca aggagttcat gcgcttcaag 60

gtgcacatgg agggctccgt gaacggccac gagttcgaga tcgagggcga gggcgagggc 120

cgcccctacg agggcaccca gaccgccaag ctgaaggtga ccaagggtgg ccccctgccc 180

ttcgcctggg acatcctgtc ccctcagttc atgtacggct ccaaggccta cgtgaagcac 240

cccgccgaca tccccgacta cttgaagctg tccttccccg agggcttcaa gtgggagcgc 300

gtgatgaact tcgaggacgg cggcgtggtg accgtgaccc aggactcctc cctgcaggac 360

ggcgagttca tctacaaggt gaagctgcgc ggcaccaact tcccctccga cggccccgta 420

atgcagaaga agaccatggg ctgggaggcc tcctccgagc ggatgtaccc cgaggacggc 480

gccctgaagg gcgagatcaa gcagaggctg aagctgaagg acggcggcca ctacgacgct 540

gaggtcaaga ccacctacaa ggccaagaag cccgtgcagc tgcccggcgc ctacaacgtc 600

aacatcaagt tggacatcac ctcccacaac gaggactaca ccatcgtgga acagtacgaa 660

cgcgccgagg gccgccactc caccggcggc atggacgagc tgtacaagta a 711

<210> 31

<211> 720

<212> DNA

<213> Artificial Sequence (Artificial Sequence)

<220>

<221> misc_feature

<222> (1)..(720)

<223> mCherry reporter genes

<400> 31

atggtgagca agggcgagga gctgttcacc ggggtggtgc ccatcctggt cgagctggac 60

ggcgacgtaa acggccacaa gttcagcgtg tccggcgagg gcgagggcga tgccacctac 120

ggcaagctga ccctgaagtt catctgcacc accggcaagc tgcccgtgcc ctggcccacc 180

ctcgtgacca ccctgaccta cggcgtgcag tgcttcagcc gctaccccga ccacatgaag 240

cagcacgact tcttcaagtc cgccatgccc gaaggctacg tccaggagcg caccatcttc 300

ttcaaggacg acggcaacta caagacccgc gccgaggtga agttcgaggg cgacaccctg 360

gtgaaccgca tcgagctgaa gggcatcgac ttcaaggagg acggcaacat cctggggcac 420

aagctggagt acaactacaa cagccacaac gtctatatca tggccgacaa gcagaagaac 480

ggcatcaagg tgaacttcaa gatccgccac aacatcgagg acggcagcgt gcagctcgcc 540

gaccactacc agcagaacac ccccatcggc gacggccccg tgctgctgcc cgacaaccac 600

tacctgagca cccagtccgc cctgagcaaa gaccccaacg agaagcgcga tcacatggtc 660

ctgctggagt tcgtgaccgc cgccgggatc actctcggca tggacgagct gtacaagtga 720

<210> 32

<211> 66

<212> DNA

<213> Artificial Sequence (Artificial Sequence)

<220>

<221> misc_feature

<222> (1)..(66)

<223> SgRNA

<400> 32

gctggagcag cccccgattt gtggggtgat tacagcggtc ttcgatattc aagcgtcgga 60

agacct 66

<210> 33

<211> 66

<212> DNA

<213> Artificial Sequence (Artificial Sequence)

<220>

<221> misc_feature

<222> (1)..(66)

<223> SgRNA

<400> 33

ggtcttcgat attcaagcgt cggaagacct gctggagcag cccccgattt gtggggtgat 60

tacagc 66

<210> 34

<211> 20

<212> DNA

<213> Artificial Sequence (Artificial Sequence)

<220>

<221> misc_feature

<222> (1)..(20)

<223> SgRNA

<400> 34

ttggtgccgc gcagcttcac 20

<210> 35

<211> 25

<212> DNA

<213> Artificial Sequence (Artificial Sequence)

<220>

<221> misc_

<222> (1)..(25)

<223> SgRNA

<400> 35

ttggtgccgc gcagcttcac cttgt 25

<210> 36

<211> 30

<212> DNA

<213> Artificial Sequence (Artificial Sequence)

<220>

<221> misc_feature

<222> (1)..(30)

<223> SgRNA

<400> 36

ttggtgccgc gcagcttcac cttgtagatg 30

<210> 37

<211> 35

<212> DNA

<213> Artificial Sequence (Artificial Sequence)

<220>

<221> misc_feature

<222> (1)..(35)

<223> SgRNA

<400> 37

ttggtgccgc gcagcttcac cttgtagatg aactc 35

<210> 38

<211> 40

<212> DNA

<213> Artificial Sequence (Artificial Sequence)

<220>

<221> misc_feature

<222> (1)..(40)

<400> 38

ttggtgccgc gcagcttcac cttgtagatg aactcgccgt 40

<210> 39

<211> 45

<212> DNA

<213> Artificial Sequence (Artificial Sequence)

<220>

<221> misc_feature

<222> (1)..(45)

<223> SgRNA

<400> 39

ttggtgccgc gcagcttcac cttgtagatg aactcgccgt cctgc 45

<210> 40

<211> 50

<212> DNA

<213> Artificial Sequence (Artificial Sequence)

<220>

<221> misc_feature

<222> (1)..(50)

<223> SgRNA

<400> 40

ttggtgccgc gcagcttcac cttgtagatg aactcgccgt cctgcaggga 50

<210> 41

<211> 3615

<212> DNA

<213> Artificial Sequence (Artificial Sequence)

<220>

<221> misc_feature

<222> (1)..(3615)

<223> dCas13e.1-ADAR2DD

<400> 41

atgcccaaga agaagcggaa ggtggcccag gtgagcaagc agacctccaa gaagagggag 60

ctgagcatcg acgagtacca gggcgcccgg aagtggtgct tcaccattgc cttcaacaag 120

gccctggtga accgggacaa gaacgacggc ctgttcgtgg aaagcctgct gagacacgag 180

aagtacagca agcacgactg gtacgacgaa gatacccggg ccctgatcaa gtgcagcacc 240

caggccgcca acgccaaggc tgaagccctg gcgaactact tcagtgctta ccggcatagc 300

cctggctgcc tgaccttcac cgccgaggac gaactgcgga ccatcatgga gagagcctat 360

gagcgggcca tcttcgagtg cagaagaaga gagacagagg tgatcatcga gtttcccagc 420

ctgttcgagg gcgaccggat caccaccgcc ggcgtggtgt ttttcgtgag ctttttcgtg 480

gaaagaagag tgctggatcg gctgtatgga gccgtgtccg gcctgaagaa gaatgaggga 540

cagtacaagc tgacccggaa ggccctgagc atgtactgcc tgaaggacag cagattcacc 600

aaggcctggg ataagcgggt gctgctgttc agagacatcc tggcccagct gggaagaatc 660

cccgccgagg cctacgagta ctaccacggc gagcagggtg ataagaagag agctaacgac 720

aatgagggca caaatcccaa gcggcacaag gacaagttca tcgaatttgc actgcactac 780

ctggaagccc agcacagcga gatctgcttc ggcagacgcc acatcgtgcg ggaagaggcc 840

ggcgccggcg atgagcacaa gaagcaccgg accaagggaa aggtggtggt ggacttcagc 900

aagaaggacg aggaccagag ctactatatc tccaagaaca acgtgatcgt gcggatcgac 960

aagaacgccg gccctagaag ctaccggatg ggcctgaacg agctgaagta cctcgtgctg 1020

ctgagcctgc aggggaaggg cgacgatgcc atcgccaagc tgtacagata cagacagcac 1080

gtggagaaca tcctggatgt ggtgaaggtg accgataagg ataaccacgt gttcctgccc 1140

cgcttcgtgc tggagcagca cggcatcggc agaaaggcct tcaagcagcg gatcgatgga 1200

cgggtgaagc acgtgcgggg cgtgtgggag aagaagaagg ccgccaccaa tgaaatgacc 1260

ctgcacgaga aggccagaga catcctgcag tacgtgaacg aaaactgcac ccggtccttc 1320

aaccctggcg aatacaacag actgctggtg tgcctggtgg gcaaggacgt ggagaacttt 1380

caggccggcc tgaagcggct gcagctggcc gaaaggatcg atggccgggt gtactccatc 1440

ttcgcccaga ccagcaccat caatgagatg caccaggtgg tgtgcgacca gatcctgaac 1500

cggctgtgca gaatcggcga ccagaagctg tacgattacg tgggactggg caagaaggac 1560

gaaatcgact acaagcagaa ggtggcctgg ttcaaggagc acatcagcat ccggagagga 1620

ttcctgagaa agaagttctg gtacgatagc aagaagggat tcgcaaagct ggtggaggaa 1680

cacctggagt ccggcggcgg ccagcgcgac gtgggcctgg acaagaagta ctaccacatc 1740

gacgccatcg gcagattcga gggcgccaac cccgccctgt acgagaccct ggccagagat 1800

cggctgtgcc tcatgatggc ccagtacttc ctgggcagcg tgagaaagga actgggcaac 1860

aagattgtgt ggagcaacga cagcatcgaa ctgcctgtgg aaggctctgt gggaaatgag 1920

aagagcatcg tgttctccgt gtctgactac ggcaagctgt acgtgctgga cgatgccgaa 1980

ttcctgggcc ggatctgcga atacttcatg ccccacgaaa agggcaagat ccggtaccac 2040

acagtgtacg aaaagggctt tagagcatac aacgacctgc agaagaagtg cgtggaggcc 2100

gtgctggctt tcgaagagaa ggtggtgaag gccaagaaga tgagcgagaa ggaaggcgcc 2160

cactacatcg acttccggga gatcctggcc cagaccatgt gcaaggaggc cgagaagacc 2220

gcagtgaaca aggtggcggc tgccttcttc gctgcgcacc tgaagttcgt gattgacgag 2280

ttcggcctgt tcagcgacgt gatgaagaag tacggcatcg agaaggaatg gaagttccct 2340

gtcaagccca agaagaagcg gaaggtgggt ggaggcggag gttctggggg aggaggtagt 2400

ggcggtggtg gttcaggagg cggcggaagc cagctgcatt taccgcaggt tttagctgac 2460

gctgtctcac gcctggtcct gggtaagttt ggtgacctga ccgacaactt ctcctcccct 2520

cacgctcgca gaaaagtgct ggctggagtc gtcatgacaa caggcacaga tgttaaagat 2580

gccaaggtga taagtgtttc tacaggaggc aaatgtatta atggtgaata catgagtgat 2640

cgtggccttg cattaaatga ctgccatgca gaaataatat ctcggagatc cttgctcaga 2700

tttctttata cacaacttga gctttactta aataacaaag atgatcaaaa aagatccatc 2760

tttcagaaat cagagcgagg ggggtttagg ctgaaggaga atgtccagtt tcatctgtac 2820

atcagcacct ctccctgtgg agatgccaga atcttctcac cacatgagcc aatcctggaa 2880

gaaccagcag atagacaccc aaatcgtaaa gcaagaggac agctacggac caaaatagag 2940

tctggtcagg ggacgattcc agtgcgctcc aatgcgagca tccaaacgtg ggacggggtg 3000

ctgcaagggg agcggctgct caccatgtcc tgcagtgaca agattgcacg ctggaacgtg 3060

gtgggcatcc agggatcact gctcagcatt ttcgtggagc ccatttactt ctcgagcatc 3120

atcctgggca gcctttacca cggggaccac ctttccaggg ccatgtacca gcggatctcc 3180

aacatagagg acctgccacc tctctacacc ctcaacaagc ctttgctcag tggcatcagc 3240

aatgcagaag cacggcagcc agggaaggcc cccaacttca gtgtcaactg gacggtaggc 3300

gactccgcta ttgaggtcat caacgccacg actgggaagg atgagctggg ccgcgcgtcc 3360

cgcctgtgta agcacgcgtt gtactgtcgc tggatgcgtg tgcacggcaa ggttccctcc 3420

cacttactac gctccaagat taccaagccc aacgtgtacc atgagtccaa gctggcggca 3480

aaggagtacc aggccgccaa ggcgcgtctg ttcacagcct tcatcaaggc ggggctgggg 3540

gcctgggtgg agaagcccac cgagcaggac cagttctcac tcacgtaccc atacgacgta 3600

ccagattacg cttaa 3615

<210> 42

<211> 711

<212> DNA

<213> Artificial Sequence (Artificial Sequence)

<220>

<221> misc_feature

<222> (1)..(711)

<223> mutated mCherry

<400> 42

atggtgagca agggcgagga ggataacatg gccatcatca aggagttcat gcgcttcaag 60

gtgcacatgg agggctccgt gaacggccac gagttcgaga tcgagggcga gggcgagggc 120

cgcccctacg agggcaccca gaccgccaag ctgaaggtga ccaagggtgg ccccctgccc 180

ttcgcctggg acatcctgtc ccctcagttc atgtacggct ccaaggccta cgtgaagcac 240

cccgccgaca tccccgacta cttgaagctg tccttccccg agggcttcaa gtaggagcgc 300

gtgatgaact tcgaggacgg cggcgtggtg accgtgaccc aggactcctc cctgcaggac 360

ggcgagttca tctacaaggt gaagctgcgc ggcaccaact tcccctccga cggccccgta 420

atgcagaaga agaccatggg ctgggaggcc tcctccgagc ggatgtaccc cgaggacggc 480

gccctgaagg gcgagatcaa gcagaggctg aagctgaagg acggcggcca ctacgacgct 540

gaggtcaaga ccacctacaa ggccaagaag cccgtgcagc tgcccggcgc ctacaacgtc 600

aacatcaagt tggacatcac ctcccacaac gaggactaca ccatcgtgga acagtacgaa 660

cgcgccgagg gccgccactc caccggcggc atggacgagc tgtacaagta a 711

<210> 43

<211> 86

<212> DNA

<213> Artificial Sequence (Artificial Sequence)

<220>

<221> misc_feature

<222> (1)..(86)

<223> gRNA

<400> 43

caagtagtcg gggatgtcgg cggggtgctt cacctaggcc ttggagccgt gctggagcag 60

cccccgattt gtggggtgat tacagc 86

<210> 44

<211> 86

<212> DNA

<213> Artificial Sequence (Artificial Sequence)

<220>

<221> misc_feature

<222> (1)..(86)

<223> gRNA

<400> 44

cggggatgtc ggcggggtgc ttcacctagg ccttggagcc gtacatgaac gctggagcag 60

cccccgattt gtggggtgat tacagc 86

<210> 45

<211> 3489

<212> DNA

<213> Artificial Sequence (Artificial Sequence)

<220>

<221> misc_feature

<222> (1)..(3489)

<223> LwaCas13a

<400> 45

atgcccaaga agaagcggaa ggtgggatcc atgaaagtga ccaaggtcga tggcatcagc 60

cacaagaagt acatcgaaga gggcaagctc gtgaagtcca ccagcgagga aaaccggacc 120

agcgagagac tgagcgagct gctgagcatc cggctggaca tctacatcaa gaaccccgac 180

aacgcctccg aggaagagaa ccggatcaga agagagaacc tgaagaagtt ctttagcaac 240

aaggtgctgc acctgaagga cagcgtgctg tatctgaaga accggaaaga aaagaacgcc 300

gtgcaggaca agaactatag cgaagaggac atcagcgagt acgacctgaa aaacaagaac 360

agcttctccg tgctgaagaa gatcctgctg aacgaggacg tgaactctga ggaactggaa 420

atctttcgga aggacgtgga agccaagctg aacaagatca acagcctgaa gtacagcttc 480

gaagagaaca aggccaacta ccagaagatc aacgagaaca acgtggaaaa agtgggcggc 540

aagagcaagc ggaacatcat ctacgactac tacagagaga gcgccaagcg caacgactac 600

atcaacaacg tgcaggaagc cttcgacaag ctgtataaga aagaggatat cgagaaactg 660

tttttcctga tcgagaacag caagaagcac gagaagtaca agatccgcga gtactatcac 720

aagatcatcg gccggaagaa cgacaaagag aacttcgcca agattatcta cgaagagatc 780

cagaacgtga acaacatcaa agagctgatt gagaagatcc ccgacatgtc tgagctgaag 840

aaaagccagg tgttctacaa gtactacctg gacaaagagg aactgaacga caagaatatt 900

aagtacgcct tctgccactt cgtggaaatc gagatgtccc agctgctgaa aaactacgtg 960

tacaagcggc tgagcaacat cagcaacgat aagatcaagc ggatcttcga gtaccagaat 1020

ctgaaaaagc tgatcgaaaa caaactgctg aacaagctgg acacctacgt gcggaactgc 1080

ggcaagtaca actactatct gcaagtgggc gagatcgcca cctccgactt tatcgcccgg 1140

aaccggcaga acgaggcctt cctgagaaac atcatcggcg tgtccagcgt ggcctacttc 1200

agcctgagga acatcctgga aaccgagaac gagaacgata tcaccggccg gatgcggggc 1260

aagaccgtga agaacaacaa gggcgaagag aaatacgtgt ccggcgaggt ggacaagatc 1320

tacaatgaga acaagcagaa cgaagtgaaa gaaaatctga agatgttcta cagctacgac 1380

ttcaacatgg acaacaagaa cgagatcgag gacttcttcg ccaacatcga cgaggccatc 1440

agcagcatca gacacggcat cgtgcacttc aacctggaac tggaaggcaa ggacatcttc 1500

gccttcaaga atatcgcccc cagcgagatc tccaagaaga tgtttcagaa cgaaatcaac 1560

gaaaagaagc tgaagctgaa aatcttcaag cagctgaaca gcgccaacgt gttcaactac 1620

tacgagaagg atgtgatcat caagtacctg aagaatacca agttcaactt cgtgaacaaa 1680

aacatcccct tcgtgcccag cttcaccaag ctgtacaaca agattgagga cctgcggaat 1740

accctgaagt ttttttggag cgtgcccaag gacaaagaag agaaggacgc ccagatctac 1800

ctgctgaaga atatctacta cggcgagttc ctgaacaagt tcgtgaaaaa ctccaaggtg 1860

ttctttaaga tcaccaatga agtgatcaag attaacaagc agcggaacca gaaaaccggc 1920

cactacaagt atcagaagtt cgagaacatc gagaaaaccg tgcccgtgga atacctggcc 1980

atcatccaga gcagagagat gatcaacaac caggacaaag aggaaaagaa tacctacatc 2040

gactttattc agcagatttt cctgaagggc ttcatcgact acctgaacaa gaacaatctg 2100

aagtatatcg agagcaacaa caacaatgac aacaacgaca tcttctccaa gatcaagatc 2160

aaaaaggata acaaagagaa gtacgacaag atcctgaaga actatgagaa gcacaatcgg 2220

aacaaagaaa tccctcacga gatcaatgag ttcgtgcgcg agatcaagct ggggaagatt 2280

ctgaagtaca ccgagaatct gaacatgttt tacctgatcc tgaagctgct gaaccacaaa 2340

gagctgacca acctgaaggg cagcctggaa aagtaccagt ccgccaacaa agaagaaacc 2400

ttcagcgacg agctggaact gatcaacctg ctgaacctgg acaacaacag agtgaccgag 2460

gacttcgagc tggaagccaa cgagatcggc aagttcctgg acttcaacga aaacaaaatc 2520

aaggaccgga aagagctgaa aaagttcgac accaacaaga tctatttcga cggcgagaac 2580

atcatcaagc accgggcctt ctacaatatc aagaaatacg gcatgctgaa tctgctggaa 2640

aagatcgccg ataaggccaa gtataagatc agcctgaaag aactgaaaga gtacagcaac 2700

aagaagaatg agattgaaaa gaactacacc atgcagcaga acctgcaccg gaagtacgcc 2760

agacccaaga aggacgaaaa gttcaacgac gaggactaca aagagtatga gaaggccatc 2820

ggcaacatcc agaagtacac ccacctgaag aacaaggtgg aattcaatga gctgaacctg 2880

ctgcagggcc tgctgctgaa gatcctgcac cggctcgtgg gctacaccag catctgggag 2940

cgggacctga gattccggct gaagggcgag tttcccgaga accactacat cgaggaaatt 3000

ttcaatttcg acaactccaa gaatgtgaag tacaaaagcg gccagatcgt ggaaaagtat 3060

atcaacttct acaaagaact gtacaaggac aatgtggaaa agcggagcat ctactccgac 3120

aagaaagtga agaaactgaa gcaggaaaaa aaggacctgt acatccggaa ctacattgcc 3180

cacttcaact acatccccca cgccgagatt agcctgctgg aagtgctgga aaacctgcgg 3240

aagctgctgt cctacgaccg gaagctgaag aacgccatca tgaagtccat cgtggacatt 3300

ctgaaagaat acggcttcgt ggccaccttc aagatcggcg ctgacaagaa gatcgaaatc 3360

cagaccctgg aatcagagaa gatcgtgcac ctgaagaatc tgaagaaaaa gaaactgatg 3420

accgaccgga acagcgagga actgtgcgaa ctcgtgaaag tcatgttcga gtacaaggcc 3480

ctggaatga 3489

<210> 46

<211> 3312

<212> DNA

<213> Artificial Sequence (Artificial Sequence)

<220>

<221> misc_feature

<222> (1)..(3312)

<223> PspCas13b

<400> 46

atgcccaaga agaagcggaa ggtggtcgac aacatccccg ctctggtgga aaaccagaag 60

aagtactttg gcacctacag cgtgatggcc atgctgaacg ctcagaccgt gctggaccac 120

atccagaagg tggccgatat tgagggcgag cagaacgaga acaacgagaa tctgtggttt 180

caccccgtga tgagccacct gtacaacgcc aagaacggct acgacaagca gcccgagaaa 240

accatgttca tcatcgagcg gctgcagagc tacttcccat tcctgaagat catggccgag 300

aaccagagag agtacagcaa cggcaagtac aagcagaacc gcgtggaagt gaacagcaac 360

gacatcttcg aggtgctgaa gcgcgccttc ggcgtgctga agatgtacag ggacctgacc 420

aaccactaca agacctacga ggaaaagctg aacgacggct gcgagttcct gaccagcaca 480

gagcaacctc tgagcggcat gatcaacaac tactacacag tggccctgcg gaacatgaac 540

gagagatacg gctacaagac agaggacctg gccttcatcc aggacaagcg gttcaagttc 600

gtgaaggacg cctacggcaa gaaaaagtcc caagtgaata ccggattctt cctgagcctg 660

caggactaca acggcgacac acagaagaag ctgcacctga gcggagtggg aatcgccctg 720

ctgatctgcc tgttcctgga caagcagtac atcaacatct ttctgagcag gctgcccatc 780

ttctccagct acaatgccca gagcgaggaa cggcggatca tcatcagatc cttcggcatc 840

aacagcatca agctgcccaa ggaccggatc cacagcgaga agtccaacaa gagcgtggcc 900

atggatatgc tcaacgaagt gaagcggtgc cccgacgagc tgttcacaac actgtctgcc 960

gagaagcagt cccggttcag aatcatcagc gacgaccaca atgaagtgct gatgaagcgg 1020

agcagcgaca gattcgtgcc tctgctgctg cagtatatcg attacggcaa gctgttcgac 1080

cacatcaggt tccacgtgaa catgggcaag ctgagatacc tgctgaaggc cgacaagacc 1140

tgcatcgacg gccagaccag agtcagagtg atcgagcagc ccctgaacgg cttcggcaga 1200

ctggaagagg ccgagacaat gcggaagcaa gagaacggca ccttcggcaa cagcggcatc 1260

cggatcagag acttcgagaa catgaagcgg gacgacgcca atcctgccaa ctatccctac 1320

atcgtggaca cctacacaca ctacatcctg gaaaacaaca aggtcgagat gtttatcaac 1380

gacaaagagg acagcgcccc actgctgccc gtgatcgagg atgatagata cgtggtcaag 1440

acaatcccca gctgccggat gagcaccctg gaaattccag ccatggcctt ccacatgttt 1500

ctgttcggca gcaagaaaac cgagaagctg atcgtggacg tgcacaaccg gtacaagaga 1560

ctgttccagg ccatgcagaa agaagaagtg accgccgaga atatcgccag cttcggaatc 1620

gccgagagcg acctgcctca gaagatcctg gatctgatca gcggcaatgc ccacggcaag 1680

gatgtggacg ccttcatcag actgaccgtg gacgacatgc tgaccgacac cgagcggaga 1740

atcaagagat tcaaggacga ccggaagtcc attcggagcg ccgacaacaa gatgggaaag 1800

agaggcttca agcagatctc cacaggcaag ctggccgact tcctggccaa ggacatcgtg 1860

ctgtttcagc ccagcgtgaa cgatggcgag aacaagatca ccggcctgaa ctaccggatc 1920

atgcagagcg ccattgccgt gtacgatagc ggcgacgatt acgaggccaa gcagcagttc 1980

aagctgatgt tcgagaaggc ccggctgatc ggcaagggca caacagagcc tcatccattt 2040

ctgtacaagg tgttcgcccg cagcatcccc gccaatgccg tcgagttcta cgagcgctac 2100

ctgatcgagc ggaagttcta cctgaccggc ctgtccaacg agatcaagaa aggcaacaga 2160

gtggatgtgc ccttcatccg gcgggaccag aacaagtgga aaacacccgc catgaaaacc 2220

ctgggcagaa tctacagcga ggatctgccc gtggaactgc ccagacagat gttcgacaat 2280

gagatcaagt cccacctgaa gtccctgcca cagatggaag gcatcgactt caacaatgcc 2340

aacgtgacct atctgatcgc cgagtacatg aagagagtgc tggacgacga cttccagacc 2400

ttctaccagt ggaaccgcaa ctaccggtac atggacatgc ttaagggcga gtacgacaga 2460

aagggctccc tgcagcactg cttcaccagc gtggaagaga gagaaggcct ctggaaagag 2520

cgggcctcca gaacagagcg gtacagaaag caggccagca acaagatccg cagcaaccgg 2580

cagatgagaa acgccagcag cgaagagatc gagacaatcc tggataagcg gctgagcaac 2640

agccggaacg agtaccagaa aagcgagaaa gtgatccggc gctacagagt gcaggatgcc 2700

ctgctgtttc tgctggccaa aaagaccctg accgaactgg ccgatttcga cggcgagagg 2760

ttcaaactga aagaaatcat gcccgacgcc gagaagggaa tcctgagcga gatcatgccc 2820

atgagcttca ccttcgagaa aggcggcaag aagtacacca tcaccagcga gggcatgaag 2880

ctgaagaact acggcgactt ctttgtgctg gctagcgaca agaggatcgg caacctgctg 2940

gaactcgtgg gcagcgacat cgtgtccaaa gaggatatca tggaagagtt caacaaatac 3000

gaccagtgca ggcccgagat cagctccatc gtgttcaacc tggaaaagtg ggccttcgac 3060

acataccccg agctgtctgc cagagtggac cgggaagaga aggtggactt caagagcatc 3120

ctgaaaatcc tgctgaacaa caagaacatc aacaaagagc agagcgacat cctgcggaag 3180

atccggaacg ccttcgatca caacaattac cccgacaaag gcgtggtgga aatcaaggcc 3240

ctgcctgaga tcgccatgag catcaagaag gcctttgggg agtacgccat catgaaggga 3300

tcccttcaat ga 3312

<210> 47

<211> 2934

<212> DNA

<213> Artificial Sequence (Artificial Sequence)

<220>

<221> misc_feature

<222> (1)..(2934)

<223> RxCas13d

<400> 47

atgcctaaaa agaaaagaaa ggtgggttct ggtatcgaga agaagaagag cttcgccaag 60

ggcatgggag tgaagagcac cctggtgtcc ggctctaagg tgtacatgac cacatttgct 120

gagggaagcg acgccaggct ggagaagatc gtggagggcg atagcatcag atccgtgaac 180

gagggagagg ctttcagcgc cgagatggct gacaagaacg ctggctacaa gatcggaaac 240

gccaagtttt cccacccaaa gggctacgcc gtggtggcta acaacccact gtacaccgga 300

ccagtgcagc aggacatgct gggactgaag gagacactgg agaagaggta cttcggcgag 360

tccgccgacg gaaacgataa catctgcatc caggtcatcc acaacatcct ggatatcgag 420

aagatcctgg ctgagtacat cacaaacgcc gcttacgccg tgaacaacat ctccggcctg 480

gacaaggata tcatcggctt cggaaagttt tctaccgtgt acacatacga cgagttcaag 540

gatccagagc accaccgggc cgcttttaac aacaacgaca agctgatcaa cgccatcaag 600

gctcagtacg acgagttcga taactttctg gataacccca ggctgggcta cttcggacag 660

gctttctttt ctaaggaggg cagaaactac atcatcaact acggaaacga gtgttacgac 720

atcctggccc tgctgagcgg actgaggcac tgggtggtgc acaacaacga ggaggagtct 780

cggatcagcc gcacctggct gtacaacctg gacaagaacc tggataacga gtacatctcc 840

acactgaact acctgtacga caggatcacc aacgagctga caaacagctt ctccaagaac 900

tctgccgcta acgtgaacta catcgctgag accctgggca tcaacccagc tgagttcgct 960

gagcagtact tcagattttc catcatgaag gagcagaaga acctgggctt caacatcaca 1020

aagctgagag aagtgatgct ggacagaaag gatatgtccg agatcaggaa gaaccacaag 1080

gtgttcgatt ctatcagaac caaggtgtac acaatgatgg actttgtgat ctacaggtac 1140

tacatcgagg aggatgccaa ggtggccgct gccaacaaga gcctgcccga caacgagaag 1200

tctctgagcg agaaggatat cttcgtgatc aacctgagag gctcctttaa cgacgatcag 1260

aaggacgctc tgtactacga tgaggccaac aggatctgga gaaagctgga gaacatcatg 1320

cacaacatca aggagttccg gggaaacaag acccgcgagt acaagaagaa ggacgctcca 1380

aggctgccta ggatcctgcc tgctggaagg gacgtgagcg ccttcagcaa gctgatgtac 1440

gccctgacaa tgtttctgga cggaaaggag atcaacgatc tgctgaccac actgatcaac 1500

aagttcgaca acatccagtc ttttctgaaa gtgatgcctc tgatcggcgt gaacgctaag 1560

ttcgtggagg agtacgcctt ctttaaggac agcgccaaga tcgctgatga gctgcggctg 1620

atcaagtcct ttgccaggat gggagagcca atcgctgacg ctaggagagc tatgtacatc 1680

gatgccatcc ggatcctggg aaccaacctg tcttacgacg agctgaaggc tctggccgac 1740

accttcagcc tggatgagaa cggcaacaag ctgaagaagg gcaagcacgg aatgcgcaac 1800

ttcatcatca acaacgtgat cagcaacaag cggtttcact acctgatcag atacggcgac 1860

ccagctcacc tgcacgagat cgctaagaac gaggccgtgg tgaagttcgt gctgggacgg 1920

atcgccgata tccagaagaa gcagggccag aacggaaaga accagatcga ccgctactac 1980

gagacctgca tcggcaagga taagggaaag tccgtgtctg agaaggtgga cgctctgacc 2040

aagatcatca caggcatgaa ctacgaccag ttcgataaga agagatctgt gatcgaggac 2100

accggaaggg agaacgccga gagagagaag tttaagaaga tcatcagcct gtacctgaca 2160

gtgatctacc acatcctgaa gaacatcgtg aacatcaacg ctagatacgt gatcggcttc 2220

cactgcgtgg agcgcgatgc ccagctgtac aaggagaagg gatacgacat caacctgaag 2280

aagctggagg agaagggctt tagctccgtg accaagctgt gcgctggaat cgacgagaca 2340

gcccccgaca agaggaagga tgtggagaag gagatggccg agagagctaa ggagagcatc 2400

gactccctgg agtctgctaa ccctaagctg tacgccaact acatcaagta ctccgatgag 2460

aagaaggccg aggagttcac caggcagatc aacagagaga aggccaagac cgctctgaac 2520

gcctacctga ggaacacaaa gtggaacgtg atcatccggg aggacctgct gcgcatcgat 2580

aacaagacct gtacactgtt ccggaacaag gctgtgcacc tggaggtggc tcgctacgtg 2640

cacgcctaca tcaacgacat cgccgaggtg aactcctact ttcagctgta ccactacatc 2700

atgcagagga tcatcatgaa cgagagatac gagaagtcta gcggcaaggt gtctgagtac 2760

ttcgacgccg tgaacgatga gaagaagtac aacgatagac tgctgaagct gctgtgcgtg 2820

cctttcggat actgtatccc acggtttaag aacctgagca tcgaggccct gttcgaccgc 2880

aacgaggctg ccaagtttga taaggagaag aagaaggtga gcggcaactc ctga 2934

<210> 48

<211> 30

<212> DNA

<213> Intelligent (Homo sapiens)

<400> 48

atggcccttc gcagctcttg cacgtcatac 30

<210> 49

<211> 30

<212> DNA

<213> Intelligent (Homo sapiens)

<400> 49

ttaggcagcc ctcatcagtg ccggctccct 30

<210> 50

<211> 30

<212> DNA

<213> Intelligent (Homo sapiens)

<400> 50

ggccaggatc tcaattaggc agccctcatc 30

132页详细技术资料下载
上一篇:一种医用注射器针头装配设备
下一篇:一种混合均匀的微生物科技用生物发酵工艺

网友询问留言

已有0条留言

还没有人留言评论。精彩留言会获得点赞!

精彩留言,会给你点赞!