Modified CPF1 guide RNA

文档序号:1060771 发布日期:2020-10-13 浏览:21次 中文

阅读说明:本技术 修饰的cpf1引导rna (Modified CPF1 guide RNA ) 是由 K·李 于 2018-10-02 设计创作,主要内容包括:本发明提供了包含Cpf1 crRNA、Cpf1 crRNA 5’的加工序列和加工序列5’的延伸序列的核酸。本发明还提供了包含核酸、载体和任选地Cpf1的组合物。另外,本发明提供了遗传修饰真核靶细胞的方法,其包括使真核靶细胞与核酸或组合物接触,以遗传修饰细胞中的靶核酸。(The invention provides nucleic acids comprising a Cpf1crRNA, a processing sequence 5 'to the Cpf1crRNA, and an extension 5' to the processing sequence. The present invention also provides compositions comprising a nucleic acid, a vector, and optionally Cpf 1. In addition, the invention provides methods of genetically modifying eukaryotic target cells, comprising contacting a eukaryotic target cell with a nucleic acid or composition to genetically modify a target nucleic acid in the cell.)

1. A nucleic acid comprising a Cpf1crRNA, an extension sequence 5' of the crRNA, and optionally a processing sequence between the crRNA and the extension sequence, wherein the processing sequence is a sequence self-cleaved by Cpf 1.

2. The nucleic acid of claim 1, wherein said nucleic acid comprises a processing sequence and said processing sequence comprises a fragment of a direct repeat of a Cpf1 array, wherein said direct repeat comprises a crRNA sequence portion and a processing portion located 5' to said crRNA sequence portion and said fragment comprises at least 5 contiguous nucleotides of a processing portion of said direct repeat.

3. The nucleic acid of claim 2, wherein the processing sequence comprises a fragment of at least 10 nucleotides of the processing portion of the direct repeat.

4. The nucleic acid of claim 2, wherein the processing sequence comprises the entire processed portion of the direct repeat sequence.

5. The nucleic acid of any one of claims 1-4, wherein the extension sequence does not comprise the sequence of the processing sequence or the crRNA.

6. The nucleic acid of any one of claims 1-5, wherein the extended sequence comprises at least 2 nucleotides.

7. The nucleic acid of any one of claims 1-6, wherein the extended sequence comprises 10 to 100 nucleotides.

8. The nucleic acid of any one of claims 1 to 7, wherein the nucleic acid comprises only a single Cpf1crRNA sequence.

9. The nucleic acid of any one of claims 1 to 8, further comprising a second processing sequence 5 'to the extension sequence and a second extension sequence 5' to the second processing sequence.

10. The nucleic acid of any one of claims 1-9, further comprising a donor nucleic acid hybridized or covalently linked thereto.

11. The nucleic acid of claim 10, wherein the donor nucleic acid is covalently linked to 5 'of the processing sequence or 5' of the extension sequence.

12. The nucleic acid of claim 11, wherein the donor nucleic acid is linked to the processing sequence or extension sequence by a linker group.

13. The nucleic acid of claim 10, wherein the donor nucleic acid hybridizes to the extension sequence and/or processing sequence.

14. The nucleic acid of any one of claims 1 to 13, further comprising a targeting nucleic acid for the Cpf1crRNA 3'.

15. The nucleic acid of any one of claims 1-14, wherein the extended sequence comprises less than about 60 nucleotides.

16. The nucleic acid of claim 15, wherein the extended sequence comprises less than about 20 nucleotides.

17. The nucleic acid of claim 15, wherein the extended sequence comprises about 2-20 nucleotides.

18. The nucleic acid of any one of claims 1-17, wherein the nucleic acid does not comprise a processing sequence.

19. The nucleic acid of claim 18, wherein the nucleic acid further comprises a donor nucleic acid covalently linked 5' of the extension sequence.

20. The nucleic acid of claim 19, wherein the donor nucleic acid is linked by a linker group.

21. The nucleic acid of claim 18, wherein the nucleic acid further comprises a donor nucleic acid that hybridizes to the extension sequence.

22. The nucleic acid of any one of claims 18 to 21, further comprising a targeting nucleic acid for the Cpf1crRNA 3'.

23. The nucleic acid of any one of claims 1-22, wherein the extension sequence comprises a self-hybridizing sequence.

24. The nucleic acid of any one of claims 1-23, wherein the extension sequence comprises a semi-stable hairpin structure, a pseudoknot structure, a G-quadruplex structure, a bulge loop structure, an inner loop structure, a branched loop structure, or a combination thereof.

25. The nucleic acid of claim 24, wherein the extended sequence comprises a repetitive trinucleotide motif.

26. The nucleic acid of any one of claims 25, wherein the repetitive trinucleotide motif is CAA, UUG, AAG, CUU, CCU, CCA, UAA, or a combination thereof.

27. The nucleic acid of any one of claims 25, wherein the repetitive trinucleotide motif is CAU, CUA, UUA, AUG, UAG, or a combination thereof.

28. The nucleic acid of any one of claims 25, wherein the repetitive trinucleotide motif is CGA, CGU, CGG, CAG, CUG, CCG, or a combination thereof.

29. The nucleic acid of any one of claims 25, wherein the repetitive trinucleotide motif is a CNG motif or a combination of CNG motifs, optionally with CGA or CGU.

30. The nucleic acid of any one of claims 25, wherein the repetitive trinucleotide motif is AGG, UGG, or a combination thereof.

31. The nucleic acid of any one of claims 1-25, wherein the extension sequence comprises a combination of repetitive trinucleotide motifs of any one of claims 26 to 30.

32. The nucleic acid of any one of claims 1-31, wherein the extension sequence or a portion thereof is resistant to nuclease degradation.

33. The nucleic acid of any one of claims 1-32, wherein the extension sequence comprises one or more modified internucleotide linkages.

34. The nucleic acid of claim 33, wherein a region of 4 or more contiguous nucleotides of the extended sequence, or the entire extended sequence, has modified internucleotide linkages.

35. The nucleic acid of claim 33 or 34, wherein the modified internucleotide linkages comprise phosphorothioate, phosphorodithioate, methylphosphonate, phosphoramidate, 2' -O-methyl, 2' -O-methoxyethyl, 2' -fluoro, Bridged Nucleic Acid (BNA), or phosphotriester modified linkages, or combinations thereof.

36. The nucleic acid of any one of claims 1-35, wherein the extension sequence comprises one or more Xenogenic Nucleic Acids (XNA).

37. The nucleic acid of any one of claims 1-36, wherein the nucleic acid further comprises a biotin and/or avidin or streptavidin molecule attached to the 5' end of the extension sequence.

38. The nucleic acid of claim 37, wherein the nucleic acid further comprises a biotin molecule attached to the 5' end of the extension sequence, an avidin or streptavidin molecule conjugated to the biotin molecule, and optionally, a targeting construct comprising a biotin molecule conjugated to avidin or streptavidin.

39. The nucleic acid of claim 38, wherein the targeting construct is a peptide comprising a biotin molecule attached thereto.

40. A composition comprising the nucleic acid of any one of claims 1-39 and a vector, and optionally further comprising a Cpf1 protein.

41. The composition of claim 40, wherein the composition is substantially free of divalent metal ions that promote Cpf1 cleavage.

42. The composition of claim 40 or 41, wherein the composition comprises a cationic lipid.

43. The composition of any one of claims 40-42, wherein the nucleic acid is in a liposome.

44. The composition of any one of claims 40-42, wherein the nucleic acid is partially or fully encapsulated by, or attached to, a metal or polymeric nanoparticle.

45. A method of genetically modifying a eukaryotic target cell, comprising contacting the eukaryotic target cell with the nucleic acid of any one of claims 1-39 or the composition of any one of claims 40-44 to genetically modify a target nucleic acid.

46. The method of claim 45, wherein the Cpf1crRNA comprises a targeting sequence that hybridizes to a target sequence in the target cell.

47. The method of claim 45 or 46, wherein the target cell is a mammalian cell, optionally a human cell.

48. The method of any one of claims 45-47, wherein the nucleic acid construct is cleaved upon entry into the cell, thereby releasing the Cpf1crRNA from the remainder of the nucleic acid construct.

Background

RNA-guided endonucleases have proven to be effective tools for genome engineering in multiple cell types and microorganisms. RNA-guided endonucleases generate site-specific double-stranded DNA breaks or single-stranded DNA breaks within a target nucleic acid. When cleavage of a target nucleic acid occurs within a cell, breaks in the nucleic acid can be repaired by non-homologous end joining (NHEJ) or Homology Directed Repair (HDR).

RNA-guided endonucleases and their gene editing components (e.g., guide RNA) are delivered directly into cells, both in vitro and in vivo, with great potential as therapeutic strategies for the treatment of genetic diseases. However, at present, it is challenging to deliver these components directly into cells with reasonable efficiency.

Therefore, there is a need to identify new compositions and related methods to improve cellular delivery and other properties of RNA-guided endonucleases that will enhance genome engineering. The present invention provides such compositions and related methods.

Disclosure of Invention

The invention is a nucleic acid comprising a Cpf1crRNA having an extended sequence. In one aspect, the nucleic acid comprises a Cpf1crRNA and an extension sequence at the 5' end of the Cpf1crRNA, wherein the extension sequence comprises less than about 60 nucleotides. In another aspect, the nucleic acid comprises a Cpf1crRNA, a processing sequence 5 'to the Cpf1crRNA, and an extension 5' to the processing sequence. Also provided are compositions comprising a nucleic acid, a vector, and optionally a Cpf1 protein or a vector encoding the protein.

The invention also provides methods for genetically modifying eukaryotic target cells. The method involves contacting a eukaryotic target cell with a nucleic acid comprising a Cpf1crRNA as described herein.

These and other aspects of the invention are described in more detail in the following sections.

Drawings

FIG. 1A is a graph comparing the delivery of unmodified crRNA complexed to Cpf1 using cationic liposomes (lipofectamine) or electroporation (nuclear transfection).

FIGS. 1B and 1C are graphs comparing cationic liposome-mediated delivery and NHEJ production of Cpf1-crRNA complexes comprising unmodified crRNA (41 nucleotides in length) and extended crRNA.

Fig. 1D is a schematic diagram of the structures of unmodified crRNA and extended crRNA of 41 nucleotides (nt). Arrows represent Cpf1 cleavage sites.

Fig. 1E is a graph showing NHEJ efficiency for the crRNA construct shown in fig. 1D.

Fig. 1F is a graph depicting cellular delivery of Cpf1RNP using cationic liposomes and crRNA of different lengths labeled with fluorescent dyes.

Fig. 2 provides a graph showing gene editing efficiency of GFP knockdown according to 5' extended crRNA delivered to GFP-HEK cells via electroporation (right panel), and a schematic of crRNA (left panel).

Fig. 3A is a schematic of an in vivo study in AI9 mice.

Figure 3B is a schematic of the gastrocnemius injection site and imaging segment.

Fig. 4A is a graph of HDR frequency for crRNA with different extensions delivered with donor DNA using electroporation.

Fig. 4B is a graph of GFP-cell percentages with different extended crrnas delivered with donor DNA, showing NHEJ efficiency using electroporation.

Fig. 4C is a graph of BFP-cell percentages with different extended crrnas delivered with donor DNA, showing NHEJ efficiency using electroporation.

Fig. 4D is a graph of GFP-cell percentage of extended crRNA delivered with and without single-stranded dna (ssdna) that does not have homology to the target sequence using electroporation.

Fig. 4E is a graph showing gene editing as a percentage of GFP-cells on the use of electroporated crRNA extended with 100nt RNA and 9nt RNA that do not have homology to the target sequence.

Fig. 4F is a graph of GFP-cell percentages using electroporated crRNA with and without 4nt extension and further modified with chemical moieties.

Fig. 5A and 5B provide schematic diagrams of conjugation of crRNA and donor DNA.

Fig. 5C is an image of a gel electrophoretic separation showing the release of donor DNA and crRNA from the conjugated crRNA/DNA molecule after reduction with thiol.

FIG. 6 is a graph demonstrating that Cpf1 conjugated to HD-RNA induces NHEJ in GFP-HEK cells following transfection with PASP (DET) (i.e., cationic polymer).

Figure 7 is a graph demonstrating that Cpf1 conjugated with HD-RNA induced HDR in GFP-HEK cells after transfection with pasp (det) (i.e. cationic polymer).

FIG. 8 provides the sequence of the Cpf1 protein.

Fig. 9 provides an example of a Cpf1 processing sequence.

Fig. 10A shows a schematic of crRNA conjugated to donor DNA.

Fig. 10B shows the sequence used in crRNA conjugated to donor DNA.

Fig. 10C is a graph of the percentage of RFP + cells after treatment with various crrnas and Cpf1 in primary Ai9 myoblasts using electroporation.

FIG. 10D is a graph of the percentage of RFP cells transfected into primary Ai9 myoblasts with 100nt DNA or RNA.

Fig. 10E is a graph of NHEJ efficiency in HepG2 cells transfected with Cpf1RNP with or without 9nt extension of crRNA targeting Serpina1 gene using electroporation.

Fig. 11A shows RNA structures that can be used in crRNA extension.

Figure 11B shows trinucleotide repeats that can be used to provide various RNA structures.

Fig. 11C shows the intersection of hybridizing extension sequences of crRNA in a kissing loop (kissing loop), which can be used to form crRNA multimers.

FIG. 11D shows the intersection of hybridizing stretches of crRNA to form trimers (panel (i)) or octamers (panel (ii)).

FIG. 12 shows the editing efficiency (% BPF-) of various Cpf1 crRNAs in BFP-expressing HEK293T cells. MS is a 2'-OMe 3' -phosphorothioate modification on the first three nucleotides from the 5 'terminus, +9du is a 2' -deoxy modification on the 9 th nucleotide from the 5 'terminus, +9S is a phosphorothioate modification on the first 9 nucleotides from the 5' terminus. BFP knockout efficiency was measured by flow cytometry 7 days after electroporation. Mean ± s.e, n ═ 3. All extended crrnas showed statistically significant differences from unmodified crrnas by the student's t test, with p-values less than 0.05.

Fig. 13A is a diagram of unmodified Cas9 sgRNA and Cpf1 crRNA; and

fig. 13B and 13C are graphs showing the relative activity of Cas9 sgRNA and Cpf1crRNA according to GFP knockdown.

Fig. 14 is a schematic of an extended crRNA modified with biotin and avidin and linked to a targeting molecule comprising biotin.

Fig. 15A is a schematic of chemical modifications performed on crRNA with extensions.

FIG. 15B is a graph quantifying the amount of crRNA remaining after incubation in serum.

Fig. 15C is a graph of the percentage of GFP-cells after delivery of crRNA with 9nt extension and chemical modification using cationic liposomes.

Fig. 15D is a graph comparing the percentage of GFP-cells after delivery of extended crRNA with chemical modification and unmodified extended crRNA along with Cpf1 using electroporation.

FIG. 16 is a graph comparing cationic polymer-mediated delivery and NHEJ production of Cpf1-crRNA complexes comprising unmodified crRNA (41nt), 9 base pair extended crRNA (50 nt total), or 59 base pair extended crRNA (100nt total).

Detailed Description

The present invention provides modified guide nucleic acids, referred to as "crrnas", for Cpf1 that have enhanced properties compared to conventional crRNA molecules. As used herein, crRNA refers to a nucleic acid sequence (e.g., RNA) that binds to RNA-guided endonuclease Cpf1 and targets the RNA-guided endonuclease to a specific location within the target nucleic acid to be cleaved by Cpf 1. Cpf1 is an RNA-guided endonuclease of class II CRISPR/Cas system, which is involved in type V adaptive immunity. Cpf1 does not require tracrRNA molecules as do other CRISPR enzymes, and only a single crRNA molecule is required to function. Cpf1 prefers the "TTN" PAM motif, which is located 5' upstream of its target. In addition, the cleavage site for Cpf1 was staggered by about 3-5 bases, which resulted in a "sticky end" (Kim et al, 2016. "Genome-wide analysis sources specificities of Cpf1endonucleases in human cells", published on line at 6.6.2016). These sticky ends with 3-5bp overhangs are thought to facilitate NHEJ mediated ligation and improve gene editing of DNA fragments with matching ends.

One skilled in the art will appreciate that the Cpf1crRNA may be from any species or any synthetic or naturally occurring variant or ortholog derived or isolated from any source. That is, the Cpf1crRNA may have the required elements (e.g., sequence or structure) of the crRNA that recognize (are bound by) any Cpf1 polypeptide or ortholog from any bacterial species, or synthetic variants thereof. Examples of Cpf1crRNA sequences are provided in fig. 9; thus, for example, examples of Cpf1crRNA include a Cpf comprising SEQ ID NO: 21-39). An example of a Cpf1crRNA sequence of a synthetic variant Cpf1 is a crRNA corresponding to the MAD7Cpf1 ortholog by scripta, inc. (CO, USA). Another example of a Cpf1 variant is Cpf1 modified to reduce or eliminate rnase activity, for example by introducing modifications at corresponding positions of H800A, K809A, K860A, F864A and R790A, or a different Cpf1 ortholog, of the genus aminoacidococcus (acetminococcus) Cpf1 (ascipf 1) (e.g. H800A mutation or H → a mutation at a corresponding position).

Typically, the crRNA comprises a targeting domain and a stem-loop domain located 5' to the targeting domain. The overall length of the crRNA is not particularly limited as long as it can direct Cpf1 to a specific location within the target nucleic acid. The stem-loop domain is generally about 19-22 nucleotides (nt) in length, while the targeting/directing domain is anywhere from about 14-25nt (e.g., at least about 14nt, 15nt, 16nt, 17nt, or 18 nt). In some embodiments, the overall length of the Cpf1crRNA may have a length of 20 to 100nt, 20nt to 90nt, 20nt to 80nt, 20nt to 70nt, 20nt to 60nt, 20nt to 55nt, 20nt to 50nt, 20nt to 45nt, 20nt to 40nt, 20nt to 35nt, 20nt to 30nt, or 20nt to 25 nt.

One aspect of the present disclosure provides a nucleic acid comprising a Cpf1crRNA, an extended sequence of crRNA5', and optionally a processing sequence, which may be located between the crRNA and the extended sequence, within the extended sequence, or 5' to the extended sequence.

Extension sequences

The nucleic acids of the invention comprise an extension sequence located 5' to the crRNA. The extension sequence may comprise any combination of nucleic acids (i.e., any sequence). In one embodiment, the extension sequence increases the overall negative charge density of the nucleic acid molecule and improves delivery of nucleic acids, including crRNA.

In some embodiments, the extension sequence may be cleaved once in the cell. Without wishing to be bound by any particular theory or mechanism of action, it is believed that Cpf1 may cleave the extension sequence. However, in certain applications, it is desirable to use constructs in which the extension sequence is not cleaved from the Cpf1 crRNA. Thus, in some embodiments, the extension sequence is not cleavable by the Cpf1 crRNA. For example, the extension sequence, or some portion or region thereof, may comprise one or more modified internucleotide linkages (modified "backbones") that are resistant (e.g., nuclease resistant) to cleavage by Cpf1 crRNA. Examples of modified internucleotide linkages include, but are not limited to: phosphorothioate, phosphorodithioate, methylphosphonate, phosphoramidate, 2' -O-methyl, 2' -O-methoxyethyl, 2' -fluoro, Bridged Nucleic Acid (BNA), or phosphotriester modified linkages, and combinations thereof. The extension sequence, or portions thereof, may also comprise synthetic nucleotides, such as Xenogenic Nucleic Acids (XNA) that are nuclease resistant. XNA is a nucleic acid in which the ribofuranosyl ring of DNA or RNA is replaced by a five or six membered modified ribose molecule, such as 1,5 anhydrohexitol nucleic acid (HNA), cyclohexenyl nucleic acid (CeNA) and 2', 4' -C- (N-methylaminomethylene) Bridged Nucleic Acid (BNA), 2' -O, 4' -C-methylene- β -D-ribonucleic acid or Locked Nucleic Acid (LNA), ANA (arabinonucleic acid), 2' -fluoro-arabinoribonucleic acid (FANA) and α -L-Threofuranyl Nucleic Acid (TNA). In addition, any combination thereof may also be used.

The length of the extension sequence is not particularly limited as long as the extension sequence increases the overall negative charge density. For example, the extension sequence can have a length of at least about 2 nucleotides (nt) up to about 1000nt (e.g., at least about 2,3, 4,5, 6, 7, 8, 9, 10, 15, 20, 25, 30, 35, 40, 45, 50, 55, 60, 65, 70, 75, 80, 85, 90, 95, 100, 200, 300, 400, 500, 600, 700, 800, or 900, and up to about 1000 nt). In one aspect, the extended sequence is no more than about 100 nucleotides in length, such as no more than about 80 nucleotides in length, no more than about 60 nucleotides in length, or no more than about 40 nucleotides in length (e.g., no more than about 30 nucleotides in length or no more than about 20 nucleotides in length). Any of the foregoing lower and upper limits on length may be expressed as a range. Shorter sequences (e.g., no more than about 15 nucleotides, or no more than about 10 nucleotides) may also be used. In some embodiments, the extended sequence comprises at least about 2 nucleotides, such as at least about 4 nucleotides, at least about 6 nucleotides, or even at least about 9 nucleotides. Any of the foregoing may be expressed as a range. Thus, for example, an extended sequence can be about 2-60 nucleotides (e.g., about 2-40 nucleotides, about 2-30 nucleotides, about 2-20 nucleotides, about 2-15 nucleotides, or about 2-10 nucleotides), about 4-60 nucleotides (e.g., about 4-40 nucleotides, about 4-30 nucleotides, about 4-20 nucleotides, about 4-15 nucleotides, or about 4-10 nucleotides); about 6-60 nucleotides (e.g., about 6-40 nucleotides, about 6-30 nucleotides, about 6-20 nucleotides, about 6-15 nucleotides, or about 6-10 nucleotides); or about 9-60 nucleotides (e.g., about 9-40 nucleotides, about 9-30 nucleotides, about 9-20 nucleotides, about 9-15 nucleotides, or about 9-10 nucleotides).

In some embodiments, the extension sequence does not have a function other than conferring greater overall negative charge density to the nucleic acid construct. In this embodiment, for example, the extension sequence is a random or non-coding sequence. In some cases, such as when a processing sequence is used, the sequence may be degraded upon cleavage of the processing sequence and released from the nucleic acid construct.

In other embodiments, the extension sequence has a function separate and apart from conferring a greater overall negative charge density to the nucleic acid construct. The extension sequence may have any additional function. For example, the extension sequence may provide a hybridization site for another nucleic acid, e.g., a donor nucleic acid. In addition, in some embodiments, the extension sequence may be an aptamer and/or promote cell binding. However, it is sometimes undesirable to recruit proteins other than RNA-guided endonucleases to bind to the guide RNA. In addition, aptamer sequences typically have complex folding patterns that can be bulky and not compact. Thus, in other embodiments, the extension sequence is not an aptamer sequence.

In some embodiments, the extension sequence may comprise a sequence encoding a protein whose expression is desired in the target cell to be edited. For example, the extension sequence may comprise a sequence encoding an RNA-guided endonuclease, such as an RNA-guided endonuclease that pairs with (i.e., recognizes and is guided by) the crRNA used in the nucleic acid construct. The extension sequence may comprise, for example, the sequence of the mRNA of an RNA-guided endonuclease.

In some embodiments, the extension is self-folding (self-hybridizing) to provide a structured extension. There is no limitation as to the type of structure provided. The extended portion may have a random coil structure; however, in some embodiments, the extension portion has a structure that is more compact than a random coil structure of the same number of nucleotides, which provides a greater negative charge density. By increasing the overall length of the extension, the negative charge of the molecule is increased. The overall negative charge density of the molecule is further increased when a more compact structure is used. The stringency or charge density can be determined by the mobility in gel electrophoresis. More specifically, if gel electrophoresis is performed on two nucleic acids having the same number of nucleotides running together on the same gel, the nucleic acid having higher mobility (moving furthest in the gel) is considered to have a more compact structure.

In another embodiment, the extension sequence comprises at least one semi-stable hairpin structure, pseudoknot structure, G-quadruplex structure, bulge loop structure, inner loop structure, branched loop structure, or a combination thereof. These types of nucleotide structures are known in the art and are schematically illustrated in fig. 11A. It is to be understood that the illustrations are for the purpose of illustrating the general structure only and are not intended to be a detailed illustration of the actual molecular structure. One skilled in the art recognizes that hairpin structures, for example, may have interspersed non-complementary regions that produce "bumps" or other variations in the structure, and other depicted structures may include similar variations. The Structure of a given nucleotide sequence can be determined using available algorithms (e.g., "The mfold Web Server" by Rensselaer biotechnological Institute and The RNA Institute In Albany, College of Arts and sciences, State University of New York; see also M.Zuker, D.H.Mathews & D.H.Turner. Algorithms and Thermodynmics for RNASeconondand Structure research Prediction: A Practical Guide In RNA Biochemistry and Dombio technology, 11-43, J.Barciszewski and B.F.C.NArk, eds., TO ASI Series, Klwerunadhesive, Purdson, 1999, (NL)).

The type of structure provided can be controlled using repetitive trinucleotide motifs (e.g., FIG. 11B). A repetitive trinucleotide motif is a motif of three nucleotides that repeats at least two times (e.g., repeats two or more times, three or more times, four or more times, five or more times, six or more times, seven or more times, eight or more times, or ten or more times) in sequence. Thus, the extended sequence may comprise a repetitive trinucleotide motif. In one embodiment, the extended sequence comprises a repetitive trinucleotide motif of CAA, UUG, AAG, CUU, CCU, CCA, UAA, or a combination thereof, which provides a random coil sequence. In another embodiment, the extension sequence comprises a repetitive trinucleotide motif of CAU, CUA, UUA, AUG, UAG, or a combination thereof, which provides a semi-stable hairpin structure. In another embodiment, the extension sequence comprises a repetitive CNG trinucleotide motif (e.g., CGG, CAG, CUG, CCG), CGA or CGU, or a combination thereof, which provides a stable hairpin structure. In another embodiment, the extension sequence comprises a repetitive trinucleotide motif of AGG, UGG, or a combination thereof, which provides a quadruplex (or G-quadruplex) structure. In yet another embodiment, the extended sequence comprises a combination of the aforementioned trinucleotide motifs and the resulting combination of different structures. For example, the extended sequence can have a region comprising a random coil structure, a region comprising a semi-stable hairpin, a region comprising a stable hairpin, and/or a region comprising a quadruplex. Thus, each region may comprise a repetitive trinucleotide motif associated with the indicated structure. Non-limiting examples of structures are presented in the following table:

TABLE 1 representative RNA extension sequences (35 nucleotides) and their corresponding structures.

Figure GDA0002657894320000091

Extension sequences can also be used to generate crRNA multimers; thus, in another embodiment, a crRNA multimer is provided comprising two or more crRNA molecules (e.g., 3 or more, 4 or more, 5 or more, 6 or more, 7 or more, or even 8 or more crRNA molecules), wherein each crRNA comprises an extension sequence as described herein, and the crRNA molecules of the multimer are linked by their extension sequences, e.g., via base pairing or hybridization. Thus, in one embodiment, each crRNA of the multimer comprises an extended sequence comprising a region sufficiently complementary to an extended region of another crRNA of the multimer to facilitate hybridization. The complementary regions can have any suitable length that facilitates interaction (e.g., 4nt or more, 6nt or more, 8nt or more, 10nt or more, 15nt or more, etc.). The crRNA multimers can be used, for example, to deliver multiple crrnas simultaneously, e.g., when multiple crrnas are required for a particular therapeutic strategy. An example of such a use is exon skipping, where a DNA fragment is cut by two crRNAs to restore a functional reading frame (e.g., Ousterout DG et al (2015), Multiplex CRISPR/Cas9-based genome editing for correction of differences in kinetics. NatCommun.6: 6244). Exon skipping requires two crrnas, each targeting a different site in the nucleus for targeting (one at the 5 'site and the other at the 3' site). Ideally, the ratio of the two crrnas should be 1: 1; however, it is difficult to maintain this ratio. By pairing crrnas (e.g., each comprising a different targeting sequence) in a multimer via appropriate extension structures, delivery at a desired ratio can be facilitated.

In one embodiment, two or more crrnas with a structured extension participate in an RNA "kissing" interaction (also known as loop-loop interaction) that occurs when an unpaired nucleotide in one structured extension sequence (e.g., a hairpin loop) is base paired with an unpaired nucleotide in another structure (e.g., another hairpin loop) on a second crRNA. An example of this type of interaction is shown in fig. 11C. Formation of a coil or other structure multimerizes two or more crRNA molecules. This strategy can be used to link several crRNA molecules.

Hybridization of extended complementary sequences on each crRNA can also be used to promote multimerization. For example, supramolecular crRNA structures can be constructed via extended regions with self-assembly capability. For example, a trimer can be formed from three RNA molecules with appropriately placed hybridization regions (e.g., FIG. 11D, panel (i); Shu D, Shu Y, Haque F, Abdelmawlas, & Guo P (2011) Thermodynamic stage RNAthree-way junctions as a platform for constraining multi-functional nanoparticles for delivery of therapeutics. Natotechnol.6 (10): 658 667)). Similarly, RNA octamers can be generated by assembling sixteen RNA molecules (FIG. 11D, panel (ii); Yu J, Liu Z, Jiang W, Wang G, & Mao C (2014) De novo design of an RNAtile that is self-assembly into a homo-organic nanoprism. nat Commun.6: 5724)).

Any of the foregoing types of extensions may be used with or without a processing sequence. In some embodiments, the nucleic acid may comprise multiple processing sequences and extension sequences. For example, the nucleic acid can further comprise a second processing sequence 5 'to the first extension sequence and a second extension sequence 5' to the second processing sequence. The second processing and extension sequence may be the same as the first processing and extension sequence (e.g., a repeat), or either or both of the second processing sequence and the second extension sequence may be different from the first processing sequence and/or the extension sequence. The nucleic acid is not particularly limited to any number of processing and extension sequences, and may have 2,3, 4,5, etc. processing and/or extension sequences.

The 5 'end of the nucleic acid construct (i.e., the processing sequence or extension sequence at the 5' end, where applicable) may be further modified as desired. For example, the 5' end can be modified with a functional group, such as a functional group that participates in bio-orthogonal or "click" chemistry. For example, the 5' end of a nucleic acid can be chemically modified with an azide, tetrazine, alkyne, strained alkene, or strained alkyne. Such modifications may use appropriately paired functional groups to facilitate attachment of the desired chemical moiety or molecule to the construct.

The 5' end of the nucleic acid may optionally be modified to contain a biologically functional molecule via bio-orthogonal or "click" chemistry as described above. A biofunctional molecule may be any molecule that enhances the delivery or activity of an RNA-guided endonuclease, or that provides some other desired function, such as targeting nucleic acids to a particular destination (e.g., to a portion of a particular protein, cell receptor, tissue, etc.), or facilitating the tracking of a construct (e.g., a detectable label, such as a fluorescent label, a radioactive label, etc.). Examples of biofunctional molecules include, for example, endonucleasesA lysozyme polymer, a donor DNA molecule, an amino sugar (e.g., N-acetylgalactosamine (GalNAc) or tri-GalNAc), a guide and/or tracer RNA (e.g., a single guide RNA), and other peptides, nucleic acids, and targeting ligands (e.g., antibodies, ligands, cell receptors, aptamers, galactose, sugars, small molecules). In one embodiment, the crRNA comprises a biotin or avidin (or streptavidin) molecule conjugated to a crRNA extension, allowing the modified crRNA to bind another molecule (e.g., a targeting molecule or peptide) conjugated to avidin/streptavidin or biotin, as appropriate (see, e.g., fig. 14). In another embodiment, the crRNA extension may be covalently linked to the amino sugar in any suitable manner, e.g., via a linker. As used herein, an "amino sugar" is a sugar molecule in which the hydroxyl groups have been replaced with amine groups (e.g., galactosamine) and/or the nitrogen of which is part of a complex functional group (e.g., N-acetylgalactosamine (GalNAc); tri-N-acetylgalactosamine (triantenna N-acetylgalactosamine)). The amino sugar may be modified to contain an optional spacer. Examples of the amino sugar include N-acetylgalactosamine (GalNAc), trivalent GalNAc, or triantenna N-acetylgalactosamine. One example of an amino sugar group includes the following:

Figure GDA0002657894320000121

wherein the linkers may be any commonly known in the art, and each linker may be the same or different from each other. Typically, the linker is a saturated or unsaturated aliphatic or heteroaliphatic chain. The aliphatic or heteroaliphatic chain typically comprises 1-30 members (e.g., 1-30 carbon, nitrogen, and/or oxygen atoms) and may be substituted with one or more functional groups (e.g., one or more ketone, ether, ester, amide, alcohol, amine, urea, thiourea, sulfoxide, sulfone, sulfonamide, and/or disulfide groups). In some cases, shorter aliphatic or heteroaliphatic chains are used (e.g., about 1-15 members, about 1-10 members, about 1-5 members, about 3-15 members, about 3-10 members, about 5-15 members, or about 5-10 members in the chain). In other instances, longer aliphatic or heteroaliphatic chains are used (e.g., about 5-30 members, about 5-25 members, about 5-20 members, about 10-30 members, about 10-25 members, about 10-20 members, about 15-30 members, about 15-25 members, or about 15-20 members in the chain). Examples of spacers include substituted and unsubstituted alkyl, alkenyl, and polyethylene glycol (e.g., PEG 1-10 or PEG 1-5), or combinations thereof. More specific examples provided for illustration are as follows:

Figure GDA0002657894320000122

prior to conjugation to the linker, the amino sugar may comprise a functional group (e.g., azide, tetrazine, alkyne, strained alkene, or strained alkyne) that allows conjugation to an appropriately paired functional group (e.g., at the 5' terminus) attached to the crRNA extension. Thus, for example, prior to conjugation to the extended crRNA, the amino sugar may comprise:

Figure GDA0002657894320000131

wherein A is2Including azides, tetrazines, alkynes, strained alkenes, or strained alkynes, as described herein. More specific examples are as follows:

wherein A is2Comprising an azide, tetrazine, alkyne, strained alkene, or strained alkyne, as described herein, for example:

Figure GDA0002657894320000141

processing sequence

In some embodiments, the crRNA comprises a processing sequence. The processing sequence is a nucleic acid sequence that self-cleaves in vitro or in vivo via Cpf1 without a targeting sequence. Without wishing to be bound by any particular theory or mechanism of action, it is believed that when present, the processing sequence is cleaved upon entry into the cell and the crRNA is released from any extended sequence. The processing sequence may be located between the crRNA and the extension sequence. In this configuration, the crRNA is released from the extended sequence of the nucleic acid construct provided herein after cleavage of the processing sequence.

The processing sequence may also be located within the extension sequence, 5' to the extension sequence, or may serve as the extension sequence. In addition, multiple processing sequences may be used. For example, the second processing sequence may serve as an extension sequence, alone or in conjunction with an additional nucleotide sequence. However, the extension sequence is generally different from the processing sequence (if present). Furthermore, in one embodiment, the extension sequence does not comprise the processing sequence and/or any other complete (complete) crRNA sequence.

In some embodiments, the processing sequence is located immediately 5' of the crRNA (i.e., directly attached to the crRNA sequence). In other embodiments, a spacer sequence may be present between the crRNA and the processing sequence. The spacer sequence may be of any length (e.g., 1,2, 3,4, 5, 6, 7, 8, 9, or 10nt) provided that it does not prevent Cpf1 cleavage of the processing sequence or the function of crRNA released after cleavage.

In one embodiment, the processing sequence comprises a fragment of a direct repeat of the Cpf1 array. Cpf1 arrays (sometimes also referred to as pre-crRNA) are naturally occurring arrays that contain direct repeats and a spacer sequence between each direct repeat. The direct repeat portion of the array includes two portions: a crRNA sequence portion and a processing portion. Within a given direct repeat, the processing portion is located 5 'to the portion of the crRNA sequence, often immediately 5' to the processing portion. According to this embodiment, the processing sequence of the nucleic acids provided herein comprises at least one fragment of the processing portion sufficient to effect direct repetition of Cpf1 cleavage. For example, the processing sequence may comprise a fragment of at least 5 contiguous nucleotides of the processed portion of the direct repeat sequence, e.g., at least 6, 7, 8, 9, 10, 11, 12, 13, 14, 15, 16, or 17nt of the processed portion of the direct repeat sequence (or the entire processed portion of the direct repeat sequence), the length of which depends on the species from which the direct repeat is derived. In some embodiments, the processing sequence comprises the entire processing portion of the direct repeat sequence. The direct repeats may be from a Cpf1 array of any microorganism. An example of a direct repeat sequence, and a processed portion of the direct repeat sequence, is provided in fig. 9. The processing sequence of the nucleic acid of the invention may comprise a fragment or the entire sequence of any of the processing sequences in FIG. 9 (e.g., SEQ ID NOS: 2-20).

Donor nucleic acid

The nucleic acid constructs provided herein can further comprise a donor nucleic acid (also referred to as a donor polynucleotide). A donor polynucleotide is a nucleic acid inserted at a cleavage site induced by an RNA-guided endonuclease (e.g., Cpf 1). The nucleic acid of the donor polynucleotide can be any type of nucleic acid known in the art. For example, the nucleic acid may be DNA, RNA, DNA/RNA hybrids, artificial nucleic acids, or any combination thereof. In one embodiment, the nucleic acid of the donor polynucleotide is DNA, also referred to herein as "donor DNA".

The donor polynucleotide is typically single-stranded and serves as a template for the generation of double-stranded DNA containing the desired sequence. The donor polynucleotide contains sufficient identity (e.g., 85%, 90%, 95%, or 100% sequence identity) with the genomic sequence flanking the cleavage site to a region of the genomic sequence proximal to the cleavage site (within about 50 bases or less, within about 30 bases or less, within about 15 bases or less, or within about 10 bases or less, within about 5 bases or less, or immediately adjacent to the cleavage site) to support homology-directed duplication between the donor sequence and the genomic sequence flanking the cleavage site to which the donor sequence has sufficient sequence identity. The donor polynucleotide sequence can be of any length, but must have a sufficient number of nucleotides with sequence identity on both sides of the cleavage site to facilitate HDR. These regions of the donor polynucleotide are called homology arms. The homology arms can have the same number of bases or a different number of bases, and each is typically at least 5 nucleotides in length (e.g., 10 nucleotides or more, 15 nucleotides or more, 20 nucleotides or more, 50 nucleotides or more, 100 nucleotides or more, 150 nucleotides or more, or even 200 nucleotides or more). The donor polynucleotide also contains a central region flanked by homology arms, which contain mutations or other DNA sequences of interest. Thus, the overall length of the donor polynucleotide is typically greater than the total length of the two homology arms (e.g., about 15 nucleotides or more, about 20 nucleotides or more, 50 nucleotides or more, 100 nucleotides or more, 150 nucleotides or more, or even 200 nucleotides or more, 250 nucleotides or more, 500 nucleotides or more, 1000 nucleotides or more, 5000 nucleotides or more).

The donor polynucleotide sequence is typically different from the target genomic sequence. Rather, the donor polynucleotide sequence may contain one or more single base alterations, insertions, deletions, inversions or rearrangements with respect to the genomic sequence, as long as the homology arms have sufficient sequence identity to support HDR. The donor polynucleotide sequence may further comprise a sequence that facilitates detection of successful insertion of the donor polynucleotide.

The ends of the donor polynucleotide can be protected (e.g., from exonucleolytic degradation) by methods known to those skilled in the art. For example, one or more dideoxynucleotide residues are added to the 3' end of a linear molecule, and/or self-complementary oligonucleotides are ligated to one or both ends. Additional methods for protecting exogenous polynucleotides from degradation include, but are not limited to, the addition of terminal amino groups and the use of modified internucleotide linkages, such as phosphorothioate, phosphoramidate, and O-methyl ribose or deoxyribose residues.

In some embodiments, the donor polynucleotide (e.g., donor DNA) is covalently linked to the 5' end of the Cpf1crRNA, the 5' end of the processing sequence, or the 5' end of the extension sequence. In a preferred embodiment, the donor polynucleotide is linked to the 5' end of the extension sequence. In some embodiments, the linkage between the donor DNA and the nucleic acid is reversible (e.g., disulfide bonds).

In some embodiments, the donor DNA is covalently linked to the nucleic acid construct. For example, the donor polynucleotide can be linked to a processing sequence and serve as an extension sequence located 5' to the processing sequence. In another embodiment, the donor polynucleotide may be linked 5' to the extension sequence.

The nucleic acid and donor DNA may be linked or conjugated by any method known in the art. In some embodiments, the 3 'end of the donor DNA and the 5' end of the nucleic acid are modified to facilitate bonding. For example, the 5' end of the nucleic acid may be activated by thiopyridine, while the donor DNA may be thiol-terminated, allowing disulfide bond formation between the two molecules. In some embodiments, bridge DNA complementary to both the nucleic acid and the donor DNA allows the two molecules to hybridize and come into proximity to facilitate the reaction. FIGS. 5A-5C provide non-limiting examples of synthesis of donor DNA conjugated to nucleic acids.

In other embodiments, the nucleic acid and donor DNA may be conjugated via functional groups, such as functional groups that participate in bio-orthogonal or "click" chemical reactions. For example, the 5 'terminus of a nucleic acid can be chemically modified with a functional group such as azide, tetrazine, alkyne, strained alkene, or strained alkyne, and the 3' terminus of donor DNA can be chemically modified with an appropriately paired functional group. For example, if the nucleic acid contains an azide, the azide will react with the alkynyl group of the donor DNA via an azide-alkyne cycloaddition reaction (copper catalyzed) or will react with the strained alkynyl group of the donor DNA via an azide-strained alkyne cycloaddition reaction (without the need for a catalyst). Likewise, if the nucleic acid comprises a tetrazine, it will react with a strained olefin via a tetrazine/olefin cycloaddition reaction. Similarly, the opposite configuration can be used, for example, if the nucleic acid comprises an alkyne, a strained alkyne, or a strained alkene, it will react with the azide or tetrazine group of the donor DNA by the same cycloaddition reaction.

In some embodiments, the nucleic acid and donor DNA are conjugated via a linker. For example, the nucleic acid and donor DNA may be conjugated by a self-immolative linker. As used herein, a "suicide linker" is a linker that is hydrolyzed under certain conditions (e.g., a certain pH value) that allow the donor DNA to be released from the nucleic acid.

The linker of the nucleic acid donor DNA conjugate encompasses any linker known in the art that is capable of covalently linking the donor DNA to the nucleic acid. Linkers can be attached to the donor DNA and nucleic acid at either terminus. However, in some embodiments, the linker is attached to the 5' end of the nucleic acid (e.g., the 5' end of the crRNA, processing sequence, or extension sequence) and the 3' end of the donor DNA. The linker may be attached to the nucleic acid and the donor DNA by any method known in the art, such as those described herein with respect to conjugation of the donor DNA to the nucleic acid.

In another embodiment, the donor polynucleotide may hybridize to the extension sequence and/or the processing sequence. Thus, for example, the extension sequence may comprise a sequence sufficiently complementary to the donor polynucleotide to facilitate hybridization.

When the donor nucleic acid is covalently or non-covalently linked to the extension sequence, it is sometimes desirable that the donor nucleic acid is linked to the extension sequence or a portion thereof that is not cleaved by the Cpf1crRNA, such that the donor nucleic acid binds tightly to the crRNA when the target gene is edited by Cpf 1. It is believed that in some cases improved gene editing may be achieved by such constructs. As described above, an extension sequence that is not cleaved by Cpf1 includes, for example, an extension sequence comprising one or more modified internucleotide linkages or synthetic nucleotides.

Composition and carrier

The invention also includes compositions comprising any of the nucleic acid molecules and vectors described herein. Any suitable vector for nucleic acid delivery may be used. In some embodiments, the vector may comprise a molecule capable of interacting with any of the nucleic acids described herein and facilitating entry of the nucleic acid into the cell.

In some embodiments, the carrier comprises a cationic lipid. Cationic lipids are amphiphilic molecules with a positively charged polar head group linked via an anchor to a non-polar hydrophobic domain typically comprising two alkyl chains. In some embodiments, the cationic lipid forms a liposome (e.g., a lipid vesicle) surrounding the nucleic acid construct and optionally the Cpf1 protein. Thus, in a related aspect, there is provided a liposome comprising a nucleic acid construct and optionally a Cpf1 protein.

In yet another embodiment, the carrier comprises a cationic polymer. Examples of cationic polymers of the compositions of the invention include Polyethyleneimine (PEI), poly (arginine), poly (lysine), poly (histidine), poly- [2- { (2-aminoethyl) amino } -ethyl-asparagine ] (pAsp (DET)), block copolymers of polyethylene glycol (PEG) and polyarginine, block copolymers of PEG and polylysine, block copolymers of PEG and poly { N- [ N- (2-aminoethyl) -2-aminoethyl ] asparagine } (PEG-pAsp [ DET ], ({2, 2-bis [ (9Z,12Z) -octadeca-9, 12-dien-1-yl ] -1, 3-dioxan-5-yl } methyl) dimethylamine, (3aR,5s,6aS) -N, N-dimethyl-2, 2-bis ((9Z,12Z) -octadec-9, 12-dien-1-yl) tetrahydro-3 aH-cyclopenta [ d ] [1,3] dioxol-5-amine, (3aR,5R,6aS) -N, N-dimethyl-2, 2-bis ((9Z,12Z) -octadec-9, 12-dien-1-yl) tetrahydro-3 aH-cyclopenta [ d ] [1,3] dioxol-5-amine, (3aR,5R,7aS) -N, N-dimethyl-2, 2-bis ((9Z,12Z) -octadec-9, 12-dien-1-yl) hexahydrobenzo [ d ] [1 ], 3] dioxol-5-amine, (3aS,5R,7aR) -N, N-dimethyl-2, 2-bis ((9Z,12Z) -octadeca-9, 12-dien-1-yl) hexahydrobenzo [ d ] [1,3] dioxol-5-amine, (2- {2, 2-bis [ (9Z,12Z) -octadeca-9, 12-dien-1-yl ] -1, 3-dioxan-4-yl } ethyl) dimethylamine, (3aR,6aS) -5-methyl-2- ((6Z,9Z) -octadeca-6, 9-dien-1-yl) -2- ((9Z,12Z) -octadeca-9, 12-dien-1-yl) tetrahydro-3 aH- [1,3] dioxolo [4,5-c ] pyrrole, (3aS,7aR) -5-methyl-2, 2-bis ((9Z,12Z) -octadeca-9, 12-dien-1-yl) hexahydro- [1,3] dioxolo [4,5-c ] pyridine, (3aR,8aS) -6-methyl-2, 2-bis ((9Z,12Z) -octadeca-9, 12-dien-1-yl) hexahydro-3 aH- [1,3] dioxolo [4,5-d ] azepine, (6Z,9Z,28Z,31Z) -thirty-seven-carbon-6, 9,28, 31-tetraen-19-yl 2- (dimethylamino) acetate, (6Z,9Z,28Z,31Z) -thirty-seven-carbon-6, 9,28, 31-tetraen-19-yl 3- (dimethylamino) propionate, [6Z,9Z,28Z,31Z) -thirty-seven-carbon-6, 9,28, 31-tetraen-19-yl-4- (dimethylamino) butyrate ], (6Z,9Z,28Z,31Z) -thirty-seven-carbon-6, 9,28, 31-tetraen-19-yl 5- (dimethylamino) valerate, (6Z,9Z,28Z,31Z) -thirty-seven-carbon-6, 9,28, 31-tetraen-19-yl 6- (dimethylamino) hexanoate, and mixtures thereof, (3- {2, 2-bis [ (9Z,12Z) -octadeca-9, 12-dien-1-yl ] -1, 3-dioxan-4-yl } propyl) dimethylamine, 1- ((3aR,5r,6aS) -2, 2-bis ((9Z,12Z) -octadeca-9, 12-dien-1-yl) tetrahydro-3 aH cyclopenta [ d ] [1,3] dioxol-5-yl) -N, N-dimethylmethylamine, 1- ((3aR,5s,6aS) -2, 2-bis ((9Z,12Z) -octadeca-9, 12-dien-1-yl) tetrahydro-3 aH cyclopenta [ d ] [1,3] dioxol-5-yl) -N, N-dimethylmethylamine, 8-methyl-2, 2-bis ((9Z,12Z) -octadeca-9, 12-dien-1-yl) -1, 3-dioxa-8-azaspiro [4.5] decane, 2- (2, 2-bis ((9Z,12Z) -octadeca-9, 12-dien-1-yl) -1, 3-dioxolan-4-yl) -N-methyl-N- (pyridin-3-ylmethyl) ethylamine, 1, 3-bis (9Z,12Z) -octadeca-9, 12-dien-1-yl 2- [2- (dimethylamino) ethyl ] malonate, N, N-dimethyl-1- ((3aR,5R,7aS) -2- ((8Z,11Z) -octadeca-8, 11-dien-1-yl) -2- ((9Z,12Z) -octadeca-9, 12-dien-1-yl) hexahydrobenzo [ d ] [1,3] dioxol-5-yl) methylamine, N-dimethyl-1- ((3aR,5S,7aS) -2- ((8Z,11Z) -octadeca-8, 11-dien-1-yl) -2- ((9Z,12Z) -octadeca-9, 12-dien-1-yl) hexahydrobenzo [ d ] [1,3] dioxol-5-yl-methylamine, (1S,3R,4S) -N, N-dimethyl-3, 4-bis ((9Z,12Z) -octadeca-9, 12-dien-1-yloxy) cyclopentylamine, 2- (4, 5-bis ((8Z,11Z) -heptadeca-8, 11-dien-1-yl) -2-methyl-1, 3-dioxolan-2-yl) -N, N-dimethylethylamine, 2, 3-bis ((8Z,11Z) -heptadeca-8, 11-dien-1-yl) -N, N-dimethyl-1, 4-dioxaspiro [4.5] decan-8-amine, (6Z,9Z,28Z,31Z) -trioctadeca-6, 9,28, 31-tetraen-19-yl 4- (diethylamino) butyrate, (6Z,9Z,28Z,31Z) -trioctadeca-6, 9,28, 31-tetraen-19-yl 4- [ bis (prop-2-yl) amino ] butyrate, N- (4-N, N-dimethylamino) butyryl- (6Z,9Z,28Z,31Z) -trioctadeca-6, 9,28, 31-tetraen-19-amine, (2- {2, 2-bis [ (9Z,12Z) -octadec-9, 12-dien-1-yl ] -1, 3-dioxan-5-yl } ethyl) dimethylamine, (4- {2, 2-bis [ (9Z,12Z) -octadec-9, 12-dien-1-yl ] -1, 3-dioxan-5-yl } butyl) dimethylamine, (6Z,9Z,28Z,31Z) -thirty-seven-carbon-6, 9,28, 31-tetraen-19-yl (2- (dimethylamino) ethyl) carbamate, 2- (dimethylamino) ethyl (6Z,9Z,28Z,31Z) -thirty-seven-carbon-6, 9,28, 31-tetraen-19-yl carbamate, (6Z,9Z,28Z,31Z) -thirty-seven-carbon-6, 9,28, 31-tetraen-19-yl 3- (ethylamino) propionate, (6Z,9Z,28Z,31Z) -thirty-seven-carbon-6, 9,28, 31-tetraen-19-yl 4- (prop-2-ylamino) butyrate, N1, N1, N2-trimethyl-N2- ((11Z,14Z) -2- ((9Z,12Z) -octadeca-9, 12-dien-1-yl) eicosa-11, 14-dien-1-yl) ethane-1, 2-diamine, 3- (dimethylamino) -N- ((11Z,14Z) -2- ((9Z,12Z) -octadeca-9, 12-dien-1-yl) eicosa-11, 14-dien-1-yl) propionamide, (6Z,9Z,28Z,31Z) -thirty-seven-carbon-6, 9,28, 31-tetraen-19-yl 4- (methylamino) butyrate), dimethyl ({4- [ (9Z,12Z) -octadeca-9, 12-dien-1-yloxy ] -3- { [ (9Z,12Z) -octadeca-9, 12-dien-1-yloxy ] methyl } butyl } amine, 2, 3-bis ((8Z,11Z) -heptadec-8, 11-dien-1-yl) -8-methyl-1, 4-dioxa-8-azaspiro [4.5] decane, 3- (dimethylamino) propyl (6Z,9Z,28Z,31Z) -thirty-seven-carbon-6, 9,28, 31-tetraen-19-yl carbamate, 2- (dimethylamino) ethyl ((11Z,14Z) -2- ((9Z,12Z) -octadeca-9, 12-dien-1-yl) eicosa-11, 14-dien-1-yl) carbamate, 1- ((3aR,4R,6aR) -6-methoxy-2, 2-bis ((9Z,12Z) -octadeca-9, 12-dien-1-yl) tetrahydrofurane [3,4-d ] [1,3] dioxol-4-yl) -N, N-dimethylmethylamine, (6Z,9Z,28Z,31Z) -triheptadec-6, 9,28, 31-tetraen-19-yl 4- [ ethyl (methyl) amino ] butyrate, (6Z,9Z,28Z,31Z) -triheptadec-6, 9,28, 31-tetraen-19-yl 4-aminobutyrate, 3- (dimethylamino) propyl ((11Z,14Z) -2- ((9Z,12Z) -octadeca-9, 12-dien-1-yl) eicosa-11, 14-dien-1-yl) carbamate, 1- ((3aR,4R,6aS) -2, 2-bis ((9Z,12Z) -octadec-9, 12-dien-1-yl) tetrahydrofuro [3,4-d ] [1,3] dioxol-4-yl) -N, N-dimethylmethylamine, (3aR,5R,7aR) -N, N-dimethyl-2, 2-bis ((9Z,12Z) -octadec-9, 12-dien-1-yl) hexahydrobenzo [ d ] [1,3] dioxol-5-amine, (11Z,14Z) -N, N-dimethyl-2- ((9Z,12Z) -octadec-9, 12-dien-1-yl) eicosa-11, 14-dien-1-amine, (3aS,4S,5R,7R,7aR) -N, N-dimethyl-2- ((7Z,10Z) -octadeca-7, 10-dien-1-yl) -2- ((9Z,12Z) -octadeca-9, 12-dien-1-yl) hexahydro-4, 7-methylenebenzo [ d ] [1,3] dioxol-5-amine, N-dimethyl-3, 4-bis ((9Z,12Z) -octadeca-9, 12-dien-1-yloxy) butan-1-amine, and 3- (4, 5-bis ((8Z,11Z) -heptadeca-8, 11-dien-1-yl) -1, 3-dioxolan-2-yl) -N, N-dimethylpropan-1-amine. Any combination of the foregoing polymers may also be used.

In other embodiments, the carrier comprises polymeric nanoparticles. For example, the compositions of the present invention may be applied as nanoparticles as described in International patent application No. PCT/US2016/052690, the entire disclosure of which is expressly incorporated by reference.

Cpf1 polypeptides or nucleic acids encoding same

In some embodiments, including the liposomal embodiments described above, the composition further comprises a Cpf1 polypeptide or a nucleic acid encoding the same. Any Cpf1 polypeptide may be used in the compositions of the invention, although the Cpf1 chosen should be appropriately selected to act in combination with the crRNA of the nucleic acid construct in the composition to cleave the target nucleic acid and/or cleave the processing sequence of the nucleic acid construct, as applicable. The Cpf1 of the composition may be a naturally occurring Cpf1 or a variant or mutant Cpf1 polypeptide. In some embodiments, the Cpf1 polypeptide is enzymatically active, e.g., when bound to a guide RNA, the Cpf1 polypeptide cleaves the target nucleic acid. In some embodiments, the Cpf1 polypeptide exhibits reduced enzymatic activity relative to a wild-type Cpf1 polypeptide (e.g., relative to a Cpf1 polypeptide comprising the amino acid sequence set forth in fig. 8 (SEQ id no: 1)), and retains DNA binding activity. Mutations that alter the enzymatic activity of Cpf1 are known in the art.

For example, Cpf1 may be from a bacterium of the genus ammococcus or from the genus Lachnospiraceae (Lachnospiraceae), or from any of the genera or species identified in fig. 9. An example of a Cpf1 protein sequence is provided in fig. 8. In some embodiments, a Cpf1 polypeptide comprises an amino acid sequence that has at least 30%, at least 35%, at least 40%, at least 45%, at least 50%, at least 55%, at least 60%, at least 65%, at least 70%, at least 75%, at least 80%, at least 85%, at least 90%, at least 95%, at least 90%, or 100% amino acid sequence identity to the amino acid sequence set forth in fig. 8. In some embodiments, a Cpf1 polypeptide comprises an amino acid sequence that has at least 30%, at least 35%, at least 40%, at least 45%, at least 50%, at least 55%, at least 60%, at least 65%, at least 70%, at least 75%, at least 80%, at least 85%, at least 90%, at least 95%, at least 90%, or 100% amino acid sequence identity to a contiguous segment of 100 amino acids to 200 amino acids (aa), 200aa to 400aa, 400aa to 600aa, 600aa to 800aa, 800aa to 1000aa, 1000aa to 1100aa, 1100aa to 1200aa, or 1200aa to 1300aa of the amino acid sequence set forth in fig. 8.

In some embodiments, a Cpf1 polypeptide comprises an amino acid sequence that has at least 30%, at least 35%, at least 40%, at least 45%, at least 50%, at least 55%, at least 60%, at least 65%, at least 70%, at least 75%, at least 80%, at least 85%, at least 90%, at least 95%, at least 90%, or 100% amino acid sequence identity to the RuvCI domain of a Cpf1 polypeptide of the amino acid sequence set forth in fig. 8. In some embodiments, a Cpf1 polypeptide comprises an amino acid sequence that has at least 30%, at least 35%, at least 40%, at least 45%, at least 50%, at least 55%, at least 60%, at least 65%, at least 70%, at least 75%, at least 80%, at least 85%, at least 90%, at least 95%, at least 90%, or 100% amino acid sequence identity to the RuvCII domain of a Cpf1 polypeptide of the amino acid sequence set forth in fig. 8. In some embodiments, a Cpf1 polypeptide comprises an amino acid sequence that has at least 30%, at least 35%, at least 40%, at least 45%, at least 50%, at least 55%, at least 60%, at least 65%, at least 70%, at least 75%, at least 80%, at least 85%, at least 90%, at least 95%, at least 90%, or 100% amino acid sequence identity to the RuvCIII domain of a Cpf1 polypeptide of the amino acid sequence set forth in fig. 8.

In some embodiments, the Cpf1 polypeptide is FnCpf1, Lb3Cpf1, BpCpf1, PeCpf1, SsCpf1, AsCpf1, Lb2Cpf1, CMtCpf1, EeCpf1, MbCpf1, LiCpf1, LbCpf1, PcCpf1, PdCpf1, or PmCPf 1; or a Cpf1 polypeptide comprising an amino acid sequence having at least 30%, at least 35%, at least 40%, at least 45%, at least 50%, at least 55%, at least 60%, at least 65%, at least 70%, at least 75%, at least 80%, at least 85%, at least 90%, at least 95%, at least 90%, or 100% amino acid sequence identity thereto.

In some embodiments, the Cpfl polypeptide comprises an amino acid substitution at the amino acid residue corresponding to position 917 of the amino acid sequence set forth in fig. 8 (e.g., a D → a substitution); and/or an amino acid substitution at an amino acid residue corresponding to position 1006 of the amino acid sequence set forth in figure 8 (e.g., an E → a substitution); and/or an amino acid substitution at an amino acid residue corresponding to position 1255 of the amino acid sequence set forth in figure 8 (e.g., a D → a substitution).

The Cpf1 polypeptide may also be rnase inactivated Cpf1, such as Cpf1 comprising a modification at H800A, K809A, K860A, F864A or R790A of amino acid coccus Cpf1 (ascif 1), or at a corresponding position of a different Cpf1 ortholog. Examples of mutated Cpf1 proteins include those disclosed in Zetsche et al, "Multiplex Gene Editing by CRISPR-Cpf1 Through Autonomous Processing of a Single crRNARRay," Nat.Biotechnol.2017, 35(1), 31-34. The Cpf1 polypeptide may also be a dCpf1 base editor (e.g.Cpf 1-cytosine deaminase fusion protein). Examples include proteins such as those disclosed in Li et al, Nature Biotechnology, 36324-327(2018), and Mahfuz et al, Biochem J., 475(11), 1955-1964 (2018). An example of a synthetic variant, Cpf1, is the MAD7Cpf1 ortholog by lnscita, inc. (CO, USA). Additional examples of Cpf1 proteins include any of those Cpf1 proteins disclosed in international patent application No. PCT/US2016/052690, including chimeric or mutant proteins, the entire disclosure of which is expressly incorporated herein by reference.

Other nucleic acids

The composition may further comprise other nucleic acids in addition to crRNA. For example, a composition can comprise a donor polynucleotide, as described herein. Alternatively or additionally, the composition may comprise one or more additional nucleic acids that are not donor polynucleotides (e.g., nucleic acids that do not have significant sequence identity to the target sequence to be edited, or any endogenous nucleic acid sequence of the editing cell (e.g., a level of sequence identity that is insufficient to allow homologous recombination). these additional nucleic acids may be RNA or DNA, such as single-stranded RNA or DNA molecules (or hybrid molecules comprising both RNA and DNA, optionally with synthetic nucleic acid residues) 1000 nucleotides or more, or even 5000 nucleotides or more). However, in most cases, the nucleic acid comprises about 5000 nucleotides or less, such as about 1000 nucleotides or less, or even 500 nucleotides or less (e.g., 200 nucleotides or less).

The composition may further comprise a nucleic acid encoding a particular protein of interest, e.g., an RNA-guided endonuclease (e.g., a Cpf1 polypeptide). The RNA-guided endonuclease may be any as described herein with respect to other aspects of the invention.

Divalent metal ion

In some embodiments, the composition is substantially or completely free of divalent metal ions (e.g., magnesium) that activate the particular Cpf1 protein used in order to reduce or prevent premature cleavage of the processing sequence prior to delivery. The composition is considered substantially free of magnesium at a concentration that does not allow Cpf1 self-processing enzymatic activity. In some embodiments, the composition comprises about 20mM or less NaCl, and is substantially or completely free of magnesium or other divalent ions that activate the Cpf1 protein.

Method for genetically modifying eukaryotic cells

The present invention also provides a method of genetically modifying a eukaryotic target cell, comprising contacting a eukaryotic target cell with any of the nucleic acids or compositions described herein (e.g., an extension sequence comprising Cpf1crRNA, crRNA5', and optionally a processing sequence between the crRNA and the extension sequence) to genetically modify the target nucleic acid. In some embodiments, the Cpf1crRNA of the nucleic acid comprises a targeting sequence (e.g., 3' of the stem-loop domain) that hybridizes to a target sequence in a target cell. In some embodiments, the Cpf1crRNA comprises a processing sequence that is cleaved upon entry into the cell, thereby releasing the Cpf1crRNA from the processing sequence and the extension sequence. In other embodiments, the Cpf1crRNA does not comprise a processing sequence.

The target nucleic acid is a polynucleotide (e.g., RNA, DNA) to which the targeting sequence of crRNA binds and induces cleavage by Cpf 1. The target nucleic acid comprises a "target site" or "target sequence", which is the sequence present in the target nucleic acid to which the crRNA hybridizes, which in turn directs the endonuclease to the target nucleic acid.

The "eukaryotic target cell" can be any eukaryotic cell known in the art, and includes both in vivo and in vitro cells. In one embodiment, the target cell is a mammalian cell.

Any route of administration can be used to deliver the composition to the mammal. Indeed, while more than one route may be used to administer the composition, a particular route may provide a more immediate and more effective response than another route. When administered to a cell in vitro or ex vivo, the nucleic acid or composition may be contacted with the cell by any suitable method. For example, the nucleic acid can be introduced in liposomes, encapsulated by cationic polymers, and/or by electroporation. When administered to a subject, such as a mammal or human, the composition can be administered by any of a variety of routes. For example, a dose of the composition can also be applied to or instilled into a body cavity, absorbed through the skin (e.g., via a transdermal patch), inhaled, ingested, topically applied to a tissue, or parenterally administered via, for example, intravenous, intraperitoneal, intraoral, intradermal, subcutaneous, or intraarterial administration.

The composition may be administered in or on a device that allows for controlled or sustained release, such as a sponge, biocompatible mesh, mechanical reservoir, or mechanical implant. Implants (see, e.g., U.S. patent No. 5,443,505), devices (see, e.g., U.S. patent No. 4,863,457), e.g., implantable devices, such as mechanical reservoirs or implants or devices composed of polymeric compositions, are particularly useful for administration of the compositions. The compositions may also be administered in the form of sustained release formulations (see, e.g., U.S. Pat. No. 5,378,475) comprising, for example, gel foams, hyaluronic acid, gelatin, chondroitin sulfate, polyphosphates such as bis-2-hydroxyethyl-terephthalate (BHET), and/or polyglycolic acid.

65页详细技术资料下载
上一篇:一种医用注射器针头装配设备
下一篇:杀昆虫蛋白发现平台和自其发现的杀昆虫蛋白

网友询问留言

已有0条留言

还没有人留言评论。精彩留言会获得点赞!

精彩留言,会给你点赞!