Fusion proteins for base editing

文档序号:1078009 发布日期:2020-10-16 浏览:13次 中文

阅读说明:本技术 用于碱基编辑的融合蛋白 (Fusion proteins for base editing ) 是由 陈佳 杨力 黄行许 杨贝 王潇 李佳楠 于 2019-02-22 设计创作,主要内容包括:本发明提供一种融合蛋白,所述融合蛋白包括载脂蛋白BmRNA编辑酶催化亚基3A(APOBEC3A)和成簇规律间隔的短回文重复序列(CRISPR)相关(Cas)蛋白,任选地其进一步与尿嘧啶糖基化酶抑制剂(UGI)融合。即使胞嘧啶在GpC中或为甲基化的,这种融合蛋白也能够通过使胞嘧啶脱氨基为尿嘧啶,从而在DNA中进行碱基编辑。(The present invention provides a fusion protein comprising an apolipoprotein b mrna editing enzyme catalytic subunit 3A (APOBEC3A) and a Clustered Regularly Interspaced Short Palindromic Repeats (CRISPR) -associated (Cas) protein, optionally further fused to a Uracil Glycosylase Inhibitor (UGI). Even if cytosine is in the GpC or methylated, such fusion proteins are able to base edit in DNA by deaminating cytosine to uracil.)

1. A fusion protein comprising a first fragment comprising an apolipoprotein B mRNA editing enzyme catalytic subunit 3A (APOBEC3A) and a second fragment comprising a Clustered Regularly Interspaced Short Palindromic Repeats (CRISPR) -associated (Cas) protein.

2. The fusion protein of claim 1, further comprising a Uracil Glycosylase Inhibitor (UGI).

3. The fusion protein of claim 1, comprising a total of less than 2500 amino acid residues.

4. The fusion protein of claim 1, wherein the APOBEC3A is a mutant of wild-type human APOBEC3A or a mutant of human APOBEC3A having a mutation selected from the group consisting of Y130F, D131Y, D131E, Y132D, W104A, W98Y, P134Y, and combinations thereof, according to the residue numbering of SEQ ID No. 1, wherein the mutant retains cytidine deaminase activity.

5. The fusion protein of claim 4, wherein the APOBEC3A is a mutant of human APOBEC3A having a mutation selected from the group consisting of Y130F + D131E + Y132D, Y130F + D131Y + Y132D, W98Y + W104A, W98Y + P134Y, W104A + P134Y, W104A + Y130F, W104A + Y132D, W98Y + W104A + Y130F, W98Y + W104A + Y132D, W104A + Y130F + P134Y and W104A + Y132D + P134Y according to the residue numbering in SEQ ID NO 1.

6. The fusion protein of claim 4, wherein human APOBEC3A is human APOBEC3A subtype a or subtype b.

7. The fusion protein of claim 4, wherein the APOBEC3A comprises the amino acid sequence shown in SEQ ID NO. 1 or 6, or has at least 90% sequence identity to amino acid residues 29-199 of SEQ ID NO. 1 and retains cytidine deaminase activity.

8. The fusion protein of claim 7, wherein the APOBEC3A comprises an amino acid sequence selected from the group consisting of SEQ ID NOs 1-10 and 22-36.

9. The fusion protein of claim 1, wherein the Cas protein is selected from the group consisting of SpCas9, FnCas9, St1Cas9, St3Cas9, NmCas9, SaCas9, AsCpf1, LbCpf1, FnCpf1, VQR SpCas9, EQR SpCas9, vrersspcas 9, RHAFnCas9, KKH saccas 9, NmeCas9, StCas9, CjCas9, AsCpf1, FnCpf1, SsCpf1, PcCpf1, bpcppf 1, CmtCpf1, licf 1, PmCpf1, Pb3310Cpf1, Pb4417Cpf1, BsCpf1, eecf 1, eecppf 1, eecf 1, EbCas 3612, bncscas 1, pswacs3672, pswaxsas 1, pswafcx 3613, psfcx 1, psfccas 3613, psfcsa 1, psfcx 1, psfccas 3613, and psnccas 3613.

10. The fusion protein of claim 1, wherein the Cas protein is selected from the group consisting of SpCas9, FnCas9, St1Cas9, St3Cas9, NmCas9, SaCas9, AsCpf1, LbCpf1, FnCpf1, VQR SpCas9, EQR SpCas9, vrersspcas 9, RHA FnCas9, KKH saccas 9, NmeCas9, StCas9, CjCas9, AsCpf1, FnCpf1, SsCpf1, pccpcf 1, bpcppf 1, CmtCpf1, LiCpf1, PmCpf1, Pb3310Cpf1, Pb4417Cpf1, BsCpf1, eecf 1, eecppf 1, eecf 1, BhCas 3612, bbcas 1, ebcs3672, ebcscas 1, ppcs3672, ppcsasx 3613, pcasx 3613, pcffcas 3613, and pcffcas 1, wherein the mutant does not retain the binding ability of SpCas 13.

11. The fusion protein of claim 10, wherein the mutant is capable of introducing a nick into one strand of double-stranded DNA bound by the mutant.

12. The fusion protein of claim 10, wherein the Cas protein comprises an amino acid sequence selected from the group consisting of the amino acid sequences set forth as SEQ id nos 11 and 37-39.

13. The fusion protein of claim 1, wherein the first segment is N-terminal to the second segment.

14. The fusion protein of claim 2, wherein the UGI comprises the amino acid sequence set forth in SEQ ID No. 12 or has at least 90% sequence identity to SEQ ID No. 12 and retains uracil glycosylase inhibiting activity.

15. The fusion protein of claim 14, wherein the first segment is N-terminal to the second segment, and the second segment is N-terminal to the UGI.

16. The fusion protein of claim 1, further comprising a peptide linker between the first fragment and the second fragment.

17. The fusion protein of claim 16, wherein the peptide linker has 1 to 100 amino acid residues.

18. The fusion protein of claim 17, wherein at least 40% of the amino acid residues of the peptide linker are selected from the group consisting of alanine, glycine, cysteine, and serine.

19. The fusion protein of claim 17, wherein the peptide linker has an amino acid sequence as set forth in SEQ ID No. 13 or 14.

20. The fusion protein of claim 1, wherein the fusion protein further comprises a nuclear localization sequence.

21. The fusion protein of claim 1, wherein the fusion protein comprises an amino acid sequence selected from the group consisting of SEQ ID NOs 16-20 and 40-50.

22. A fusion protein comprising a first fragment comprising apolipoprotein B mRNA editor enzyme catalytic subunit 3A (APOBEC3A) and a second fragment comprising a CRISPR-associated endonuclease in Prevotella (Prevotella) and Francisella (Francisella 1, Cpf 1).

23. The fusion protein of claim 22, wherein the Cpf1 is catalytically inactive.

24. The fusion protein of claim 22 or 21, wherein Cpf1 is selected from the group consisting of aspcf 1, LbCpf1, and FnCpf 1.

25. The fusion protein of claim 22, wherein the Cpf1 is catalytically inactive Cpf1 (dlbpcf 1) of the family lachnospiraceae.

26. The fusion protein of claim 22, wherein the APOBEC3A is a wild-type human APOBEC3A or a mutant of human APOBEC3A having a mutation selected from the group consisting of Y130F, D131Y, D131E, Y132D, W98Y, W104A, P134Y, and combinations thereof, according to the numbering of residues in SEQ ID No. 1, wherein the mutant retains cytidine deaminase activity.

27. A polynucleotide encoding the fusion protein of any one of claims 1-26.

28. A composition comprising the fusion protein of any one of claims 1-26 and a pharmaceutically acceptable carrier.

29. The composition of claim 28, wherein said composition further comprises a guide RNA.

30. A method of editing a target polynucleotide, the method comprising contacting the fusion protein of any one of claims 1-26 and a guide RNA having at least partial sequence complementarity to the target polynucleotide, wherein the editing comprises deaminating cytosine (C) in the target polynucleotide.

31. The method of claim 30, wherein the C is in GpC.

32. The method of claim 30, wherein C is methylated.

33. The method of any one of claims 30-32, wherein the contacting is performed in vivo.

34. The method of any one of claims 30-32, further comprising contacting the target polynucleotide with Uracil Glycosylase Inhibitor (UGI) that is not fused to the Cas protein.

Technical Field

The invention belongs to the field of gene editing, and particularly relates to a fusion protein for base editing.

Background

Genome editing is a genetic engineering that uses engineered nucleases (molecular scissors) to insert, delete or replace DNA in the genome of an organism. Genetic manipulation of the genome of cells and organisms using genome editing tools has a wide range of application interests in life science research, biotechnology, agricultural technology development and most important pharmaceutical or clinical innovations. For example, genome editing can be used to correct driving mutations behind genetic diseases, thereby completely curing these diseases in the organism; genome editing can also be used to modify the genome of a crop, thereby increasing crop yield and conferring resistance to environmental contamination or pathogen infection; likewise, microbial genome transformation by precise genome editing is of great significance for the development of renewable biological energy sources.

The CRISPR/Cas (clustered regularly interspaced short palindromic repeats/CRISPR-associated proteins) system is the most powerful genome editing tool due to its unrivaled editing efficiency, convenience and potential application in organisms. Under the guidance of guide rna (grna), Cas nucleases can cause DNA Double Strand Breaks (DSBs) at target genomic sites in various cells (cell lines and organism cells). These DSBs are then repaired by endogenous DNA repair systems that can be used for the desired genome editing.

Generally, DSBs can activate two major DNA repair pathways, non-homologous end-chain (NHEJ) and homologous mediated repair (HDR). NHEJ can induce random insertions or deletions in the genomic DNA region surrounding the DSB, resulting in a shift in the Open Reading Frame (ORF) and ultimately in gene inactivation. In contrast, when HDR is triggered, the DNA template sequence of the exogenous donor replaces the genomic DNA sequence of the target site by a homologous recombination mechanism, resulting in correction of the gene mutation.

However, the practical efficiency of HDR-mediated gene correction is low (typically < 5%), since homologous recombination occurs both cell-class-specific and cell cycle-dependent, and the triggering frequency of NHEJ is higher than HDR. Thus, the relatively low efficiency of HDR limits the transformation of CRISPR/Cas genome editing tools in the field of precise gene therapy (disease-driven gene correction).

The Base Editor (BE) which integrates CRISPR/Cas and APOBEC (catalytic polypeptide-like apolipoprotein B mRNA editing enzyme) cytosine deaminase families, which is invented recently, greatly improves the efficiency of CRISPR/Cas9 mediated gene correction. The cytosine (C) deamination activity of rat APOBEC1(rA1) fused to Cas9 nickase (nCas9) could purposely target bases in the genome and catalyze the replacement of the C at these bases by T.

However, current rA 1-based BEs do not efficiently edit G-associated Cs (i.e., GpC Cs), thereby limiting the targeting breadth of the genome. Therefore, it is highly desirable to create a new BE that can efficiently edit C in GpC. Such new BEs will ensure that we perform efficient base editing in the more extensive genomes of various organisms. Importantly, such BEs with high efficiency effects on C in GpC will facilitate clinical transformation, especially in gene therapy involving repair of disease-related GpT mutations to GpC mutants.

Brief description of the invention

The present disclosure shows that when apolipoprotein B mRNA editing enzyme catalytic subunit 3A (APOBEC3A or A3A) is fused to a Clustered Regularly Interspaced Short Palindromic Repeats (CRISPR) -associated (Cas) protein, optionally further fused to a Uracil Glycosylase Inhibitor (UGI), the resulting fusion protein is capable of efficiently deaminating cytosine to uracil, resulting in C being substituted by T. Surprisingly and unexpectedly, this base editing is effective even when C is accompanied by G (i.e., in a GpC dinucleotide environment) or C is methylated. This editing efficiency can be further improved when A3A contains some mutations tested. Since cytosine methylation is common in living cells, this editing has important clinical implications.

In conventional base editors, Cas9 is a commonly used DNA endonuclease. In the base editor, Cas12a (Cpf1) has the advantage of recognizing a/T rich sequences when Cas12a (Cpf1) is used with APOBEC 1. Another surprising finding was that when A3A was substituted for APOBEC1, the editing efficiency was greatly improved. However, editing efficiency of Cas12a-A3A can be further improved when A3A contains some tested mutations.

Thus, in one embodiment, the present disclosure provides a fusion fragment comprising a first fragment comprising the apolipoprotein B mRNA editing enzyme catalytic subunit 3A (APOBEC3A) and a second fragment comprising a Clustered Regularly Interspaced Short Palindromic Repeats (CRISPR) -associated (Cas) protein. In some embodiments, the fusion protein further comprises a Uracil Glycosylase Inhibitor (UGI).

Preferably, the fusion protein comprises a total of less than 3000, 2500, 2200, 2100, 2000, 1900, 1800, 1700, 1600 or 1500 amino acid residues.

In some embodiments, APOBEC3A is a mutant of wild-type human APOBEC3A or mutant human APOBEC3A having a mutation selected from the group consisting of Y130F, D131Y, D131E, Y132D, W104A, W98Y, P134Y, and combinations thereof, which retains cytidine deaminase activity, according to the numbering of residues in sequence SEQ ID No. 1.

In some embodiments, the APOBEC3A is a human APOBEC3A mutant comprising a mutation selected from the group consisting of Y130F + D131E + Y132D, Y130F + D131Y + Y132D, W98Y + W104A, W98Y + P134Y, W104A + P134Y, W104A + Y130F, W104A + Y132D, W98Y + W104A + Y130F, W98Y + W104A + Y132D, W104A + Y130F + P134Y and W104A + Y132D + P134Y, according to the numbering of residues in SEQ ID No. 1.

In some embodiments, APOBEC3A comprises the amino acid sequence shown as SEQ ID No. 1 or has at least 90% sequence identity to amino acid residues 29-199 of SEQ ID No. 1 and retains cytidine deaminase activity. In some embodiments, APOBEC3A contains an amino acid sequence selected from the group consisting of SEQ ID NOs 1-10 and 22-36.

In some embodiments, the Cas protein is selected from the group consisting of SpCas, FnCas, St1Cas, St3Cas, NmCas, SaCas, AsCpf, LbCpf, FnCpf, VQR SpCas, EQR SpCas, VRER SpCas, RHA FnCas, KKHSaCas, NmeCas, StCas, CjCas, AsCpf, FnCpf, SsCpf, PcCpf, bppf, CmtCpf, LiCpf, PmCpf, Pb3310Cpf, Pb4417Cpf, BsCpf, EeCpf, BhCas12, AkCas12, EbCas12, LsCas12, RfCas13, LwaCas13, PspCas13, pgcas 13, RanCas13, ranx and y. In some embodiments, the Cas protein is a mutant of a protein selected from the group consisting of SpCas, FnCas, St1Cas, St3Cas, NmCas, SaCas, AsCpf, LbCpf, FnCpf, VQRSpCas, EQR SpCas, VRER SpCas, RHA Cas, KKH saccas, NmeCas, StCas, CjCas, AsCpf, FnCpf, SsCpf, PcCpf, bppf, CmtCpf, LiCpf, PmCpf, Pb3310Cpf, Pb4417Cpf, espf, BhCas12, AkCas12, EbCas12, LsCas12, RfCas13, lwcas 13, psas 13, pgcas 13, RanCas13, CasX and y, wherein the mutant retains DNA binding ability but does not introduce a double-stranded DNA break. In some embodiments, the mutant is capable of introducing a nick into one strand of double-stranded DNA bound by the mutant. In some embodiments, the Cas protein comprises any one of the amino acid sequences shown as SEQ ID NOs 11 and 37-39.

In some embodiments, the UGI contains the amino acid sequence shown as SEQ ID No. 12 or has at least 90% sequence identity to SEQ ID No. 12 and retains uracil glycosylase inhibiting activity.

In some embodiments, the first fragment is N-terminal to the second fragment. In some embodiments, the first segment is N-terminal to the second segment, and the second segment is N-terminal to the UGI.

In some embodiments, the fusion protein further comprises a peptide linker between the first fragment and the second fragment. In some embodiments, the peptide linker has 1 to 100 amino acid residues. In some embodiments, at least 10%, 20%, 30%, 40%, 50%, 60%, 70%, 80%, or 90% of the amino acid residues of the peptide linker are selected from the group consisting of: alanine, glycine, cysteine and serine. In some embodiments, the peptide linker has an amino acid sequence as set forth in SEQ ID NO 13 or 14. In some embodiments, the fusion protein further comprises a nuclear localization sequence.

Non-limiting examples of fusion proteins include those having an amino acid sequence selected from the group consisting of SEQ ID NOs 16-20 and 40-50.

In another embodiment, a fusion protein is provided comprising a first fragment comprising the apolipoprotein B mRNA editing enzyme catalytic subunit 3A (APOBEC3A) and a second fragment comprising a CRISPR-associated endonuclease in Prevotella (Prevotella) and Francisella (Francisella 1, Cpf 1). In some embodiments, Cpf1 is catalytically inactive.

In some embodiments, Cpf1(Cas12a) may be selected from ascipf 1, LbCpf1, and FnCpf 1. In a particular embodiment, Cpf1 is catalytically inactive Cpf1 (dlbpcf 1) of the family lachnospiraceae.

In some embodiments, APOBEC3A is wild-type human APOBEC3A or a mutant human APOBEC3A having a mutation selected from the group consisting of Y130F, D131Y, D131E, Y132D, W104A, W98Y, P134Y, and combinations thereof, according to the residue numbering of SEQ ID No. 1, wherein the mutant retains cytidine deaminase activity.

Polynucleotides encoding the fusion proteins of the disclosure are also provided. Moreover, in another embodiment, a composition comprising a fusion protein of the present disclosure and a pharmaceutically acceptable carrier is provided. In some embodiments, the composition further comprises a guide RNA.

Some embodiments also provide methods of using the fusion proteins and compositions. In one embodiment, a method for editing a target polynucleotide is provided comprising contacting a fusion protein of the present disclosure and a guide RNA having at least partial sequence complementarity to the target polynucleotide, wherein the editing comprises deamination of a cytosine (C) in the target polynucleotide. In some embodiments, C is in GpC. In some embodiments, C is methylated. In some embodiments, the contacting occurs in vitro, ex vivo, or in vivo. In some embodiments, the method further comprises contacting the target polynucleotide with a Uracil Glycosylase Inhibitor (UGI) that is not fused to the Cas protein.

Drawings

FIGS. 1A-B, structure and Performance of hA 3A-BE. FIG. A: schematic illustrations demonstrating co-expression of BE3/sgRNA or hA 3A-BE/sgRNA. And B, drawing: the co-expression of hA3A-BE/sgRNA achieved more efficient base editing on the C of GpC of the target genomic region of sgRNA (sgFANCF-M-L6 and sgSITE4) compared to the co-expression of BE 3/sgRNA. The dashed box indicates the position of cytosine in GpC.

FIGS. 2A-B, structure and properties of hA3A-BE-Y130F and hA 3A-BE-Y132D. FIG. A: a schematic illustrating co-expression of hA3A-BE/sgRNA, hA3A-BE-Y130F/sgRNA, or hA 3A-BE-Y132D/sgRNA. And B: co-expression of hA3A-BE-Y130F/sgRNA or hA3A-BE-Y132D/sgRNA induced base editing in a narrower window of the sgRNA target genomic region (sgSITE3 and sgEMX1) compared to co-expression of A3A-BE/sgRNA. The dashed boxes represent editing windows of bases.

FIGS. 3A-B, structure and properties of hA3A-BE-W104A and hA 3A-BE-D131Y. FIG. A: a schematic illustrating co-expression of hA3A-BE/sgRNA, hA3A-BE-W104A/sgRNA, or hA 3A-BE-D131Y/sgRNA. And B, drawing: co-expression of hA3A-BE-W104A/sgRNA or hA3A-BE-D131Y/sgRNA induces more efficient base editing in the sgRNA target genomic region (sgFANCF and sgSITE2) compared to co-expression of A3A-BE/sgRNA. The dashed box represents edited cytosine.

FIGS. 4A-B, structures and properties of hA3A-BE-Y130F-D131E-Y132D and hA 3A-BE-Y130F-D131Y-Y132D. FIG. A: a schematic diagram illustrating the co-expression of hA3A-BE/sgRNA, hA3A-BE-Y130F-D131E-Y132D/sgRNA or hA3A-BE-Y130F-D131Y-Y132D/sgRNA is shown. And B, drawing: co-expression of hA3A-BE-Y130F-D131E-Y132D/sgRNA or hA3A-BE-Y130F-D131Y-Y132D/sgRNA induces base editing in a narrower window of the sgRNA target genomic region (sgFANCF and sgSITE3) compared to co-expression of hA 3A-BE/sgRNA. The dashed box represents edited cytosine.

FIGS. 5a-h, hA3A-BE3 induced efficient base editing in the methylation region and GpC environment. (a) The profile of BE for the T to C (or A to G) mutants was compiled. Potentially editable cytosines (underlined) are classified two-dimensionally according to their 3' neighbouring bases. (b) BE subjected to efficient base editing in a hypermethylated environment was screened. A series of new BEs were constructed by fusing different APOBEC/AID deaminases with Cas9 nickase (nCas9) and uracil DNA glycosylase inhibitor (UGI). (c) Cumulative base editing frequency induced by different BEs in unmethylated and methylated supports. A commonly used rA 1-based BE3 was selected for alignment. Mean ± s.d. values were obtained from three independent experiments (six hA3A-BE 3). (d) Immunoblotting of BE3 and hA3A-BE3 co-transfected with unmethylated or methylated vectors. Tubulin was used as a loading control and immunoblot images represent three independent experiments. (e) BE3 and hA3A-BE3 were compared in the efficiency of base editing induced in the genomic region of naturally high methylation level of DNA. The C to T editing frequency of the indicated cytosines was determined separately. The target site sequence was displayed in pink with the BE3 editing window (sites 4-8, with the base distal to the PAM set to site 1), PAM in cyan, and CpG sites in uppercase. Shades of gray: guanine at the 5' end of cytosine can be edited. NT: untreated native HEK293T cells. (f) Statistical analysis of normalized C to T editing frequency in the regions of naturally hypermethylated levels of DNA shown in (e) was performed, with the frequency induced by BE3 set to 100%. n-48 samples from three independent experiments. (g) Comparison of base editing efficiency induced on C of GpC by BE3 and hA3A-BE3 in genomic regions of natural low-level methylation. (h) Statistical analysis of normalized C to T editing frequency of GpC sites in the DNA hypomethylation level region shown in (g) was performed, and the frequency induced by BE3 was set as 100%. n-24 samples from three independent experiments. (e, g) mean ± s.d. from three independent experiments. (f, h) P value: and (5) carrying out t test on a single student. The median and interquartile range (IQR) are indicated.

FIGS. 6a-i, a modification of hA3A-BE 3. (a) The base-editing efficiencies induced by BE3, hA3A-BE3, hA3A-BE3-Y130F and hA3A-BE3-Y132D in genomic regions of high methylation levels in natural DNA were compared. The target site sequence is shown in pink, PAM in cyan, and CpG in uppercase with overlapping editing windows (sites 4-7). NT: untreated native HEK293T cells. (b) Statistical analysis of normalized C to T editing frequency was performed in the overlapping editing window shown in (a), with the frequency induced by BE3 set to 100%. n-12 samples from three independent experiments. (c) Comparison of base editing efficiency on C of GpC induced by BE3, hA3A-BE3, hA3A-BE3-Y130F, and hA3A-BE3-Y132D in overlapping editing windows of genomic regions with lower native methylation levels. (d) Statistical analysis of normalized C to T editing frequency shown in (C), the frequency induced by BE3 was set to 100%. n-9 samples from three independent experiments. (e) Immunoblotting of BE transfected into HEK293T cells. Tubulin was used as a loading control and immunoblot images represent three independent experiments. (f) Comparison of base editing efficiency induced on C in GpC by hA3A-BE3-Y130F, hA3A-eBE-Y130F, hA3A-BE3-Y132D and hA3A-eBE-Y132D and hA3A-eBE-Y132D in genomic regions of natural low level of DNA methylation overlapping editing windows. (g) Statistical analysis of the normalized C to T editing frequencies shown in (f) set the frequency induced by hA3A-BE3-Y130F (left) or hA3A-BE3-Y132D (right) to 100%. n-9 samples from three independent experiments. (h, i) comparison of the product purity (h) and insertion deletions (i) produced in naturally low levels of DNA methylated genomic DNA regions by hA3A-BE3-Y130F, hA3A-eBE-Y130F, hA3A-BE3-Y132D, and hA 3A-eBE-Y132D. Asterisks indicate abnormally high frequency of base indels (either amplification, sequencing or manual alignment) at the VEGFA-M-c site examined in NT. The (a, c, f, i) mean ± s.d. was from three independent experiments. (b, d, g) P value: and (5) carrying out t test on a single student. Median and IQR are indicated.

FIGS. 7A-B and 8A-B show the vector structure of each base editor tested and graphically illustrate its editing efficiency for the target DYRK1A gene.

FIGS. 9A-B and 10A-B show the vector structure of each base editor tested and graphically show its editing efficiency for the target SITE6 gene.

FIGS. 11A-B and 12A-B show the vector structure of each base editor tested and graphically illustrate its editing efficiency for the target RUNX1 gene.

FIGS. 13-18 show the sequencing results of examples 3-5.

Detailed Description

Definition of

It should be noted that the term "an" object refers to one or more of the objects; for example, "an antibody" is to be construed as representing one or more antibodies. Thus, the terms "a", "one or more" and "at least one" may be used interchangeably herein.

As used herein, the term "polypeptide" is intended to include a single "polypeptide" as well as a plurality of "polypeptides" and refers to a molecule consisting of monomers (amino acids) linearly linked by amide bonds (also referred to as peptide bonds). The term "polypeptide" refers to any chain or chains of two or more amino acids, and not to a particular length of the product. Thus, included in the definition of "polypeptide" are peptides, dipeptides, tripeptides, oligopeptides, "proteins," "amino acid chains," or any term used to refer to a chain or chains of two or more amino acids, and the term "polypeptide" may be used instead of, or interchangeably with, any of these terms. The term "polypeptide" also means the product of modification after expression of the polypeptide, including but not limited to glycosylation, acetylation, phosphorylation, amidation, derivatization of known protecting/blocking groups, proteolytic cleavage, or unnatural amino acid modification. The polypeptides may be derived from natural biological sources or produced by recombinant techniques, but are not necessarily translated from a specified nucleic acid sequence. It may be produced in any manner, including chemical synthesis.

The term "isolated" as used herein with respect to cells, nucleic acids (e.g., DNA or RNA) refers to molecules that are separated from other DNA or RNA, respectively, that are present in the natural source of the macromolecule. The term "isolated" as used herein also refers to nucleic acids or peptides produced by recombinant DNA techniques that are substantially free of cellular material, viral material, or culture medium, or produced by chemical synthesis that are free of chemical precursors or other chemicals. Furthermore, "isolated nucleic acid" is meant to include non-naturally occurring fragments and nucleic acid fragments that are not found in nature. The term "isolated" is also used herein to refer to cells or polypeptides that are isolated from other cellular proteins or tissues. Isolated polypeptides are intended to encompass both purified polypeptides and recombinant polypeptides.

As used herein, the term "recombinant" in relation to a polypeptide or polynucleotide refers to a form of a non-naturally occurring polypeptide or polynucleotide, a non-limiting example of which would be produced by combining polynucleotides or polypeptides that would not normally appear together.

"homology" or "identity" or "similarity" refers to sequence similarity between two peptides or between two nucleic acid molecules. Homology is determined by comparing the sites of each sequence, and these sequences may be aligned for comparison. When a position in the compared sequences is occupied by the same base or amino acid, then the molecules are homologous at that position. The degree of homology between sequences is a function of the number of sites that the sequences share the same or homologous sites. An "unrelated" or "non-homologous" sequence has less than 40% identity, but preferably less than 25% identity, to one of the sequences of the present disclosure.

"sequence identity" of a polynucleotide or polynucleotide region (or polypeptide region) to another sequence by a percentage (e.g., 60%, 65%, 70%, 75%, 80%, 85%, 90%, 95%, 98%, or 99%) means that the percentage of bases (or amino acids) are identical in the sequences being compared when aligned. Such alignments and percent homology or sequence identity properties are determined by using software programs known in the art, such as the software mentioned in Ausubel et al. Preferably, the alignment uses default parameters. One of the alignment programs is BLAST, which uses default parameters. In particular, the BLASTN and BLASTP programs use the following default parameters: genetic code ═ standard; filter is none; strand ═ booth; cutoff is 60; expect is 10; matrix ═ BLOSUM 62; descriptors is 50 sequences; sort by HIGH SCORE; biologically equivalent polynucleotides are those having the above specified percentages of homology and encoding polypeptides having the same or similar biological activity.

The term "equivalent nucleic acid or polynucleotide" refers to a nucleic acid having a nucleotide sequence with a degree of homology or sequence identity to the nucleotide sequence of another nucleic acid or its complement. Homologues of double-stranded nucleic acids are intended to include nucleic acids having a nucleotide sequence with a degree of homology to its complementary sequence. In one aspect, a homologue of a nucleic acid is capable of hybridizing to a nucleic acid or its complement. Similarly, an "equivalent polypeptide" refers to a polypeptide that has a degree of homology or sequence identity to the amino acid sequence of a control polypeptide. In some aspects, the sequence identity is at least about 70%, 75%, 80%, 85%, 90%, 95%, 98%, or 99%. In some aspects, an equivalent polypeptide or polynucleotide contains one, two, three, four, or five additions, deletions, substitutions, and combinations thereof, as compared to a control polypeptide or polynucleotide. In some aspects, the equivalent sequence retains the activity (e.g., epitope binding) or structure (e.g., salt bridge) of the control sequence.

Hybridization reactions can be performed under different "stringency" conditions. Tong (Chinese character of 'tong')Typically, low stringency hybridization reactions are performed in about 10 × SSC or equivalent ionic strength/temperature solutions at about 40 deg.C, medium stringency hybridization is typically performed at about 50 deg.C in about 6 × SSC, and high stringency hybridization reactions are typically performed at about 60 deg.C in about 1 × SSC2+And (4) concentration.

A polynucleotide consists of a specific sequence of four nucleotide bases: adenine (a); cytosine (C); guanine (G); thymine (T); when the polynucleotide is RNA, uracil (U) is used in place of thymine. Thus, the term "polynucleotide sequence" is denoted by the letters of a polynucleotide molecule. The letter representation can be entered into a database of a computer containing a central processing unit and used for bioinformatics applications such as functional genomics and homology searches. The term "polymorphism" refers to the co-existence between multiple forms of a gene or a portion thereof. A part of a gene has at least two different forms, i.e., two different nucleotide sequences, which is referred to as a "polymorphic region of a gene". A polymorphic region may be a single nucleotide, the identity of which differs in different alleles.

The terms "polynucleotide" and "oligonucleotide" are used interchangeably to refer to a polymeric form of nucleotides of any length, i.e., deoxyribonucleotides or ribonucleotides or analogs thereof. The polynucleotide may have any three-dimensional structure and may perform any function, known or unknown. Non-limiting examples of polynucleotides are as follows: a gene or gene fragment (e.g., a probe, primer, EST, or SAGE tag), an exon, an intron, messenger RNA (mrna), transfer RNA, ribosomal RNA, ribozyme, cDNA, dsRNA, siRNA, miRNA, recombinant polynucleotide, branched polynucleotide, plasmid, vector, any sequence of isolated DNA, any sequence of isolated RNA, nucleic acid probe, and primer. A polynucleotide may contain modified nucleotides, such as methylated nucleotides and nucleotide analogs. If present, the nucleotide structure may be modified before or after polynucleotide assembly. The nucleotide sequence may be interrupted by non-nucleotide components. The polynucleotide may be further modified after polymerization, for example by conjugation with a labeling component. The term also refers to double-stranded and single-stranded molecules. Unless otherwise specified or required, any polynucleotide embodiment of the disclosure includes a double-stranded form and two complementary single-stranded forms each of which is known or predicted to constitute the double-stranded form.

When the term "encoding" is used in reference to a polynucleotide, it refers to a polynucleotide that "encodes" a polypeptide. If manipulated in its native state or by methods well known to those skilled in the art, it may be transcribed and/or translated to produce mRNA for the polypeptide and/or fragments thereof. The antisense strand is the complementary strand of such a nucleic acid and the coding sequence can be deduced therefrom.

Fusion proteins

Current rA 1-based BEs (base editors) do not efficiently edit C in the sequence of methylated regions or GpC, which limits the use of base editing. The fusion molecules provided by the present disclosure bind to apolipoprotein B mRNA editing enzyme catalytic subunit 3A (APOBEC3A or A3A) and Clustered Regularly Interspaced Short Palindromic Repeats (CRISPR) -associated (Cas) protein, optionally further with Uracil Glycosylase Inhibitor (UGI).

The resulting fusion protein is able to deaminate cytosine to uracil efficiently, resulting in the substitution of C by T. Surprisingly and unexpectedly, such base editing is effective even when C is associated with G (i.e., in the GpC dinucleotide environment) and/or even when it is in the methylated region. Since cytosine methylation is common in living cells, this editing has important clinical implications.

According to one embodiment of the present disclosure, there is provided a fusion protein comprising a first fragment comprising an apolipoprotein B mRNA editing enzyme catalytic subunit 3A (APOBEC3A) and a second fragment comprising a Clustered Regularly Interspaced Short Palindromic Repeats (CRISPR) -associated (Cas) protein.

APOBEC3A, also known as apolipoprotein B mRNA editing enzyme catalytic subunit 3A or A3A, is a protein of the APOBEC3 family found in humans, non-human primates and some other mammals. The APOBEC3A protein lacks the zinc binding activity of other family members. In humans, both subtype a ((NP-663745.1; SEQ ID NO:1) and subtype b (NP-001257335.1; SEQ ID NO:6) are active, whereas subtype a also contains some more residues near the N-terminus the term "APOBEC 3A" also includes variants and mutants that have some sequence identity (e.g., 70%, 75%, 80%, 85%, 90%, 95%, 98%, 99%) to wild-type mammalian APOBEC3A and retain their cytidine deamination activity.

As the embodiments show, certain mutants (e.g., Y130F (SEQ ID NO:2), Y132D (SEQ ID NO:3), W104A (SEQ ID NO:4), D131Y (SEQ ID NO:5), D131E (SEQ ID NO:22), W98Y (SEQ ID NO:24), W104A (SEQ ID NO:25), and P134Y (SEQ ID NO:26)) are even superior to wild-type human APOBEC 3A. In addition, many combinations of tests of these mutants also showed good performance. Moreover, although not specifically tested, these same mutants may also function in subtype b of A3A. Examples of such variants and mutants are provided in table 1 below.

Table 1: examples of APOBEC3A sequences

Figure BDA0002646791410000101

Figure BDA0002646791410000111

Figure BDA0002646791410000131

In some embodiments, APOBEC3A in a fusion protein of the disclosure is human subtype a or a sequence having at least 70%, 75%, 80%, 85%, 90%, 95%, 98%, 99% sequence identity to subtype a. In some embodiments, APOBEC3A in the fusion proteins of the present disclosure is human subtype b or a sequence having at least 70%, 75%, 80%, 85%, 90%, 95%, 98%, 99% sequence identity to subtype b. In some embodiments, APOBEC3A in a fusion protein of the disclosure is rat APOBEC3 or a sequence that is at least 70%, 75%, 80%, 85%, 90%, 95%, 98%, 99% sequence identity to rat APOBEC 3. In some embodiments, APOBEC3A in a fusion protein of the disclosure is mouse APOBEC3 or a sequence that has at least 70%, 75%, 80%, 85%, 90%, 95%, 98%, 99% sequence identity to mouse APOBEC 3. In some embodiments, the sequence retains cytidine deaminase activity.

In some embodiments, APOBEC3A includes the Y130F mutation (human subtype b, rat and mouse sequence numbering different but readily transposable) according to the residue numbering of SEQ ID NO: 1. In some embodiments, APOBEC3A includes the Y132D mutation according to the residue numbering of SEQ ID No. 1. In some embodiments, APOBEC3A includes the W104A mutation according to the residue numbering in SEQ ID No. 1. In some embodiments, APOBEC3A includes a D131Y mutation according to the residue numbering in SEQ ID No. 1. In some embodiments, APOBEC3A includes a D131E mutation according to the residue numbering in SEQ ID No. 1. In some embodiments, APOBEC3A includes the W98Y mutation according to the residue numbering in SEQ ID No. 1. In some embodiments, APOBEC3A includes the P134Y mutation according to the residue numbering in SEQ ID No. 1.

In some embodiments, APOBEC3A includes mutations Y130F, D131E and Y132D (human subtype b, rat and mouse sequence numbers are different but can be easily switched) according to the residue number of SEQ ID NO: 1. In some embodiments, APOBEC3A includes mutations Y130F, D131Y, and Y132D according to the residue numbering in SEQ ID NO: 1. In some embodiments, APOBEC3A includes mutations W98Y and W104A according to the residue numbering in SEQ ID NO: 1. In some embodiments, APOBEC3A includes mutations W98Y and P134Y according to the residue numbering in SEQ ID NO: 1. In some embodiments, APOBEC3A includes mutations W104A and P134Y according to the residue numbering in SEQ ID NO: 1. In some embodiments, APOBEC3A includes mutations W98Y, W104A and Y130F according to the residue numbering in seq id NO: 1. In some embodiments, APOBEC3A includes mutations W98Y, W104A, and Y132D, according to the residue numbering in SEQ ID NO: 1. In some embodiments, APOBEC3A includes mutations W104A, Y130F, and P134Y according to the residue numbering in SEQ ID NO: 1. In some embodiments, APOBEC3A includes mutations W104A, Y132D, and P134Y according to the residue numbering in SEQ ID NO: 1. In some embodiments, APOBEC3A includes mutations W104A and Y130F according to the residue numbering in SEQ ID NO: 1. In some embodiments, APOBEC3A includes mutations W104A and Y132D according to the residue numbering in SEQ ID NO: 1.

The sequence of exemplary APOBEC3A is shown in SEQ ID NOS: 1-10 and 22-36.

The APOBEC3A protein allows for further modifications, such as additions, deletions and/or substitutions at other amino acid positions. Such modifications may be substitutions at one, two or three or more sites. In one embodiment, modification refers to substitution at one position. In some embodiments, such substitutions are conservative substitutions. In some embodiments, the modified APOBEC3A protein still retains cytidine deaminase activity. In some embodiments, the modified APOBEC3A protein retains the mutations tested in the experimental examples.

In various embodiments, APOBEC3A may be substituted with another deaminase such as A3B (APOBEC3B), A3C (APOBEC3C), A3D (APOBEC3D), A3F (APOBEC3F), A3G (APOBEC3G), A3H (APOBEC3H), A3(APOBEC3), and aid (aicda).

In some embodiments, fusion proteins are provided comprising a first fragment comprising an apolipoprotein B mRNA editing enzyme catalytic subunit 3B (APOBEC3B) and a second fragment comprising a Clustered Regularly Interspaced Short Palindromic Repeats (CRISPR) -associated (Cas) protein. In some embodiments, fusion proteins are provided comprising a first fragment comprising an apolipoprotein B mRNA editing enzyme catalytic subunit 3C (APOBEC3C) and a second fragment comprising a Clustered Regularly Interspaced Short Palindromic Repeats (CRISPR) -associated (Cas) protein. In some embodiments, fusion proteins are provided comprising a first fragment comprising an apolipoprotein B mRNA editing enzyme catalytic subunit 3D (APOBEC3D) and a second fragment comprising a Clustered Regularly Interspaced Short Palindromic Repeats (CRISPR) -associated (Cas) protein. In some embodiments, fusion proteins are provided comprising a first fragment comprising an apolipoprotein b mrna editing enzyme catalytic subunit 3F (APOBEC3F) and a second fragment comprising a Clustered Regularly Interspaced Short Palindromic Repeats (CRISPR) -associated (Cas) protein. In some embodiments, fusion proteins are provided comprising a first fragment comprising the apolipoprotein B mRNA editing enzyme catalytic subunit 3G (APOBEC3G) and a second fragment comprising a Clustered Regularly Interspaced Short Palindromic Repeats (CRISPR) -associated (Cas) protein. In some embodiments, fusion proteins are provided comprising a first fragment comprising an apolipoprotein B mRNA editing enzyme catalytic subunit 3H (APOBEC3H) and a second fragment comprising a Clustered Regularly Interspaced Short Palindromic Repeats (CRISPR) -associated (Cas) protein. In some embodiments, fusion proteins are provided comprising a first fragment comprising an apolipoprotein B mRNA editing enzyme catalytic subunit 3(APOBEC3) and a second fragment comprising a Clustered Regularly Interspaced Short Palindromic Repeats (CRISPR) -associated (Cas) protein. In some embodiments, fusion proteins are provided comprising a first fragment comprising the apolipoprotein bmmrna editing enzyme catalytic subunit aid (aicda) and a second fragment comprising a Clustered Regularly Interspaced Short Palindromic Repeats (CRISPR) -associated (Cas) protein.

In some embodiments, the APOBEC protein is a human protein. In some embodiments, the APOBEC protein is a mouse or rat protein. The following table lists some examples of APOBEC proteins.

A "conservative amino acid substitution" is one in which the amino acid residue is replaced with an amino acid residue having a similar side chain. Families of amino acid residues having similar side chains have been defined in the art, including basic side chains (e.g., lysine, arginine, histidine), acidic side chains (e.g., aspartic acid, glutamic acid), uncharged polar side chains (e.g., glycine, asparagine, glutamine, serine, threonine, tyrosine, cysteine), nonpolar side chains (e.g., alanine, valine, leucine, isoleucine, proline, phenylalanine, methionine, tryptophan), beta-branched side chains (e.g., threonine, valine, isoleucine) and aromatic side chains (e.g., tyrosine, phenylalanine tryptophan, histidine). Thus, a non-essential amino acid residue in an immunoglobulin polypeptide is preferably substituted with another amino acid residue from the same side chain family. In another embodiment, one amino acid can be replaced by another that is structurally similar, both differing in the order and/or composition of the side chain family members.

The following table provides non-limiting examples of conservative amino acid substitutions. Wherein a similarity score of 0 or higher indicates that conservative substitutions between the two amino acids have occurred.

Table a: amino acid similarity matrix

C G P S A T D E N Q H K R V M I L F Y W
W -8 -7 -6 -2 -6 -5 -7 -7 -4 -5 -3 -3 2 -6 -4 -5 -2 0 0 17
Y 0 -5 -5 -3 -3 -3 -4 -4 -2 -4 0 -4 -5 -2 -2 -1 -1 7 10
F -4 -5 -5 -3 -4 -3 -6 -5 -4 -5 -2 -5 -4 -1 0 1 2 9
L -6 -4 -3 -3 -2 -2 -4 -3 -3 -2 -2 -3 -3 2 4 2 6
I -2 -3 -2 -1 -1 0 -2 -2 -2 -2 -2 -2 -2 4 2 5
M -5 -3 -2 -2 -1 -1 -3 -2 0 -1 -2 0 0 2 6
V -2 -1 -1 -1 0 0 -2 -2 -2 -2 -2 -2 -2 4
R -4 -3 0 0 -2 -1 -1 -1 0 1 2 3 6
K -5 -2 -1 0 -1 0 0 0 1 1 0 5
H -3 -2 0 -1 -1 -1 1 1 2 3 6
Q -5 -1 0 -1 0 -1 2 2 1 4
N -4 0 -1 1 0 0 2 1 2
E -5 0 -1 0 0 0 3 4
D -5 1 -1 0 0 0 4
T -2 0 0 1 1 3
A -2 1 1 1 2
S 0 1 1 1
P -3 -1 6
G -3 5
C 12

Table B: conservative amino acid substitutions

Figure BDA0002646791410000161

The term "Clustered Regularly Interspaced Short Palindromic Repeats (CRISPR) -associated (Cas) protein" or simply "Cas protein" refers to an RNA-guided DNA endonuclease associated with the CRISPR (clustered regularly interspaced short palindromic repeats) adaptive immune system of streptococcus pyogenes and other bacteria. Non-limiting examples of Cas proteins include streptococcus pyogenes Cas9(SpCas9), staphylococcus aureus Cas9(SaCas9), aminococcus acidi Cas12a (Cpf1), lachnospiraceae Cas12a (Cpf1), new francisella Cas12a (Cpf 1). Komor et al provide further examples in "CRISPR-based eukaryotic genome manipulation techniques" (cell.2017Jan 12; 168(1-2): 20-36).

Cas proteins in embodiments include SpCas9, FnCCas9, St1Cas9, St3Cas9, NmCas9, SaCas9, AsCpf1, LbCpf1, FnCpf1, VQRSSpCas 9, EQRSpCas9, VRERSpCas9, VRERSpCas9, RHAFnCas9, KKHSaCas9, NmCas9, StCas9, CjCas9, CasCcaas 9, CasC9, CcCas9, AsCpf 9, FnCpf 9, FnCPCCasf 9, FnCCpf 9, BpCpf 9, Cmtf 9, LiCpf 9, Pcppf 9, Pcpcpcf 9, Pcppf 363310 Cpf 9, Pb4417 f 9, BstcPcf 44f 9, BstCpf 9, eCspcPcncf 9, sCas 3613, eCspcPcnfCas 3613, sCas 3613, sPcpf 9, sPcpf 3613, sCas 9, and sP.

Table C: examples of Cas proteins

In some embodiments, the Cas protein is a mutant selected from the above proteins that retains DNA binding ability but does not introduce double-stranded DNA breaks.

For example, it is well known that in SpCas9, residues Asp10 and His840 are most important for the catalytic (nuclease) activity of Cas 9. When both residues were mutated to Ala, the mutant lost nuclease activity. In another embodiment, only the Asp10Ala mutation is made and the mutein does not undergo a double strand break; but rather forms an indentation in one of the strands. This mutant is also known as Cas9 nickase. A non-limiting example of a Cas9 nickase is shown in SEQ ID NO 11. A non-limiting example of a Cas12a nickase is shown in SEQ ID NOS 37-39. Cas proteins also include mutants of known Cas proteins with a certain sequence identity (e.g., 70%, 75%, 80%, 85%, 90%, 95%, 98%, 99% or more). In some embodiments, the Cas protein retains catalytic (nuclease) activity.

In some embodiments, the Cas protein in the fusion proteins of the present disclosure is a Cas12a (Cpf1, CRISPR-associated endonuclease in prevotella and francisella 1) protein. In traditional base editors, Cas9 is a commonly used DNA endonuclease. When used with APOBEC1, Cas12a (Cpf1) has the advantage of identifying a/T-rich sequences. Another surprising discovery in the present disclosure is that when A3A is substituted for APOBEC1, the editing efficiency is greatly improved (see, e.g., examples 3-5 and fig. 7B, 9B, and 11B). However, when A3A includes some tested mutations (examples 3-5 and fig. 7B, 9B and 11B), the editing efficiency of this Cas12a-A3A can be further improved, and when A3A contains more tested mutations, more precise editing can be achieved by narrowing the editing window of this Cas12a-A3A (examples 3-5 and fig. 8B, 10B and 12B).

Thus, in some embodiments, a fusion protein is provided comprising a first fragment comprising the apolipoprotein B mRNA editing enzyme catalytic subunit 3A (APOBEC3A) and a second fragment comprising a CRISPR-associated endonuclease in Prevotella (Prevotella) and Francisella (Francisella 1, Cpf 1). Examples of APOBEC3A and its alternatives (e.g. A3B (APOBEC3B), A3C (APOBEC3C), A3D (APOBEC3D), A3F (APOBEC3F), A3G (APOBEC3G), A3H (APOBEC3H), A3(APOBEC3) or aid (aicda)) and bioequivalents (homologues) have been disclosed. 40-50 provide non-limiting examples of fusion sequences.

In some embodiments, the fusion protein further comprises a Uracil Glycosylase Inhibitor (UGI). Non-limiting examples of UGI are found in Bacillus phage AR9(YP _ 009283008.1). In some embodiments, the UGI comprises the amino acid sequence set forth in SEQ ID No. 12 or has at least 90% sequence identity with the sequence set forth in SEQ ID No. 12 and retains uracil glycosylase inhibiting activity.

In some embodiments, the UGI is not fused to a fusion protein, but is used alone (free UGI, not fused to a Cas protein or cytosine deaminase) when the fusion protein is used for genome editing. In some embodiments, the free UGI functions with a fusion protein containing the UGI.

Preferably, a peptide linker is provided between each fragment of the fusion protein. In some embodiments, the peptide linker has 1 to 100 amino acid residues (or 3-20, 4-15, but is not limited thereto). In some embodiments, at least 10%, 20%, 30%, 40%, 50%, 60%, 70%, 80%, or 90% of the amino acid residues of the peptide linker are selected from the group consisting of: alanine, glycine, cysteine and serine. In some embodiments, the peptide linker comprises an amino acid sequence as set forth in SEQ ID NO 13 or 14.

The APOBEC3A, Cas protein, and UGI can be arranged in any manner. However, in preferred embodiments, APOBEC3A is located at the N-terminus of the Cas protein. In one embodiment, the Cas protein is N-terminal to the UGI.

In some embodiments, the fusion protein further comprises a nuclear localization sequence, e.g., as shown in SEQ ID NO: 15.

Non-limiting examples of fusion proteins include those having an amino acid sequence selected from the group consisting of SEQ ID NOs 16-20.

Table 2: other sequences

Figure BDA0002646791410000221

Figure BDA0002646791410000231

Figure BDA0002646791410000251

Figure BDA0002646791410000261

Figure BDA0002646791410000271

Figure BDA0002646791410000301

Figure BDA0002646791410000311

Figure BDA0002646791410000321

Figure BDA0002646791410000351

Figure BDA0002646791410000361

The present disclosure also provides isolated polynucleotide or nucleic acid molecule sequences (as shown in SEQ ID NO:21) encoding the fusion proteins, variants or derivatives thereof of the present disclosure. Methods of making fusion proteins are well known in the art and are described herein.

Compositions and methods

The disclosure also provides compositions and methods. Such compositions comprise an effective amount of the fusion protein and an acceptable carrier. In some embodiments, the composition further comprises a guide RNA complementary to the target DNA. Such compositions are useful for base editing in a sample.

The fusion proteins and compositions are useful for base editing. In one embodiment, a method for editing a target polynucleotide is provided comprising contacting a fusion protein of the present disclosure and a guide RNA having at least partial sequence complementarity to the target polynucleotide, wherein the editing comprises deamination of a cytosine (C) in the polynucleotide of interest.

It has been shown that the presently disclosed fusion proteins can edit cytosine at any position and under any circumstances, e.g. in CpC, ApC, GpC, TpC, CpA, CpG, CpC, CpT. However, it is surprising and unexpected that these fusion proteins can edit C in the GpC dinucleotide environment, even if the C is methylated.

The contact between the fusion protein (and guide RNA) and the target polynucleotide can be in vitro, particularly in cell culture. When contact occurs ex vivo or in vivo, the fusion protein is clinically/therapeutically significant.

199页详细技术资料下载
上一篇:一种医用注射器针头装配设备
下一篇:以更高的生产率制造乙烯互聚物产品

网友询问留言

已有0条留言

还没有人留言评论。精彩留言会获得点赞!

精彩留言,会给你点赞!

技术分类