Novel single base editing technology and application thereof

文档序号:1068010 发布日期:2020-10-16 浏览:47次 中文

阅读说明:本技术 一种新型的单碱基编辑技术及其应用 (Novel single base editing technology and application thereof ) 是由 杨辉 周昌阳 于 2019-04-04 设计创作,主要内容包括:本发明提供了一种新型的单碱基编辑技术及其应用。具体地,本发明提供了一种基因编辑酶,其特征在于,所述基因编辑酶的结构如式I所示:Z1-L1-Z2-L2-Z3-L3-Z4 (I)其中,Z1为腺嘌呤脱氨酶TadA的氨基酸序列;Z2为TadA*酶的氨基酸序列;并且所述Z1和/或Z2具有对应于SEQ ID NO:1所示序列的第147位和/或148位的F残基的突变;Z3为Cas9核酸酶的编码序列;L1、L2和L3各自独立地为任选的连接肽序列;Z4为无或核定位信号元件(NLS);并且各“-”独立地为肽键。本发明还提供了一种基因单碱基定点编辑的方法。本发明方法的DNA编辑精确度高,并且可显著降低RNA脱靶效应。(The invention provides a novel single base editing technology and application thereof. Specifically, the invention provides a gene editing enzyme, which is characterized in that the structure of the gene editing enzyme is shown as formula I: Z1-L1-Z2-L2-Z3-L3-Z4 (I), wherein Z1 is an amino acid sequence of adenine deaminase TadA; z2 is the amino acid sequence of TadA enzyme; and the Z1 and/or Z2 has a mutation corresponding to F residue 147 and/or 148 of the sequence shown in SEQ ID NO. 1; z3 is a coding sequence of Cas9 nuclease; l1, L2 and L3 are each independently an optional linker peptide sequence; z4 is a nuclear localization signal element (NLS); and each "-" is independently a peptide bond. The invention also provides a gene single-base fixed-point editing method. The method has high DNA editing precision and can obviously reduce RNA off-target effect.)

1. A mutein of adenine deaminase TadA, wherein the mutein is a non-natural protein and wherein the mutein is mutated at one or more amino acids of adenine deaminase TadA selected from the group consisting of:

phenylalanine (F) at position 147 and phenylalanine (F) at position 148;

wherein the 147 th and 148 th positions are the 147 th and 148 th positions corresponding to the sequence shown in SEQ ID NO. 1.

2. The mutein according to claim 1, characterized in that the mutein has the activity of catalyzing the hydrolytic deamination of adenine to form hypoxanthine.

3. The mutein according to claim 1, wherein the adenine deaminase TadA comprises TadA enzyme and a wild-type TadA enzyme.

4. A gene editing enzyme, wherein the structure of the gene editing enzyme is represented by formula I:

Z1-L1-Z2-L2-Z3-L3-Z4 (I)

wherein the content of the first and second substances,

z1 is the amino acid sequence of adenine deaminase TadA;

z2 is the amino acid sequence of TadA enzyme;

and the Z1 and/or Z2 is an amino acid sequence of the mutein of claim 1;

z3 is a coding sequence of Cas9 nuclease;

l1, L2 and L3 are each independently an optional linker peptide sequence;

z4 is a nuclear localization signal element (NLS);

and each "-" is independently a peptide bond.

5. The gene-editing plum of claim 4, wherein the amino acid sequence of the gene-editing enzyme is represented by SEQ ID NO. 10.

6. A polynucleotide encoding the gene-editing enzyme of claim 4.

7. A vector comprising the polynucleotide of claim 6.

8. A host cell comprising the vector of claim 7, or having the polynucleotide of claim 6 integrated into its genome.

9. A method for single-base site-specific editing of a gene, comprising the steps of:

(i) providing a cell and a first vector and a second vector, wherein the first vector comprises an expression cassette for the gene-editing enzyme of claim 2, and the second vector comprises an expression cassette for expressing a sgRNA;

(ii) infecting said cell with said first vector and said second vector, thereby performing single base site directed editing within said cell.

10. A kit, comprising:

(a1) a first container, and a first vector comprising an expression cassette for a gene-editing enzyme of claim 2 located in the first container.

Technical Field

The invention relates to the field of biotechnology, in particular to a novel single-base editing technology and application thereof.

Background

Since 2013, a new generation of gene editing technology represented by CRISPR/Cas9 enters various experiments in the field of biology, and the traditional gene manipulation means is being changed.

DNA base editing methods developed in recent years are capable of directly generating precise point mutations in genomic DNA without generating Double Strand Breaks (DSBs). Two types of basic editors have been reported: cytosine base editors (CBE, C to T and G to a) and adenine base editors (ABE, a to G, T to C). However, there is also a key problem with its use, namely off-target effects.

Previous studies have focused primarily on assessing off-target mutations in genomic DNA. Recent research results indicate that CBE, but not ABEs, induce a large number of off-target single nucleotide mutations during gene editing, emphasizing the necessity of developing higher fidelity single base editing tools. In addition to DNA targeting activity, commonly used single base editing systems may mutate RNA. For example, the CBE-associated cytosine deaminase APOBEC1 was found to target both DNA and RNA, and the ABE-associated adenine deaminase TadA was also found to induce site-specific inosine formation on RNA. However, RNA targeting activity mediated by DNA base editing has not been previously studied. Studies have shown that both the cytosine base editor BE3 and the adenine base editor ABE7.10 produce tens of thousands of off-target RNA Single Nucleotide Variants (SNVs), whereas cells without base editing exhibit only a few hundred SNVs.

At present, in the existing DNA base editing methods, the precision of DNA editing is not high, namely, a gene editing window is too large. ABE7.10 developed by David Liu laboratories at harvard university is able to edit the third to eighth bases of sgRNA targeting sequences, if other bases next to the target base to be edited are non-specifically edited.

Therefore, there is an urgent need in the art to develop a single base editing technique that has high accuracy, significantly reduces RNA off-target effects, and maintains effective DNA targeting activity.

Disclosure of Invention

The invention aims to provide a single base editing technology which has high precision, obviously reduces RNA off-target effect and can keep effective DNA targeting activity.

In a first aspect of the invention there is provided a mutein of adenine deaminase TadA, said mutein being a non-natural protein and said mutein having a mutation at one or more amino acids of adenine deaminase TadA selected from the group consisting of:

phenylalanine (F) at position 147 and phenylalanine (F) at position 148;

wherein the 147 th and 148 th positions are the 147 th and 148 th positions corresponding to the sequence shown in SEQ ID NO. 1.

In another preferred embodiment, said adenine deaminase TadA is derived from a species selected from the group consisting of: escherichia coli, hyperthermophiles (a. aeolicus), bacillus subtilis, and yeast CDD 1.

In another preferred embodiment, the mutein has the activity of catalyzing the hydrolytic deamination of adenine to form hypoxanthine.

In another preferred embodiment, said adenine deaminase TadA comprises TadA enzyme and wild-type TadA enzyme.

In another preferred embodiment, the adenine deaminase TadA is TadA enzyme.

In another preferred embodiment, the amino acid sequence of the wild-type TadA enzyme is shown in SEQ ID NO 1.

In another preferred embodiment, the amino acid sequence of said TadA enzyme is as shown in SEQ ID No. 2.

In another preferred embodiment, the phenylalanine (F) at position 147 is mutated to an amino acid residue other than phenylalanine.

In another preferred embodiment, the phenylalanine at position 147 is mutated to: alanine (a), glycine (G), arginine (R), aspartic acid (D), cysteine (C), glutamine (Q), glutamic acid (E), glycine (G), histidine (H), isoleucine (I), leucine (L), lysine (K), methionine (M), serine (S), proline (P), threonine (T), tryptophan (W), tyrosine (Y), or valine (V).

In another preferred embodiment, the phenylalanine at position 147 is mutated to: leucine (L), valine (V), isoleucine (I), alanine (a), or tyrosine (Y).

In another preferred embodiment, the phenylalanine (F) at position 148 is mutated to an amino acid residue other than phenylalanine.

In another preferred embodiment, the phenylalanine 148 position mutation is: alanine (a), glycine (G), arginine (R), aspartic acid (D), cysteine (C), glutamine (Q), glutamic acid (E), glycine (G), histidine (H), isoleucine (I), leucine (L), lysine (K), methionine (M), serine (S), proline (P), threonine (T), tryptophan (W), tyrosine (Y), or valine (V).

In another preferred embodiment, the phenylalanine 148 position mutation is: leucine (L), valine (V), isoleucine (I), alanine (a), or tyrosine (Y).

In another preferred embodiment, the remaining amino acid sequence of the mutein is identical or substantially identical to the sequence shown in SEQ ID No. 1, except for the mutations (e.g.amino acids 147 and 148).

In another preferred embodiment, the substantial identity is a difference of at most 50 (preferably 1-20, more preferably 1-10, more preferably 1-5) amino acids, wherein the difference comprises substitution, deletion or addition of amino acids, and the mutant protein still has the activity of catalyzing the hydrolysis deamination of adenine to form hypoxanthine.

In another preferred embodiment, when the adenine deaminase TadA is a wild-type TadA enzyme, the amino acid sequence of the mutein is as shown in SEQ ID No. 3.

In another preferred embodiment, when the adenine deaminase TadA is a TadA-ase, the amino acid sequence of the mutein is represented by seq id No. 4.

In another preferred embodiment, the amino acid sequence of the mutein has a sequence that is identical to the amino acid sequence of SEQ ID NO:3 or SEQ ID NO. 4, preferably at least 85 or 90%, more preferably at least 95%, most preferably at least 98%, and has a homology of 166/167 or 99.4% or less.

In a second aspect of the present invention, there is provided a gene-editing enzyme, the structure of which is shown in formula I:

Z1-L1-Z2-L2-Z3-L3-Z4 (I)

wherein the content of the first and second substances,

z1 is the amino acid sequence of adenine deaminase TadA;

z2 is the amino acid sequence of TadA enzyme;

and the Z1 and/or Z2 is an amino acid sequence of a mutein according to the first aspect of the invention;

z3 is a coding sequence of Cas9 nuclease;

l1, L2 and L3 are each independently an optional linker peptide sequence;

z4 is a nuclear localization signal element (NLS);

and each "-" is independently a peptide bond.

In another preferred embodiment, said Z1 has the amino acid sequence of a wild-type TadA enzyme.

In another preferred example, the Z1 has the amino acid sequence of a wild-type TadA enzyme with F147A and/or F148A mutations.

In another preferred example, the Z1 is a wild-type TadA enzyme having F147A and/or F148A mutations.

In another preferred embodiment, the amino acid sequence of Z1 is shown as SEQ ID NO. 3.

In another preferred embodiment, Z2 has the amino acid sequence of TadA enzyme.

In another preferred example, said Z2 has the amino acid sequence of a TadA-enzyme having an F147A and/or F148A mutation.

In another preferred example, the Z2 is a TadA enzyme having F147A and/or F148A mutations.

In another preferred embodiment, the amino acid sequence of Z2 is shown in SEQ ID NO. 4.

In another preferred embodiment, the amino acid sequence of L1 is shown as SEQ ID NO. 5.

In another preferred embodiment, the amino acid sequence of L1 is identical or substantially identical to the amino acid sequence set forth as SEQ ID NO. 5.

In another preferred embodiment, the amino acid sequence of L2 is shown as SEQ ID NO 6.

In another preferred embodiment, the amino acid sequence of L2 is identical or substantially identical to the amino acid sequence shown in SEQ ID NO. 6.

In another preferred embodiment, the amino acid sequence of L3 is shown in SEQ ID NO 7.

In another preferred embodiment, the amino acid sequence of L3 is identical or substantially identical to the amino acid sequence set forth as SEQ ID NO. 7.

In another preferred example, in Z3, the Cas9 nuclease is derived from a source selected from the group consisting of: streptococcus pyogenes (Streptococcus pyogenes), Staphylococcus aureus (Staphylococcus aureus), Streptococcus pyogenes mutants, or Staphylococcus aureus mutants.

In another preferred example, in Z3, the Cas9 nuclease may be replaced with Cpf1 nuclease, and the Cpf1 nuclease is derived from a source selected from the group consisting of: acid amino-coccus (Acidaminococcus), Lachnospiraceae (Lachnospiraceae), mutants of acid amino-coccus, mutants of Lachnospiraceae.

In another preferred embodiment, the amino acid sequence of Z3 is shown as SEQ ID NO. 8.

In another preferred embodiment, the amino acid sequence of Z3 is identical or substantially identical to the amino acid sequence shown in SEQ ID NO. 8.

In another preferred embodiment, the amino acid sequence of Z4 is shown in SEQ ID NO 9.

In another preferred embodiment, the amino acid sequence of Z4 is identical or substantially identical to the amino acid sequence shown in SEQ ID NO. 9.

In another preferred embodiment, the substantial identity is a difference of up to 50 (preferably 1 to 20, more preferably 1 to 10, still more preferably 1 to 5, most preferably 1 to 3) amino acids, wherein the difference comprises a substitution, deletion or addition of an amino acid.

In another preferred embodiment, the substantial identity is at least 70%, at least 75%, at least 80%, at least 85%, at least 86%, at least 87%, at least 88%, at least 89%, at least 90%, at least 91%, at least 92%, at least 93%, at least 94%, at least 95%, at least 96%, at least 97%, at least 98%, or at least 99% sequence identity of the amino acid sequence to the corresponding amino acid sequence.

In another preferred embodiment, the amino acid sequence of the gene-editing enzyme is shown in SEQ ID NO 10.

In a third aspect of the invention, there is provided a polynucleotide encoding a gene-editing enzyme according to the second aspect of the invention.

In another preferred embodiment, the polynucleotide is selected from the group consisting of:

(a) a polynucleotide encoding the amino acid sequence shown as SEQ ID NO. 10;

(b) a polynucleotide having a nucleotide sequence with a sequence identity of 95% or more (preferably 98% or more) to the polynucleotide sequence of (a);

(c) a polynucleotide complementary to a polynucleotide of any one of (a) and (b).

In another preferred embodiment, the ORF of the gene-editing enzyme according to the second aspect of the invention is additionally flanked by auxiliary elements selected from the group consisting of: a signal peptide, a secretory peptide, a tag sequence (e.g., 6His), or a combination thereof.

In another preferred embodiment, the signal peptide is a nuclear localization sequence.

In another preferred embodiment, the polynucleotide is selected from the group consisting of: a DNA sequence, an RNA sequence, or a combination thereof.

In a fourth aspect of the invention, there is provided a vector comprising a polynucleotide according to the third aspect of the invention.

In another preferred embodiment, the vector comprises an expression vector, a shuttle vector and an integration vector.

In a fifth aspect of the invention, there is provided a host cell comprising a vector according to the fourth aspect of the invention, or having integrated into its genome a polynucleotide according to the third aspect of the invention.

In another preferred embodiment, the host is a prokaryotic cell or a eukaryotic cell.

In another preferred embodiment, the prokaryotic cell comprises: escherichia coli.

In another preferred embodiment, the eukaryotic cell is selected from the group consisting of: yeast cells, plant cells, mammalian cells, human cells (e.g., HEK293T cells), or a combination thereof.

In a sixth aspect of the present invention, there is provided a method for single base site-specific editing of a gene, comprising the steps of:

(i) providing a cell and a first vector and a second vector, wherein the first vector comprises an expression cassette for a gene editing enzyme according to the second aspect of the invention and the second vector comprises an expression cassette for expression of a sgRNA;

(ii) infecting said cell with said first vector and said second vector, thereby performing single base site directed editing within said cell.

In another preferred embodiment, wherein said first vector comprises a first nucleic acid construct, said first nucleic acid construct has the structure of formula II 5 '-3' (5 'to 3'):

P1-X1-L4-X2 (II)

wherein P1 is a first promoter sequence;

x1 is a nucleotide sequence encoding a gene-editing enzyme according to the second aspect of the invention;

l4 is nothing or a linking sequence;

x2 is a polyA sequence;

and, each "-" is independently a bond or a nucleotide connecting sequence.

In another preferred embodiment, the first promoter is selected from the group consisting of: a CMV promoter, a CAG promoter, a PGK promoter, an EF 1a promoter, an EFs promoter, or a combination thereof.

In another preferred embodiment, the first promoter sequence is a CMV promoter.

In another preferred embodiment, the length of the linker sequence is 30-120nt, preferably 48-96nt, and preferably a multiple of 3.

In another preferred embodiment, the first carrier and the second carrier may be the same or different.

In another preferred embodiment, the first vector and the second vector may be the same vector.

In another preferred embodiment, the first vector and/or the second vector further comprises an expression cassette for expressing a selection marker.

In another preferred embodiment, the selection marker is selected from the group consisting of: green fluorescent protein, yellow fluorescent protein, red fluorescent protein, blue fluorescent protein, or a combination thereof.

In another preferred embodiment, the method is non-diagnostic and non-therapeutic.

In another preferred embodiment, the cells are from the following species: a human, a non-human mammal, poultry, a plant, or a microorganism.

In another preferred embodiment, the non-human mammal includes a rodent (e.g., mouse, rat, rabbit), cow, pig, sheep, horse, dog, cat, non-human primate (e.g., monkey).

In another preferred embodiment, the cell is selected from the group consisting of: somatic cells, stem cells, germ cells, non-dividing cells, or combinations thereof.

In another preferred embodiment, the cell is selected from the group consisting of: kidney cells, epithelial cells, endothelial cells, neural cells, or a combination thereof.

In another preferred example, when the gene is edited by the method, the editing window is from 4 th base to 7 th base of 20 base sequences targeted by the sgRNA, wherein the editing efficiency of the 5 th base is highest and is distributed to two sides to be obviously reduced, while the editing window of the non-mutated ABE7.10 editing system is wider compared with the method, and the editing window is from 3 rd amino acid to 9 th amino acid, the editing efficiency of the 5 th base is highest and is distributed to two sides to be gradually reduced.

In a seventh aspect of the present invention, there is provided a kit comprising:

(a1) a first container, and a first vector comprising an expression cassette for a gene-editing enzyme according to the second aspect of the invention located in the first container.

In another preferred embodiment, the kit further comprises:

(a2) a second container, and a second vector comprising an expression cassette for expressing the sgRNA in the second container.

In another preferred embodiment, the first vector and/or the second vector further comprises an expression cassette for expressing a selection marker.

In another preferred embodiment, the first container and the second container may be the same container, and may be different containers.

In another preferred embodiment, the kit further comprises an instruction, and the instruction describes the following: a method of infecting a cell with a first vector and a second vector to effect single base site-directed editing of a gene in the cell.

It is to be understood that within the scope of the present invention, the above-described features of the present invention and those specifically described below (e.g., in the examples) may be combined with each other to form new or preferred embodiments. Not to be reiterated herein, but to the extent of space.

Drawings

FIG. 1 shows the SNV results of off-target RNA of each single-base editing system.

A: experimental design protocol.

B: DNA targeting efficiency of WT (n-3 repeats), GFP (n-3), APOBEC1 (n-3 repeats), BE3 (n-3 repeats) and BE 3-site 3 (n-2 repeats). Note that APOBEC1 is a cytosine deaminase of BE 3.

C: DNA targeting efficiency of WT, GFP, APOBEC1, BE3 and BE3-RNF 2. Each set of n-3 repeats.

D: WT, GFP, TadA-TadA, ABE7.10 and ABE 7.10-site 1 DNA targeting efficiency. Each set of n is 3 replicates. Note that TadA-TadA (wild-type TadA enzyme-evolved TadA heterodimer) is an adenine deaminase of ABE7.10, and improved TadA is represented by TadA.

E: WT, GFP, TadA-TadA, ABE7.10 and ABE 7.10-site 2 DNA targeting efficiency. Each set of n is 3 replicates.

F. G: comparison of SNV of the off-target RNAs of BE3 and ABE7.10 groups.

H: representative distribution of off-target RNA SNV on human chromosomes for GFP, BE3 and ABE 7.10. The staining is indicated with different colors. The GFP group served as a control for all comparisons. All values are expressed as mean ± SEM p <0.05,. p <0.01,. p <0.001, unpaired t-test.

FIG. 2 shows the characterization of the off-target RNA SNV.

A: GFP (n ═ 6 repeats), APOBEC1(n ═ 3 repeats), BE3(n ═ 3 repeats), BE 3-site 3(n ═ 2 repeats) and BE3-RNF2(n ═ 3 repeats) for the G > a and C > U mutations.

B: ratios of a > G and U > C mutations for GFP (n ═ 6 repeats), TadA-TadA (n ═ 3 repeats), ABE 7.10-site 1(n ═ 3 repeats), and ABE 7.10-site 2(n ═ 3 repeats).

C: distribution of mutation types for each group. The numbers indicate the percentage of a certain mutation among all mutations.

D: ratio of shared RNA SNV between any two samples in the BE3 and ABE7.10 groups. The proportion in each cell was calculated by dividing the number of overlapping RNA SNVs between the two samples by the number of RNA SNVs in the row.

E: the non-synonymous mutation induced by ABE7.10 is located in the oncogene and the tumor suppressor with the highest editing rate on the oncogene. The gene name is indicated in blue, the amino acid mutation in red and the single nucleotide transition in green. The GFP group served as a control for all comparisons. All values are expressed as mean ± SEM. P <0.05, p <0.01, p <0.001, unpaired t-test.

FIG. 3 shows the results of SNV analysis of single cell RNA from cells transfected with the base editor.

A: SNV profile analyzed by single cell RNA sequencing method.

B: expression pattern of ABE, BE3 or GFP in single cells from single cell RNA-seq data.

C: the number of off-target RNA SNVs detected in GFP- (n ═ 15 cells), BE 3-site 3- (n ═ 4 cells) and ABE 7.10-site 1- (n ═ 9 cells) treated individual cells.

D: ratio of G > A and C > U mutations.

E: ratios of a > G and U > C mutations for GFP (n ═ 15 cells), BE 3-site 3(n ═ 4 cells) and ABE 7.10-site 1(n ═ 9 cells).

F: distribution of mutation types in each cell. The numbers indicate the percentage of a certain mutation among all mutations.

G. H: the ratio of SNVs is shared between any two samples in the same group. The proportion in each cell is calculated by dividing the number of overlapping SNVs between two samples by the samples in the row.

I: editing rate of SNVs located on cancer-associated genes that occurred in at least 3 ABE7.10 edited single cells. The GFP group served as a control for all comparisons. All values are expressed as mean ± SEM. P <0.05, p <0.01, p <0.001, unpaired t-test.

FIG. 4 shows the result of eliminating off-target RNA SNV by rational design of deaminase.

A: schematic representation of BE3 and ABE7.10 variants. All deaminase mutations were performed in the context of BE3/ABE 7.10. Point mutations are indicated by red lines.

B: targeting efficiencies for GFP (n ═ 3 repeats), BE 3-position 3(n ═ 2 repeats), BE3(hA3A) -position 3(n ═ 3 repeats) and BE3(W90A) -position 3(n ═ 3 repeats).

C: BE 3-comparison of SNV of off-target RNA in the group treated with site 3.

D: targeting efficiency for the GFP, ABE 7.10-site 1, ABE7.10(D53G) -site 1 and ABE7.10(F148A) -site 1 groups. Each set of n is 3 replicates.

E: comparison of the SNV of off-target RNA in ABE7.10 treated group.

F: editing efficiencies of ABE7.10 and ABE7.10(F148A) at four different sites were compared. Each set of n is 3 replicates.

G: the representative editing site display ABE7.10(F148A) narrows the width of the editing window. All values are expressed as mean ± SEM. P <0.05, p <0.01, p <0.001, unpaired t-test.

FIG. 5 shows a schematic representation of a plasmid.

FIG. 6 shows a representative distribution of off-target RNA SNV on chromosomes.

A: APOBEC1, BE 3-site 3, BE3-RNF 2; b: TadA-TadA, ABE 7.10-site 1 and ABE 7.10-site 2

Figure 7 shows the distribution of mutation types for each repeat of all groups. The numbers indicate the percentage of a certain type of mutation among all mutations.

A: distribution of mutation types for each replicate of the GFP group.

B: distribution of mutation types per repeat of APOBEC1 and BE3 groups with or without sgrnas.

C: distribution of mutation types per duplication of TadA-TadA and ABE7.10 groups with or without sgrnas.

Figure 8 shows that in all BE3 and ABE7.10 transfectants, the gene containing overlapping off-target RNA SNVs was significantly higher than the randomized mimic gene. P values were calculated by a two-sided Student's t' test.

FIG. 9 shows the similarity between adjacent off-target RNA SNV sequences and target sequences

FIG. 10 shows the rate of editing non-synonymous mutations induced by BE3 located on oncogenes and tumor suppressor genes. Single nucleotide transitions are indicated in green, amino acid mutations in red, and gene names in blue.

Figure 11 shows the ratio of editing ABE 7.10-induced nonsynonymous mutations located on oncogenes and tumor suppressor genes. Single nucleotide transitions are indicated in green, amino acid mutations in red, and gene names in blue.

Fig. 12 shows that off-target RNA SNV was detected only in RNA, not DNA. Sanger sequencing chromatograms showed that U to C mutations were observed only in the RNA of the two highest ranked oncogenes, TOPRS and CSDE 1.

FIG. 13 shows the expression level of the transfection vector in single cells. The expression levels of GFP, APOBEC1 and TadA-TadA were quantified in all sequenced single cells. The threshold is indicated by a blue dashed line. The thresholds for log2(FPKM +1) for GFP, BE3 and ABE7.10 were 0.3, 1 and 0.3, respectively. Cells with expression levels above the threshold are included for further analysis.

FIG. 14 shows the mutation type distribution of all single cells.

A: distribution of mutation types in GFP-transfected single cells (n ═ 16 cells).

B: distribution of mutation types in BE3 site 3 transfected single cells (n-31 cells). Cells expressing APOBEC1 at levels above the threshold are included in the red squares.

C: ABE 7.10-site 1-transfected single cells (n ═ 28 cells) distribution of mutation types. Cells with expression levels TadA-TadA above the threshold are included in red squares. The number indicates the percentage of a certain mutation among all mutations. SC represents a single cell.

FIG. 15 shows the distribution of all single-cell off-target RNA SNVs on human chromosomes with expression levels above the threshold.

A: distribution of GFP-transfected single cells (n-15) on human chromosomes of off-target RNA SNV.

B: distribution of single cells transfected with BE3 site 3 (n-4) off-target RNA SNV on human chromosomes.

C: ABE 7.10-site 1-transfected single cells (n ═ 9) distribution of off-target RNA SNV on the human chromosome.

FIG. 16 shows the editing rate of BE 3-induced nonsynonymous mutations located on oncogenes and tumor suppressor genes in single cells. Single nucleotide transitions are indicated in green, amino acid mutations in red, and gene names in blue.

Figure 17 shows the editing rate of ABE 7.10-induced nonsense mutations on oncogenes and tumor suppressor genes located in single cells. Single nucleotide transitions are indicated in green, amino acid mutations in red, and gene names in blue.

FIG. 18 shows the representative distribution of off-target RNA SNV on human chromosomes of engineered BE3 and ABE7.10 variants.

Figure 19 shows the average distribution of mutation types for the engineered variants of BE3 and ABE7.10, with n-3 for each group.

Figure 20 shows the distribution of mutation types for each sample of BE3 and the engineered variants of ABE 7.10.

Figure 21 shows the ratio of shared RNA SNV between any two samples in the engineered variants of BE3 and ABE 7.10. The proportion in each cell was calculated by dividing the number of overlapping RNA SNVs between the two samples by the number of RNA SNVs in the row.

Fig. 22 shows the results for ABE7.10(n ═ 3) and ABE7.10F148A(n-3) width of edit window.

Figure 23 shows the homology of TadA enzymes in various species.

Detailed Description

The present inventors have made extensive and intensive studies and, as a result of extensive screening, have surprisingly found for the first time that the amino acid residues F at position 148 in the TadA fragment and TadA fragment of adenine base editor ABE-related adenine deaminase (TadA-TadA) are mutated to a (i.e., TadA) respectivelyF148A-TadA*F148A) Later, the gene editing window can be obviously narrowed under the condition of maintaining effective DNA targeting activity, and the accuracy of gene editing can be obviously improved; furthermore, experiments have shown that there is such a mutation (i.e., TadA)F148A-TadA*F148A) The gene editing system of (1), wherein the off-target effect of RNA is greatly reduced. The present invention has been completed based on this finding.

Term(s) for

As used herein, the term "base mutation" refers to a substitution (substitution), insertion (insertion), and/or deletion (deletion) of a base at a position in a nucleotide sequence.

As used herein, the term "base substitution" refers to a mutation of a base at a position in a nucleotide sequence to another, different base, such as an a mutation to a G.

As used herein, "selectable marker gene" refers to a gene used in a transgenic process to select a transgenic cell or transgenic animal, and the selectable marker gene that can be used in the present application is not particularly limited, and includes various selectable marker genes commonly used in the transgenic field, representative examples including (but not limited to): luciferin protein, or luciferase (e.g., firefly luciferase, renilla luciferase), green fluorescent protein, yellow fluorescent protein, red fluorescent protein, or combinations thereof.

As used herein, the term "Cas protein" refers to a nuclease. One preferred Cas protein is the Cas9 protein. Typical Cas9 proteins include (but are not limited to): cas9 derived from Staphylococcus (Staphylococcus aureus). In the present invention, the Cas9 protein may be further replaced by Cpf1 nuclease, and the Cpf1 nuclease is derived from the following group: acid amino-coccus (Acidaminococcus), Lachnospiraceae (Lachnospiraceae), mutants of acid amino-coccus, mutants of Lachnospiraceae.

Adenine deaminase TadA

TadA is a prokaryotic RNA editing enzyme.

The TadA enzyme has an adenine deaminase activity and is capable of deaminating adenine (a) to hypoxanthine (Inosine, I). The recombinant TadA protein forms a homodimer and produces inosine by deaminating the adenosine residue at the wobble position of tRNA Arg-2.

As shown in fig. 23, TadA has higher homology in various species. For example, the E.coli tadA showed sequence similarity to the yeast tRNA deaminase subunit Tad2 p.

In many species, particularly at position 148, which corresponds to the sequence shown in SEQ ID NO 1 of the present invention, there are highly conserved amino acid residues.

As used herein, the terms "TadA 7.10", "TadA" are used interchangeably and refer to a mutant based on the amino acid sequence of the wild-type TadA enzyme of the invention, the mutated amino acid residues including W23R, H36L, P48A, R51L, L84F, a106V, D108N, H123Y, S146C, D147Y, R152P, E155V, I156F and K157N.

Accordingly, the terms "ABE 7.10", "TadA-TadA" are used interchangeably and refer to proteins comprising in their amino acid sequence the amino acid sequences of the wild-type TadA enzyme and of the TadA enzyme not mutated as described in the present invention.

In one embodiment of the invention, the wild-type TadA enzyme has the amino acid sequence shown in SEQ ID No. 1 and the TadA enzyme has the amino acid sequence shown in SEQ ID No. 2.

The Gene-editing enzyme of the present invention and nucleic acid encoding the same

As used herein, the term "gene editing enzyme"," Gene-editing enzyme of the invention "," TadA of the inventionF148A-TadA*F148A”、“ABE7.10F148A"used interchangeably" refers to a gene-editing enzyme having the structure of formula I as described in the second aspect of the invention:

Z1-L1-Z2-L2-Z3-L3-Z4 (I)

wherein the content of the first and second substances,

z1 is the amino acid sequence of adenine deaminase TadA;

z2 is the amino acid sequence of TadA enzyme;

and the Z1 and/or Z2 is an amino acid sequence of a mutein according to the first aspect of the invention;

z3 is a coding sequence of Cas9 nuclease;

l1, L2 and L3 are each independently an optional linker peptide sequence;

z4 is a nuclear localization signal element (NLS);

and each "-" is independently a peptide bond.

In a preferred embodiment, the amino acid sequence of Z1 is an amino acid sequence which has a mutation at position 148 of F148A based on the amino acid sequence shown in SEQ ID NO. 1.

In a preferred embodiment, the amino acid sequence of Z2 is an amino acid sequence which has a mutation at position 148 of F148A based on the amino acid sequence shown in SEQ ID NO. 2.

In a preferred embodiment, the amino acid sequence of Z3 is shown in SEQ ID NO 8.

In one embodiment of the present invention, each of L1, L2 and L3 independently has an amino acid sequence selected from the group consisting of: GGS, (GGS)2、(GGS)3、(GGS)4、(GGS)5、(GGS)6、(GGS)7Or a combination thereof.

In a preferred embodiment, the amino acid sequence of L1 is SGGSSGGSSGSETPGTSESATPESSGGSSGGS (SEQ ID NO: 5); the amino acid sequence of L2 is SGGSSGGSSGSETPGTSESATPESSGGSSGGSGS (SEQ ID NO: 6); the amino acid sequence of L3 was SGGS (SEQ ID NO: 7).

In a preferred embodiment, Z4 is a nuclear localization signal element (NLS) and the amino acid sequence is PKKKRKV (SEQ ID NO: 9).

In a preferred embodiment of the present invention, a typical amino acid sequence of the gene-editing enzyme of the present invention is shown in SEQ ID NO. 10.

The invention also includes a polypeptide that is identical to SEQ ID NO:10 (b) or more (preferably 60% or more, 70% or more, 80% or more, more preferably 90% or more, more preferably 95% or more, most preferably 98% or more, e.g., 99%) homologous to each other.

The "same or similar functions" mainly refer to: "Activity of catalyzing the hydrolytic deamination of adenine to form hypoxanthine".

It is understood that the amino acid numbering in the gene-editing enzymes of the invention is based on SEQ ID No. 10, and that when a particular gene-editing enzyme has 80% or more homology to the sequence shown in SEQ ID No. 10, the amino acid numbering of the gene-editing enzyme may be misaligned relative to the amino acid numbering of SEQ ID No. 10, e.g. by 1-5 positions towards the N-or C-terminus of the amino acid, whereas with sequence alignment techniques conventional in the art, it is generally understood by those skilled in the art that such misalignments are within reasonable limits and that mutants having the same or similar catalytic activity of the gene-editing enzyme that produce 80% (e.g. 90%, 95%, 98%) homology due to misalignment of the amino acid numbering are not within the scope of the gene-editing enzymes of the invention.

The gene-editing enzymes of the invention are synthetic or recombinant proteins, i.e., may be the products of chemical synthesis or produced using recombinant techniques from prokaryotic or eukaryotic hosts (e.g., bacteria, yeast, plants). Depending on the host used in the recombinant production protocol, the gene-editing enzymes of the invention may be glycosylated or may be non-glycosylated. The gene-editing enzymes of the invention may or may not also include an initial methionine residue.

The invention also includes fragments, derivatives and analogues of the gene-editing enzymes. As used herein, the terms "fragment," "derivative," and "analog" refer to a protein that retains substantially the same biological function or activity as the gene-editing enzyme.

The gene-editing enzyme fragment, derivative or analogue of the present invention may be (i) a gene-editing enzyme in which one or more conserved or non-conserved amino acid residues (preferably conserved amino acid residues) are substituted, and such substituted amino acid residues may or may not be encoded by the genetic code, or (ii) a gene-editing enzyme having a substituent group in one or more amino acid residues, or (iii) a mature gene editing enzyme fused to another compound, such as a compound that extends the half-life of the gene editing enzyme, e.g., polyethylene glycol, or (iv) a gene-editing enzyme formed by fusing an additional amino acid sequence to the gene-editing enzyme sequence (e.g., a leader sequence or a secretory sequence or a sequence used to purify the gene-editing enzyme or a proprotein sequence, or a fusion protein formed with an antigen IgG fragment). Such fragments, derivatives and analogs are within the purview of those skilled in the art in view of the teachings herein. In the present invention, conservatively substituted amino acids are preferably generated by amino acid substitutions according to Table I.

TABLE I

In addition, the gene-editing enzyme of the present invention may be modified. Modified (generally without altering primary structure) forms include: chemically derivatized forms of gene-editing enzymes such as acetylation or carboxylation, in vivo or in vitro. Modifications also include glycosylation, such as those resulting from glycosylation modifications during synthesis and processing of the gene-editing enzyme or during further processing steps. Such modification may be accomplished by exposing the gene-editing enzyme to an enzyme that performs glycosylation, such as a mammalian glycosylase or deglycosylase. Modified forms also include sequences having phosphorylated amino acid residues (e.g., phosphotyrosine, phosphoserine, phosphothreonine). Also included are gene editing enzymes modified to improve their resistance to proteolysis or to optimize solubility properties.

The term "polynucleotide encoding a gene-editing enzyme" may include a polynucleotide encoding a gene-editing enzyme of the present invention, and may also include polynucleotides that additionally include coding and/or non-coding sequences.

The invention also relates to variants of the above polynucleotides which encode fragments, analogs and derivatives of the polypeptides or gene-editing enzymes having the same amino acid sequence as the present invention. These nucleotide variants include substitution variants, deletion variants and insertion variants. As is known in the art, an allelic variant is a substitution of a polynucleotide, which may be a substitution, deletion, or insertion of one or more nucleotides, without substantially altering the function of the gene-editing enzyme it encodes.

The present invention also relates to polynucleotides which hybridize to the sequences described above and which have at least 50%, preferably at least 70%, and more preferably at least 80% identity between the two sequences. The present invention particularly relates to polynucleotides which hybridize under stringent conditions (or stringent conditions) to the polynucleotides of the present invention. In the present invention, "stringent conditions" mean: (1) hybridization and elution at lower ionic strength and higher temperature, such as 0.2 XSSC, 0.1% SDS, 60 ℃; or (2) adding denaturant during hybridization, such as 50% (v/v) formamide, 0.1% calf serum/0.1% Ficoll, 42 deg.C, etc.; or (3) hybridization occurs only when the identity between two sequences is at least 90% or more, preferably 95% or more.

The gene-editing enzymes and polynucleotides of the invention are preferably provided in isolated form, and more preferably, are purified to homogeneity.

The full-length sequence of the polynucleotide of the present invention can be obtained by PCR amplification, recombination, or artificial synthesis. For PCR amplification, primers can be designed based on the nucleotide sequences disclosed herein, particularly open reading frame sequences, and the sequences can be amplified using commercially available cDNA libraries or cDNA libraries prepared by conventional methods known to those skilled in the art as templates. When the sequence is long, it is often necessary to perform two or more PCR amplifications, and then splice the amplified fragments together in the correct order.

Once the sequence of interest has been obtained, it can be obtained in large quantities by recombinant methods. This is usually done by cloning it into a vector, transferring it into a cell, and isolating the relevant sequence from the expanded host cell by conventional methods.

In addition, the sequence can be synthesized by artificial synthesis, especially when the fragment length is short. Generally, fragments with long sequences are obtained by first synthesizing a plurality of small fragments and then ligating them.

At present, DNA sequences encoding the proteins of the present invention (or fragments or derivatives thereof) have been obtained completely by chemical synthesis. The DNA sequence may then be introduced into various existing DNA molecules (or vectors, for example) and cells known in the art. Furthermore, mutations can also be introduced into the protein sequences of the invention by chemical synthesis.

Methods for amplifying DNA/RNA using PCR techniques are preferably used to obtain the polynucleotides of the invention. Particularly, when it is difficult to obtain a full-length cDNA from a library, it is preferable to use the RACE method (RACE-cDNA terminal rapid amplification method), and primers used for PCR can be appropriately selected based on the sequence information of the present invention disclosed herein and synthesized by a conventional method. The amplified DNA/RNA fragments can be isolated and purified by conventional methods, such as by gel electrophoresis.

The method of the invention

In the invention, a method for single-base fixed-point editing of a gene is also provided, which comprises the following steps:

(i) providing a cell and a first vector and a second vector, wherein the first vector comprises an expression cassette for a gene editing enzyme according to the second aspect of the invention and the second vector comprises an expression cassette for expression of a sgRNA;

(ii) infecting said cell with said first vector and said second vector, thereby performing single base site directed editing within said cell.

In another preferred embodiment, wherein said first vector comprises a first nucleic acid construct, said first nucleic acid construct has the structure of formula II 5 '-3' (5 'to 3'):

P1-X1-L4-X2 (II)

wherein the content of the first and second substances,

p1 is a first promoter sequence;

x1 is a nucleotide sequence encoding a gene-editing enzyme according to the second aspect of the invention;

l4 is nothing or a linking sequence;

x2 is a polyA sequence;

and, each "-" is independently a bond or a nucleotide connecting sequence.

Wherein, the first promoter is selected from the following group: a CMV promoter, a CAG promoter, a PGK promoter, an EF 1a promoter, an EFs promoter, or a combination thereof. In a preferred embodiment, the first promoter sequence is a CMV promoter.

In one embodiment of the invention, the length of the linker sequence is 30-120nt, preferably 48-96nt, and preferably a multiple of 3.

In the method, the first carrier and the second carrier may be the same or different. In a preferred embodiment, the first vector and the second vector may be the same vector.

Preferably, the first vector and/or the second vector further comprises an expression cassette for expression of a selectable marker. The screening marker is selected from the following group: green fluorescent protein, yellow fluorescent protein, red fluorescent protein, blue fluorescent protein, or a combination thereof.

In one embodiment of the invention, the method is non-diagnostic and non-therapeutic.

In the method of the invention, the cells are from the following species: a human, a non-human mammal, poultry, a plant, or a microorganism. Wherein the non-human mammal comprises a rodent (e.g., mouse, rat, rabbit), cow, pig, sheep, horse, dog, cat, non-human primate (e.g., monkey).

In one embodiment of the invention, the cell is selected from the group consisting of: somatic cells, stem cells, germ cells, non-dividing cells, or combinations thereof. Preferably, the cell is selected from the group consisting of: kidney cells, epithelial cells, endothelial cells, nerve cells, or a combination thereof.

In the invention, when the gene is edited by the method, the editing window is from 4 th base to 7 th base of 20 base sequences targeted by the sgRNA, wherein the editing efficiency of the 5 th base is the highest and is obviously reduced in a distributed manner towards two sides, while the editing window of the non-mutated ABE7.10 editing system is wider compared with the method, and the editing efficiency of the 3 rd amino acid to the 9 th amino acid at the editing window position is the highest and is distributed towards two sides in a gradually-reduced trend.

The main advantages of the invention include:

1) the editing window of the single base editing system ABE is reduced, and the accuracy of single base editing is greatly improved. When the method is used for gene editing, the editing window is the 4 th base to the 7 th base of the 20 base sequences targeted by the sgRNA, wherein the editing efficiency of the 5 th base is highest and is obviously reduced in a distributed mode towards two sides, the editing window of the ABE7.10 editing system instead of mutation is wider compared with the method, the editing window is from the 3 rd amino acid to the 9 th amino acid, the editing efficiency of the 5 th base is highest, and the editing window is distributed towards two sides to form a gradually-reduced trend.

2) Almost eliminates point mutation generated by the ABE of the single base editing system on the RNA level, and greatly improves the specificity of the ABE of the single base editing system.

3)ABE7.10F148AThe editing activity of ABE7.10 is almost maintained, and the consistent activity is maintained in the target editing site.

The invention will be further illustrated with reference to the following specific examples. It should be understood that these examples are for illustrative purposes only and are not intended to limit the scope of the present invention. Experimental procedures without specific conditions noted in the following examples, generally followed by conventional conditions, such as Sambrook et al, molecular cloning: the conditions described in the laboratory Manual (New York: Cold Spring harbor laboratories, 1989), or according to the manufacturer's recommendations. Unless otherwise indicated, percentages and parts are percentages and parts by weight.

The materials and reagents used in the examples were all commercially available products unless otherwise specified.

Method and material

Transient transfection and sequencing

Plasmids were constructed according to standard protocols using a NEBuilder HiFi DNA Assembly Master Mix (New England Biolabs). 293T cells were seeded in 10cm dishes and in Dulbecco's modified Eagle's Medium (DMEM, ThermoFisher Scientific) supplemented with 10% FBS (ThermoFisher Scientific) and penicillin/streptomycin at 37 ℃ with 5% CO2And (5) culturing. Cells were transfected with 30. mu.g of plasmid using Lipofectamine 3000 (Thermo Fisher scientific). Three days after transfection, cells were digested with 0.05% trypsin (Thermo FisherScientific) and prepared for FACS. GFP positive cells were sorted and stored in DMEM or Trizol (Ambion) to determine DNA base editing or RNA-seq. To determine the efficiency of DNA base editing, cells were lysed using a one-step mouse genotyping kit (Vazyme) followed by deep sequencing using Hi-TOM or Sanger sequencing using EditR 1.0.8 quantification. For RNA-seq,. -500000 cells were collected and RNA was extracted according to standard protocols and then converted to cDNA, which was used for high-throughput RNA-seq.

RNA editing analysis by RNA sequencing

High throughput mRNA sequencing (RNA-seq) was performed using Illumina Hiseq at an average coverage of 125 ×. FastQC (v0.11.3) and Trimmomatic (v0.36) were used for quality control. The qualified reads were mapped to the reference genome (Ensemble GRCh38) in a 2-pass mode using STAR (v2.5.2b), the parameters of which were enforced by the ENCODE project. The Picard tool (v2.3.0) is then applied to sort and mark the duplicate entries of the mapped BAM file. Refined BAM files were read for segmentation, across splice points, local rearrangements, base recalibration, and variant calls using the split country reads, indelraligner, BaseRecalibrator, and HaplotypeCaller tools from GATK (v3.5), respectively. To determine variants with high confidence, clusters of at least 5 SNVs were filtered, these SNVs were within a window of 35 bases and retained variants with gene quality scores >25, mapping quality scores >20, Fisher Strand values (FS >30.0), Qual By depth values (QD <2.0) and sequencing depths > 20.

Any reliable variants found in wild-type 293T cells were considered SNPs and were filtered from the GFP and base editor transfected groups for off-target analysis. The edit rate was calculated as the number of mutant reads divided by the sequencing depth of each site. To analyze the predicted variant effects for each off-target, variant annotation was performed using the variant effect predictor (VEP, v94) and GRCh38 database.

Single cell full-length RNA-seq library construction

After FACS, single human 293T cells were picked manually, lysed and cDNA synthesized using the Smart-seq2 protocol. Single cell cDNA was then amplified and fragmented as previously described (2, 3). Sequencing libraries (New England Biolabs) were constructed, quality checked and sequenced on the Illumina HiSeq X-Ten platform (Novogene) with paired-end 150-bp reads.

Processing Single cell RNA-seq data

The raw reads of single cell RNA-seq data were first trimmed and aligned with GRCh38 human transcriptome (STAR v2.5.2b). After deduplication, RNA SNVs from individual cells were identified using GATK software (v 3.5). Those SNVs detected in single cells with DP ≧ 20.0, FS ≦ 30.0, and QD ≧ 2.0 were retained for downstream analysis.

Statistical analysis

All values are shown as mean +/-SEM. The unpaired Student's t test (two-tailed) was used for comparison, and p <0.05 was considered statistically significant.

51页详细技术资料下载
上一篇:一种医用注射器针头装配设备
下一篇:乙酰氨基葡萄糖水解酶突变体及其应用

网友询问留言

已有0条留言

还没有人留言评论。精彩留言会获得点赞!

精彩留言,会给你点赞!