Construction of chimeric SacAS9 based on evolutionary information for enhanced and extended PAM site recognition

文档序号：1624375 发布日期：2020-01-14 浏览：73次中文

阅读说明：本技术 基于进化信息构建嵌合SaCas9用于增强和扩展PAM位点的识别 (Construction of chimeric SacAS9 based on evolutionary information for enhanced and extended PAM site recognition ) 是由谢震马大程张昭煜许志锰于 2018-07-05 设计创作，主要内容包括：本发明提出了一种Cas9蛋白突变体。该Cas9蛋白突变体具有：框架区；和PAM识别区,所述PAM识别区识别下列核酸序列的至少之一：5’-NNNRRT-3’；5’-NNNRRN-3’；5’-NNNRCN-3’；5’-NNNRTN-3’；5’-NNNCAA-3’；5’-NNNCAT-3’；5’-NNNCGT-3’；5’-NNNCGC-3’；5’-NNNGTN-3’；5’-NNNTCN-3’；5’-NNNTTC-3’；5’-NNNTTG-3’；5’-NNNTTT-3’；N＝A、T、G或C,R＝A或G。(The invention provides a mutant of Cas9 protein. The Cas9 protein mutant has: a framework region; and a PAM recognition region that recognizes at least one of the following nucleic acid sequences: 5 '-NNNRRT-3'; 5 '-NNNRRN-3'; 5 '-NNNRCN-3'; 5 '-NNNRTN-3'; 5 '-NNNCAA-3'; 5 '-NNNCAT-3'; 5 '-NNNCGT-3'; 5 '-NNNCGC-3'; 5 '-NNNGTN-3'; 5 '-NNNTCN-3'; 5 '-NNNTTC-3'; 5 '-NNNTTG-3'; 5 '-NNNTTT-3'; N-A, T, G or C, R-a or G.)

1. A mutant Cas9 protein, characterized in that it has:

a framework region; and

a PAM recognition region that recognizes at least one of the following nucleic acid sequences:

5 '-NNNRRT-3', N-A, T, G or C, R-a or G;

5 '-NNNRRN-3', N-A, T, G or C, R-a or G;

5 '-NNNRCN-3', N-A, T, G or C, R-a or G;

5 '-NNNRTN-3', N-A, T, G or C, R-a or G;

5 '-NNNCAA-3', N ═ A, T, G or C;

5 '-NNNCAT-3', N ═ A, T, G or C;

5 '-NNNCGT-3', N ═ A, T, G or C;

5 '-NNNCGC-3', N ═ A, T, G or C;

5 '-NNNGTN-3', N ═ A, T, G or C;

5 '-NNNTCN-3', N ═ A, T, G or C;

5 '-NNNTTC-3', N ═ A, T, G or C;

5 '-NNNTTG-3', N ═ A, T, G or C;

5 '-NNNTTT-3', N-A, T, G or C.

2. A Cas9 protein mutant according to claim 1, wherein the framework regions have at least 70% homology with the framework regions of the following wild-type proteins; preferably, at least 80% homology; more preferably, at least 90% homology; more preferably, at least 95% homology; more preferably, at least 99% homology;

O13、O40、O23、O39、O26、O18、O38、O12、O36、O27、O10、O33、O34、O14、O44、O15、O28、O42、O20、O37、O24、O43、O30、O31、O32、O29、O16、O19、O25、O21、O17、O35、O22、saCas9、SaCas9-KKH。

3. a Cas9 protein mutant according to claim 1, wherein the framework region has at least 90% homology with the framework region of saCas 9; more preferably, at least 95% homology; more preferably, at least 99% homology;

preferably, the framework regions have at least one of the following mutations compared to the framework regions of saCas9, relative to saCas 9:

the 499 th position of the mutant is A,

the 500 th mutation is K,

the 654 th mutation is A,

the mutation at the 655 th position is R,

mutation at the 782 th position to K,

the 968 th position is mutated into K,

the 1015 th mutation is H.

4. A Cas9 protein mutant according to claim 1, wherein the framework regions have the amino acid sequence of SEQ ID NO: 1-2, and 130.

5. A Cas9 protein mutant according to claim 1, characterized in that the PAM recognition region has at least one mutation compared to 982IGVNNDLLNRIEV994 relative to saCas 9.

6. A Cas9 protein mutant according to claim 1, characterized in that the PAM recognition region has at most 13 mutations compared to 982IGVNNDLLNRIEV994 relative to saCas 9;

preferably, there are at most 8 mutations.

7. A Cas9 protein mutant according to claim 1, characterized in that the PAM recognition region has at least one of the following mutations compared to 982IGVNNDLLNRIEV994, relative to saCas 9:

the 982 th mutation is T, K, R or L,

the 983 th mutation is A, C or S,

the mutation at the 984 th position is T, D,

the mutation at the 985 th position is F, S, A, N,

the mutation at the 986 th site is E, D, H, A, M,

the mutation at the 987 th site is S, G, N, S, D, E, P,

the mutation at the 988 th site is D, K, T, S, T, D, K, R, E, A,

the mutation at the 989 th site is R, A, N, Q, G, E, T, K, S, G, H, V,

the 990 th mutation is S,

the 991 th mutation is I, V, L, K, T, M,

the 992 th mutation is V, L,

the 993 th mutation is Q,

the 994 th mutation was L, M, C, I, A.

8. The mutant Cas9 protein according to claim 1, wherein relative to sacAS9, the PAM recognition region has a mutation at 986 position to S and a mutation at 991 position to R compared with 982IGVNNDLLNRIEV994 under the premise that the mutation at 985 position to S;

preferably, on the premise that the 985 th mutation is A, the 986 th mutation is M and the 991 th mutation is K;

preferably, on the premise that the 985 th mutation is N, the 986 th mutation is N and the 991 th mutation is K;

preferably, on the premise that the 985 th mutation is A, the 986 th mutation is N and the 991 th mutation is I;

preferably, on the premise that the 985 th mutation is N, the 986 th mutation is H and the 991 th mutation is R;

preferably, on the premise that the 985 th mutation is N, the 986 th mutation is S and the 991 th mutation is L;

preferably, under the premise that the 985 th mutation is N, the 986 th mutation is N and the 991 th mutation is I;

preferably, on the premise that the 985 th mutation is N, the 986 th mutation is N and the 991 th mutation is K;

preferably, the 986 th mutation is N and the 991 th mutation is V on the premise that the 985 th mutation is N;

preferably, on the premise that the 985 th mutation is N, the 986 th mutation is D and the 991 th mutation is K;

preferably, on the premise that the 985 th mutation is N, the 986 th mutation is S and the 991 th mutation is I;

preferably, on the premise that the 985 th mutation is N, the 986 th mutation is D and the 991 th mutation is K;

preferably, on the premise that the 985 th mutation is N, the 986 th mutation is N and the 991 th mutation is R;

preferably, on the premise that the 985 th mutation is N, the 986 th mutation is H and the 991 th mutation is R;

preferably, under the premise that the 985 th mutation is N, the 986 th mutation is E and the 991 th mutation is I;

preferably, on the premise that the 985 th mutation is S, the 986 th mutation is S and the 991 th mutation is R;

preferably, on the premise that the 985 th mutation is N, the 986 th mutation is D and the 991 th mutation is K;

preferably, the 986 th mutation is A and the 991 th mutation is T on the premise that the 985 th mutation is N;

preferably, on the premise that the 985 th mutation is N, the 986 th mutation is D and the 991 th mutation is T;

preferably, on the premise that the 985 th mutation is N, the 986 th mutation is N and the 991 th mutation is R;

preferably, the 986 th mutation is N and the 991 th mutation is V on the premise that the 985 th mutation is N;

preferably, on the premise that the 985 th mutation is S, the 986 th mutation is M and the 991 th mutation is K;

preferably, on the premise that the 985 th mutation is A, the 986 th mutation is M and the 991 th mutation is K;

preferably, on the premise that the 985 th mutation is N, the 986 th mutation is N and the 991 th mutation is L;

preferably, on the premise that the 985 th mutation is N, the 986 th mutation is N and the 991 th mutation is M;

preferably, on the premise that the 985 th mutation is N, the 986 th mutation is N and the 991 th mutation is R;

preferably, on the premise that the 985 th mutation is N, the 986 th mutation is D and the 991 th mutation is T;

preferably, on the premise that the 985 th mutation is N, the 986 th mutation is N and the 991 th mutation is K;

preferably, on the premise that the 985 th mutation is N, the 986 th mutation is N and the 991 th mutation is M;

preferably, on the premise that the 985 th mutation is N, the 986 th mutation is N and the 991 th mutation is K;

preferably, on the premise that the 985 th mutation is F, the 986 th mutation is S and the 991 th mutation is L;

preferably, the 986 th mutation is N and the 991 th mutation is V on the premise that the 985 th mutation is N;

preferably, on the premise that the 985 th mutation is N, the 986 th mutation is S and the 991 th mutation is L;

preferably, on the premise that the 985 th mutation is N, the 986 th mutation is S and the 991 th mutation is K;

preferably, the 986 th mutation is S and the 991 th mutation is R on the premise that the 985 th mutation is N;

preferably, on the premise that the 985 th mutation is N, the 986 th mutation is N and the 991 th mutation is K;

preferably, the mutation at 986 is N and the mutation at 991 is I under the precondition that the mutation at 985 is N.

9. A Cas9 protein mutant according to claim 1, characterized in that the PAM recognition region has an amino acid sequence shown in SEQ ID NO. 3-SEQ ID NO. 43;

optionally, the Cas9 protein mutant has an amino acid sequence shown as SEQ ID NO: 44-84 and 131.

10. A nucleic acid encoding a Cas9 protein mutant according to any one of claims 1 to 9.

11. The nucleic acid of claim 10, wherein the nucleic acid has the nucleotide sequence of any one of SEQ ID NOS 85-125 and 132.

12. A kit, comprising:

a first nucleic acid molecule encoding a mutant Cas9 protein of any one of claims 1-9; and

a second nucleic acid molecule encoding a gRNA.

13. The kit of claim 12, wherein the first nucleic acid molecule has the nucleotide sequence of any one of SEQ ID NOs 85-125 and 132.

14. The kit of claim 12, wherein the nucleotide sequence of the second nucleic acid molecule encoding the gRNA framework sequence has at least one of the following mutations compared to the nucleotide sequence of a wild-type gRNA framework sequence: U3C, U4A, U4C, U5C, A6G, a32G, a31T, a31G, a30G, T29C.

15. The kit of claim 14, wherein the second nucleic acid molecule has the nucleotide sequence of any one of SEQ ID NOS 127-129.

16. The kit of claim 12, wherein the first nucleic acid molecule and the second nucleic acid molecule are carried on the same expression vector.

17. The kit of claim 16, wherein said same vector is an adenoviral vector.

18. A method for genetically modifying a cell, characterized in that a first nucleic acid molecule and a second nucleic acid molecule are introduced into the cell to be modified, the first nucleic acid molecule and the second nucleic acid molecule being as defined in any one of claims 12 to 17.

19. The method of claim 18, wherein the PAM recognition region sequence and the gRNA sequence of the Cas9 protein mutant are determined based on a gene sequence to be engineered.

20. The method of claim 18, wherein the PAM recognition region sequence and the gRNA sequence are determined based on the following relationship:

21. the method of claim 18, wherein the genetic modification comprises knockout or regulation of expression of a predetermined site.

22. A cell obtained by the method of any one of claims 18 to 21.

23. The cell of claim 22, wherein the cell is an animal cell, a plant cell, or a microbial cell.

Technical Field

The invention relates to the technical field of biology, in particular to a Cas9 protein mutant, nucleic acid, a kit, a method for genetically modifying cells and the cells.

Background

The CRISPR/Cas9 ribozyme enables efficient gene editing in a variety of species and in different cells. Cas9 is directed to recognize different positions by artificially coupling crRNA and tracrrna to form guide RNA. Cas9 still needs to bind before a specific PAM sequence.

The widely used SpCas9 recognizes the NGG PAM sequence, while another SaCas9 protein recognizes the PAM sequence of "NNGRRT," which limits the range of recognition by SaCas 9. To extend the recognition range of SpCas9, multiple different PAMs were mined by a method of screening in bacteria using direct evolution. Meanwhile, PAM of KKH-SaCas9 was expanded to NNNRRT by introducing three-point mutation. Although the PAM recognition range of KKH-SaCas9 is more extended compared to SaCas9, in theory KKH-SaCas9 can only bind to 1/16 region.

Although Cas9 homologous proteins are widely distributed in bacteria, many different Cas9 homologous proteins have been identified. However, few have been identified for efficient gene editing in mammalian cells.

Therefore, Cas9 with PAM broad recognition capability requires continued development and improvement by researchers, making the gene editing capability of CRISPR/Cas9 system more powerful.

Disclosure of Invention

The present application is based on the discovery and recognition by the inventors of the following facts and problems:

the inventor of the application discovers a series of different SaCas9 homologous proteins by a method of evolution information and gene mining, and further designs a series of different Cas9 chimeras (cCas9) by using KKH SaCas9 as a framework and replacing 13 amino acid residue peptide segments of a PAM action region with other homologous protein sequences. These different cscas 9 have different PAM specificities, and in addition to NNNRRT, different mutants can also recognize PAM regions including NNNRRN, NNNRCN, NNNRTN, NNNCAA, NNNCAT, NNNCGT, NNNCGC, NNNGTN, NNNTCN, NNNTTC, NNNTTG, NNNTTT (N A, T, G or C, R a or G). The inventors of the present application succeeded in extending the PAM recognition range of Cas9 to greater than 1/2, (49 PAM species in total listed above, 64 PAM species in total, the probability of this recognition range is 49/64). Not only expands the tendency of PAM, but also discovers a plurality of new chimeras.

To this end, in a first aspect of the invention, the invention proposes a mutant Cas9 protein. According to an embodiment of the invention, it has: a framework region; and a PAM recognition region that recognizes at least one of the following nucleic acid sequences:

5 '-NNNRRT-3', N-A, T, G or C, R-a or G;

5 '-NNNRRN-3', N-A, T, G or C, R-a or G;

5 '-NNNRCN-3', N-A, T, G or C, R-a or G;

5 '-NNNRTN-3', N-A, T, G or C, R-a or G;

5 '-NNNCAA-3', N ═ A, T, G or C;

5 '-NNNCAT-3', N ═ A, T, G or C;

5 '-NNNCGT-3', N ═ A, T, G or C;

5 '-NNNCGC-3', N ═ A, T, G or C;

5 '-NNNGTN-3', N ═ A, T, G or C;

5 '-NNNTCN-3', N ═ A, T, G or C;

5 '-NNNTTC-3', N ═ A, T, G or C;

5 '-NNNTTG-3', N ═ A, T, G or C;

5 '-NNNTTT-3', N-A, T, G or C.

Compared with Cas9, the recognition range of the protein mutant of Cas9 provided by the embodiment of the invention is expanded to be close to 1/2, and the tendency of PAM is greatly expanded. According to the Cas9 protein mutant provided by the embodiment of the invention, under the guide of guide RNA, a dsDNA region capable of being combined is greatly expanded, and the gene editing capacity of a CRISPR/Cas9 system is more powerful.

According to an embodiment of the present invention, the Cas9 protein mutant may further include at least one of the following additional technical features:

according to embodiments of the invention, the framework regions of the Cas9 protein mutant are at least 70% homologous to the framework regions of the following wild-type proteins; preferably, at least 80% homology; more preferably, at least 90% homology; more preferably, at least 95% homology; more preferably, at least 99% homology;

compared with Cas9, the above Cas9 protein mutant provided by the embodiment of the invention has a wide PAM recognition range, and under guide RNA guidance, the Cas9 protein mutant provided by the embodiment of the invention has a greatly expanded dsDNA region capable of being combined, and the gene editing capability of a CRISPR/Cas9 system becomes stronger.

According to an embodiment of the invention, the framework region has at least 90% homology to saCas 9; more preferably, at least 95% homology; more preferably, it has at least 99% homology. Compared with saCas9, the recognition range of the PAM is wider, and the PAM recognition range of the mutant of the Cas9 protein can be expanded to be close to 1/2.

According to a specific embodiment of the invention, the framework region has an amino acid sequence shown as SEQ ID NO 1-2, 130.

MKRNYILGLDIGITSVGYGIIDYETRDVIDAGVRLFKEANVENNEGRRSKRGARRLKRRRRHRIQRVKKLLFDYNLLTDHSELSGINPYEARVKGLSQKLSEEEFSAALLHLAKRRGVHNVNEVEEDTGNELSTKEQISRNSKALEEKYVAELQLERLKKDGEVRGSINRFKTSDYVKEAKQLLKVQKAYHQLDQSFIDTYIDLLETRRTYYEGPGEGSPFGWKDIKEWYEMLMGHCTYFPEELRSVKYAYNADLYNALNDLNNLVITRDENEKLEYYEKFQIIENVFKQKKKPTLKQIAKEILVNEEDIKGYRVTSTGKPEFTNLKVYHDIKDITARKEIIENAELLDQIAKILTIYQSSEDIQEELTNLNSELTQEEIEQISNLKGYTGTHNLSLKAINLILDELWHTNDNQIAIFNRLKLVPKKVDLSQQKEIPTTLVDDFILSPVVKRSFIQSIKVINAIIKKYGLPNDIIIELAREKNSKDAQKMINEMQKRNRQTNERIEEIIRTTGKENAKYLIEKIKLHDMQEGKCLYSLEAIPLEDLLNNPFNYEVDHIIPRSVSFDNSFNNKVLVKQEENSKKGNRTPFQYLSSSDSKISYETFKKHILNLAKGKGRISKTKKEYLLEERDINRFSVQKDFINRNLVDTRYATRGLMNLLRSYFRVNNLDVKVKSINGGFTSFLRRKWKFKKERNKGYKHHAEDALIIANADFIFKEWKKLDKAKKVMENQMFEEKQAESMPEIETEQEYKEIFITPHQIKHIKDFKDYKYSHRVDKKPNRKLINDTLYSTRKDDKGNTLIVNNLNGLYDKDNDKLKKLINKSPEKLLMYHHDPQTYQKLKLIMEQYGDEKNPLYKYYEETGNYLTKYSKKDNGPVIKKIKYYGNKLNAHLDITDDYPNSRNKVVKLSLKPYRFDVYLDNGVYKFVTVKNLDVIKKENYYEVNSKCYEEAKKLKKISNQAEFIASFYKNDLIKINGELYRV(SEQ ID NO:1)。

NMIDITYREYLENMNDKRPPHIIKTIASKTQSIKKYSTDILGNLYEVKSKKHPQIIKKG(SEQ IDNO:2)。

MKRNYILGLDIGITSVGYGIIDYETRDVIDAGVRLFKEANVENNEGRRSKRGARRLKRRRRHRIQRVKKLLFDYNLLTDHSELSGINPYEARVKGLSQKLSEEEFSAALLHLAKRRGVHNVNEVEEDTGNELSTKEQISRNSKALEEKYVAELQLERLKKDGEVRGSINRFKTSDYVKEAKQLLKVQKAYHQLDQSFIDTYIDLLETRRTYYEGPGEGSPFGWKDIKEWYEMLMGHCTYFPEELRSVKYAYNADLYNALNDLNNLVITRDENEKLEYYEKFQIIENVFKQKKKPTLKQIAKEILVNEEDIKGYRVTSTGKPEFTNLKVYHDIKDITARKEIIENAELLDQIAKILTIYQSSEDIQEELTNLNSELTQEEIEQISNLKGYTGTHNLSLKAINLILDELWHTNDNQIAIFNRLKLVPKKVDLSQQKEIPTTLVDDFILSPVVKRSFIQSIKVINAIIKKYGLPNDIIIELAREKNSKDAQKMINEMQKRNAKTNERIEEIIRTTGKENAKYLIEKIKLHDMQEGKCLYSLEAIPLEDLLNNPFNYEVDHIIPRSVSFDNSFNNKVLVKQEENSKKGNRTPFQYLSSSDSKISYETFKKHILNLAKGKGRISKTKKEYLLEERDINRFSVQKDFINRNLVDTRYATARLMNLLRSYFRVNNLDVKVKSINGGFTSFLRRKWKFKKERNKGYKHHAEDALIIANADFIFKEWKKLDKAKKVMENQMFEEKQAESMPEIETEQEYKEIFITPHQIKHIKDFKDYKYSHRVDKKPNRKLINDTLYSTRKDDKGNTLIVNNLNGLYDKDNDKLKKLINKSPEKLLMYHHDPQTYQKLKLIMEQYGDEKNPLYKYYEETGNYLTKYSKKDNGPVIKKIKYYGNKLNAHLDITDDYPNSRNKVVKLSLKPYRFDVYLDNGVYKFVTVKNLDVIKKENYYEVNSKCYEEAKKLKKISNQAEFIASFYKNDLIKINGELYRV(SEQ ID NO:130)。

Wherein the amino acid sequence shown by SEQ ID NO 1 or SEQ ID NO 130 is a framework region sequence positioned at the 5 'end of the PAM identification region, the amino acid sequence shown by SEQ ID NO 2 is a framework region sequence positioned at the 3' end of the PAM identification region, namely, the PAM identification region is positioned between the framework region sequence at the 5 'end and the framework region sequence at the 3' end, the 3 'end of the amino acid sequence shown by SEQ ID NO 1 or SEQ ID NO 130 is connected with the 5' end of the PAM identification region, and the 5 'end of the amino acid sequence shown by SEQ ID NO 2 is connected with the 3' end of the PAM identification region.

According to embodiments of the invention, the PAM recognition region has at least one mutation compared to 982IGVNNDLLNRIEV994 relative to saCas 9.

According to embodiments of the invention, the PAM recognition region has at most 13 mutations compared to 982IGVNNDLLNRIEV994, preferably at most 8 mutations, or at most 7 mutations, or at most 6 mutations, or at most 5 mutations, or at most 4 mutations, or at most 3 mutations relative to saCas 9.

According to embodiments of the present invention, the Cas9 protein mutant has a mutation at any one or more of positions 982 to 994 as compared to the Cas9 protein. The inventors found that SaCas9 are more non-conserved at PAM directly interacting amino acid residues, suggesting that these different Cas9 homologous proteins may recognize different PAM sequences. While the sequences beside the amino acid residue responsible for PAM interaction on SaCas9 are more conserved, and the three PAM interaction amino acid residues at positions 985, 986, 991 are closely distributed over the protein sequence. Therefore, the inventors directly replaced short peptides of PAM interaction region in Cas9 homologous proteins of different sources onto SaCas9, thereby developing a series of recognition of different chimeric proteins. And (3) selecting an amino acid peptide (PAM recognition region) segment with the sequence of 982-994 in the SaCas9 for replacement, and obtaining the SaCas9 chimera with higher interaction success rate and activity with PAM through screening.

According to embodiments of the invention, the PAM recognition region has at least one of the following mutations compared to 982IGVNNDLLNRIEV994 relative to saCas 9: the 982 th mutation is T, K, R or L, the 983 rd mutation is A, C or S, the 984 th mutation is T, D, the 985 th mutation is F, S, A, N, the 986 th mutation is E, D, H, A, M, the 987 th mutation is S, G, N, S, D, E, P, the 988 th mutation is D, K, T, S, T, D, K, R, E, A, the 989 th mutation is R, A, N, Q, G, E, T, K, S, G, H, V, the 990 th mutation is S, the 991 th mutation is I, V, L, K, T, M, the 992 nd mutation is V, L, the 993 th mutation is Q, and the 994 th mutation is L, M, C, I, A. The inventors found that the PAM recognition range of Cas9 protein mutants according to the examples of the present application, having at least one of the mutations described above, is broad, extending to nearly 1/2.

According to the embodiment of the invention, compared with sacAS9, compared with 982IGVNNDLLNRIEV994, compared with the PAM recognition region, on the premise that the 985 th mutation is S, the 986 th mutation is S and the 991 th mutation is R;