Insecticidal protein discovery platform and insecticidal proteins discovered therefrom

文档序号:1060772 发布日期:2020-10-13 浏览:17次 中文

阅读说明:本技术 杀昆虫蛋白发现平台和自其发现的杀昆虫蛋白 (Insecticidal protein discovery platform and insecticidal proteins discovered therefrom ) 是由 J·金 O·刘 J·肖克 W·W·黄 于 2019-03-01 设计创作,主要内容包括:本发明提出一种用于从高度异质的环境来源发现新颖杀昆虫蛋白的平台。方法利用宏基因组富集程序和独特的基因扩增技术,从而能够获取广泛类别的未知微生物多样性及其所得蛋白质组。所揭示的杀昆虫蛋白发现平台IPDP可由计算驱动,且能够整合分子生物学、自动化和高级机器学习协议。所述平台将使研究人员能够快速且准确地获取由未表征和复杂的微生物环境样品产生的大量未开发的杀昆虫蛋白的组库。本文还提出来自罕见类别的杀昆虫蛋白的一组新发现的成孔毒素PFT,其利用所述杀昆虫蛋白发现平台发现。(The present invention proposes a platform for the discovery of novel insecticidal proteins from highly heterogeneous environmental sources. The method utilizes metagenome enrichment procedures and unique gene amplification techniques to obtain a wide variety of unknown microbial diversity and resulting proteomes. The disclosed insecticidal protein discovery platform IPDP can be computer driven and is capable of integrating molecular biology, automation, and advanced machine learning protocols. The platform will enable researchers to quickly and accurately obtain a repertoire of large quantities of undeveloped insecticidal proteins produced from uncharacterized and complex samples of microbial environments. Also presented herein is a set of newly discovered pore-forming toxins PFTs from a rare class of insecticidal proteins, which are discovered using the insecticidal protein discovery platform.)

1. An isolated nucleic acid molecule encoding: having (i) at least about 80% sequence identity to a protein having an amino acid sequence selected from the group consisting of: 2, 4, 12, 14, 16, 18, 20 and 22; or (ii) an insecticidal protein having an amino acid sequence having at least about 91% sequence identity to a protein having the amino acid sequence of SEQ ID NO. 8.

2. The isolated nucleic acid molecule of claim 1, wherein the nucleic acid molecule encodes an insecticidal protein having an amino acid sequence with at least about 90% sequence identity to a protein having an amino acid sequence selected from the group consisting of seq id nos: SEQ ID NO 2, SEQ ID NO 4, SEQ ID NO 12, SEQ ID NO 14, SEQ ID NO 16, SEQ ID NO 18, SEQ ID NO 20 and SEQ ID NO 22.

3. The isolated nucleic acid molecule of claim 1, wherein the nucleic acid molecule encodes an insecticidal protein having an amino acid sequence with at least about 95% sequence identity to a protein having an amino acid sequence selected from the group consisting of seq id nos: SEQ ID NO 2, SEQ ID NO 4, SEQ ID NO 8, SEQ ID NO 12, SEQ ID NO 14, SEQ ID NO 16, SEQ ID NO 18, SEQ ID NO 20 and SEQ ID NO 22.

4. The isolated nucleic acid molecule of claim 1, wherein the nucleic acid molecule encodes an insecticidal protein having an amino acid sequence with at least about 99% sequence identity to a protein having an amino acid sequence selected from the group consisting of seq id nos: SEQ ID NO 2, SEQ ID NO 4, SEQ ID NO 8, SEQ ID NO 12, SEQ ID NO 14, SEQ ID NO 16, SEQ ID NO 18, SEQ ID NO 20 and SEQ ID NO 22.

5. The isolated nucleic acid molecule of claim 1, wherein the nucleic acid molecule encodes an insecticidal protein having an amino acid sequence selected from the group consisting of: SEQ ID NO 2, SEQ ID NO 4, SEQ ID NO 8, SEQ ID NO 12, SEQ ID NO 14, SEQ ID NO 16, SEQ ID NO 18, SEQ ID NO 20 and SEQ ID NO 22.

6. The isolated nucleic acid molecule of any one of claims 1-5, wherein the nucleic acid molecule is codon optimized for expression in a host cell of interest.

7. The isolated nucleic acid molecule of claim 6, wherein the host cell of interest is a plant cell.

8. The isolated nucleic acid molecule of claim 1, wherein the nucleic acid molecule comprises a nucleotide sequence selected from the group consisting of: 1, 3, 7, 11, 13, 15, 17, 19 and 21.

9. A nucleotide construct, comprising: a nucleic acid molecule encoding a polypeptide having (i) at least about 80% sequence identity to a protein having an amino acid sequence selected from the group consisting of: 2, 4, 12, 14, 16, 18, 20 and 22; or (ii) an insecticidal protein having an amino acid sequence with at least about 91% sequence identity to a protein having the amino acid sequence of SEQ ID NO. 8, said nucleic acid molecule being operably linked to a heterologous regulatory element.

10. The nucleotide construct of claim 9, wherein said nucleic acid molecule encodes an insecticidal protein having an amino acid sequence with at least about 90% sequence identity to a protein having an amino acid sequence selected from the group consisting of seq id nos: SEQ ID NO 2, SEQ ID NO 4, SEQ ID NO 12, SEQ ID NO 14, SEQ ID NO 16, SEQ ID NO 18, SEQ ID NO 20 and SEQ ID NO 22.

11. The nucleotide construct of claim 9, wherein said nucleic acid molecule encodes an insecticidal protein having an amino acid sequence with at least about 95% sequence identity to a protein having an amino acid sequence selected from the group consisting of seq id nos: SEQ ID NO 2, SEQ ID NO 4, SEQ ID NO 8, SEQ ID NO 12, SEQ ID NO 14, SEQ ID NO 16, SEQ ID NO 18, SEQ ID NO 20 and SEQ ID NO 22.

12. The nucleotide construct of claim 9, wherein said nucleic acid molecule encodes an insecticidal protein having an amino acid sequence with at least about 99% sequence identity to a protein having an amino acid sequence selected from the group consisting of seq id nos: SEQ ID NO 2, SEQ ID NO 4, SEQ ID NO 8, SEQ ID NO 12, SEQ ID NO 14, SEQ ID NO 16, SEQ ID NO 18, SEQ ID NO 20 and SEQ ID NO 22.

13. The nucleotide construct of claim 9, wherein said nucleic acid molecule encodes an insecticidal protein having an amino acid sequence selected from the group consisting of seq id no: SEQ ID NO 2, SEQ ID NO 4, SEQ ID NO 8, SEQ ID NO 12, SEQ ID NO 14, SEQ ID NO 16, SEQ ID NO 18, SEQ ID NO 20 and SEQ ID NO 22.

14. The nucleotide construct of any one of claims 9-13, wherein the heterologous regulatory element is a promoter.

15. The nucleotide construct according to any one of claims 9 to 14, wherein the nucleotide construct is comprised in an expression cassette.

16. The nucleotide construct of any one of claims 9-15, wherein the heterologous regulatory element is capable of expressing an encoded protein in a plant.

17. The nucleotide construct of any one of claims 9-16, wherein the nucleic acid molecule is codon optimized for expression in a host cell of interest.

18. The nucleotide construct of claim 17, wherein the host cell of interest is a plant cell.

19. The nucleotide construct of any one of claims 9-18, wherein the nucleic acid molecule is selected from the group consisting of: 1, 3, 7, 11, 13, 15, 17, 19 and 21.

20. An expression vector comprising the nucleotide construct of any one of claims 9-19.

21. A plasmid comprising the nucleotide construct of any one of claims 9-19.

22. A host cell comprising the nucleotide construct of any one of claims 9-19.

23. A method of killing an insect comprising contacting the insect with the host cell of claim 22.

24. A prokaryotic host cell comprising the nucleotide construct of any one of claims 9-19.

25. A eukaryotic host cell comprising the nucleotide construct of any one of claims 9-19.

26. A plant cell comprising the nucleotide construct of any one of claims 9-19.

27. A monocotyledonous plant cell comprising the nucleotide construct of any one of claims 9 to 19.

28. A dicot plant cell comprising the nucleotide construct of any one of claims 9-19.

29. A plant stably transformed with the nucleotide construct of any one of claims 9-19.

30. A seed produced by a plant that has been stably transformed with the nucleotide construct of any one of claims 9-19.

31. An isolated insecticidal protein comprising: (i) has at least about 80% sequence identity to SEQ ID NO 2, SEQ ID NO 4, SEQ ID NO 12, SEQ ID NO 14, SEQ ID NO 16, SEQ ID NO 18, SEQ ID NO 20 or SEQ ID NO 22; or (ii) an amino acid sequence having at least about 91% sequence identity to SEQ ID NO 8.

32. The isolated insecticidal protein of claim 31, comprising: an amino acid sequence having at least about 90% sequence identity to SEQ ID NO 2, SEQ ID NO 4, SEQ ID NO 12, SEQ ID NO 14, SEQ ID NO 16, SEQ ID NO 18, SEQ ID NO 20 or SEQ ID NO 22.

33. The isolated insecticidal protein of claim 31, comprising: an amino acid sequence having at least about 95% sequence identity to SEQ ID NO 2, SEQ ID NO 4, SEQ ID NO 8, SEQ ID NO 12, SEQ ID NO 14, SEQ ID NO 16, SEQ ID NO 18, SEQ ID NO 20 or SEQ ID NO 22.

34. The isolated insecticidal protein of claim 31, comprising: an amino acid sequence having at least about 99% sequence identity to SEQ ID NO 2, SEQ ID NO 4, SEQ ID NO 8, SEQ ID NO 12, SEQ ID NO 14, SEQ ID NO 16, SEQ ID NO 18, SEQ ID NO 20 or SEQ ID NO 22.

35. The isolated insecticidal protein of claim 31, comprising: an amino acid sequence selected from the group consisting of SEQ ID NO 2, SEQ ID NO 4, SEQ ID NO 8, SEQ ID NO 12, SEQ ID NO 14, SEQ ID NO 16, SEQ ID NO 18, SEQ ID NO 20 or SEQ ID NO 22.

36. A recombinant insecticidal protein comprising: (i) has at least about 80% sequence identity to SEQ ID NO 2, SEQ ID NO 4, SEQ ID NO 12, SEQ ID NO 14, SEQ ID NO 16, SEQ ID NO 18, SEQ ID NO 20 or SEQ ID NO 22; or (ii) an amino acid sequence having at least about 91% sequence identity to SEQ ID NO 8.

37. The recombinant insecticidal protein of claim 36, comprising: an amino acid sequence having at least about 90% sequence identity to SEQ ID NO 2, SEQ ID NO 4, SEQ ID NO 12, SEQ ID NO 14, SEQ ID NO 16, SEQ ID NO 18, SEQ ID NO 20 or SEQ ID NO 22.

38. The recombinant insecticidal protein of claim 36, comprising: an amino acid sequence having at least about 95% sequence identity to SEQ ID NO 2, SEQ ID NO 4, SEQ ID NO 8, SEQ ID NO 12, SEQ ID NO 14, SEQ ID NO 16, SEQ ID NO 18, SEQ ID NO 20 or SEQ ID NO 22.

39. The recombinant insecticidal protein of claim 36, comprising: an amino acid sequence having at least about 99% sequence identity to SEQ ID NO 2, SEQ ID NO 4, SEQ ID NO 8, SEQ ID NO 12, SEQ ID NO 14, SEQ ID NO 16, SEQ ID NO 18, SEQ ID NO 20 or SEQ ID NO 22.

40. The recombinant insecticidal protein of claim 36, comprising: an amino acid sequence selected from the group consisting of SEQ ID NO 2, SEQ ID NO 4, SEQ ID NO 8, SEQ ID NO 12, SEQ ID NO 14, SEQ ID NO 16, SEQ ID NO 18, SEQ ID NO 20 or SEQ ID NO 22.

41. A transgenic plant cell comprising:

a dna construct comprising: a polynucleotide encoding a polypeptide having (i) at least about 80%, 81%, 82%, 83%, 84%, 85%, 86%, 87%, 88%, 89%, 90%, 91%, 92%, 93%, 94%, 95%, 96%, 97%, 98%, 99% or more sequence identity to an amino acid sequence selected from the group consisting of: 2, 4, 12, 14, 16, 18, 20 and 22; or (ii) a polypeptide having an amino acid sequence having at least about 91%, 92%, 93%, 94%, 95%, 96%, 97%, 98%, 99% or greater sequence identity to the amino acid sequence of SEQ ID NO. 8; and a heterologous regulatory sequence operably linked to the polynucleotide.

42. The transgenic plant cell of claim 41, wherein the polynucleotide encodes a polypeptide having an amino acid sequence selected from the group consisting of SEQ ID NO: SEQ ID NO 2, SEQ ID NO 4, SEQ ID NO 8, SEQ ID NO 12, SEQ ID NO 14, SEQ ID NO 16, SEQ ID NO 18, SEQ ID NO 20 and SEQ ID NO 22.

43. The transgenic plant cell of claim 41, wherein the heterologous regulatory element is a promoter.

44. The transgenic plant cell of any one of claims 41-43, wherein the cell is from a monocot species.

45. The transgenic plant cell of any one of claims 41-43, wherein the cell is from maize, wheat, oat, or rice.

46. The transgenic plant cell of any one of claims 41-43, wherein the cell is from a dicot species.

47. The transgenic plant cell of any one of claims 41-43, wherein the cell is from cotton, potato, or soybean.

48. The transgenic plant cell of any one of claims 41-43, wherein the cell is from an agricultural row crop species.

49. A transgenic plant stably transformed with a DNA construct comprising:

a. a polynucleotide encoding a polypeptide having (i) at least about 80%, 81%, 82%, 83%, 84%, 85%, 86%, 87%, 88%, 89%, 90%, 91%, 92%, 93%, 94%, 95%, 96%, 97%, 98%, 99% or more sequence identity to an amino acid sequence selected from the group consisting of: 2, 4, 12, 14, 16, 18, 20 and 22; or (ii) a polypeptide having an amino acid sequence having at least about 91%, 92%, 93%, 94%, 95%, 96%, 97%, 98%, 99% or greater sequence identity to the amino acid sequence of SEQ ID NO. 8; and

b. A heterologous regulatory sequence operably linked to the polynucleotide.

50. The transgenic plant of claim 49, wherein the polynucleotide encodes a polypeptide having an amino acid sequence selected from the group consisting of SEQ ID NO: SEQ ID NO 2, SEQ ID NO 4, SEQ ID NO 8, SEQ ID NO 12, SEQ ID NO 14, SEQ ID NO 16, SEQ ID NO 18, SEQ ID NO 20 and SEQ ID NO 22.

51. The transgenic plant of claim 49, wherein the heterologous regulatory element is a promoter.

52. The transgenic plant of any one of claims 49-51, wherein the plant is a monocot species.

53. The transgenic plant of any one of claims 49-51, wherein the plant is maize, wheat, oat, or rice.

54. The transgenic plant of any one of claims 49-51, wherein the plant is a dicotyledonous plant species.

55. The transgenic plant of any one of claims 49-51, wherein the plant is cotton, potato, or soybean.

56. The transgenic plant of any one of claims 49-51, wherein the plant is from an agricultural row crop species.

57. A seed produced by the plant of any one of claims 49-56.

58. A progeny plant produced from the plant of any one of claims 49-56.

59. The transgenic plant of claim 49, further comprising: a DNA construct comprising a polynucleotide encoding a protein selected from the group consisting of: a monatin protein, a pseudomonas insecticidal protein, a Cry protein, a Cyt protein, a vegetative insecticidal protein, a toxin complexing protein, and any combination thereof.

60. A method of killing a target pest comprising: providing a transgenic plant according to any one of claims 49-56 and 59 to an area wherein the target pest is exposed to the transgenic plant.

61. The method of claim 60, wherein said target pest feeds on said transgenic plant.

62. A method of killing a target pest resistant to a pesticidal protein, comprising: providing a transgenic plant according to any one of claims 49-56 and 59 to an area, wherein the target pest is exposed to the transgenic plant, and wherein the target pest is resistant to a protein selected from the group consisting of: a monatin protein, a pseudomonas insecticidal protein, a Cry protein, a Cyt protein, a vegetative insecticidal protein, a toxin complexing protein, and any combination thereof.

63. A method of killing a target pest resistant to a pesticidal protein, comprising: providing a transgenic plant of any one of claims 49-56 and 59 to an area, wherein the target pest feeds on the transgenic plant, and wherein the target pest is resistant to a protein selected from the group consisting of: a monatin protein, a pseudomonas insecticidal protein, a Cry protein, a Cyt protein, a vegetative insecticidal protein, a toxin complexing protein, and any combination thereof.

64. A method of killing a target pest comprising: providing a transgenic plant according to any one of claims 49-56 and 59 to a region, wherein the target pest is exposed to the transgenic plant and the target pest is a member of the orders coleoptera, diptera, hymenoptera, lepidoptera, hemiptera, orthoptera, thysanoptera or dermaptera.

65. An insecticidal composition comprising:

a. an isolated insecticidal protein having (i) at least about 80%, 81%, 82%, 83%, 84%, 85%, 86%, 87%, 88%, 89%, 90%, 91%, 92%, 93%, 94%, 95%, 96%, 97%, 98%, 99% or greater sequence identity to an amino acid sequence selected from the group consisting of: 2, 4, 12, 14, 16, 18, 20 and 22; or (ii) an amino acid sequence having at least about 91%, 92%, 93%, 94%, 95%, 96%, 97%, 98%, 99% or more sequence identity to the amino acid sequence of SEQ ID NO. 8; and

b. An agriculturally acceptable carrier.

66. The insecticidal composition of claim 65, wherein the isolated insecticidal protein has an amino acid sequence selected from the group consisting of SEQ ID NO: SEQ ID NO 2, SEQ ID NO 4, SEQ ID NO 8, SEQ ID NO 12, SEQ ID NO 14, SEQ ID NO 16, SEQ ID NO 18, SEQ ID NO 20 and SEQ ID NO 22.

67. The insecticidal composition of claim 65, wherein the isolated insecticidal protein is present in an insecticidally effective amount.

68. The insecticidal composition of any one of claims 65-67, wherein the agriculturally acceptable carrier is selected from the group consisting of: adjuvants, inert components, dispersants, surfactants, stickers, tackifiers, binders, natural or regenerated minerals, solvents, wetting agents, fertilizers, and combinations thereof.

69. The insecticidal composition of any one of claims 65-68, formulated as a dry solid.

70. The insecticidal composition of any one of claims 65-68, formulated as a liquid.

71. The insecticidal composition of any one of claims 65-68, formulated for foliar application.

72. The insecticidal composition of any one of claims 65-68, formulated for in-furrow application.

73. The insecticidal composition of any one of claims 65-68 formulated as a seed coating or seed treatment.

74. The insecticidal composition of any one of claims 65-73, further comprising: at least one additional pesticidal compound.

75. The insecticidal composition according to claim 74, wherein the at least one additional insecticidal compound is selected from the group consisting of: a monatin protein, a pseudomonas insecticidal protein, a Cry protein, a Cyt protein, a vegetative insecticidal protein, a toxin complexing protein, and any combination thereof.

76. The insecticidal composition of any one of claims 65-75, further comprising: a herbicidal compound.

77. A method of killing a target pest comprising: applying to the target pest the insecticidal composition of any one of claims 65-76.

78. A method of killing a target pest comprising: applying to a locus an insecticidal composition according to any one of claims 65 to 76 wherein the target pest is in contact with the locus.

79. A method of killing a target pest comprising: applying to a crop an insecticidal composition according to any one of claims 65-76, wherein the target pest is in contact with the crop.

80. A method of killing a target pest comprising: applying to a crop plant the insecticidal composition of any one of claims 65-76, wherein the target pest is in contact with the crop plant and the target pest is a member of the orders coleoptera, diptera, hymenoptera, lepidoptera, hemiptera, orthoptera, thysanoptera, or dermaptera.

81. A cell lysate comprising: an insecticidal protein comprising (i) at least about 80% sequence identity to SEQ ID NO 2, SEQ ID NO 4, SEQ ID NO 12, SEQ ID NO 14, SEQ ID NO 16, SEQ ID NO 18, SEQ ID NO 20, or SEQ ID NO 22; or (ii) an amino acid sequence having at least about 91% sequence identity to SEQ ID NO 8.

82. A cell lysate according to claim 81, comprising: an insecticidal protein comprising an amino acid sequence having at least about 90% sequence identity to SEQ ID NO 2, SEQ ID NO 4, SEQ ID NO 12, SEQ ID NO 14, SEQ ID NO 16, SEQ ID NO 18, SEQ ID NO 20, or SEQ ID NO 22.

83. A cell lysate according to claim 81, comprising: an insecticidal protein comprising an amino acid sequence having at least about 95% sequence identity to SEQ ID NO 2, SEQ ID NO 4, SEQ ID NO 8, SEQ ID NO 12, SEQ ID NO 14, SEQ ID NO 16, SEQ ID NO 18, SEQ ID NO 20, or SEQ ID NO 22.

84. A cell lysate according to claim 81, comprising: an insecticidal protein comprising an amino acid sequence having at least about 99% sequence identity to SEQ ID NO 2, SEQ ID NO 4, SEQ ID NO 8, SEQ ID NO 12, SEQ ID NO 14, SEQ ID NO 16, SEQ ID NO 18, SEQ ID NO 20, or SEQ ID NO 22.

85. A cell lysate according to claim 81, comprising: an insecticidal protein comprising an amino acid sequence selected from the group consisting of SEQ ID NO 2, SEQ ID NO 4, SEQ ID NO 8, SEQ ID NO 12, SEQ ID NO 14, SEQ ID NO 16, SEQ ID NO 18, SEQ ID NO 20 and SEQ ID NO 22.

86. A method of killing a target pest comprising: applying the cell lysate of any one of claims 81-85 to the target pest.

87. A method of killing a target pest comprising: administering to the target pest a host cell that expresses a polypeptide having (i) at least about 80%, 81%, 82%, 83%, 84%, 85%, 86%, 87%, 88%, 89%, 90%, 91%, 92%, 93%, 94%, 95%, 96%, 97%, 98%, 99% or greater sequence identity to an amino acid sequence selected from the group consisting of: 2, 4, 12, 14, 16, 18, 20 and 22; or (ii) a polypeptide having an amino acid sequence with at least about 91%, 92%, 93%, 94%, 95%, 96%, 97%, 98%, 99% or more sequence identity to the amino acid sequence of SEQ ID NO. 8.

88. The method of claim 87, wherein the host cell expresses a polynucleotide encoding a polypeptide having an amino acid sequence selected from the group consisting of SEQ ID NO: SEQ ID NO 2, SEQ ID NO 4, SEQ ID NO 8, SEQ ID NO 12, SEQ ID NO 14, SEQ ID NO 16, SEQ ID NO 18, SEQ ID NO 20 and SEQ ID NO 22.

89. The method of claim 87, wherein the host cell is a prokaryotic host cell.

90. The method of claim 87, wherein the host cell naturally expresses the polynucleotide.

91. The method of claim 87, wherein the host cell is from the genus Pseudomonas.

92. The method of any one of claims 87-91, wherein the target pest is a member of the order coleoptera, diptera, hymenoptera, lepidoptera, hemiptera, orthoptera, thysanoptera, or dermaptera.

93. A method for constructing a genomic library enriched in DNA of pseudomonas encoding insecticidal proteins, the method comprising:

a. providing a starting sample comprising one or more microorganisms;

b. exposing the initial sample to a solid nutrient-limiting medium to allow enriched growth of a species from the genus pseudomonas, thereby producing a subsequent sample enriched in the pseudomonas species;

c. Isolating DNA from the subsequently enriched sample;

d. extracting DNA from the isolated DNA and performing degenerate PCR with selected primers to amplify a target insecticidal protein gene;

e. cloning the PCR-amplified DNA into a plasmid; and

f. the cloned DNA from the plasmid was sequenced.

94. The method of claim 93, further comprising: assembling the sequenced DNA into a genomic library.

95. The method of claim 93, further comprising: identifying an insecticidal protein gene within the sequenced DNA.

96. The method of claim 93, further comprising: identifying an insecticidal protein gene within the sequenced DNA, wherein the identified insecticidal protein gene is unknown.

97. The method of any one of claims 93, 95, and 96, further comprising: identifying insecticidal protein genes within the sequenced DNA using hidden Markov models.

98. The method of claim 93, further comprising: identifying an insecticidal protein gene within the sequenced DNA, wherein the identified insecticidal protein gene comprises a nucleotide sequence selected from the group consisting of: 1, 3, 7, 11, 13, 15, 17, 19 and 21.

99. The method of claim 93, further comprising: identifying an insecticidal protein gene within the sequenced DNA, wherein the identified insecticidal protein gene encodes a protein having an amino acid sequence selected from the group consisting of: 2, 4, 8, 12, 14, 16, 18, 20 and 22.

100. The method of claim 93, wherein the primers are selected to amplify a target insecticidal protein gene encoding a protein having an amino acid sequence with at least 50% sequence identity to SEQ ID No. 87.

101. The method of claim 93, wherein the initial sample is from soil.

102. An insecticidal genomic library enriched in pseudomonas DNA encoding an insecticidal protein as constructed by the method of claim 93.

103. An insecticidal protein comprising: when using HMM scores or matches in table 6, a) amino acid sequences with bit score scores at or above 521.5; and/or b) amino acid sequences that match with an E value of less than or equal to 7.9E-161.

Technical Field

The present invention is directed to a method for discovering novel insecticidal proteins from highly heterogeneous environmental sources. The method utilizes a metagenome enrichment procedure and a unique gene amplification technology to realize the acquisition of the diversity of a wide variety of unknown microorganisms and the proteomes obtained by the diversity.

The disclosed Insecticidal Protein Discovery Platform (IPDP) is computationally driven and capable of integrating molecular biology, automation, and advanced machine learning protocols. The platform will enable researchers to quickly and accurately access a repertoire of large quantities of undeveloped insecticidal proteins produced by uncharacterized and complex samples of microbial environments.

Also presented herein is a set of newly discovered pore-forming toxins (PFTs) from the rare class of insecticidal proteins, which are discovered using an insecticidal protein discovery platform.

Background

It is estimated that by 2050, the world population will have reached over 90 hundred million people. As estimated by agricultural experts in the united nations project, in order to supply such a large global population, the overall food production must then be increased by 70% to meet future demands. This challenge is exacerbated by a number of factors, including: reduced fresh water resources, limited supply of arable land, rising energy prices, increased input costs, and environmental concerns over modern intertillage crop farming.

The old problem is the associated reduction in pesticidal pressure and yield and the reduction in yields derived therefrom, which will continue to be one of the most pressing issues facing our global agricultural industry. Traditional synthetic chemicals have successfully helped farmers combat problematic insects, but these chemicals are under increasing scrutiny due to concerns about their impact on human health and potentially harmful environmental effects. Therefore, to meet the demand of the growing global population, more biotechnological solutions will be needed to combat agricultural pests.

One of the major biotechnological pesticide solutions is derived from Bacillus thuringiensis (Bt), gram-positive, spore-forming bacteria. Bt bacteria are identified as entomopathogens and their insecticidal activity is attributed to parasporal crystals encoded by the Cry genes, of which there are over 100 known isoforms. This observation has led to the development of Bt bacteria-based biopesticides for controlling certain insect species. Currently, plants have been genetically engineered to express Bt insecticidal proteins, which alleviates the need for application externally to plants. However, similar to the situation where insect resistance arises from the continuous use of chemical insecticides, the continuous expression of these insecticidal Bt proteins in plants also strongly selects for resistance to the target pest population. Thus, the industry has found that a surprising ratio of insect populations are resistant to Bt crops. In addition, Bt proteins have a limited range of activity and are ineffective against some of the insect species that are currently problematic.

Thus, in view of the growing global population, environmental issues associated with traditional chemical pesticides, and the increasing resistance of insects to Bt traits, there is a great need in the art to identify novel insecticidal proteins that can be incorporated into biotechnological products suitable for use in modern agriculture.

Disclosure of Invention

The present invention provides novel insecticidal proteins that can be used in modern row crop agriculture. These insecticidal proteins can be developed as stand-alone products for direct application to plant species, or can be incorporated into the genome of a host plant for expression.

Unlike traditional synthetic chemical insecticides, the insecticidal proteins taught do not pose environmental concerns. Furthermore, insecticidal proteins belong to a newly discovered class that have several advantages over current industry standard Cry protein products derived from bacillus thuringiensis (Bt) coding sequences.

In addition to the novel insecticidal proteins themselves, the present invention provides a platform for discovering additional insecticidal proteins by obtaining a repertoire of large quantities of undeveloped insecticidal proteins produced from uncharacterized and complex samples of microbial environments.

The Insecticidal Protein Discovery Platform (IPDP) utilizes metagenome enrichment procedures and unique gene amplification technology to achieve the acquisition of a wide variety of unknown microbial diversity and resulting proteomes. Because the platform can be computer driven and is capable of integrating molecular biology, automation, and advanced machine learning approaches, researchers will now be able to quickly and systematically develop models and search queries to identify additional novel insecticidal proteins.

In certain embodiments, the present invention provides a method for constructing a genomic library enriched in DNA of pseudomonas encoding insecticidal proteins, the method comprising: a) providing a starting sample comprising one or more microorganisms; b) exposing said initial sample to a solid nutrient-limiting medium to allow for enriched growth of a species of pseudomonas, thereby producing a subsequent sample enriched in pseudomonas species; c) isolating DNA from the subsequently enriched sample; d) extracting DNA from the isolated DNA and performing degenerate PCR with selected primers to amplify a target insecticidal protein gene; e) cloning the PCR-amplified DNA into a plasmid; and f) sequencing the cloned DNA from said plasmid. In certain embodiments, the method comprises assembling the sequenced DNA into a genomic library. In certain embodiments, the method comprises identifying an insecticidal protein gene within the sequenced DNA. In some embodiments, the identified insecticidal protein is unknown. In some embodiments, hidden markov models are used to identify insecticidal protein genes. In some embodiments, any gene (i.e., nucleotide sequence) in Table 3 can be found (e.g., SEQ ID NOs: 1, 3, 5, 7, 9, 11, 13, 15, 17, 19, 21, 23, 25, 27, 29, 31, 33, 35, 37, 39, 41, 43, 45, 47, 49, 51, 53, 55, 57, 59, 61, 63, 65, 67, 69, and 71). In some embodiments, any gene encoding a protein found in Table 3 can be found (e.g., SEQ ID NOs: 2, 4, 6, 8, 10, 12, 14, 16, 18, 20, 22, 24, 26, 28, 30, 32, 34, 36, 38, 40, 42, 44, 46, 48, 50, 52, 54, 56, 58, 60, 62, 64, 66, 68, 70, and 72). In some embodiments, the primers are selected to amplify a target insecticidal protein gene encoding a protein having at least 50% sequence identity to SEQ ID NO: 87.

In some embodiments, the present invention provides an isolated nucleic acid molecule encoding an insecticidal protein having at least about 80%, 81%, 82%, 83%, 84%, 85%, 86%, 87%, 88%, 89%, 90%, 91%, 92%, 93%, 94%, 95%, 96%, 97%, 98%, 99% or greater sequence identity to a protein selected from the group consisting of: 2, 4, 6, 8, 10, 12, 14, 16, 18, 20, 22, 24, 26, 28, 30, 32, 34, 36, 38, 40, 42, 44, 46, 48, 50, 52, 54, 56, 58, 60, 62, 64, 66, 68, 70 and 72. In certain embodiments, the isolated nucleic acid molecule is codon optimized for expression in a host cell of interest. In certain embodiments, the isolated nucleic acid molecule is codon optimized for expression in a plant cell. In certain embodiments, an isolated nucleic acid molecule has at least about 80%, 81%, 82%, 83%, 84%, 85%, 86%, 87%, 88%, 89%, 90%, 91%, 92%, 93%, 94%, 95%, 96%, 97%, 98%, 99% or greater sequence identity to a nucleic acid sequence selected from the group consisting of: 1, 3, 5, 7, 9, 11, 13, 15, 17, 19, 21, 23, 25, 27, 29, 31, 33, 35, 37, 39, 41, 43, 45, 47, 49, 51, 53, 55, 57, 59, 61, 63, 65, 67, 69, and 71.

In some embodiments, the invention provides a nucleotide construct comprising: a nucleic acid molecule encoding an insecticidal protein having at least about 80% sequence identity to a protein selected from the group consisting of: 2, 4, 6, 8, 10, 12, 14, 16, 18, 20, 22, 24, 26, 28, 30, 32, 34, 36, 38, 40, 42, 44, 46, 48, 50, 52, 54, 56, 58, 60, 62, 64, 66, 68, 70, and 72, operably linked to a heterologous regulatory element. In various aspects, the heterologous regulatory element is a promoter. In various aspects, the heterologous regulatory element is a plant promoter. In some embodiments, the invention provides a transgenic plant cell comprising the nucleotide construct. In some embodiments, the invention provides stably transformed plants expressing the protein from a nucleotide construct. In some embodiments, the insect feeds on the transgenic plant and is killed.

In some embodiments, the present invention provides an isolated insecticidal protein comprising: an amino acid sequence having at least about 80%, 81%, 82%, 83%, 84%, 85%, 86%, 87%, 88%, 89%, 90%, 91%, 92%, 93%, 94%, 95%, 96%, 97%, 98%, 99% or more sequence identity to an amino acid sequence selected from the group consisting of: 2, 4, 6, 8, 10, 12, 14, 16, 18, 20, 22, 24, 26, 28, 30, 32, 34, 36, 38, 40, 42, 44, 46, 48, 50, 52, 54, 56, 58, 60, 62, 64, 66, 68, 70 and 72. In some embodiments, the isolated insecticidal protein is recombinant. In some embodiments, the invention provides transgenic plant cells expressing the protein. In some embodiments, the insect feeds on the transgenic plant and is killed. In some embodiments, the aforementioned insecticidal proteins are contained in an agricultural composition. In some embodiments, the agricultural compositions are used to spray on plants and/or insects to provide effective insect control. In some embodiments, insecticidal proteins are found in cell lysates and the cell lysates can be used to control insect pest populations. In some embodiments, a native pseudomonas host organism can be formulated into a composition and used to combat insect pests.

In certain embodiments, the present invention provides novel insecticidal proteins, wherein said proteins have an amino acid sequence that scores 521.5 or a bit score greater than 521.5 and/or a sequence that matches an E value less than or equal to 7.9E-161 when using the HMM score in table 6. These proteins may be provided in any of the forms disclosed herein (e.g., as isolated or recombinant proteins) or as part of any of the compositions (e.g., plant or agricultural compositions).

Drawings

FIG. 1 outlines the workflow of the taught Insecticidal Protein Discovery Platform (IPDP).

Fig. 2 outlines the workflow of the taught Insecticidal Protein Discovery Platform (IPDP) and illustrates two steps utilized by prior art methods not required by current IPDP.

FIG. 3 illustrates a multiple sequence alignment of eight novel insecticidal proteins found in Table 3 (ZIP1, ZIP2, ZIP6, ZIP8, ZIP9, ZIP10, ZIP11, ZIP12) found using IPDP.

FIG. 4 illustrates a multiple sequence alignment of the eight novel insecticidal proteins found in Table 3 (ZIP1, ZIP2, ZIP6, ZIP8, ZIP9, ZIP10, ZIP11, ZIP12) found using IPDP in comparison to monacolin (monalysin).

FIG. 5 illustrates a phylogenetic tree of the eight novel insecticidal proteins found in Table 3 and FIG. 3, found using IPDP.

FIG. 6 illustrates a phylogenetic tree of the eight novel insecticidal proteins found in Table 3 and FIG. 4 using IPDP compared to monacolin.

FIG. 7 illustrates the results of the insect bioassay experiments using ten purified insecticidal proteins found in Table 3. Insects (lygus sinensis (hallomorpha hays) -pest toona sinensis (BrownMarmorated Stink Bug)) ingesting water containing purified insecticidal proteins found via IPDP (ZIP1, ZIP2, ZIP4, ZIP6, ZIP8, ZIP9, ZIP10, ZIP11, and ZIP12 and ZIP16) exhibited varying degrees of mortality. The concentration of purified insecticidal protein used in this experiment is also presented in figure 7.

FIG. 8 illustrates the results of the insect bioassay experiments using three purified insecticidal proteins (ZIP1, ZIP2 and ZIP4) found against Toonae sinensis in Table 3. Different concentrations of purified protein were ingested by insects (N ═ number of insects analyzed), and mortality data were subjected to Probit analysis to generate the lethal concentrations required to kill 50% of the population (LC50) at the upper and lower 95% confidence intervals.

FIGS. 9A-B illustrate the results of insect bioassay experiments using three purified insecticidal proteins (ZIP1, ZIP2, and ZIP4) found in Table 3 for two major members of the insect order; fall Armyworm (Fall Armyworm) and Southern Corn Rootworm (Southern Corn Rootworm). The average weight percent reduction of insects ingesting the listed concentrations of purified protein mixed with the solid diet compared to the buffer only control is reported. Fig. 9A presents experiments performed on fall armyworm (Spodoptera frugiperda), while fig. 9B illustrates experiments performed on southern corn rootworm (Diabrotica undecimpacita).

FIG. 10 illustrates results from insect lysate experiments. Insects (tea bug-pest) that ingested bacterial lysates containing insecticidal proteins found via IPDP exhibited 100% mortality.

Fig. 11 illustrates western blot results from example 6 showing expression of ZIP proteins from soybean and corn leaves. Strips 1 and 11: negative control (untransformed soybean leaves); strips 2 and 3: ZIP 1-transformed soybean leaves; 4-10 of the strip: ZIP 2-transformed soybean leaves; strips 12 and 13: ZIP 4-transformed soybean leaves; strip 14: negative control (untransformed maize leaves); strips 15 and 16: ZIP2 transformed maize leaves.

Detailed Description

Definition of

While the following terms are believed to be well understood by those of ordinary skill in the art, the following definitions are set forth to facilitate explanation of the inventive subject matter.

The terms "a" and "an" refer to one or more of the stated entities, which may refer to a plurality of the stated entities. Thus, the terms "a (a) or an", "one or more", and "at least one" are used interchangeably herein. Furthermore, reference to an "element" by the indefinite article "a/an" does not exclude the possibility that more than one of the element is present, unless the context clearly requires that one and only one of the elements is present.

As used herein, the terms "cellular organism", "microbial organism" or "microorganism" should be understood in a broad sense. These terms are used interchangeably and include, but are not limited to, two prokaryotic domains: bacteria and archaea, and certain eukaryotic fungi and protists. In some embodiments, the invention refers to "microorganisms" or "cellular organisms" or "microorganisms" in the lists/tables and figures presented herein. Such characterization may refer not only to the identified genus of the table and drawing, but also to the identified species, as well as any novel and newly identified or designed strains of organisms in the table or drawing. For statements of these terms in other parts of this specification (like examples), the same token holds true.

The term "prokaryote" is art-recognized and refers to a cell that does not contain a nucleus or other organelle. Prokaryotes are generally classified according to one of two domains: bacteria and archaea. The decisive difference between archaea and bacterial domain organisms is based on the fundamental difference in nucleotide base sequences in 16S ribosomal RNA.

The term "archaebacteria" refers to a class of organisms of the phylum meldosticus, which are typically found in abnormal environments and are distinguished from the rest of prokaryotes according to several criteria, including the number of ribosomal proteins and the absence of muramic acid in the cell wall. Based on ssrna analysis, archaea consist of two distinct phylogenetic groups: the phylum Fanggu (Crenarchaeota) and the phylum euryalchaeota (Euryarchaeota). Archaea can be organized into three types based on their physiology: methanogens (prokaryotes that produce methane); extreme halophiles (extremehalophiles) (prokaryotes surviving at very high concentrations of salt (NaCl)); and extreme (hyper) thermophilus (prokaryotes that survive extremely high temperatures). In addition to unifying archaebacteria's characteristics (i.e., absence of murein, ester-linked membrane lipids, etc. in the cell wall) from bacteria, these prokaryotes also exhibit unique structural or biochemical attributes that allow them to adapt to their particular habitat. The phylum Fauginose mainly consists of extreme thermophilic sulfur-dependent prokaryotes, and the phylum Fauginose contains methanogens and extreme halophilic bacteria.

"bacterial" or "eubacteria" refers to prokaryotic organic domains. Bacteria include at least 11 different groups as follows: (1) gram-positive (gram +) bacteria, which exist in two major subgenus: (1) high G + C group (Actinomycetes, Mycobacteria, Micrococcus, etc.), (2) low G + C group (Bacillus, clostridium, Lactobacillus, staphylococcus, streptococcus, mycoplasma); (2) proteobacteria, such as purple photosynthetic + non-photosynthetic gram-negative bacteria (including the most "common" gram-negative bacteria); (3) cyanobacteria, such as aerobic phototrophy; (4) spirochetes and related species; (5) phycomycetes; (6) bacteroides (Bacteroides), flavobacterium (flavobacterium); (7) chlamydia (Chlamydia); (8) a green sulfur bacterium; (9) green non-sulfur bacteria (also anaerobic phototrophic); (10) radioresistant micrococcus and related species; (11) thermomyces (Thermotoga) and Thermotoga thermophila (Thermosiphothiothermophiles).

A "eukaryote" is any organism whose cells contain a nucleus and other organelles enclosed within membranes. Eukaryotes belong to the eukaryotes or to the eukaryote taxa. A limiting feature that distinguishes eukaryotic cells from prokaryotic cells (the aforementioned bacteria and archaea) is their membrane-bound organelles, especially the nucleus, which contains genetic material and is enclosed by a nuclear envelope.

The terms "genetically modified host cell", "recombinant host cell" and "recombinant strain" are used interchangeably herein and refer to a host cell that has been genetically modified using the cloning and transformation methods of the present invention. Thus, the term includes a host cell (e.g., a bacterium, yeast cell, fungal cell, CHO, human cell, etc.) that has been genetically altered, modified, or engineered so that it exhibits an altered, modified, or different genotype and/or phenotype (e.g., when the genetic modification affects the coding nucleic acid sequence of the microorganism) as compared to the naturally occurring organism from which it is derived. It is understood that in some embodiments, the term refers not only to the particular recombinant host cell in question, but also to progeny or potential progeny of such a host cell.

The term "wild-type microorganism" or "wild-type host cell" describes a cell as it exists in nature, i.e., a cell that has not been genetically modified.

The term "genetic engineering" may refer to any manipulation of the genome of a host cell (e.g., insertion, deletion, mutation, or substitution of nucleic acids).

The term "control" or "control host cell" refers to an appropriate comparison host cell for determining the effect of a genetic modification or experimental treatment. In some embodiments, the control host cell is a wild-type cell. In other embodiments, the control host cell is genetically identical to the genetically modified host cell except for the genetic modification, thereby being distinct from the treated host cell.

As used herein, the term "allele" means any one of one or more alternative forms of a gene, all alleles of which are involved in at least one trait or characteristic. In diploid cells, both alleles of a given gene occupy corresponding loci on a pair of homologous chromosomes.

As used herein, the term "locus" (loci) is used to mean a plurality of loci (loci) at which one or more specific locations or sites on a chromosome are found, e.g., a gene or gene marker.

As used herein, the term "genetically linked" means that two or more traits are inherited together at a high ratio during breeding, making them difficult to separate by crossing.

As used herein, "recombination" or "recombination event" refers to chromosome swapping or independent classification.

As used herein, the term "phenotype" refers to an observable characteristic of an individual cell, cell culture, organism, or group of organisms that results from the interplay between the genetic makeup (i.e., genotype) of that individual and the environment.

As used herein, the term "chimeric" or "recombinant" when describing a nucleic acid sequence or protein sequence refers to a nucleic acid or protein sequence that joins at least two heterologous polynucleotides or two heterologous polypeptides into a single macromolecule or rearranges one or more elements of at least one native nucleic acid or protein sequence. For example, the term "recombinant" may refer to an artificial combination of two otherwise isolated sequence segments, such as occurs by chemical synthesis or by manipulation of the isolated nucleic acid segments by genetic engineering techniques.

As used herein, a "synthetic nucleotide sequence" or "synthetic polynucleotide sequence" is a nucleotide sequence that is known not to exist in nature or not to exist in nature. In general, such synthetic nucleotide sequences will comprise at least one nucleotide difference when compared to any other naturally occurring nucleotide sequence.

As used herein, a "synthetic amino acid sequence" or a "synthetic peptide" or a "synthetic protein" is an amino acid sequence that is known not to exist in nature or not to exist in nature. In general, such synthetic protein sequences will comprise at least one amino acid difference when compared to any other naturally occurring protein sequence.

As used herein, the term "nucleic acid" refers to a polymeric form of nucleotides (ribonucleotides or deoxyribonucleotides) of any length or an analog thereof. This term refers to the primary structure of the molecule and thus includes double-and single-stranded DNA, as well as double-and single-stranded RNA. It also includes modified nucleic acids, such as methylated and/or blocked nucleic acids, nucleic acids containing modified bases, backbone modifications, and the like. The terms "nucleic acid" and "nucleotide sequence" are used interchangeably.

As used herein, the term "gene" refers to any segment of DNA associated with a biological function. Thus, a gene includes, but is not limited to, coding sequences and/or regulatory sequences required for its expression. Genes may also include unexpressed DNA segments, e.g., forming recognition sequences for other proteins. Genes can be obtained from a variety of sources, including cloning from a source of interest or synthesis from known or predicted sequence information, and can include sequences designed to have desired parameters.

As used herein, the term "homology" or "homolog" or "ortholog" is known in the art and refers to related sequences that share a common ancestor or family member and that can be inferred based on the degree of sequence identity. The terms "homology," "homologous," "substantially similar," and "substantially corresponding" are used interchangeably herein. It refers to a nucleic acid fragment wherein changes in one or more nucleotide bases do not affect the ability of the nucleic acid fragment to mediate gene expression or produce a certain phenotype. These terms also refer to modifications of the nucleic acid fragments of the invention, such as deletions or insertions of one or more nucleotides that do not substantially alter the functional properties of the resulting nucleic acid fragment relative to the original, unmodified fragment. It is therefore to be understood that, as one of ordinary skill in the art will appreciate, the invention encompasses sequences other than the specific exemplary sequences described. These terms describe the relationship between a gene found in one species, subspecies, variety, cultivar or line and the corresponding or equivalent gene in another species, subspecies, variety, cultivar or line. For the purposes of the present invention, homologous sequences are compared. "homologous sequences" or "homologues" or "orthologues" are considered, believed or known to be functionally related. The functional relationship may be indicated in any of a number of ways, including but not limited to: (a) a degree of sequence identity and/or (b) a biological function that is the same or similar. Preferably, the indications (a) and (b). Homology can be determined using software programs readily available in the art, such as those discussed in the modern Molecular Biology experimental techniques (Current Protocols in Molecular Biology) (edited by f.m. aust (f.m. ausubel) et al, 1987) subp 30, section 7.718, table 7.71. Some alignment programs are MacVector (Oxford Molecular Ltd, Oxford, u.k.)), ALIGN Plus (Scientific and economic Software, Pennsylvania), and ALIGN x (Vector NTI, Invitrogen, Carlsbad, CA). Another alignment program is Sequencher (Gene Codes, Ann Arbor, Michigan), which uses default parameters.

As used herein, the term "endogenous" or "endogenous gene" refers to a naturally occurring gene at a location where it is found to be naturally present within the genome of a host cell. In the context of the present invention, a heterologous promoter operably linked to an endogenous gene means that the heterologous promoter sequence is in a position where the gene naturally occurs prior to being genetically inserted into an existing gene. Endogenous genes as described herein may include alleles of naturally occurring genes that have been mutated according to any of the methods of the invention.

As used herein, the term "exogenous" is used interchangeably with the term "heterologous" and refers to material from some source other than its native source. For example, the term "exogenous protein" or "exogenous gene" refers to a protein or gene that is derived from a non-native source or location and that has been provided into a biological system by artificial means.

As used herein, the term "nucleotide change" refers to, for example, a nucleotide substitution, deletion, and/or insertion, as is well understood in the art. For example, mutations contain changes that can produce silent substitutions, additions or deletions, but do not alter the properties or activity of the encoded protein or how the protein is made.

As used herein, the term "protein modification" refers to, for example, amino acid substitutions, amino acid modifications, deletions, and/or insertions, as are well understood in the art.

As used herein, the term "at least a portion" or "fragment" of a nucleic acid or polypeptide means a portion having the smallest dimension characteristic of such sequence, or any larger fragment of a full-length molecule, up to and including the full-length molecule. The polynucleotide fragments of the present invention may encode biologically active portions of the regulatory elements of the gene. Biologically active portions of gene regulatory elements can be prepared by isolating a portion of one of the polynucleotides of the invention comprising the gene regulatory element and assessing the activity as described herein. Similarly, a portion of a polypeptide can be 4 amino acids, 5 amino acids, 6 amino acids, 7 amino acids, and the like, up to the full-length polypeptide. The length of the portion to be used will depend on the particular application. A portion of nucleic acid suitable for use as a hybridization probe may be as short as 12 nucleotides; in some embodiments, it is 20 nucleotides. A portion of a polypeptide suitable for use as an epitope may be as short as 4 amino acids. The portion of the polypeptide that functions as a full-length polypeptide will typically be longer than 4 amino acids.

Variant polynucleotides also encompass sequences derived from mutagenesis and recombinogenic procedures, such as DNA shuffling. Strategies for such DNA shuffling are known in the art. See, e.g., Schtermer (Stemmer) (1994) PNAS 91: 10747-10751; st mercer (1994) Nature 370: 389-391; chemerin (Crameri) et al (1997) Nature Biotech (Nature Biotech.) 15: 436-; moore et al (1997) journal of molecular biology (J.mol.biol.) 272: 336-347; zhang (Zhang) et al (1997) PNAS 94: 4504-4509; kaimeriy et al (1998) Nature 391: 288-291; and U.S. Pat. nos. 5,605,793 and 5,837,458.

In the case of PCR amplification of the polynucleotides disclosed herein, oligonucleotide primers used in PCR reactions can be designed to amplify corresponding DNA sequences from cDNA or genomic DNA extracted from any organism of interest. Methods for designing PCR primers and PCR cloning are generally known in the art and are disclosed in sabeluk (Sambrook) et al (2001), "molecular cloning: in the Experimental guidelines (Molecular Cloning: A Laboratory Manual) (3 rd edition, Cold Spring Harbor Laboratory Press, Plainview, New York). See also Innis (Innis) et al (1990) PCR protocols: instructions for Methods and Applications (PCR Protocols: AGuides to Methods and Applications), Academic Press, New York; ennes and Gilford (Gelfand) 1995 PCR Strategies (PCR Strategies) (academic Press, New York); and Insense and Gilfen (1999) handbook of PCR Methods Manual (academic Press, New York). Known PCR methods include, but are not limited to, methods using pair primers, nested primers, single specific primers, degenerate primers, gene specific primers, vector specific primers, partially mismatched primers, and the like.

As used herein, the term "primer" refers to an oligonucleotide that, when placed under conditions that induce synthesis of a primer extension product (i.e., in the presence of nucleotides and a polymerization agent (e.g., a DNA polymerase) and at a suitable temperature and pH), is capable of binding to an amplification target, thereby allowing the DNA polymerase to adhere, thereby serving as a point of initiation of DNA synthesis. The (amplification) primer is preferably single-stranded for maximum amplification efficiency. Preferably, the primer is an oligodeoxynucleotide. The primer must be long enough to prime the synthesis of extension products in the presence of the polymerization agent. The exact length of the primer will depend on a number of factors, including the temperature and composition of the primer (A/T versus G/C content). A pair of bidirectional primers consists of a forward and a reverse primer, as is commonly used in the field of DNA amplification, such as PCR amplification.

As used herein, "promoter" refers to a DNA sequence capable of controlling the expression of a coding sequence or functional RNA. In some embodiments, the promoter sequence consists of proximal and more distal upstream elements, the latter elements often referred to as enhancers. Thus, a "enhancer" is a DNA sequence that can stimulate promoter activity, and can be an intrinsic element of the promoter or a heterologous element inserted to enhance promoter content or tissue specificity. Promoters may be derived entirely from the native gene, or be composed of different elements derived from different promoters found in nature, or even comprise synthetic DNA segments. It will be appreciated by those skilled in the art that different promoters may direct gene expression in different tissues or cell types, or at different stages of development, or in response to different environmental conditions. It will also be appreciated that some variant DNA fragments may have the same promoter activity, since in most cases the precise boundaries of the regulatory sequences are not yet fully defined.

As used herein, the phrases "recombinant construct", "expression construct", "chimeric construct", "construct" and "recombinant DNA construct" are used interchangeably herein. Recombinant constructs comprise artificial combinations of nucleic acid fragments, e.g., regulatory and coding sequences not found together in nature. For example, a chimeric construct may comprise regulatory sequences and coding sequences that are derived from different sources, or regulatory sequences and coding sequences derived from the same source, but arranged in a manner different from that found in nature. Such constructs may be used alone or may be used in combination with a vector. As is well known to those skilled in the art, if a vector is used, the choice of vector will depend on the method used to transform the host cell. For example, plasmid vectors may be used. It is well known to those skilled in the art that in order to successfully transform, select and propagate a host cell comprising any of the isolated nucleic acid fragments of the present invention, the genetic element must be present on a vector. Those skilled in the art will also recognize that different independent transformation events will result in different expression levels and patterns (Jones et al, (1985), EMBO J. (EMBO J.) -4: 2411-2418; Dealmada (De Almeida) et al, (1989), molecular Gene genetics (mol. Gen. genetics) 218:78-86), and that multiple events must therefore be screened in order to obtain strains exhibiting the desired expression levels and patterns. Such screening can be accomplished by southern analysis of DNA, northern analysis of mRNA expression, immunoblot analysis or phenotypic analysis of protein expression, and the like. The vector may be a plasmid, virus, phage, provirus, phagemid, transposon, artificial chromosome, and the like, which autonomously replicates and can integrate into the chromosome of the host cell. The vector may also be a naked RNA polynucleotide that replicates non-autonomously, a naked DNA polynucleotide, a polynucleotide composed of DNA and RNA within the same strand, polylysine-bound DNA or RNA, peptide-bound DNA or RNA, liposome-bound DNA, or the like. As used herein, the term "expression" refers to the production of a functional end product, such as mRNA or protein (precursor or mature).

Herein, "operably linked" means the sequential arrangement of a promoter polynucleotide according to the invention and other oligonucleotides or polynucleotides, thereby causing transcription of said other polynucleotides.

As used herein, the term "product of interest" or "biomolecule" refers to any product produced by a microorganism in a feedstock. In some cases, the product of interest may be a small molecule, enzyme, peptide, amino acid, organic acid, synthetic compound, fuel, alcohol, and the like. For example, the product or biomolecule of interest can be any primary or secondary extracellular metabolite. The primary metabolites may be, inter alia, ethanol, citric acid, lactic acid, glutamic acid, glutamate, lysine, threonine, tryptophan and other amino acids, vitamins, polysaccharides, etc. The secondary metabolite may be, inter alia, an antibiotic compound, such as penicillin, or an immunosuppressant, such as cyclosporin a (cyclosporine a); plant hormones, such as gibberellins; statin drugs, such as lovastatin; fungicides, such as griseofulvin (griseofulvin), and the like. The product or biomolecule of interest may also be any intracellular component produced by a microorganism, such as: a microbial enzyme, comprising: catalytic enzymes, amylases, proteases, pectinases, glucose isomerases, cellulases, hemicellulases, lipases, lactases, streptokinases, and many others. Intracellular components may also include recombinant proteins such as: insulin, hepatitis B vaccine, interferon, granulocyte colony stimulating factor, streptokinase, etc.

The term "carbon source" generally refers to a substance suitable for use as a carbon source for cell growth. Carbon sources include, but are not limited to, biomass hydrolysate, starch, sucrose, cellulose, hemicellulose, xylose, and lignin, as well as monomeric components of these substrates. The carbon source may comprise various organic compounds in various forms including, but not limited to, polymers, carbohydrates, acids, alcohols, aldehydes, ketones, amino acids, peptides, and the like. These include, for example, various monosaccharides such as glucose, dextrose (D-glucose), maltose, oligosaccharides, polysaccharides, saturated or unsaturated fatty acids, succinates, lactates, acetates, ethanol, and the like, or mixtures thereof. The photosynthetic organism may additionally produce a carbon source in the form of a photosynthetic product. In some embodiments, the carbon source may be selected from biomass hydrolysate and glucose.

The term "feedstock" is defined as a raw material or a mixture of raw materials that is supplied to a microorganism or a fermentation process with which other products can be produced. For example, a carbon source, such as biomass or carbon compounds derived from biomass, is a feedstock for a microorganism to produce a product of interest (e.g., small molecules, peptides, synthetic compounds, fuels, ethanol, etc.) in a fermentation process. However, the feedstock may contain nutrients other than a carbon source.

The term "volumetric productivity" or "production rate" is defined as the amount of product formed per volume of medium per unit time. Volumetric productivity may be reported in grams per liter per hour (g/L/h).

The term "specific productivity" is defined as the rate of formation of the product. Specific productivity is further defined herein as the specific productivity expressed in grams of product per gram of dry cell weight (CDW)/hour (g/g CDW/h). Use of CDW and OD for specified microorganisms600The specific productivity can also be expressed in grams of product per liter of medium per 600nm of cultureOptical Density (OD)/hour (g/L/h/OD).

The term "yield" is defined as the amount of product obtained per unit weight of starting material and can be expressed in grams product per gram substrate (g/g). Yield can be expressed as a percentage of theoretical yield. "theoretical yield" is defined as the maximum amount of product that can be produced per specified amount of substrate, as specified by the stoichiometry of the metabolic pathway used to make the product.

The term "titer" or "potency" is defined as the strength of a solution or the concentration of a substance in a solution. For example, the titer of a product of interest (e.g., small molecule, peptide, synthetic compound, fuel, ethanol, etc.) in a fermentation broth is described as grams of product of interest per liter of fermentation broth (g/L) in solution.

The term "total titer" is defined as the sum of all products of interest produced in a process, including but not limited to the product of interest in solution, the product of interest in the gas phase (if applicable), and any product of interest removed from the process and recovered relative to the initial volume in the process or the operating volume in the process.

The term "insecticidal protein" or "insecticidal toxin" is used to refer to a protein having toxic activity against one or more pests. Examples of pests include various orders of insects, including: lepidoptera, diptera, hemiptera, and coleoptera (to name a few). Pests also include non-insect organisms that are pests to agriculture, including, for example, members of the phylum nematoda.

Insecticidal/pesticidal proteins

The present invention teaches an insecticidal protein discovery platform and insecticidal proteins discovered therefrom. However, it should be understood that the term "insecticidal" is not limited to only insects, but encompasses a broader classification group of organisms commonly referred to as "pests". Thus, the phrase "insecticidal protein" may be considered synonymous with "insecticidal protein", and the phrase "insecticidal protein discovery platform" may be considered synonymous with "insecticidal protein discovery platform". Furthermore, in some aspects, the invention provides insecticidal toxins and platforms for discovery of insecticidal toxins, which may not be limited to protein examples.

Insecticidal protein-monatin

Pseudomonas entomophila (Pseudomonas entomophila) is an entomopathogenic bacterium that infects and kills fruit flies. The pathogenicity of the insect-loving pseudomonas is related to its ability to cause irreversible damage to the gut of drosophila, preventing epithelial regeneration and repair. Recently, ompta (Opota) and colleagues reported the recognition of a novel pore-forming toxin (PFT), known as "monasin", contributing to the toxicity of the insect-loving pseudomonas against drosophila. Ompta et al, "monatin," a Novel β -Pore-Forming Toxin from the Drosophila Pathogen Pseudomonas species, an insect-preferred Pathogen, causes Damage and death to the gut of the Host (Monalysin, a Novel β -Pore-Forming Toxin from the Drosophila pathologist, contibutes to Host Intestinal dam and Lehealth), "PLoS etiology (PLoS pathologists), 9.2011, Vol.7, journal 9. Ompta showed that monatin requires N-terminal cleavage for complete activation, forms oligomers in vitro, and induces pore formation in the artificial lipid membrane. Prediction of the secondary structure of the transmembrane domain indicates that monacin is a β -type PFT. The expression of monacin is regulated by both the GacS/GacA two-component system and the Pvf regulator, both signaling systems controlling the pathogenicity of Pseudomonas entomophilia. In addition, AprA (a metalloprotease secreted by pseudomonas entomophila) can induce rapid cleavage of prommonasin into its active form. Reduced cell death was observed following infection with the deficient mutant at the time of monatin production, suggesting that monatin plays a role in the ability of pseudomonas insect-loving to induce intestinal cell damage, consistent with its activity as a PFT. The alplata study, together with the recognized role of the bacillus thuringiensis Cry toxins, suggests that PFT production is a common strategy for insect pathogens to disrupt stabilization in the insect gut. As above.

Ompta found monatin (PSEEN3174) by characterizing the protein product of the unknown gene PSEEN 3174. According to opper, the monatin amino acid sequence did not show homology to other sequences using P Blast, except for two uncharacterized orthologs found in the Pseudomonas putida (Pseudomonas putida) F1 strain (fig. S1 of opper). Neither the insect-loving pseudomonas nor the pseudomonas putida gene products show any obvious protein domains. However, opper uses HHpred software (homology detection and structure prediction by HMM-HMM comparison) to reveal the presence of an internal region with alternating polar and hydrophobic residues flanked by a stretch of serine and threonine residues, which are indicative of the transmembrane region of the β -barrel pore-forming toxin. As above.

DNA sequence retrieval and analysis of opperta was performed using a pseudomonas genome database (pseudomonas. com, which can be accessed over the world wide web using the "www" prefix). The monasin gene (ORF PSEEN3174) corresponds to the deposit number YP _ 608728.1. The monatin putative orthologs in pseudomonas putida Pput _1063 and Pput _1064 correspond to the deposit numbers YP _001266408.1 and YP _001266409.1, respectively. ORFPSEEN0535, which is involved in the production of type VI secretion system, corresponds to the registration number YP _ 606298.1.

Insecticidal protein-Pseudomonas Insecticidal Protein (PIP)

There are several known families of pseudomonas insecticidal proteins, including: PIP-1, 45, 47, 64, 72, 74, 75 and 77. These PIP proteins and the recognition characteristics are provided in table 1 below. Other information can be found in: (1) schrenberger et al, "a selective insecticidal protein from Pseudomonas for controlling corn rootworms" (Science), 2016, 4/11; 354(6312) 634-637 (IPD 072Aa, an 86AA protein, GenBank accession No. KT795291), which is incorporated herein by reference; and (2) Junzhi (Jun-Zhi Wei) et al, "a selective insecticidal protein from Pseudomonas mosselii for corn rootworm control," Journal of Plant Biotechnology (Plant Biotechnology Journal), 2018, 16 th, 649 page 659 (providing PIP-47aa), incorporated herein by reference.

TABLE 1 Pseudomonas Insecticidal Proteins (PIP) and monatin

Figure BDA0002647180000000141

Figure BDA0002647180000000161

1All application publications in table 1 are incorporated herein by reference.

2Shown before the sequence is the SEQ ID NO of the original source application/disclosure, shown after the sequence and underlined is the SEQ ID NO according to the current application.

Insecticidal proteins-Cry proteins

Bacillus thuringiensis (Bt) is a gram-positive, spore-forming bacterium with entomopathogenic properties. During the sporulation phase Bt produces insecticidal proteins as parasporal crystals. These crystals are mainly composed of one or more proteins (Cry and Cyt toxins), also known as endotoxins. Cry proteins are companion cell inclusion (crystal) proteins from bacillus thuringiensis that exhibit experimentally detectable toxic effects on target organisms or have significant sequence similarity to known Cry proteins. Similarly, Cyt protein is a chaperone inclusion protein from bacillus thuringiensis, exhibiting hemolytic (cytolytic) activity or significant sequence similarity to known Cyt proteins. These toxins are highly specific for their target insects, are harmless to humans, vertebrates and plants, and are completely biodegradable. Bravo a (bravo a), gill ss (gill ss), sobaroulon M. (Sober up nM.), "the Mode of action of bacillus thuringiensis Cry and Cyt toxins and their insect control potential (Mode of action of bacillus thuringiensis Cry and Cyt toxins and the insect potential for insects)," "poison (Toxicon)": international Society of toxicology Official publication (Official Journal of the International Society on toxicology) 2007; 49(4):423-435.

Bt Cry and Cyt toxins belong to a class of bacterial toxins called pore-forming toxins (PFTs), which are secreted as water-soluble proteins that undergo conformational changes in order to insert into or translocate across the cell membrane of their host. There are two main groups of PFTs: (i) an alpha-helical toxin in which the alpha-helical region forms a transmembrane pore, and (ii) a beta-barrel toxin which is inserted into the membrane by forming a beta-barrel composed of beta-sheet clips from each monomer. See park mw (parker mw), fel sc (feilsc), "pore-forming protein toxins: from structure to function (Pore-forming protein functions), "(prog. biophysis. mol. biol.) progress in biophysics and molecular biology (prog. biophysis. mol. biol.) for 5 months 2005; 88(1):91-142. The first category of PFTs includes toxins such as colicin (colicin), exotoxin a, diphtheria toxin, and Cry three-domain toxins. On the other hand, aerolysin (aerolysin), α -hemolysin, anthrax protective antigen, cholesterol dependent toxin as lytic element (perfringolysin) O and Cyt toxin belongs to β -barrel toxin. As above. Generally, PFT producing bacteria secrete their toxins and these toxins interact with specific receptors located on the surface of host cells. In most cases, PFT is activated by host proteases after receptor binding that induces the formation of insertion-competent oligomeric structures. Finally, in most cases, membrane insertion is triggered by a decrease in the pH of the molten globule state of the induced protein. As above.

The development of transgenic crops that produce Bt Cry proteins allows for the replacement of chemical pesticides by environmentally friendly alternatives. In transgenic plants, the Cry toxins are continuously produced, protecting the toxins from degradation and making them accessible to chewing and boring insects. Cry protein production in plants has been improved by engineering the Cry gene with plant-shifted codon usage, by removing the putative splicing signal sequence and deleting the carboxy-terminal region of the protoxin. See Shule TH (Schuler TH), et al, "Insect-resistant transgenic plants" (Trends Biotechnol), "1998; 16:168-175. The use of insect resistant crops has significantly reduced the use of chemical pesticides in the area where these transgenic crops are planted. See Gemm M (Qaim M), Zilberman D (Zilberman D), "Yield impact of transgenic crops in developing countries (Yield effects of genetically modified crops in developing countries)," science, 2.7.2003; 299(5608):900-902.

Known Cry proteins include: endotoxins, including but not limited to: cry1, Cry2, Cry3, Cry4, Cry5, Cry6, Cry7, Cry8, Cry9, Cry10, Cry11, Cry12, Cry13, Cry14, Cry15, Cry16, Cry17, Cry18, Cry19, Cry20, Cry21, Cry22, Cry23, Cry24, Cry25, Cry26, Cry27, Cry 28, Cry 29, Cry 30, Cry31, Cry32, Cry33, Cry34, Cry35, Cry36, Cry37, Cry38, Cry39, Cry40, Cry41, Cry42, Cry 3646, Cry42, Cry 51, Cry 42.

Members of these classes of bacillus thuringiensis insecticidal proteins include, but are not limited to: CrylAa1 (accession number AAA 22353); cry1Aa2 (registration number AAA 22552); cry1Aa3 (accession number BAA 00257); cry1Aa4 (accession number CAA 31886); cry1Aa5 (accession number BAA 04468); cry1Aa6 (accession number AAA 86265); cry1Aa7 (accession number AAD 46139); cry1Aa8 (accession number 126149); cry1Aa9 (accession number BAA 77213); CrylAa10 (accession number AAD 55382); CrylAa11 (accession number CAA 70856); cry1Aa12 (accession number AAP 80146); cry1Aa13 (accession number AAM 44305); cry1Aa14 (accession number AAP 40639); cry1Aa15 (accession number AAY 66993); cry1Aa16 (accession No. HQ 439776); cry1Aa17 (accession No. HQ 439788); cry1Aa18 (accession No. HQ 439790); cry1Aa19 (accession No. HQ 685121); cry1Aa20 (accession number JF 340156); cry1Aa21 (accession number JN 651496); cry1Aa22 (accession number KC 158223); cry1Abl (accession number AAA 22330); cry1Ab2 (accession number AAA 22613); cry1Ab3 (register number AAA 22561); cry1Ab4 (accession number BAA 00071); cry1Ab5 (accession number CAA 28405); cry1Ab6 (accession number AAA 22420); cry1Ab7 (accession number CAA 31620); cry1Ab8 (register number AAA 22551); cry1Ab9 (accession number CAA 38701); cry1Ab10 (accession number a 29125); CrylAb11 (accession number Il 2419); cry1Ab12 (accession number AAC 64003); cry1Ab13 (accession number AAN 76494); cry1Ab14 (accession number AAG 16877); cry1Ab15 (accession number AA 013302); cry1Ab16 (accession number AAK 55546); cry1Ab17 (accession number AAT 46415); cry1Ab18 (accession number AAQ 88259); cry1Ab19 (accession number AAW 31761); cry1Ab20 (accession number ABB 72460); cry1Ab21 (accession number ABS 18384); cry1Ab22 (accession number ABW 87320); cry1Ab23 (accession No. HQ 439777); cry1Ab24 (accession No. HQ 439778); cry1Ab25 (accession No. HQ 685122); cry1Ab26 (accession No. HQ 847729); cry1Ab27 (accession No. JN 135249); cry1Ab28 (accession No. JN 135250); cry1Ab29 (accession number JN 135251); cry1Ab30 (accession number JN 135252); cry1Ab31 (accession number JN 135253); cry1Ab32 (accession No. JN 135254); cry1Ab33 (accession number AAS 93798); cry1Ab34 (accession number KC 156668); cry1Ab-like (accession number AAK 14336); cry1Ab-like (accession number AAK 14337); cry1Ab-like (accession number AAK 14338); cry1Ab-like (accession number ABG 88858); cry1Ac1 (accession number AAA 22331); cry1Ac2 (accession number AAA 22338); cry1Ac3 (accession number CAA 38098); cry1Ac4 (accession number AAA 73077); cry1Ac5 (accession number AAA 22339); cry1Ac6 (accession number AAA 86266); cry1Ac7 (accession number AAB 46989); cry1Ac8 (accession number AAC 44841); cry1Ac9 (accession number AAB 49768); CrylAc10 (accession number CAA 05505); CrylAc11 (accession number CAA 10270); cry1Ac12 (accession No. Il 2418); cry1Ac13 (accession number AAD 38701); cry1Ac14 (accession number AAQ 06607); cry1Ac15 (accession number AAN 07788); cry1Ac16 (accession number AAU 87037); CrylAc17 (accession number AAX 18704); cry1Ac18 (accession number AAY 88347); cry1Ac19 (accession number ABD 37053); cry1Ac20 (accession number ABB 89046); cry1Ac21 (accession number AAY 66992); cry1Ac22 (accession number ABZ 01836); cry1Ac23 (accession number CAQ 30431); cry1Ac24 (accession number ABL 01535); cry1Ac25 (accession number FJ 513324); cry1Ac26 (accession number FJ 617446); cry1Ac27 (accession number FJ 617447); cry1Ac28 (accession number ACM 90319); cry1Ac29 (accession number DQ 438941); cry1Ac30 (accession number GQ 227507); cry1Ac31 (accession number GU 446674); cry1Ac32 (accession number HM 061081); cry1Ac33 (accession number GQ 866913); cry1Ac34 (accession No. HQ 230364); cry1Ac35 (accession No. JF 340157); cry1Ac36 (accession number JN 387137); cry1Ac37 (accession No. JQ 317685); cryad 1 (accession number AAA 22340); cry1Ad2 (accession number CAA 01880); CrylAe1 (accession number AAA 22410); CrylAf1 (accession number AAB 82749); cryl ag1 (accession number AAD 46137); CrylAh1 (accession number AAQ 14326); cry1Ah2 (accession number ABB 76664); cry1Ah3 (accession No. HQ 439779); cryl ai1 (accession number AA 039719); cry1Ai2 (accession No. HQ 439780); cry1A-like (accession number AAK 14339); cry1Bal (accession number CAA 29898); cry1Ba2 (accession number CAA 65003); cry1Ba3 (accession number AAK 63251); cry1Ba4 (accession number AAK 51084); cry1Ba5 (accession No. AB 020894); cry1Ba6 (accession number ABL 60921); cry1Ba7 (accession No. HQ 439781); CrylBb1 (accession number AAA 22344); cry1Bb2 (accession No. HQ 439782); cry1Bcl (accession number CAA 86568); cry1Bdl (accession number AAD 10292); cry1Bd2 (accession number AAM 93496); CrylBe1 (accession number AAC 32850); cry1Be2 (accession number AAQ 52387); cry1Be3 (accession number ACV 96720); cry1Be4 (accession number HM 070026); CrylBf1 (accession number CAC 50778); cry1Bf2 (accession number AAQ 52380); cry1Bgl (accession No. AA 039720); cry1Bhl (accession No. HQ 589331); cry1Bil (accession number KC 156700); cry1Cal (accession number CAA 30396); cry1Ca2 (accession number CAA 31951); cry1Ca3 (accession number AAA 22343); cry1Ca4 (accession number CAA 01886); cry1Ca5 (accession number CAA 65457); cry1Ca6[1] (accession number AAF 37224); cry1Ca7 (accession number AAG 50438); cry1Ca8 (accession number AAM 00264); cry1Ca9 (accession number AAL 79362); CrylCa10 (accession number AAN 16462); cry1Ca11 (accession number AAX 53094); cry1Ca12 (accession number HM 070027); cry1Ca13 (accession No. HQ 412621); cry1Ca14 (accession number JN 651493); CrylCb1 (register number M97880); cry1Cb2 (register number AAG 35409); cry1Cb3 (accession number ACD 50894); cry1Cb-like (accession number AAX 63901); cry1Dal (accession number CAA 38099); cry1Da2 (accession number I76415); cry1Da3 (accession number HQ 439784); cry1 Dbl (accession number CAA 80234); cry1 Db2 (accession number AAK 48937); cry1 Dcl (deposit No. ABK 35074); cry1Eal (accession number CAA 37933); cry1Ea2 (accession number CAA 39609); cry1Ea3 (accession number AAA 22345); cry1Ea4 (accession number AAD 04732); cry1Ea5 (register number a 15535); cry1Ea6 (accession number AAL 50330); cry1Ea7 (accession number AAW 72936); cry1Ea8 (accession number ABX 11258); cry1Ea9 (accession No. HQ 439785); cryea 10 (accession number ADR 00398); CrylEa11 (accession number JQ 652456); cry1Ebl (accession number AAA 22346); cry1Fal (accession number AAA 22348); cry1Fa2 (accession number AAA 22347); cry1Fa3 (accession number HM 070028); cry1Fa4 (accession number HM 439638); CrylFbl (accession number CAA 80235); cry1Fb2 (register number BAA 25298); cry1Fb3 (accession number AAF 21767); cry1Fb4 (accession number AAC 10641); cry1Fb5 (accession number AA 013295); cry1Fb6 (accession number ACD 50892); cry1Fb7 (accession number ACD 50893); crygal (accession number CAA 80233); cry1Ga2 (accession number CAA 70506); crygbl (register number AAD 10291); cry1Gb2 (accession number AA 013756); CrylGcl (accession number AAQ 52381); CrylHa1 (accession number CAA 80236); CryHbl (accession number AAA 79694); cry1Hb2 (accession No. HQ 439786); CrylH-like (accession number AAF 01213); CrylIal (register number CAA 44633); cry1Ia2 (accession number AAA 22354); cry1Ia3 (accession number AAC 36999); cry1Ia4 (accession number AAB 00958); cry1Ia5 (accession number CAA 70124); cry1Ia6 (accession number AAC 26910); cry1Ia7 (accession number AAM 73516); cry1Ia8 (accession number AAK 66742); cry1Ia9 (accession number AAQ 08616); cry1Ia10 (accession number AAP 86782); cryla 11 (accession number CAC 85964); cry1Ia12 (accession number AAV 53390); cry1Ia13 (accession number ABF 83202); cry1Ia14 (register number ACG 63871); cry1Ia15 (register number FJ 617445); cry1Ia16 (register number FJ 617448); CrylIal7 (accession number GU 989199); cryllal 8 (accession number ADK 23801); crylal 9 (accession number HQ 439787); cry1Ia20 (accession number JQ 228426); cry1Ia21 (accession number JQ 228424); cry1Ia22 (accession number JQ 228427); cry1Ia23 (accession number JQ 228428); cry1Ia24 (accession number JQ 228429); cry1Ia25 (accession number JQ 228430); cry1Ia26 (accession number JQ 228431); cry1Ia27 (accession number JQ 228432); cry1Ia28 (accession number JQ 228433); cry1Ia29 (accession number JQ 228434); cry1Ia30 (accession No. JQ 317686); cry1Ia31 (register No. JX 944038); cry1Ia32 (register No. JX 944039); cry1Ia33 (accession number JX 944040); crylb 1 (register number AAA 82114); cry1Ib2 (accession number ABW 88019); cry1Ib3 (register number ACD 75515); cry1Ib4 (register number HM 051227); cry1Ib5 (accession number HM 070028); cry1Ib6 (accession number ADK 38579); cry1Ib7 (accession number JN 571740); cry1Ib8 (accession number JN 675714); cry1Ib9 (accession number JN 675715); cry1Ib10 (accession number JN 675716); crylbll (accession number JQ 228423); cryllcl (accession number AAC 62933); cry1Ic2 (accession No. AAE 71691); CrylDl (accession number AAD 44366); cry1Id2 (accession number JQ 228422); CrylIel (accession number AAG 43526); cry1Ie2 (accession number HM 439636); cry1Ie3 (accession number KC 156647); cry1Ie4 (accession number KC 156681); cryllfl (accession number AAQ 52382); CrylIgl (accession No. KC 156701); CrylI-like (accession number AAC 31094); CrylI-like (accession number ABG 88859); cryjal (accession number AAA 22341); cry1Ja2 (accession number HM 070030); cry1Ja3 (accession number JQ 228425); CrylJbl (accession number AAA 98959); cryjcl (accession number AAC 31092); cry1Jc2 (accession number AAQ 52372); CrylJdl (accession number CAC 50779); crylsal (accession number AAB 00376); cry1Ka2 (accession No. HQ 439783); crylal (accession number AAS 60191); cry1La2 (accession number HM 070031); CrylMal (accession number FJ 884067); cry1Ma2 (accession number KC 156659); CrylNal (accession No. KC 156648); CrylNbl (accession number KC 156678); cryl-like (accession number AAC 31091); cry2Aal (accession number AAA 22335); cry2Aa2 (accession number AAA 83516); cry2Aa3 (accession number D86064); cry2Aa4 (accession number AAC 04867); cry2Aa5 (deposit number CAA 10671); cry2Aa6 (deposit number CAA 10672); cry2Aa7 (deposit number CAA 10670); cry2Aa8 (accession number Aa 013734); cry2Aa9 (accession number Aa 013750); cry2Aa10 (accession number AAQ 04263); cry2Aa11 (accession number AAQ 52384); cry2Aa12 (accession No. AB 183671); cry2Aa13 (accession number ABL 01536); cry2Aa14 (accession number ACF 04939); cry2Aa15 (accession number JN 426947); cry2Abl (accession number AAA 22342); cry2Ab2 (accession number CAA 39075); cry2Ab3 (accession number AAG 36762); cry2Ab4 (accession number AA 013296); cry2Ab5 (accession number AAQ 04609); cry2Ab6 (accession number AAP 59457); cry2Ab7 (accession number AAZ 66347); cry2Ab8 (accession number ABC 95996); cry2Ab9 (accession number ABC 74968); cry2Ab10 (accession number EF 157306); cry2Ab11 (register number CAM 84575); cry2Ab12 (accession number ABM 21764); cry2Ab13 (accession number ACG 76120); cry2Ab14 (accession number ACG 76121); cry2Ab15 (accession number HM 037126); cry2Ab16 (register No. GQ 866914); cry2Abl 7 (accession No. HQ 439789); cry2Ab18 (accession number JN 135255); cry2Ab19 (accession number JN 135256); cry2Ab20 (accession No. JN 135257); cry2Ab21 (accession number JN 135258); cry2Ab22 (accession number JN 135259); cry2Ab23 (accession number JN 135260); cry2Ab24 (accession No. JN 135261); cry2Ab25 (accession number JN 415485); cry2Ab26 (accession number JN 426946); cry2Ab27 (accession number JN 415764); cry2Ab28 (accession number JN 651494); cry2Acl (accession number CAA 40536); cry2Ac2 (accession number AAG 35410); cry2Ac3 (accession number AAQ 52385); cry2Ac4 (accession number ABC 95997); cry2Ac5 (accession number ABC 74969); cry2Ac6 (accession number ABC 74793); cry2Ac7 (accession number CAL 18690); cry2Ac8 (accession number CAM 09325); cry2Ac9 (register number CAM 09326); cry2Ac10 (accession number ABN 15104); cry2Ac11 (register number CAM 83895); cry2Ac12 (register number CAM 83896); cry2Adl (accession No. AAF 09583); cry2Ad2 (accession number ABC 86927); cry2Ad3 (accession number CAK 29504); cry2Ad4 (accession number CAM 32331); cry2Ad5 (accession number CA 078739); cry2Ael (accession number AAQ 52362); cry2Afl (accession number AB 030519); cry2Af2 (register number GQ 866915); cry2Agl (accession number ACH 91610); cry2Ahl (accession number EU 939453); cry2Ah2 (accession number ACL 80665); cry2Ah3 (accession No. GU 073380); cry2Ah4 (accession number KC 156702); cry2Ail (register number FJ 788388); cry2Aj (accession number); cry2Akl (accession No. KC 156660); cry2Bal (accession number KC 156658); cry3Aal (accession number AAA 22336); cry3Aa2 (registration number AAA 22541); cry3Aa3 (accession number CAA 68482); cry3Aa4 (registration number AAA 22542); cry3Aa5 (accession number AAA 50255); cry3Aa6 (accession number AAC 43266); cry3Aa7 (accession number CAB 41411); cry3Aa8 (accession number AAS 79487); cry3Aa9 (accession number AAW 05659); cry3Aa10 (accession number AAU 29411); cry3Aa11 (accession number AAW 82872); cry3Aa12 (accession number ABY 49136); cry3Bal (accession number CAA 34983); cry3Ba2 (accession number CAA 00645); cry3Ba3 (accession number JQ 39327); cry3Bbl (accession number AAA 22334); cry3Bb2 (accession number AAA 74198); cry3Bb3 (accession number Il 5475); cry3Cal (accession number CAA 42469); cry4Aal (accession number CAA 68485); cry4Aa2 (accession number BAA 00179); cry4Aa3 (accession number CAD 30148); cry4Aa4 (accession number AFB 18317); cry4A-like (accession number AAY 96321); cry4Bal (accession number CAA 30312); cry4Ba2 (accession number CAA 30114); cry4Ba3 (accession number AAA 22337); cry4Ba4 (accession number BAA 00178); cry4Ba5 (accession number CAD 30095); cry4Ba-like (accession number ABC 47686); cry4Cal (accession number EU 646202); cry4Cbl (register number FJ 403208); cry4Cb2 (register number FJ 597622); cry4Ccl (register number FJ 403207); cry5Aal (accession number AAA 67694); cry5Abl (accession number AAA 67693); cry5Acl (accession number I34543); cry5Adl (accession number ABQ 82087); cry5Bal (accession number AAA 68598); cry5Ba2 (accession number ABW 88931); cry5Ba3 (accession number AFJ 04417); cry5Cal (accession number HM 461869); cry5Ca2 (accession number ZP _ 04123426); cry5Dal (accession number HM 461870); cry5Da2 (deposit number ZP _ 04123980); cry5Eal (accession number HM 485580); cry5Ea2 (deposit number ZP _ 04124038); cry6Aal (accession number AAA 22357); cry6Aa2 (accession number AAM 46849); cry6Aa3 (accession number ABH 03377); cry6Bal (accession number AAA 22358); cry 7Aal (accession number AAA 22351); cry7Abl (accession number AAA 21120); cry7Ab2 (accession number AAA 21121); cry7Ab3 (accession number ABX 24522); cry7Ab4 (accession number EU 380678); cry7Ab5 (accession number ABX 79555); cry7Ab6 (accession number ACI 44005); cry7Ab7 (accession number ADB 89216); cry7Ab8 (accession number GU 145299); cry7Ab9 (accession number ADD 92572); cry7Bal (accession number ABB 70817); cry7Bbl (accession number KC 156653); cry7Cal (accession number ABR 67863); cry7Cbl (accession number KC 156698); cry7Dal (accession number ACQ 99547); cry7Da2 (accession number HM 572236); cry7Da3 (deposit No. KC 156679); cry7Eal (accession number HM 035086); cry7Ea2 (registration number HM 132124); cry7Ea3 (register number EEM 19403); cry7Fal (accession number HM 035088); cry7Fa2 (register number EEM 19090); cry7Fbl (accession number HM 572235); cry7Fb2 (accession number KC 156682); cry7Gal (accession number HM 572237); cry7Ga2 (accession number KC 156669); cry7Gbl (deposit No. KC 156650); cry7Gcl (accession number KC 156654); cry7Gdl (accession number KC 156697); cry7Hal (deposit No. KC 156651); cry7Ial (deposit No. KC 156665); cry7Jal (accession No. KC 156671); cry7Kal (accession No. KC 156680); cry7Kbl (accession number BAM 99306); cry7Lal (accession number BAM 99307); cry8Aal (accession number AAA 21117); cry8Abl (accession number EU 044830); cry8Acl (accession number KC 156662); cry8Adl (accession No. KC 156684); cry8Bal (accession number AAA 21118); cry8Bbl (accession number CAD 57542); cry8Bcl (accession number CAD 57543); cry8Cal (accession number AAA 21119); cry8Ca2 (accession number AAR 98783); cry8Ca3 (accession No. EU 625349); cry8Ca4 (accession number ADB 54826); cry8Dal (accession number BAC 07226); cry8Da2 (deposit No. BD 133574); cry8Da3 (deposit No. BD 133575); cry8Dbl (accession number BAF 93483); cry8Eal (accession number AAQ 73470); cry8Ea2 (accession No. EU 047597); cry8Ea3 (deposit No. KC 855216); cry8Fal (accession number AAT 48690); cry8Fa2 (accession number HQl 74208); cry8Fa3 (accession number AFH 78109); cry8Gal (accession number AAT 46073); cry8Ga2 (accession number ABC 42043); cry8Ga3 (accession number FJ 198072); cry8Hal (accession number AAW 81032); cry8Ial (accession number EU 381044); cry8Ia2 (accession number GU 073381); cry8Ia3 (accession number HM 044664); cry8Ia4 (accession number KC 156674); cry8Ibl (accession number GU 325772); cry8Ib2 (accession number KC 156677); cry8Jal (accession No. EU 625348); cry8Kal (accession number FJ 422558); cry8Ka2 (accession number ACN 87262); cry8Kbl (accession number HM 123758); cry8Kb2 (accession number KC 156675); cry8Lal (accession No. GU 325771); cry8Mal (accession number HM 044665); cry8Ma2 (register number EEM 86551); cry8Ma3 (accession number HM 210574); cry8Nal (accession number HM 640939); cry8Pal (accession No. HQ 388415); cry8Qal (register number HQ 441166); cry8Qa2 (register No. KC 152468); cry8Ral (accession number AFP 87548); cry8Sal (accession number JQ 740599); cry8Tal (accession No. KC 156673); cry8-like (accession number FJ 770571); cry8-like (register number ABS 53003); cry9Aal (accession number CAA 41122); cry9Aa2 (accession number CAA 41425); cry9Aa3 (accession number GQ 249293); cry9Aa4 (accession number GQ 249294); cry9Aa5 (accession number JXl 74110); cry9Aa like (accession number AAQ 52376); cry9Bal (accession number CAA 52927); cry9Ba2 (accession number GU 299522); cry9Bbl (accession number AAV 28716); cry9Cal (accession number CAA 85764); cry9Ca2 (accession number AAQ 52375); cry9Dal (accession number BAAl 9948); cry9Da2 (accession number AAB 97923); cry9Da3 (accession number GQ 249293); cry9Da4 (accession number GQ 249297); cry9Dbl (accession number AAX 78439); cry9Dcl (accession number KCl 56683); cry9Eal (accession number BAA 34908); cry9Ea2 (accession number AA 012908); cry9Ea3 (accession number ABM 21765); cry9Ea4 (accession number ACE 88267); cry9Ea5 (registration number ACF 04743); cry9Ea6 (register number ACG 63872); cry9Ea7 (register number FJ 380927); cry9Ea8 (deposit number GQ 249292); cry9Ea9 (accession number JN 651495); cry9Ebl (accession number CAC 50780); cry9Eb2 (accession number GQ 249298); cry9Eb3 (accession number KC 156646); cry9Ecl (accession number AAC 63366); cry9 edd (accession number AAX 78440); cry9Eel (accession number GQ 249296); cry9Ee2 (accession number KC 156664); cry9Fal (accession No. KC 156692); cry9Gal (accession number KC 156699); cry9-like (accession number AAC 63366); cry10Aal (accession number AAA 22614); cry10Aa2 (accession number E00614); cry10Aa3 (accession number CAD 30098); cry10Aa4 (accession number AFB 18318); cry10A-like (registration number DQ 167578); cry11Aal (accession number AAA 22352); cry11Aa2 (accession number AAA 22611); cry11Aa3 (register number CAD 30081); cry11Aa4 (accession number AFB 18319); cry11Aa-like (registration number DQ 166531); cry11Bal (accession number CAA 60504); cry11Bbl (accession number AAC 97162); cry11Bb2 (accession number HM 068615); cry12Aal (accession number AAA 22355); cry13Aal (accession number AAA 22356); cry14Aal (accession number AAA 21516); cry14Abl (accession number KC 156652); cry15Aal (accession number AAA 22333); cryl6Aal (accession number CAA 63860); cry17Aal (accession number CAA 67841); cry18Aal (accession number CAA 67506); cryl8Bal (accession number AAF 89667); cry18Cal (accession number AAF 89668); cry19Aal (accession number CAA 68875); cry19Bal (accession number BAA 32397); cry19Cal (accession number AFM 37572); cry20Aal (accession number AAB 93476); cry20Bal (accession number ACS 93601); cry20Ba2 (accession number KC 156694); cry20-like (registration number GQ 144333); cry21Aal (accession number I32932); cry21Aa2 (accession number I66477); cry21Bal (accession number BAC 06484); cry21Cal (accession number JF 521577); cry21Ca2 (accession number KC 156687); cry21Dal (accession number JF 521578); cry22Aal (accession number I34547); cry22Aa2 (accession number CAD 43579); cry22Aa3 (accession number ACD 93211); cry22Abl (accession number AAK 50456); cry22Ab2 (accession number CAD 43577); cry22Bal (accession number CAD 43578); cry22Bbl (accession number KC 156672); cry23Aal (accession number AAF 76375); cry24Aal (accession number AAC 61891); cry24Bal (accession number BAD 32657); cry24Cal (accession number CAJ 43600); cry25Aal (accession number AAC 61892); cry26Aal (accession number AAD 25075); cry27Aal (accession number BAA 82796); cry28Aal (accession No. AAD 24189); cry28Aa2 (accession number AAG 00235); cry29Aal (accession number CAC 80985); cry30Aal (accession number CAC 80986); cry30Bal (accession number BAD 00052); cry30Cal (accession number BAD 67157); cry30Ca2 (accession number ACU 24781); cry30Dal (accession number EF 095955); cry30Dbl (accession number BAE 80088); cry30Eal (accession number ACC 95445); cry30Ea2 (register number FJ 499389); cry30Fal (accession number ACI 22625); cry30Gal (accession number ACG 60020); cry30Ga2 (accession No. HQ 638217); cry31Aal (accession number BAB 11757); cry31Aa2 (accession number AAL 87458); cry31Aa3 (accession number BAE 79808); cry31Aa4 (deposit number BAF 32571); cry31Aa5 (accession number BAF 32572); cry31Aa6 (accession number BA 144026); cry31Abl (accession number BAE 79809); cry31Ab2 (accession number BAF 32570); cry31Acl (accession number BAF 34368); cry31Ac2 (accession No. AB 731600); cry31Adl (accession number BA 144022); cry32Aal (accession number AAG 36711); cry32Aa2 (accession No. GU 063849); cry32Abl (accession number GU 063850); cry32Bal (accession number BAB 78601); cry32Cal (register number BAB 78602); cry32Cbl (accession number KC 156708); cry32Dal (accession number BAB 78603); cry32Eal (accession number GU 324274); cry32Ea2 (accession number KC 156686); cry32Ebl (accession number KC 156663); cry32Fal (accession No. KC 156656); cry32Gal (accession number KC 156657); cry32Hal (accession number KC 156661); cry32Hbl (accession No. KC 156666); cry32Ial (accession number KC 156667); cry32Jal (accession No. KC 156685); cry32Kal (accession No. KC 156688); cry32Lal (accession number KC 156689); cry32Mal (accession number KC 156690); cry32Mbl (accession number KC 156704); cry32Nal (accession number KC 156691); cry32Oal (accession number KC 156703); cry32Pal (accession number KC 156705); cry32Qal (register No. KC 156706); cry32Ral (accession number KC 156707); cry32Sal (accession number KC 156709); cry32Tal (accession number KC 156710); cry32Ual (accession number KC 156655); cry33Aal (accession number Aal 26871); cry34Aal (accession number AAG 50341); cry34Aa2 (accession number AAK 64560); cry34Aa3 (accession number AAT 29032); cry34Aa4 (accession number AAT 29030); cry34Abl (accession number AAG 41671); cry34Acl (accession number AAG 50118); cry34Ac2 (accession number AAK 64562); cry34Ac3 (accession number AAT 29029); cry34Bal (accession number AAK 64565); cry34Ba2 (accession number AAT 29033); cry34Ba3 (accession number AAT 29031); cry35Aal (accession number AAG 50342); cry35Aa2 (accession number AAK 64561); cry35Aa3 (accession number AAT 29028); cry35Aa4 (accession number AAT 29025); cry35Abl (accession number AAG 41672); cry35Ab2 (accession number AAK 64563); cry35Ab3 (accession number AY 536891); cry35Acl (accession number AAG 50117); cry35Bal (accession number AAK 64566); cry35Ba2 (accession number AAT 29027); cry35Ba3 (accession number AAT 29026); cry36Aal (accession number AAK 64558); cry37Aal (accession number AAF 76376); cry38Aal (accession number AAK 64559); cry39Aal (deposited under accession number BAB 72016); cry40Aal (accession number BAB 72018); cry40Bal (accession number BAC 77648); cry40Cal (accession number EU 381045); cry40Dal (accession number ACF 15199); cry41Aal (accession number BAD 35157); cry41Abl (accession number BAD 35163); cry41Bal (accession number HM 461871); cry41Ba2 (accession number ZP _ 04099652); cry42Aal (accession number BAD 35166); cry43Aal (deposit number BAD 15301); cry43Aa2 (registration number BAD 95474); cry43Bal (deposit number BAD 15303); cry43Cal (accession number KC 156676); cry43Cbl (accession number KC 156695); cry43Ccl (accession number KC 156696); cry43-like (accession number BAD 15305); cry44Aa (deposited under code BAD 08532); cry45Aa (registration number BAD 22577); cry46Aa (accession number BAC 79010); cry46Aa2 (deposit number BAG 68906); cry46Ab (accession number BAD 35170); cry47 Aa (accession number AAY 24695); cry48Aa (accession number CAJ 18351); cry48Aa2 (accession number CAJ 86545); cry48Aa3 (accession number CAJ 86546); cry48Ab (accession number CAJ 86548); cry48Ab2 (accession number CAJ 86549); cry49Aa (accession number CAH 56541); cry49Aa2 (accession number CAJ 86541); cry49Aa3 (accession number CAJ 86543); cry49Aa4 (accession number CAJ 86544); cry49Abl (accession number CAJ 86542); cry50Aal (deposited under accession number BAE 86999); cry50Bal (accession No. GU 446675); cry50Ba2 (accession number GU 446676); cry51Aal (accession No. AB 114444); cry51Aa2 (accession No. GU 570697); cry52Aal (accession number EF 613489); cry52Bal (accession number FJ 361760); cry53Aal (registration number EF 633476); cry53Abl (accession number FJ 361759); cry54Aal (accession number ACA 52194); cry54Aa2 (registration number GQ 140349); cry54Bal (accession No. GU 446677); cry55Aal (accession number ABW 88932); cry54Abl (accession number JQ 916908); cry55Aa2 (deposit number AAE 33526); cry56Aal (accession number ACU 57499); cry56Aa2 (registration number GQ 483512); cry56Aa3 (accession number JX 025567); cry57Aal (accession number ANC 87261); cry58Aal (accession number ANC 87260); cry59Bal (accession No. JN 790647); cry59Aal (accession number ACR 43758); cry60Aal (accession number ACU 24782); cry60Aa2 (accession number EA 057254); cry60Aa3 (deposit number EEM 99278); cry60Bal (accession No. GU 810818); cry60Ba2 (accession number EA 057253); cry60Ba3 (accession number EEM 99279); cry61Aal (accession number HM 035087); cry61Aa2 (accession number HM 132125); cry61Aa3 (deposit number EEM 19308); cry62Aal (accession number HM 054509); cry63Aal (accession number BA 144028); cry64Aal (deposited under number BAJ 05397); cry65Aal (deposit number HM 461868); cry65Aa2 (deposit number ZP _ 04123838); cry66Aal (accession number HM 485581); cry66Aa2 (deposit number ZP _ 04099945); cry67Aal (accession number HM 485582); cry67Aa2 (deposit number ZP _ 04148882); cry68Aal (accession number HQ 113114); cry69Aal (accession No. HQ 401006); cry69Aa2 (accession number JQ 821388); cry69Abl (accession No. JN 209957); cry70Aal (accession number JN 646781); cry70Bal (accession number AD 051070); cry70Bbl (deposit number EEL 67276); cry71Aal (accession number JX 025568); cry72Aal (accession number JX 025569); cyt1Aa (GenBank accession number X03182); cyt1Ab (GenBank accession number X98793); cyt1B (GenBank accession number U37196); cyt2A (GenBank accession number Z14147); and Cyt2B (GenBank accession number U52043).

Examples of endotoxins also include, but are not limited to, the Cry1A proteins of U.S. patent nos. 5,880,275, 7,858,849, 8,530,411, 8,575,433, and 8,686,233; deletion of the N-terminus of alpha-helix 1 and/or alpha-helix 2 variants of DIG-3 or DIG-11 toxin Cry proteins (e.g., Cry1A, Cry3A) from U.S. patent nos. 8,304,604, 8,304,605, and 8,476,226; cry1B of U.S. patent sequence No. 10/525,318; cry1C of U.S. patent No. 6,033,874; cry1F of U.S. patent nos. 5,188,960 and 6,218,188; 7,070,982 No; cry1A/F chimera of U.S. Pat. Nos. 6,962,705 and 6,713,063; cry2 proteins from us patent No. 7,064,249, such as Cry2Ab protein; cry3A proteins, including but not limited to engineered hybrid insecticidal proteins (ehips), produced by fusing unique combinations of variable regions and conserved blocks of at least two different Cry proteins (U.S. patent application publication No. 2010/0017914); cry4 proteins of U.S. patent nos. 7,329,736, 7,449,552, 7,803,943, 7,476,781, 7,105,332, 7,378,499, and 7,462,760; a Cry5 protein; a Cry6 protein; a Cry8 protein; cry9 proteins, such as members of the Cry9A, Cry9B, Cry9C, Cry9D, Cry9E, and Cry9F families, including but not limited to the Cry9D protein of U.S. patent No. 8,802,933 and the Cry9B protein of U.S. patent No. 8,802,934; naimov, et al, (2008), "Applied and environmental Microbiology (Applied and environmental Microbiology)," 74: 7145-; cry22, Cry34Abl proteins of U.S. patent nos. 6,127,180, 6,624,145, and 6,340,593; CryET33 and CryET34 proteins from U.S. Pat. nos. 6,248,535, 6,326,351, 6,399,330, 6,949,626, 7,385,107 and 7,504,229; homologs of CryET33 and CryET34 of U.S. patent publication nos. 2006/0191034, 2012/0278954 and PCT publication No. WO 2012/139004; cry35Abl proteins of U.S. patent nos. 6,083,499, 6,548,291, and 6,340,593; cry46 protein, Cry 51 protein, Cry binary toxin; TIC901 or a related toxin; TIC807 as disclosed in U.S. patent application No. 2008/0295207; ET29, ET37, TIC809, TIC810, TIC812, TIC127, TIC128 of PCT US 2006/033867; TIC853 toxin of us patent No. 8,513,494; AXMI-027, AXMI-036, and AXMI-038 of U.S. Pat. No. 8,236,757; AXMI-031, AXMI-039, AXMI-040, AXMI-049 of U.S. Pat. No. 7,923,602; AXMI-018, AXMI-020 and AXMI-021 of WO 2006/083891; AXMI-010 of WO 2005/038032; AXMI-003 of WO 2005/021585; us patent application 2004/0250311 discloses AXMI-008; AXMI-006 of U.S. patent application publication No. 2004/0216186; AXMI-007 of U.S. patent application No. 2004/0210965; AXMI-009 of U.S. patent application No. 2004/0210964; AXMI-014 disclosed in U.S. patent application No. 2004/0197917; AXMI-004 of U.S. patent application No. 2004/0197916; AXMI-028 and AXMI-029 of WO 2006/119457; AXMI-007, AXMI-008orf2, AXMI-009, AXMI-014 and AXMI-004 of WO 2004/074462; AXMI-150 of U.S. patent No. 8,084,416; AXMI-205 of U.S. patent application No. 2011/0023184; AXMI-011, AXMI-012, AXMI-013, AXMI-015, AXMI-019, AXMI-044, AXMI-037, AXMI-043, AXMI-033, AXMI-034, AXMI-022, AXMI-023, AXMI-041, AXMI-063, and AXMI-064 disclosed in U.S. patent application No. 2011/0263488; AXMI-Rl and related proteins disclosed in U.S. patent application No. 2010/0197592; AXMI221z, AXMI222z, AXMI223z, AXMI224z and AXMI225z of WO 2011/103248; AXMI218, AXMI219, AXMI220, AXMI226, AXMI227, AXMI228, AXMI229, AXMI230 and AXMI231 of WO 2011/103247 and U.S. patent No. 8,759,619; AXMI-115, AXMI-113, AXMI-005, AXMI-163, and AXMI-184 of U.S. Pat. No. 8,334,431; AXMI-001, AXMI-002, AXMI-030, AXMI-035, and AXMI-045 disclosed in U.S. patent application No. 2010/0298211; AXMI-066 and AXMI-076 disclosed in U.S. patent application No. 2009/0144852; AXMI128, AXMI130, AXMI131, AXMI133, AXMI140, AXMI141, AXMI142, AXMI143, AXMI144, AXMI146, AXMI148, AXMI149, AXMI152, AXMI153, AXMI154, AXMI155, AXMI156, AXMI157, AXMI158, AXMI162, AXMI165, AXMI166, AXMI167, AXMI168, AXMI169, AXMI170, AXMI171, AXMI172, AXMI173, AXMI174, AXMI175, AXMI176, AXMI177, AXMI178, AXMI179, AXMI180, AXMI181, AXMI182, AXMI185, AXMI186, AXMI187, AXMI188, AXMI 189; AXMI079, AXMI080, AXMI081, AXMI082, AXMI091, AXMI092, AXMI096, AXMI097, AXMI098, AXMI099, AXMI100, AXMI101, AXMI102, AXMI103, AXMI104, AXMI107, AXMI108, AXMI109, AXMI110, AXMI111, AXMI112, AXMI114, AXMI116, AXMI117, AXMI118, AXMI119, AXMI120, AXMI121, AXMI122, AXMI123, AXMI124, AXMI1257, AXMI1268, AXMI127, AXMI129, AXMI164, AXMI151, AXMI183, AXMI132, AXMI138, AXMI137 disclosed in U.S. patent application No. 2010/0005543; AXMI270 of U.S. patent application publication US 20140223598; AXMI279 of U.S. patent application publication US 20140223599; cry proteins of U.S. patent No. 8,319,019, such as Cry1A and Cry3A, which have modified proteolytic sites; cry1Ac, Cry2Aa, and Cry1Ca toxin proteins from bacillus thuringiensis strain VBTS 2528 disclosed in U.S. patent application No. 2011/0064710.

Other Cry proteins are well known to those skilled in the art. See, N.Crickmore et al, "Revision of Nomenclature for insecticidal Crystal Proteins of Bacillus thuringiensis," (review of Microbiology and Molecular Biology Reviews), "(1998) Vol.62: 807-; see also n. cricklmor et al, "Bacillus thuringiensis toxin nomenclature" (2016), which can be accessed on the world wide web using the "www" prefix, at btnomenclature.

The use of Cry proteins as transgenic plant traits is well known to those skilled in the art, and Cry transgenic plants include, but are not limited to, plants expressing CrylAc, Cry1Ac + Cry2Ab, cryab, crya.105, cryf, Cry1Fa2, cryf + cryac, Cry2Ab, Cry3A, mCry3A, Cry3Bbl, Cry34Abl, Cry35Abl, Vip3A, mCry3A, Cry9c, and CBI-Bt that have been approved by regulatory regulations. See, sarahuha (Sanahuja) et al, "bacillus thuringiensis: century for research, development and commercial applications (Bacillus thuringiensis: a centre of research, development and commercial applications), "(2011) Plant Biotech. journal, April 9(3): 283-? action, which can be accessed on the web using a "www" prefix. More than one insecticidal protein well known to those skilled in the art may also be expressed in plants, such as Vip3Ab & CrylFa (US 2012/0317682); CrylBE & CrylF (US 2012/0311746); CrylCA & CrylAB (US 2012/0311745); CryF & CryCa (US 2012/0317681); CrylDa & CrylBe (US 2012/0331590); CrylDA & CrylFa (US 2012/0331589); CrylAb & CrylBe (US 2012/0324606); CrylFa & Cry2Aa and Cry1I & CrylE (US 2012/0324605); cry34Ab/35Ab and Cry6Aa (US 20130167269); cry34Ab/VCry35Ab & Cry3Aa (US 20130167268); CrylAb & CrylF (US 20140182018); and Cry3A and CrylAb or Vip3Aa (US 20130116170). Insecticidal proteins also include insecticidal lipases, including the lipoyl hydrolases and cholesterol oxidases of U.S. Pat. No. 7,491,869, e.g., Streptomyces (Purcell) et al (1993) biochem. Biophys. Res. Commun.) 15: 1406-1413.

Insecticidal protein-Vip

Insecticidal proteins also include Vip (vegetative insecticidal protein) toxins.

As described in the art, "entomopathogenic bacteria produce insecticidal proteins that accumulate in inclusion bodies or parasporal crystals (such as the aforementioned Cry and Cyt proteins), as well as insecticidal proteins that are secreted into the culture medium. Among the latter are Vip proteins, which are divided into four families based on their amino acid identity. Vip1 and Vip2 proteins act as binary toxins and are toxic to some members of the orders coleoptera and hemiptera. The Vip1 component is thought to bind to receptors in the membranes of the insect midgut, and the Vip2 component enters the cell where it displays ADP-ribosyltransferase activity against actin, thereby preventing microfilament formation. Vip3 does not have sequence similarity to Vip1 or Vip2 and is toxic to a wide variety of members of the lepidoptera. Their mode of action has been shown to resemble that of Cry proteins in terms of proteolytic activation, binding to the midgut epithelial cell membrane and pore formation, although Vip3A protein does not share binding sites with Cry proteins. The latter property makes it a good candidate to combine with Cry proteins in transgenic plants (bacillus thuringiensis treated crops [ Bt crops ]) to prevent or delay insect resistance and broaden the insecticidal spectrum. There are commercially available cultivars of Bt cotton and Bt maize that express Vip3Aa protein as well as Cry proteins. For the recently reported Vip4 family, no target insect has been found. "see Chakroun et al," Bacterial Vegetative Insecticidal Proteins (Vip) in entomopathogenic Bacteria (Bacterial viral Proteins (Vip) from Entomogenic Bacteria "" review in microbiology and molecular biology "2016, 3, 2; 80(2):329-350.

VIP is found in U.S. patent nos. 5,877,012, 6,107,279, 6,137,033, 7,244,820, 7,615,686 and 8,237,020. Other VIP proteins are well known to those skilled in the art. (see lifesci. suslex. ac. uk/home/new _ Crickmore/Bt/vip. html, which can be accessed on the world wide web using a "www" prefix).

Insecticidal protein-Toxin Complex (TC) family proteins

Pesticidal proteins also include Toxin Complex (TC) proteins, which are available from organisms such as xenorhabdus, photorhabdus and paenibacillus (see us patent nos. 7,491,698 and 8,084,418). Some TC proteins have "independent" insecticidal activity and others enhance the activity of independent toxins produced by the same given organism. The toxicity of "independent" TC proteins (from photorhabdus, xenorhabdus or paenibacillus) can be enhanced by one or more TC protein "potentiators" derived from source organisms of different genera. There are three main types of TC proteins. As referred to herein, class a proteins ("protein a") are independent toxins. Class B proteins ("protein B") and class C proteins ("protein C") enhance the toxicity of class a proteins. Examples of class A proteins are TcbA, TcdA, XptaL and Xpta 2. Examples of class B proteins are TcaC, TcdB, XptBlXb and XptClwi. Examples of class C proteins are TccC, XptClXb and XptBl Wi. Insecticidal proteins also include spider, snake, and scorpion venom proteins. Examples of spider venom peptides include, but are not limited to, cytotoxin-1 peptide and mutants thereof (U.S. Pat. No. 8,334,366).

Insecticidal protein-combinations

In some embodiments, the invention encompasses the use of a combination of one or more insecticidal proteins. For example, Cry proteins are known to have limited utility for all common agricultural pests, as proteins only target specific receptors found in sensitive insect species. Thus, by expressing Cry along with novel insecticidal proteins as taught herein, it is expected that plant species will have extended protection against a wider variety of insects.

Thus, the present invention encompasses engineered plant species that produce the novel insecticidal proteins as taught herein, as well as such plant species that also express one or more other insecticidal proteins, such as monacolin, PIP, Cry, Cyt, VIP, TC, and any combination thereof.

Nucleic acid molecules encoding the discovered insecticidal proteins

One aspect of the invention relates to an isolated or recombinant nucleic acid molecule comprising a nucleic acid sequence encoding an insecticidal polypeptide, protein, or biologically active portion thereof, and a nucleic acid molecule sufficient for use as a hybridization probe to identify a nucleic acid molecule encoding a protein having a region of sequence homology.

As used herein, the term "nucleic acid molecule" refers to DNA molecules (e.g., recombinant DNA, cDNA, genomic DNA, plasmid DNA, mitochondrial DNA) and RNA molecules (e.g., mRNA) and analogs of the DNA or RNA generated using nucleotide analogs. The nucleic acid molecule may be single-stranded or double-stranded.

An "isolated" nucleic acid molecule (or DNA), as used herein, refers to a nucleic acid sequence (or DNA) that is no longer in its natural environment (e.g., in vitro). As used herein, a "recombinant" nucleic acid molecule (or DNA) refers to a nucleic acid sequence (or DNA) in a recombinant bacterial or plant host cell. In some embodiments, an "isolated" or "recombinant" nucleic acid is free of sequences that naturally flank the nucleic acid (i.e., sequences located at the 5 'and 3' ends of the nucleic acid) in the genomic DNA of the organism from which the nucleic acid is derived. For the purposes of the present invention, "isolated" or "recombinant" when used in reference to a nucleic acid molecule does not include isolated chromosomes. For example, in various embodiments, a recombinant nucleic acid molecule encoding an insecticidal protein of the invention can contain nucleic acid sequences that naturally flank the nucleic acid molecule in the genomic DNA of the cell from which the nucleic acid is derived, which is less than about 5kb, 4kb, 3kb, 2kb, 1kb, 0.5kb, or 0.1 kb.

In some embodiments, the isolated nucleic acid molecule encoding an insecticidal protein has one or more changes in the nucleic acid sequence compared to the native or genomic nucleic acid sequence. In some embodiments, the change in native or genomic nucleic acid sequence includes, but is not limited to: changes in nucleic acid sequence due to the degeneracy of the genetic code; changes in the nucleic acid sequence due to amino acid substitutions, insertions, deletions and/or additions compared to the native or genomic sequence; removing one or more introns; deletion of one or more upstream or downstream regulatory regions; and deletion of 5 'and/or 3' untranslated regions associated with the genomic nucleic acid sequence. In some embodiments, the nucleic acid molecule encoding the insecticidal protein is a non-genomic sequence.

A variety of polynucleotides encoding the insecticidal proteins of the invention are contemplated. Such polynucleotides are useful for producing insecticidal proteins in host cells when operably linked to suitable promoters, transcription terminators, and/or polyadenylation sequences. Such polynucleotides may also be used as probes for isolating homologous or substantially homologous polynucleotides encoding other insecticidal proteins.

Polynucleotides encoding the insecticidal proteins of the invention can be synthesized de novo from the sequences disclosed herein. The sequence of a polynucleotide gene can be deduced from the disclosed protein sequence by using the gene code. Computer programs, e.g. "BackTranslate" (GCG)TMPackage, Acclerys corporation, san diego, california) can be used to convert a peptide sequence into a corresponding nucleotide sequence encoding the peptide.

Furthermore, the synthetic polynucleotide sequences of the present invention may be designed such that they will be expressed in plants. U.S. Pat. No. 5,500,365 describes a method of synthesizing plant genes to improve the expression level of proteins encoded by the synthetic genes. This method involves modification of the structural gene sequence of the exogenous transgene to allow for more efficient transcription, processing, translation and expression by the plant. Characteristics of genes that perform well in plants include elimination sequences that may cause undesirable intron splicing or polyadenylation in the coding region of the gene transcript while substantially retaining the amino acid sequence of the toxic portion of the insecticidal protein. A similar method for obtaining enhanced expression of transgenes in monocots is disclosed in U.S. Pat. No. 5,689,052. As used herein, "complementary sequence" refers to a nucleic acid sequence that is sufficiently complementary to a given nucleic acid sequence such that it can hybridize to the given nucleic acid sequence to form a stable duplex. "Polynucleotide sequence variant" as used herein refers to a nucleic acid sequence encoding the same polypeptide except for the degeneracy of the genetic code.

In some embodiments, the nucleic acid molecule encoding an insecticidal protein of the invention is a non-genomic nucleic acid sequence. As used herein, a "non-genomic nucleic acid sequence" or "non-genomic nucleic acid molecule" refers to a nucleic acid molecule that has one or more changes in nucleic acid sequence as compared to a native or genomic nucleic acid sequence. In some embodiments, the change in the native or genomic nucleic acid molecule includes, but is not limited to: changes in nucleic acid sequence due to the degeneracy of the nucleic acid sequence; codon optimization of nucleic acid sequences for expression in plants; a change in the nucleic acid sequence that introduces at least one amino acid substitution, insertion, deletion and/or addition as compared to the native or genomic sequence; removing one or more introns associated with the genomic nucleic acid sequence; inserting one or more heterologous introns; deletion of one or more upstream or downstream regulatory regions associated with the genomic nucleic acid sequence; insertion of one or more heterologous upstream or downstream regulatory regions; deletion of 5 'and/or 3' untranslated regions associated with a genomic nucleic acid sequence; insertion of heterologous 5 'and/or 3' untranslated regions; and modifying the polyadenylation site. In some embodiments, the non-genomic nucleic acid molecule is cDNA.

In some embodiments, the present invention teaches nucleic acid molecules encoding the insecticidal proteins taught herein, as well as nucleic acid molecules encoding proteins having amino acid substitutions, deletions, insertions, and fragments and combinations thereof, as taught herein.

Also provided are nucleic acid molecules encoding transcription and/or translation products that are subsequently spliced to ultimately produce a functional insecticidal protein. Splicing can be effected in vitro or in vivo, and can involve cis-or trans-splicing. The substrate for splicing may be a polynucleotide (e.g., an RNA transcript) or a polypeptide. An example of cis-splicing of a polynucleotide is where an intron inserted into the coding sequence is removed and the two flanking exon regions are spliced to generate the insecticidal protein coding sequence. An example of trans-splicing would be where a polynucleotide is encrypted by separating the coding sequence into two or more fragments that can be separately transcribed and then spliced to form a full-length pesticidal protein coding sequence. The use of splice enhancer sequences that can be introduced into the construct can facilitate splicing in either cis or trans splicing of the polypeptide (U.S. Pat. Nos. 6,365,377 and 6,531,316). Thus, in some embodiments, the polynucleotide does not directly encode the full-length insecticidal protein, but rather encodes one or more fragments thereof.

Nucleic acid molecules that are fragments of the foregoing sequences encoding insecticidal proteins are also encompassed by the embodiments. "fragment" as used herein refers to a portion of a nucleic acid sequence encoding an insecticidal protein. A fragment of a nucleic acid sequence may encode a biologically active portion of a protein, or it may be a fragment that can be used as a hybridization probe or PCR primer using the methods disclosed herein. Nucleic acid molecules that are fragments of a nucleic acid sequence comprise at least about 10, 20, 30, 40, 50, 60, 70, 80, 90, 100, 110, 120, 130, 140, 150, 160, 170, 180, 190, 200, 210, 220, 230, 240, 250, 260, 270, 280, 290, 300 or more contiguous nucleotides, or at most the number of nucleotides present in a full-length nucleic acid sequence encoding an insecticidal protein as taught herein. As used herein, "contiguous nucleotides" refers to nucleotide residues that are immediately adjacent to each other. Fragments of the nucleic acid sequences of the examples will encode protein fragments that retain the biological activity of the insecticidal protein. In some embodiments, fragments of a nucleic acid sequence will encode at least about 10, 20, 30, 40, 50, 60, 70, 80, 90, 100, 110, 120, 130, 140, 150, 160, 170, 180, 190, 200, 210, 220, 230, 240, 250, 260, 270, 280, 290, 300 or more contiguous amino acids, or at most the total number of amino acids present in the full-length insecticidal proteins taught herein. In some embodiments, the fragment is an N-terminal and/or C-terminal truncation of at least about 1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 13, 14 or more amino acids from the N-terminus and/or C-terminus relative to the insecticidal proteins taught herein, e.g., by proteolysis, insertion of an initiation codon, deletion of a codon encoding the deleted amino acid, concomitant insertion of a stop codon, or by insertion of a stop codon in the coding sequence.

In some embodiments, the insecticidal protein is encoded by a nucleic acid sequence that is sufficiently similar to: SEQ ID NO 1, SEQ ID NO 3, SEQ ID NO 5, SEQ ID NO 7, SEQ ID NO 9, SEQ ID NO 11, SEQ ID NO 13, SEQ ID NO 15, SEQ ID NO 17, SEQ ID NO 19, SEQ ID NO 21, SEQ ID NO 23, SEQ ID NO 25, SEQ ID NO 27, SEQ ID NO 29, SEQ ID NO 31, SEQ ID NO 33, SEQ ID NO 35, SEQ ID NO 37, SEQ ID NO 39, SEQ ID NO 41, SEQ ID NO 43, SEQ ID NO 45, SEQ ID NO 47, SEQ ID NO 49, SEQ ID NO 51, SEQ ID NO 53, SEQ ID NO 55, SEQ ID NO 57, SEQ ID NO 59, SEQ ID NO 61, SEQ ID NO 63, SEQ ID NO 65, SEQ ID NO 67, 69 or 71. As used herein, "sufficiently similar" refers to an amino acid or nucleic acid sequence that has at least about 50%, 51%, 52%, 53%, 54%, 55%, 56%, 57%, 58%, 59%, 60%, 61%, 62%, 63%, 64%, 65%, 66%, 67%, 68%, 69%, 70%, 71%, 72%, 73%, 74%, 75%, 76%, 77%, 78%, 79%, 80%, 81%, 82%, 83%, 84%, 85%, 86%, 87%, 88%, 89%, 90%, 91%, 92%, 93%, 94%, 95%, 96%, 97%, 98%, 99%, or 100% sequence similarity to a reference sequence using standard parameters as described herein or known to one of skill in the art.

In some embodiments, the insecticidal protein is encoded by a nucleic acid sequence having sufficient sequence identity to: SEQ ID NO 1, SEQ ID NO 3, SEQ ID NO 5, SEQ ID NO 7, SEQ ID NO 9, SEQ ID NO 11, SEQ ID NO 13, SEQ ID NO 15, SEQ ID NO 17, SEQ ID NO 19, SEQ ID NO 21, SEQ ID NO 23, SEQ ID NO 25, SEQ ID NO 27, SEQ ID NO 29, SEQ ID NO 31, SEQ ID NO 33, SEQ ID NO 35, SEQ ID NO 37, SEQ ID NO 39, SEQ ID NO 41, SEQ ID NO 43, SEQ ID NO 45, SEQ ID NO 47, SEQ ID NO 49, SEQ ID NO 51, SEQ ID NO 53, SEQ ID NO 55, SEQ ID NO 57, SEQ ID NO 59, SEQ ID NO 61, SEQ ID NO 63, SEQ ID NO 65, SEQ ID NO 67, SEQ ID NO 69 or SEQ ID NO 71. "sufficient sequence identity" refers to an amino acid or nucleic acid sequence having at least about 50%, 51%, 52%, 53%, 54%, 55%, 56%, 57%, 58%, 59%, 60%, 61%, 62%, 63%, 64%, 65%, 66%, 67%, 68%, 69%, 70%, 71%, 72%, 73%, 74%, 75%, 76%, 77%, 78%, 79%, 80%, 81%, 82%, 83%, 84%, 85%, 86%, 87%, 88%, 89%, 90%, 91%, 92%, 93%, 94%, 95%, 96%, 97%, 98%, 99%, or 100% sequence identity compared to using one of the alignment modalities described herein or known to those skilled in the art, using reference sequences with standard parameters.

Percent identity calculation

One skilled in the art will recognize that the foregoing values may be appropriately adjusted to determine the corresponding homology or identity of the proteins encoded by the two nucleic acid sequences by taking into account codon degeneracy, amino acid similarity, reading frame positioning, and the like. In some embodiments, the sequence homology is for the full-length sequence of the polynucleotide encoding the protein. In some embodiments, a Vector with all default parameters is used

Figure BDA0002647180000000361

Of the program suite (Invitrogen Corporation, Carlsbad, Calif.))The ClustalW algorithm in the module calculates sequence identity. In some embodiments, the ClustalW algorithm in the align x module of the Vector NTI program suite (invitrogen, carlsbad, ca) with all default parameters was used to calculate sequence identity for the entire full-length polypeptide.

To determine the percent identity of two amino acid sequences or two nucleic acid sequences, the sequences are aligned for optimal comparison purposes. The percent identity between two sequences is a function of the number of identical positions shared by the sequences (i.e., percent identity is the total number of identical positions/position (e.g., overlapping positions) × l 00). In one embodiment, the two sequences have the same length. In another embodiment, the entire reference sequence is compared. The percent identity between two sequences can be determined using techniques similar to those described below, with or without allowing gaps. In calculating the percent identity, it is common to count exact matches.

Determination of the percent identity between two sequences (nucleic acids or amino acids) can be accomplished using a mathematical algorithm. A non-limiting example of a mathematical algorithm for comparing two sequences is the following algorithm: carlin (Karlin) and Alchuru (Altschul), (1990) Proc. Natl. Acad. Sci. USA 87:2264 as modified in Carlin and Alchuru, (1993) Proc. Acad. Sci. USA 90:5873 + 5877. Such algorithms are incorporated in the BLASTN and BLASTX programs of Aldrich et al, (1990) J mol biol.215: 403. BLAST nucleotide searches can be performed using the BLASTN program with a score of 100 and a word length of 12 to obtain nucleic acid sequences homologous to the pesticidal nucleic acid molecules of the examples. BLAST protein searches can be performed using the BLASTX program with a score of 50 and a word length of 3 to obtain amino acid sequences homologous to the insecticidal protein molecules of the examples. To obtain Gapped alignments for comparison purposes, Gapped BLAST (in BLAST 2.0) can be utilized as described below: alpuer et al (1997) nucleic acids research 25: 3389. Alternatively, PSI-Blast can be used to perform an iterative search that detects distant relationships between molecules. See alchol et al, supra, (1997). When utilizing BLAST, Gapped BLAST, and PSI-BLAST programs, default parameters of the corresponding programs (e.g., BLASTX and BLASTN) can be used. The alignment may also be performed manually by inspection.

Another non-limiting example of a mathematical algorithm for comparing sequences is the ClustalW algorithm (Higgins et al, (1994): nucleic acids research 22: 4673-. ClustalW compares sequences and aligns the entire amino acid or DNA sequence, and thus can provide data on the sequence conservation of the entire amino acid sequence. ClustalW algorithm is used for several commercially available DNA/amino acid analysis software packages, e.g., VectorOf the program suite (Invitrogen, Calif.)

Figure BDA0002647180000000372

And (5) modules. After alignment of the amino acid sequences with ClustalW, the percent amino acid identity can be assessed. A non-limiting example of a software program suitable for analyzing ClustalW alignments is GENEDOCTM。GENEDOCTM(Carl Niguls (Karl Nichols)) allows amino acid (or DNA) similarity and identity to be assessed across multiple proteins. Another non-limiting example of a mathematical algorithm for sequence comparison is the algorithm of Meiers (Myers) and Miller (Miller), (1988) CABIOS 4: 11-17. Such algorithms are incorporated into the ALIGN program (version 2.0), which is part of the GCG Wisconsin Genetics software package version 10 (available from Accelrys, Inc., 9685 Schrandon, san Diego, Calif.). When comparing amino acid sequences using the ALIGN program, a PAM120 weight residue table, gap length penalty 12, and gap penalty 4 may be used.

Another non-limiting example of a mathematical algorithm for comparing sequences is the algorithm of Needleman (Needleman) and Wensh (Wunsch), (1970) J molecular biology 48(3): 443-: % identity and similarity of nucleic acid sequences using gap weight 50 and length weight 3; % identity and similarity of nucleic acid sequences using gap weight 8 and length weight 2; and BLOSUM62 scoring program. Equivalent programs may also be used. Thus, any sequence comparison scheme can be used which produces an alignment having the same nucleotide or amino acid residue match for any two sequences in question and calculates the percent sequence identity.

Variants of nucleic acid molecules

The present invention provides nucleic acid molecules encoding insecticidal protein variants. "variants" of an encoding nucleic acid sequence may include those sequences that encode the insecticidal proteins disclosed herein but which differ conservatively as a result of the degeneracy of the genetic code, as well as those sequences which are sufficiently identical as discussed above. Naturally occurring allelic variants can be identified using well known molecular biology techniques, such as Polymerase Chain Reaction (PCR) and hybridization techniques as outlined below. Variant nucleic acid sequences also include synthetically derived nucleic acid sequences that have been generated, for example, by using site-directed mutagenesis, but still encode the disclosed insecticidal proteins.

The present invention provides isolated or recombinant polynucleotides encoding any of the insecticidal proteins disclosed herein. One of ordinary skill in the art will readily appreciate that due to the degeneracy of the genetic code, there are numerous nucleotide sequences encoding the proteins of the present invention. Table a is a codon table providing synonymous codons for each amino acid. For example, the codons AGA, AGG, CGA, CGC, CGG and CGU all encode the amino acid arginine. Thus, at each position in the nucleic acid of the invention where an arginine is specified by a codon, the codon can be changed to any of the corresponding codons described above without changing the encoded polypeptide. It is understood that U in the RNA sequence corresponds to T in the DNA sequence.

TABLE A-synonymous codon Table

Figure BDA0002647180000000391

One skilled in the art will further appreciate that changes can be introduced by mutation of the nucleic acid sequence, thereby producing changes in the amino acid sequence of the encoded protein, without altering the biological activity of the protein. Thus, variant nucleic acid molecules can be generated by introducing one or more nucleotide substitutions, additions and/or deletions into the corresponding nucleic acid sequences disclosed herein, such that one or more amino acid substitutions, additions or deletions are introduced into the encoded protein. Mutations can be introduced by standard techniques, such as site-directed mutagenesis and PCR-mediated mutagenesis. Such variant nucleic acid sequences are also encompassed by the present invention.

Alternatively, variant nucleic acid sequences can be obtained by randomly introducing mutations (e.g., by saturation mutagenesis) along all or part of the coding sequence, and the resulting mutants can be screened for the ability to confer pesticidal activity to identify mutants that retain activity. Following mutagenesis, the encoded protein can be expressed recombinantly, and the activity of the protein can be determined using standard analytical techniques.

The polynucleotides of the invention and fragments thereof are optionally used as substrates for a variety of recombinant and recursive recombination reactions, in addition to standard cloning methods as described, for example, in austeria (Ausubel), Berger (Berger), and sabeluk (Sambrook), i.e., to generate additional pesticidal protein homologs and fragments thereof having desired properties. A variety of such reactions are known. Methods of generating variants of any of the nucleic acids listed herein, which comprise recursively recombining such a polynucleotide with a second (or more) polynucleotide, thereby forming a library of variant polynucleotides, are also embodiments of the invention, as are libraries generated, cells comprising the library, and any recombinant polynucleotide generated by such methods.

A variety of diversity generation schemes, including nucleic acid recursive recombination schemes, are available and well described in the art. The programs can be used individually and/or in combination to generate one or more variants of a nucleic acid or set of nucleic acids, as well as variants of the encoded protein. Individually and collectively, these procedures provide a robust, widely applicable way of generating diverse nucleic acids and sets of nucleic acids (including, for example, nucleic acid libraries) suitable, for example, for engineering or rapidly evolving nucleic acids, proteins, pathways, cells, and/or organisms with new and/or improved characteristics.

Although for clarity, distinctions and classifications are made in the course of the following discussion, it should be understood that the techniques are generally not mutually exclusive. Indeed, the various methods can be used alone or in combination (in parallel or in series) to obtain different sequence variants.

The result of any of the diversity generation procedures described herein may be the generation of one or more nucleic acids that may be selected or screened for nucleic acids having or conferring a desired property, or encoding proteins having or conferring a desired property. Any nucleic acid produced may be selected for a desired activity or characteristic, such as pesticidal activity, after diversification by one or more of the methods herein or otherwise available to those of skill in the art. This may include identifying any activity that can be detected, e.g., in an automated or automatable format, by any of the assays in the art, see, e.g., the discussion below of screening for insecticidal activity. A variety of relevant (or even irrelevant) characteristics may be evaluated in series or in parallel at the discretion of the practitioner.

Description of the diverse generation procedures for generating modified nucleic acid sequences, such as those encoding proteins having pesticidal activity or fragments thereof, is found in the following publications and references cited therein: song (Sonng et al, (2000) Nature genetics (nat. Genet.) 25(4) 436-; schlemer et al, (1999) Tumor Targeting 4: 1-4; niss (Ness) et al, (1999) Nature Biotechnology (Nat. Biotechnol.) l7: 893-896; conventional (Chang) et al, (1999) Nature Biotechnology l7: 793-797; misoull (Minshull) and Schlemol (1999) Current State of chemical biology (curr. Opin. chem. biol.) 3: 284-290; kristian (Christian) et al, (1999) Nature Biotechnology 17: 259-264; kaimeriy et al, (1998) Nature 391: 288-291; kemerally et al, (1997) Nature Biotechnology 15:436- > 438; zhang et al, (1997) PNAS USA 94: 4504-; parten (Pattern) et al, (1997) Current State of Biotechnology (curr. Opin. Biotechnol.) 8: 724-733; karmeri et al, (1996) Nature medicine (nat. Med.) 2: 100-; kaimeriy et al, (1996) Nature Biotechnology 14: 315-; gatts et al, (1996) journal of molecular biology 255: 373-386; schlemer (1996) "sexual PCR and Assembly PCR (Sexual PCR and Assembly PCR)" in: encyclopedia of Molecular Biology (the encyclopedia of Molecular Biology), VCH publisher, New York, page 447-457; karmeri and Schlemer, (1995) BioTechniques 18: 194-195; st.mercer et al, (1995) Gene (Gene), 164: 49-53; st mercer, (1995) science 270: 1510; schlemol, (1995) Biotechnology (Biotechnology) 13: 549-553; schlemer, (1994) Nature 370:389-391 and Schlemer, (1994) PNAS USA 91: 10747-10751.

Methods of mutagenesis for generating diversity include, for example, site-directed mutagenesis (Ling et al, (1997) analytical biochemistry (Anal Biochem) 254(2): 157-178; Dale (Dale) et al, (1996) methods in Molecular Biology (methods mol. biol.) -57: 369-374; Smith (Smith), (1985) Ann. Rev. Genet.) -19: 423-462; Botstein and Shortle (short), (1985) 229: 1193-1201; Carter (Carter), (1986) J. Biochemical J. 237:1-7 and Kinkel (Kunkel), (1987) efficiency of oligonucleotide-directed mutagenesis (The specificity of Molecular Biology) (Biochemical Biology, Molecular Biology (Molecular Biology, berlin)); mutagenesis studies using uracil-containing templates (Mutagenesis using uracil conjugation templates) (Kinkel, (1985) PNAS USA82: 488-; oligonucleotide-directed mutagenesis (oligonucleotide-directed mutagenesis) (Zoller and Smith, (1983) methods in enzymology 100: 468. cndot. 500; Zoller and Smith, (1987) methods in enzymology 154: 329. cndot. 350 (1987); Zoller and Smith, (1982) nucleic acid research 10: 6487. cndot. 6500), phosphorothioate-modified DNA mutagenesis (phosphothioate-modified DNA mutagenesis) (Taylor (Taylol) et al, (1985) nucleic acid research 13: 8749. cndot. 8764; Taylor et al, (1985) nucleic acid research 13: 8765. cndot. 8787 (1985); Secondary (Nakamaye) and Ekstan, (1986) nucleic acid research 9698; Sayer. cndot. sup. 802; 19816) nucleic acid research 798; Japanese laid-by Skyo et al, (1986) nucleic acid research 9698; 19816) nucleic acid research 798) 19816. cndot. sup. 814); mutagenesis of gapped duplex DNA (mutagenises using gapped duplex DNA) (Kramer) et al, (1984) nucleic acids Res. 12: 9441-9456; Kramer and Fritz (Fritz), (1987) methods in enzymology 154: 350-367; Kramer et al, (1988) nucleic acids Res. 16:7207 and Fritz et al, (1988) nucleic acids Res. 16:6987-6999) was used.

Additional suitable methods include point mismatch repair (Kleimer et al, (1984) Cell (Cell) 38:879- > 887), mutagenesis using repair-deficient host strains (Katel et al, (1985) nucleic acids Res 13:4431- > 4443 and Katel, (1987) methods in enzymology 154:382- > 403), deletion mutagenesis (deletion mutagenesis) (Edegendangerzadeh and Hynidekloff (Henikoff), (1986) nucleic acids Res 14:5115), restriction selection and restriction purification (restriction-selection and restriction-purification) (Wells et al, (1986) King of Japan, Ph et al, (1986) synthesis of Cell (Cell) 415: 1299- > K.423), total gene mutagenesis (sodium 1299-) (sodium 1293, K-mutation) and (sodium 129223), (1988) nucleic acid research 14: 6361-6372; wells et al, (1985) Gene 34: 315-. Additional details on many of the above methods can be found in enzymology methods, volume 154, which also describes the use of various mutagenesis methods for useful control of troubleshooting problems.

Additional details regarding various diversity generation methods can be found in the following U.S. patents, PCT publications and applications, and EPO publications: U.S. Pat. No. 5,723,323, U.S. Pat. No. 5,763,192, U.S. Pat. No. 5,814,476, U.S. Pat. No. 5,817,483, U.S. Pat. No. 5,824,514, U.S. Pat. No. 5,976,862, U.S. Pat. No. 5,605,793, U.S. Pat. No. 5,811,238, U.S. Pat. No. 5,830,721, U.S. Pat. No. 5,834,252, U.S. Pat. No. 5,837,458, WO 1995/22625, WO 1996/33207, WO 1997/20078, WO 1997/35966, WO 1999/41402, WO 1999/41383, WO 1999/41369, WO 1999/41368, EP 752008, EP0932670, WO 1999/23107, WO 1999/21979, WO 1998/31837, WO 1998/27230, WO 1998/27230, WO 2000/00632, WO 2000/09679, WO 1998/42832, WO 1999/29902, WO 1998/41653, WO 1998/41622, WO 1998/42727, WO 2000/18906, WO 3982, WO 3538, WO 2000/04190, WO 2000/42561, WO2000/42559, WO 2000/42560 and WO 2001/23401.

Nucleic acid molecular probe for finding related nucleic acid

237页详细技术资料下载
上一篇:一种医用注射器针头装配设备
下一篇:突变β-葡萄糖苷酶

网友询问留言

已有0条留言

还没有人留言评论。精彩留言会获得点赞!

精彩留言,会给你点赞!