Methods and systems for identifying hybrids for use in plant breeding

文档序号：1078411 发布日期：2020-10-16 浏览：18次中文

阅读说明：本技术 用于识别供植物育种使用的杂交种的方法和系统 (Methods and systems for identifying hybrids for use in plant breeding ) 是由 S·P·K·查瓦利 S·达斯古普塔 M·加达里哈 N·波拉瓦拉普王梓于 2018-12-07 设计创作，主要内容包括：公开了用于识别供植物育种流水线使用的杂交种的示例性方法。一种示例性的由计算机实现的方法包括访问包括代表杂交种池的数据的数据结构,以及基于所述数据结构中包括的所述数据确定所述池中包括的至少一部分杂交种的预测得分。所述预测得分指示基于历史数据对杂交种进行选择的概率和/或所述杂交种的成功概率。所述方法还包括基于所述预测得分从所述池中选择杂交种群；基于杂交种集合的预期表现和/或与所述杂交种和/或构成所述杂交种的品系相关的一个或多个因素从所述杂交种群中识别所述杂交种集合；以及使所述杂交种集合进入所述育种流水线中的进一步迭代或不同阶段。(Exemplary methods for identifying hybrids for use in a plant breeding pipeline are disclosed. An exemplary computer-implemented method includes accessing a data structure including data representing a pool of hybrids, and determining a predicted score for at least a portion of the hybrids included in the pool based on the data included in the data structure. The prediction score indicates a probability of selection of a hybrid based on historical data and/or a probability of success of the hybrid. The method further comprises selecting a hybrid population from the pool based on the predicted score; identifying a set of hybrids from the hybrid population based on expected performance of the set of hybrids and/or one or more factors associated with the hybrids and/or the lines comprising the hybrids; and bringing the set of hybrids into further iterations or different stages in the breeding pipeline.)

1. A method for identifying hybrids for use in a plant breeding pipeline, the method comprising:

accessing a data structure comprising data representative of a pool of hybrid seeds;

determining, by at least one computing device, a predicted score for at least a portion of the hybrids included in the hybrid pool based on the data included in the data structure, the predicted score indicating a probability of hybrid selection based on historical data and/or a probability of success of the hybrids;

selecting, by the at least one computing device, a cross population from a pool of progeny based on the predicted score;

identifying, by the at least one computing device, a set of hybrid species from the hybrid population based on: (ii) the expected performance of the set of hybrids and/or one or more factors associated with the hybrids and/or the lines comprising the hybrids; and

the hybrid pool is entered into further iterations of a stage of a breeding pipeline or into different stages of the breeding pipeline.

2. The method of claim 1, wherein said historical data comprises historical phenotypic data associated with a plurality of hybrids and/or lines and historical selection of each hybrid in said plurality of hybrids; and

further comprising generating, by the at least one computing device, a predictive model based on the historical phenotypic data and the historical selections, a plurality of hybrids and/or lines associated with plant material of a type consistent with a plant type of the hybrid pool; and is

Wherein determining the predictive score for the at least a portion of the hybrids included in the hybrid pool comprises determining the predictive score based on the predictive model.

3. The method of claim 1, wherein the one or more factors comprise one or more of: strain distribution of male lines, strain distribution of female lines, heterosis diversity of male lines, heterosis diversity of female lines, trait or trait type, market segmentation, risk, product cost, trait availability/readiness.

4. The method of claim 1, wherein identifying the set of hybrids is based on a maximization of:

the terms are constrained by:

and maximization of:

the terms are constrained by:

5. the method of claim 1, wherein said data comprises phenotypic data representative of said pool of hybrids; and is

Wherein said selecting said hybrid population comprises selecting said hybrid population when the predicted score of the selected hybrid meets one or more thresholds.

6. The method of claim 1, wherein the set of hybrids (x) is identified_OPT) Is based on the following set identification algorithm:

7. the method of claim 6, wherein the set identification algorithm is constrained by at least one of:

and/or

8. The method of claim 7, wherein the set identification algorithm is constrained by at least one of:

9. The method of claim 8, wherein said identification of said set of hybrids is constrained by the following algorithm:

10. the method of claim 1, wherein entering the set of hybrids into different stages of the breeding pipeline comprises including a plant product in a growth space of the breeding pipeline after identifying the set of hybrids, the plant product based on at least one hybrid in the identified set of hybrids.

11. A system for identifying hybrids for use in a plant breeding pipeline, the system comprising:

a data structure comprising phenotypic data associated with pools of hybrids, each of the hybrids being based on two lines from different heterosis pools; and

a computing device communicatively coupled with the data structure and configured to:

accessing said phenotypic data associated with said pool of hybrids;

determining a predictive score for each hybrid in the pool of hybrids based on the accessed phenotypic data, the predictive score indicating a probability of selection of the hybrid based on historical data and/or a probability of success of the hybrid;

selecting a hybrid population from the pool of hybrids based on the predicted score;

identifying a set of hybrids from the selected hybrid population based on one or more factors associated with the hybrids; and

the set of hybrids is subjected to a validation stage of planting and/or testing.

12. The system of claim 11, wherein the computing device is configured to identify the set of hybrids based at least in part on a deviation of the identified set of hybrids from a desired pattern for at least one of: strain distribution, heterosis diversity and market segmentation.

13. The system of claim 12, wherein the computing device is further configured to identify the set of hybrids (x) based on_OPT)：

The terms are constrained by each of the following:

and is

14. The system of claim 13, further comprising the breeding pipeline communicatively coupled with the computing device; and is

Wherein the breeding pipeline comprises a culturing and testing phase and a validation phase; and is

Wherein the computing device is configured to receive at least a portion of the phenotypic data included in the data structure from the culturing and testing phase and store the at least a portion of the phenotypic data included in the data structure; and is

Wherein after the set of hybrids is entered into the breeding pipeline, a plant derived from at least one hybrid in the set of hybrids is planted in the growth space of the validation stage of the breeding pipeline.

15. The system of claim 13, wherein the computing device is further configured to identify the pool of hybrids based on user input prior to determining the predicted score for each hybrid in the pool of hybrids.

16. The system of claim 13, further comprising a growth space comprising one or more plants, wherein the one or more plants are derived from the identified set of hybrids.

17. A non-transitory computer-readable storage medium comprising executable instructions for identifying a hybrid for use in a plant breeding pipeline that, when executed by at least one processor, cause the at least one processor to:

accessing a data structure comprising data representative of a pool of hybrid seeds;

determining a predictive score for at least a portion of the pool of hybrids based on the data included in the data structure, the predictive score indicating a probability of hybrid selection based on historical data;

selecting a hybrid population from the pool of hybrids based on the predicted score;

identifying a set of hybrids from the hybrid population based on a probability of success of the set of hybrids and at least one factor associated with the hybrids; and

the hybrid collection is entered into further iterations of the culture and testing phase of the breeding pipeline and/or into the validation phase of the breeding pipeline.

18. The non-transitory computer-readable storage medium of claim 17, wherein the at least one factor comprises at least one of: strain distribution of male lines, strain distribution of female lines, heterosis diversity of male lines, heterosis diversity of female lines, traits or trait patterns, market segmentation, risk, product cost, trait availability/readiness; and/or

Wherein the executable instructions, when executed by at least one processor, cause the at least one processor to identify the set of hybrid species further based on a desired pattern of the at least one factor.

19. The non-transitory computer-readable storage medium of claim 17, wherein the executable instructions, when executed by the at least one processor, cause the at least one processor to:

generating a predictive model based on historical phenotypic data and historical selections of the hybrids and/or lines included in the data structure, the historical phenotypic data being associated with plant material of a type consistent with a plant type of a progeny pool; and/or

Wherein determining the prediction score comprises determining the prediction score based on the predictive model.

20. The non-transitory computer-readable storage medium of claim 17, wherein the data comprises phenotypic data representative of the pool of hybrids; and/or

Wherein said selecting a population of progeny comprises selecting said population of hybrids when the predicted score of the selected hybrid meets one or more thresholds.

Technical Field

The present disclosure relates generally to systems and methods for use in plant breeding, and in particular to systems and methods for identifying a set of hybrids from a pool of potential hybrids, and populating a breeding pipeline with the identified set of hybrids.

Background

This section provides background information related to the present disclosure that is not necessarily prior art.

In plant development, plants are modified by selective breeding or genetic manipulation. Also, when the desired improvements are achieved, commercial products are often developed by planting plants/seeds to achieve the desired improvements and harvesting the resulting seeds via several generations. Throughout development, many decisions are made based on the characteristics and/or traits of the plant being evaluated, and similarly on the characteristics and/or traits of the progeny, which are not guaranteed to inherit or exhibit the desired trait of the parent. Traditionally, as part of selecting a particular plant for further development, the parental genome is assessed for genetic sequences that, when crossed, can produce origins having the desired characteristics and/or traits, which can then be selected and/or filtered out by testing the plant. Plant development is known to involve a large number of possible lines and origins from which breeders make final breeding decisions (and/or select commercial products) by conventional techniques.

Drawings

The drawings described herein are for illustrative purposes only of selected embodiments and not all possible implementations, and are not intended to limit the scope of the present disclosure.

FIG. 1 illustrates an exemplary system of the present disclosure that is adapted to identify a set of hybrids from a pool of potential hybrids to enable advancement in one or more breeding pipelines;

FIG. 2 is a block diagram of a computing device that may be used in the exemplary system of FIG. 1;

FIG. 3 is an exemplary method suitable for use with the system of FIG. 1 to identify a set of hybrids from a pool of potential hybrids; and is

FIG. 4 includes bipartite graphs representing the identification of a set of hybrids from multiple lines, which forms a pool of hybrids.

Corresponding reference characters indicate corresponding parts throughout the several views of the drawings.

Detailed Description

Exemplary embodiments will now be described more fully with reference to the accompanying drawings. The description and specific examples included herein are intended for purposes of illustration only and are not intended to limit the scope of the present disclosure.

However, given a plurality of lines and a pool of hybrids (from which hybrids are selected), particularly when the pool contains a large number of hybrids (e.g., in a commercial setting), it is difficult to accurately select for high performing hybridsA collection of different hybrid species to be identified, said collection of hybrid species to be identified being reducible to aboutIn one illustrative example, where a human breeder is selecting one hundred hybrids (r) provided from one hundred male lines (m) and one hundred female lines (n), the number of potential collections to be identified, as an indicator of complexity, is quantified to about 10²⁰⁰. By way of this example, as well as other practical numbers of lines/hybrids, it is apparent that there is considerable complexity in selecting hybrids, particularly when trait distribution and/or genetic diversity is needed and/or desired to be considered.

Uniquely, the systems and methods herein allow for the identification of a set of hybrids from a pool of potential hybrids for inclusion in one or more breeding pipelines. Specifically, the selection engine selects a hybrid population from a pool of hybrids based on the predicted score, and then identifies a set of hybrids from the hybrid population based on one or more other factors associated with the hybrids. In particular, the selection engine may employ algorithms that take into account predicted performance, but control the set of hybrids identified by one or more factors and/or constraints (e.g., based on one or more desired traits, line distributions, heterosis diversity, risk, or desired market segments, etc.), for example, as described below. In this way, the complexity associated with the identification of a set of hybrids to be pushed towards commercialization can be mitigated and/or reduced, while maintaining considerable accuracy in selecting and considering possible performance and/or genetic diversity in the set of hybrids.

A hybrid is a cross of two separate plants or inbred lines, which are progeny of some historical origin. As used herein, lines refer to one or more parents of a hybrid and are to be construed as singular or plural where applicable. The lines may be divided into genetically distinct groups, also known as heterosis groups. The heterosis population may be referred to as the "male pool" and the "female pool". When similarity, e.g., based on markers, is used as a measure of distance between inbred lines, a male heterosis group and a female heterosis group are identified as two sets that can be separated into two different groups. Such terms are used to distinguish two heterosis groups from which two lines are selected for a given cross. The terms "male" and "female" are not intended to convey any information other than that the male and female lines are from different heterosis groups. Phenotypic data, trait distributions, ancestry, genetic sequences, commercial success, and additional information for the lines are generally known and may be stored in memory as described in more detail below.

As used herein, phenotypic data includes, but is not limited to, information about the phenotype of a given line or hybrid or a population of said given lines or hybrids. Phenotypic data may include size and/or vigor (traits) of the line (e.g., plant height, stalk circumference, stalk strength, etc.), yield, maturation time, resistance to biotic stress (e.g., disease or insect resistance), resistance to abiotic stress (e.g., drought or salt and alkali tolerance, etc.), growth climate, or any additional phenotype, and/or combinations thereof. It is to be understood that the systems and methods herein generally relate to and/or rely on phenotypic data associated with one or more lines, hybrids, and the like. That is, it is to be understood that, in one or more exemplary embodiments, genotype data can be used in conjunction with or in combination with (or otherwise) phenotypic data described herein (e.g., to supplement the phenotypic data and/or further inform models, algorithms, and/or predictions, etc. herein), which can then aid in the selection of a hybrid population or set of hybrids consistent with the description herein

Fig. 1 illustrates an exemplary system 100 for identifying a set of hybrids from a pool of hybrids for advancement, in which exemplary system 100 one or more aspects of the present disclosure may be implemented. Although in the described embodiment, the portions of system 100 are presented in one arrangement, other embodiments may include the same or different portions arranged in other ways, e.g., depending on the particular type of hybrid to be identified.

As shown in fig. 1, system 100 generally includes a breeding pipeline 102, which breeding pipeline 102 is provided to identify a set of hybrids from a pool of hybrids to advance toward commercial product development. The breeding pipeline 102 generally defines a number of pyramidal stages; whereby it begins with a large number of hybrids (e.g., potential crosses of available lines) and then successfully reduces (e.g., discounts) the number of hybrids to the preferred and/or desired hybrids. While breeding pipeline 102 is configured to identify and/or select hybrids as provided herein, breeding pipeline 102 can be configured to employ one or more other techniques, which can include a variety of methods known in the art, often depending on the particular plant and/or organism for which breeding pipeline 102 is provided.

In certain breeding pipeline embodiments (e.g., large industrial breeding pipelines, etc.), hundreds, thousands, or more lines, hybrids, etc. may be tested, selected, and/or promoted at several sites over several years in multiple stages to yield reduced hybrid collections, etc., which are then selected for commercial product development. Briefly, breeding pipeline 102 is configured to reduce a large number of lines and potential hybrids to a relatively small number of hybrids predicted to perform as desired as commercial products through testing, selection, etc. included therein.

In this exemplary embodiment, breeding pipeline 102 is described with reference to, and generally with respect to, corn or maize and traits and/or characteristics thereof. However, it should be understood that the methods disclosed herein are not limited to corn and may be used in plant breeding lines/programs related to other plants, for example to improve any fruit, vegetable, grass, tree, or ornamental crop, including but not limited to maize (Zea mays), soybean (Glycine max), cotton (Gossypium hirsutum), peanut (Arachis hypogaea), barley (Hordeum vulgare); oats (Avena sativa); orchard grass (Dactylismeglora); rice (Oryza sativa, including indica and japonica varieties); sorghum (Sorghum bicolor); sugar cane (Saccharum sp); tall fescue (Festuca arundinacea); grass species for lawn (for example, species: stolonifera (Agrostis stolonifera), Poa pratensis (Poa pratensis), St.oryzae (Stenotphrums undatus), etc.); wheat (Triticum aestivum), and alfalfa (Medicago sativa), members of the genus Brassica (Brassica), including cauliflower, cabbage, cauliflower, canola and rapeseed, carrots, chinese cabbages, cucumbers, beans, eggplant, fennel, beans, cucurbits, leeks, lettuce, melons, okra, onions, peas, peppers, squash, radishes, spinach, squash, sweet corn, tomatoes, watermelons, honeydew, cantaloupe and other melons, bananas, castor beans, coconut, coffee, cucumbers, poplar, southern pine, radiata pine, douglas fir, eucalyptus, apple and other tree species, citrus, grapefruit, lemon, lime and other citrus species, clover, linseed, olive, palm, chili, pepper, and jamaica pepper, sugar beet, sunflower, sweetgum trees, tea, tobacco and other fruits, vegetables, tubers and root crops. These methods may also be used in conjunction with non-crop species, particularly those used as model systems, such as arabidopsis thaliana. Furthermore, the systems and methods disclosed herein may be employed outside of plants, for example, in animal breeding programs or other non-plant and/or non-crop breeding programs.

As shown in fig. 1, breeding pipeline 102 includes a hybrid initiation stage 104 and a culture and test stage 106 (through one or more iterations), which hybrid initiation stage 104 and culture and test stage 106 together identify and/or select one or more hybrids to advance to a validation stage 108, where the hybrids are introduced into a pre-commercial test, where the intent and/or goal is to plant and/or commercialize the hybrids, e.g., depending on the particular type of hybrid or other suitable process (e.g., characterization and/or commercial development stage, etc.) in the validation stage 108. In this sense, it should be understood that breeding pipeline 102 can include a variety of conventional processes known to those skilled in the art in the three different stages 104, 106, and 108 shown in FIG. 1.

In the hybrid initiation phase 104, a pool of potential hybrids is provided from one or more line sets. Lines may be selected, for example, by a breeder, or otherwise depending on the particular type of plant, etc. Lines (and subsequently their associated origins) may also be selected, for example, based on an origin selection system and/or (at least in part) based on the methods and systems disclosed in U.S. patent application 15/618,023 entitled "methods for Identifying Crosses for use in Plant Breeding," the entire disclosure of which is incorporated herein by reference. Once lines, both male and female lines, are selected, the lines are combined to provide a pool of hybrids. The pool of hybrids then enters a cultivation and testing stage 106 where the hybrids are planted or otherwise introduced into one or more growth spaces, such as a greenhouse, a shade house, a nursery, a breeding plot, a field, etc., in the cultivation and testing stage 106.

Once the hybrids are grown, each hybrid is tested to derive and/or collect phenotypic data for the hybrids, whereby the phenotypic data is stored in one or more data structures described below. The test may include, for example, any suitable technique for determining phenotypic data. Such techniques may include any number of tests, trials or analyses known to be useful for assessing plant performance, including any phenotype known in the art. In preparation for such testing, samples of embryo and/or endosperm material/tissue can be harvested/removed from progeny in a manner that does not kill or otherwise prevent the seed or plant from surviving the test. For example, seed sections may be employed to obtain tissue samples from progeny for determining desired phenotypic data. Any other method of harvesting a tissue sample may also be used, such as performing the assay directly on seed tissue, without removing the tissue sample. In certain embodiments, the embryo and/or endosperm remains attached to other tissues of the seed. In certain other embodiments, the embryo and/or endosperm is separated from other tissues of the seed (e.g., embryo rescue, embryo excision, etc.). Common examples of phenotypes that may be obtained by such tests include, but are not limited to, size, shape, surface area, volume, mass, and/or amount of chemicals in at least one tissue of the seed (e.g., anthocyanins, proteins, lipids, carbohydrates, etc. in the embryo, endosperm, or other seed tissue). Where a hybrid has been selected or otherwise modified (e.g., grown from seeds, etc.) to produce a particular chemical (e.g., a drug, toxin, fragrance, etc.), the hybrid can be assayed to quantify the desired chemical.

In this sense, it should be understood that the culturing and testing phase 106 of the breeding pipeline 102 in this embodiment is not limited to certain or particular testing techniques, as any technique suitable to facilitate determination of relevant phenotypic data associated with hybrids at any stage of the life cycle may be used. That is, in certain instances, it may be advantageous to use testing techniques that can be performed without germinating the seeds of the hybrid and/or otherwise culturing the plant sporozoites (e.g., by seed slicing, etc., as described above). It should also be understood that the culturing and testing phase 106 of breeding pipeline 102 may include multiple iterations, as indicated by the loop arrows in fig. 1, in which hybrids are grown and/or tested and selected, and thereby depopulate the hybrid pool, where they are passed on to the next iteration or validation phase 108. The tests performed within the culturing and testing phase 106 may be adapted to include multiple iterations to provide tests and/or data appropriate for the hybrid and/or consistent with the techniques described herein.

With continued reference to FIG. 1, the transition of hybrids from one incubation and testing phase 106 to another (when periodic) and/or verification phase 108 is controlled in system 100 by selection engine 110. The selection engine 110 comprises a computing device, which may be a stand-alone computing service, or may be a computing device integrated with one or more other computing devices. Selection engine 110 facilitates control of identifying hybrids to transition from one iteration to another (e.g., when multiple iterations are included, etc.) within cultivation and testing phase 106, or to transition to verification phase 108 (as indicated by the dashed-line designation), and more generally to proceed from one phase to the next.

Selection engine 110 is configured to perform the operations described herein by way of computer-executable instructions and/or one or more algorithms herein (or variants thereof). Further, it should be understood that selection engine 110 can be configured to provide (e.g., generate and cause display at a computing device of a human breeder) and/or be responsive to one or more user interfaces through which a human breeder (broadly, a user) can provide input regarding a hybrid or hybrid desired trait and/or that can be used by algorithms herein (e.g., number of hybrids selected, input indicative of market segmentation, input defining desired traits, other input specific to one or more breeding strategies, or more generally, other aspects of identification of a set of hybrids; etc.). The user interface may be provided directly at a computing device of the human breeder in which the selection engine 110 is employed (e.g., computing device 200, etc., as described below), or via one or more network-based applications through which a remote user (again, possibly a human breeder) may be able to interact with the selection engine 110 as described herein.

In addition, as shown in FIG. 1, system 100 also includes a hybrid data structure 112 coupled to selection engine 110. In this exemplary embodiment, hybrid data structure 112 includes data relating to lines and hybrids, and the like. The data may include any type of data for lines and hybrids, etc., which may be historical data (e.g., plants over the last year, two years, five years, ten years, fifteen years or more, etc. of the culture and test phase), and/or data related to the current iteration of the culture and test phase 106, etc. The data can be further provided and/or generated within breeding pipeline 102 or from outside breeding pipeline 102.

Table 1 includes the data from a series of maize plant hybrids (H)_1,1To H_m,n) Wherein variable values are provided for yield and standability for each line from which the hybrid is derived. It should be understood that other data, particularly phenotypic data, of corn plants and other types of plants may be included, as contemplated herein.

TABLE 1

In addition to the specific phenotypic data for each hybrid, table 1 of hybrid data structure 112 also includes information regarding hybrid advancement decisions in breeding pipeline 102 or other breeding pipelines in one or more previous breeding cycles, years, and/or seasons. As shown, for example, hybrid H_1,1And H_m,nIs previously advanced ("true"), and hybrid H_1,2Is not previously advanced ("false").

In this exemplary embodiment, the selection engine 110 is configured to generate a predictive model based in whole or in part on historical data included in the hybrid data structure 112, wherein the predictive model provides a probability of "boosting" the hybrids for a given phenotypic data. The selection engine 110 may employ any suitable technique for generating a predictive model (also referred to as a "predictive algorithm"). These techniques may include, but are not limited to, random forests, support vector machines, logistic regression, tree-based algorithms, naive bayes, linear/logistic regression, deep learning, nearest neighbor methods, gaussian process regression, and/or various forms of recommender system techniques, methods, and/or algorithms to provide a way to determine the push probability for a given data set (e.g., yield, height, standability, etc. of corn).

In particular, for example, the predictive model may be consistent with a random forest, which is a collection of multiple decision tree classifiers. Each of the decision trees is trained on randomly sampled data from a training data set (e.g., the training data set included in table 1, etc.). Further, a random subset of features (e.g., as indicated by phenotypic data or the like) may then be selected to generate the individual trees. The final prediction score generated by the random forest is computed by the selection engine 110 as an aggregation of the individual trees and is related to the prediction of whether true or false (i.e., whether to advance) with respect to the features on which the tree generation is based.

Also, although this particular example, it should be understood that any suitable technique may be employed by the selection engine 110 to generate the predictive model.

Once the model is generated, the selection engine 110 is configured to determine a predictive score for each hybrid in the pool of hybrids (in the culture and test phase 106 of the present invention) based on the predictive model. In particular, when testing hybrids from a pool of hybrids, phenotypic data (e.g., yield, height, standability, oil content, pod count, etc.) is collected and stored in the hybrid data structure 112. To determine the prediction score, the selection engine 110 is configured to access the hybrid data structure 112 and retrieve data related to each hybrid in the pool of hybrids, such as those designated as F in table 2₁+M₁、F₁+M₂、F₁+M_m、F₂+M₁、F₃+M₁、F₄+M₁Up to F_n+M_mThe hybrid of (1). As shown, for each of the hybrids, the phenotypic data from the hybrid data structure 112 is included in table 2. The selection engine 110 is configured to then generate a predicted score based on the retrieved data and the predictive model, and determine a predicted score for each hybrid from the data.

TABLE 2

Further, the selection engine 110 is configured to select a hybrid population from the pool of hybrids based on the predicted score. In particular, the selection engine 110 may be configured to select hybrids for which the predicted score meets one or more thresholds, or alternatively sort hybrids based on the predicted score and then select a plurality of hybrids based on the index. In table 2, for example, the hybrid population selected by the selection engine 110 includes hybrids designated as "true" and does not include hybrids designated as "false".

The selection engine 110 is further configured to subsequently identify a set of hybrid species from the hybrid population to advance to the next iteration of the culture and test phase 106 and/or to the validation phase 108. To this end, the selection engine 110 is configured to employ one or more algorithms, as described herein or otherwise, to take into account the performance of the hybrid (e.g., based on a predetermined score, etc.) and one or more other factors related to the hybrid. As described herein, the factors can relate to, for example, strain distribution (e.g., male and/or female, etc.), heterosis diversity (e.g., male and/or female, etc.), traits (e.g., disease resistance, etc.), market segmentation, risk, production cost, trait availability/readiness, and the like. When appropriate, the selection engine 110 can be configured to perform the culture and test phase 106 and/or further iterations of the algorithms herein to identify a set of hybrids having a plurality of desired hybrids therein.

Finally, in breeding pipeline 102, selection engine 110 is configured to subject the identified set of hybrids to further iterations of cultivation and testing stage 106 and/or to validation stage 108 in which the hybrids are exposed to pre-commercial testing or other suitable processes (e.g., characterization and/or commercial development stages, etc.) with a goal and/or purpose being the planting and/or commercialization of the hybrids. For example, one or more plant products (e.g., seeds, etc.) can be included in the growth space (e.g., planting and testing stage 106, validation stage 108, etc.) of breeding pipeline 102, whereby the one or more plant products (e.g., one or more plant products for each identified hybrid, etc.) are derived from the set of identified hybrids. That is, the identified set of hybrids can then be subjected to one or more additional testing and/or selection methods, trait integration, and potentially one or more population techniques to prepare hybrids or hybrid-based plant material for further testing and/or commercial activities.

FIG. 2 illustrates an exemplary computing device 200, which exemplary computing device 200 can be used in system 100, for example, in connection with various stages of breeding pipeline 102, in connection with selection engine 110, hybrid data structure 112, and the like. For example, at different portions of breeding pipeline 102, breeders or other users interacting with computing devices consistent with computing device 200 enter data and/or access data in hybrid data structure 112 to support breeding decisions and/or tests completed/implemented by such breeders or other users. Further, the selection engine 110 includes at least one computing device consistent with computing device 200. In this regard, the selection engine 110 of the system 100 includes at least one computing device consistent with the computing device 200. The computing device 200 may be configured with executable instructions to implement the various algorithms and other operations described herein with respect to the selection engine 110. It should be understood that system 100 may include a variety of different computing devices, consistent with computing device 200 or different from computing device 200, as described herein.

Exemplary computing device 200 may include, for example, one or more servers, workstations, personal computers, laptop computers, tablets, smartphones, other suitable computing devices, combinations thereof, and the like. Further, computing device 200 may comprise a single computing device, or it may comprise multiple computing devices, located in close proximity or distributed over a geographic area, and coupled to each other via one or more networks. Such networks may include, but are not limited to, the internet, an intranet, a private or public Local Area Network (LAN), a Wide Area Network (WAN), a mobile network, a telecommunications network, a combination thereof, or one or more other suitable networks, and the like. In one example, the hybrid data structure 112 of the system 100 includes at least one server computing device, while the selection engine 110 includes at least one separate computing device coupled to the hybrid data structure 112 directly and/or through one or more LANs or the like.

In this sense, the illustrated computing device 200 includes a processor 202 and a memory 204 coupled to (and in communication with) the processor 202. Processor 202 may include, but is not limited to, one or more processing units (e.g., in a multi-core configuration, etc.) including a Central Processing Unit (CPU), a microcontroller, a Reduced Instruction Set Computer (RISC) processor, an Application Specific Integrated Circuit (ASIC), a Programmable Logic Device (PLD), a gate array, and/or any other circuit or processor that has the functionality described herein. The above list is exemplary only, and is thus not intended to limit in any way the definition and/or meaning of the processor.

As described herein, memory 204 is one or more devices that enable information (e.g., executable instructions and/or other data) to be stored and retrieved. Memory 204 may include one or more computer-readable storage media such as, but not limited to, Dynamic Random Access Memory (DRAM), Static Random Access Memory (SRAM), Read Only Memory (ROM), Erasable Programmable Read Only Memory (EPROM), solid state devices, flash drives, CD-ROMs, thumb drives, tapes, hard disks, and/or any other type of volatile or non-volatile physical or tangible computer-readable medium. Memory 204 may be configured to store, but is not limited to, a hybrid data structure 112, phenotypic data, test data, set selection algorithms, inbreds, various thresholds, predictive models, and/or other types of data (and/or data structures) suitable for use as described herein. In various embodiments, computer-executable instructions may be stored in memory 204 for execution by processor 202 to cause processor 202 to perform one or more of the functions described herein, thus memory 204 is a physical, tangible, and non-transitory computer-readable storage medium. It should be understood that memory 204 may include a variety of different memories, each implemented with one or more of the functions or processes described herein.

In an exemplary embodiment, the computing device 200 also includes a presentation unit 206, the presentation unit 206 being coupled to (and in communication with) the processor 202. Presentation unit 206 outputs or presents to a user of computing device 200 (e.g., a breeder, etc.) by, for example, displaying and/or otherwise outputting information (e.g., without limitation, a selected hybrid, progeny from the hybrid as a commercial product, and/or any other type of data). It should be further understood that in some embodiments, presentation unit 206 may include a display device such that various interfaces (e.g., applications (web-based or otherwise) and the like) may be displayed at computing device 200, particularly at the display device, to display such information and data and the like. And in some examples, computing device 200 may cause the interface to be displayed at a display device of another computing device that includes, for example, a server hosting a website having a plurality of web pages, or a server interacting with a web application employed at another computing device, or the like. The presentation unit 206 may include, but is not limited to, a Liquid Crystal Display (LCD), a Light Emitting Diode (LED) display, an organic LED (oled) display, an "electronic ink" display, combinations thereof, and the like. In some embodiments, presentation unit 206 includes a plurality of units.

Computing device 200 also includes an input device 208 that receives input from a user. An input device 208 is coupled to (and in communication with) the processor 202 and may include, for example, a keyboard, a pointing device, a mouse, a stylus, a touch-sensitive panel (e.g., a touchpad or touchscreen, etc.), another computing device, and/or an audio input device. Further, in some exemplary embodiments, a touch screen (such as a touch screen included in a tablet or similar device) is used as both the presentation unit 206 and the input device 208. In at least one exemplary embodiment, the presentation unit and the input device are omitted.

In addition, the illustrated computing device 200 includes a network interface 210, the network interface 210 being coupled to (and in communication with) the processor 202 (and in some embodiments also coupled to the memory 204). Network interface 210 may include, but is not limited to, a wired network adapter, a wireless network adapter, a telecommunications adapter, or other device capable of communicating with one or more different networks. In at least one embodiment, the network interface 210 is employed to receive input to the computing device 200. For example, the network interface 210 may be coupled to (and in communication with) a field data collection device in order to collect data for use as described herein. In some example embodiments, the computing device 200 may include a processor 202 and one or more network interfaces incorporated into the processor 202 or with the processor 202.

FIG. 3 illustrates an exemplary method 300 for identifying a set of hybrids to be advanced in a breeding pipeline from a pool of potential hybrids. The exemplary method 300 is described herein in connection with the system 100, and the exemplary method 300 may be implemented in whole or in part in the breeding pipeline 102, selection engine 110, and hybrid data structure 112 of the system 100. Moreover, for purposes of illustration, the example method 300 is also described with reference to the computing device 200 of FIG. 2. However, it should be understood that the method 300 or other methods described herein are not limited to the system 100 or the computing device 200. And conversely, the systems, data structures, and computing devices described herein are not limited to the exemplary method 300.

First, a breeder (or other user) initially selects a plant type (e.g., corn, etc.) for which to identify a set of hybrids. According to this option, a series of lines are identified for plant type, wherein the lines are segregated into two heterosis pools: male and female lines. FIG. 4 shows a bipartite graph 400, the bipartite graph 400 including a series of lines, of whichIs shown as a node and is designated as M₁To M₁₁Or F₁To F₁₁. It should be understood that the number of lines included in fig. 4 is for illustrative purposes only, and that different numbers of lines will typically be included in one or more embodiments of method 300 (e.g., 100 lines (or more or less) per heteroatom group, etc.). As shown, in fig. 4, the lines shown are separated into a male heterosis pool 402 and a female heterosis pool 404. The male line is then crossed with the female line to provide hybrids, and more particularly, a pool of hybrids from which a set of hybrids is to be identified, as shown in fig. 4. Hybrid pools include, for example, those designated F₁+M₁、F₁+M₂...F₂+M₁...F_n+M_mIncluding hybrid 406 shown in fig. 4 obtained by line connectors between male and female lines (e.g., hybrid F)₃+M₁Etc.).

Although there is an example shown in fig. 4 in which 100 male lines (n-100) and 100 female lines (m-100) are identified to the selection engine 110, the selection engine 110 may identify a set of 100 hybrids (r), for example, by using the method 300.

As shown in FIG. 3, selection engine 110 then initially accesses phenotypic data for the hybrids within hybrid data structure 112 at 302, wherein the phenotypic data generally includes historical data relating to past hybrids, as well as current or present data relating to hybrids included in the hybrid pool, i.e., F₁+M₁、F₁+M₂...F₂+M₁...F_n+M_m. Historical data may include, but is not limited to, yield data, height data, and stability data for corn for each of the lines included in the prior hybrids, as well as historical selection of hybrids, where true indicates, for example, that the hybrid has advanced in a prior breeding program, and where false indicates, for example, that the hybrid has not advanced in a prior breeding program. In this exemplary embodiment, the selection engine 110 bases 304 on the history of past hybridsThe historical phenotype data and the historical selections generate a predictive model, wherein the model provides a predictive score (based on the phenotype data) that indicates a probability of selecting a hybrid. The predictive models may be generated by the selection engine 110 through one or more different supervised, unsupervised, or semi-supervised algorithms/models such as, but not limited to, random forests, support vector machines, logistic regression, tree-based algorithms, naive bayes, linear/logistic regression, deep learning, nearest neighbor methods, gaussian process regression, and/or various forms of recommendation system algorithms, and the like.

Once the predictive model is generated, the selection engine 110 generates a predictive score (e.g., F) for each hybrid in the pool of hybrids based on the phenotypic data of the hybrids (accessed in the hybrid data structure 112 in memory 204) and the predictive model at 306₁+M₁...F_n+M_mEtc.).

Subsequently, the selection engine 110 selects a hybrid population from the pool of hybrids at 308 based on the predicted score generated at 306. The selection by the selection engine 110 can be accomplished in a variety of different ways using the prediction scores. In this exemplary embodiment, for example, the selection engine 110 indexes hybrids (e.g., in order from highest to lowest, etc.) based on the associated prediction scores, and the selection engine 110 then selects a hybrid population from the ordered hybrid pool as the maximum number (e.g., the maximum 6,000 hybrids, etc.) based on the index (at 308). In other examples, the selection engine 110 can apply one or more thresholds to the predicted score to retain hybrids for which the predicted score meets one or more thresholds (e.g., greater than (or less than) the one or more thresholds, etc.), while not selecting hybrids for which the predicted score fails to meet the one or more thresholds. From the hybrid population of FIG. 4, hybrid F is identified at 308, for example as shown in Table 2₁+M₁、F₂+M₁、F₃+M₁And F_n+M₁Selection from the pool of hybrids into the hybrid population, and hybrid F₁+M₂、F₁+M_mAnd F_n+M_mIs not selected to the hybrid populationIn (1).

Next, in method 300, selection engine 110 identifies a set of hybrid species from the hybrid population based on one or more set identification algorithms at 310. Typically, the one or more set identification algorithms are based on a probability of success for a hybrid that is a predicted score for each hybrid in the hybrid population (e.g., as determined at 306, etc.) and/or is derived from the predicted score. In addition, the selection engine 110 also relies on one or more factors to improve and/or alter the set of hybrids that may be identified based solely on the prediction scores. For example, the selection engine 110 may impose a trait limit on a set of hybrids to be identified, or define a desired line distribution or heterosis diversity pattern, the identified set of hybrids define a deviation or error or the like relative to the trait limit or the desired line distribution or heterosis diversity pattern, and then count the deviation or error as a penalty or cost, for example, for the probability of success of the hybrids in the identification of the set of hybrids. Other factors may include, for example, risk, cost of production (e.g., cost of goods, etc.), disease resistance or other traits (alone or in combination), market segmentation, trait integration, trait availability or readiness, or other factors related to the performance of the hybrid from the perspective of growth, availability, and/or commercial success.

In the exemplary embodiment, selection engine 110 employs a set identification algorithm as a series of algorithms that define the system of equations to be solved. In particular, two quadratic equations are provided, one for each male hybrid (equation 1) and one for each female hybrid (equation 3). Each equation is solved to provide the line distribution that is followed by the final identification of the set of hybrids (i.e., as a continuous variable). That is, for the bipartite graph of FIG. 4, the quadratic equations are associated with heterosis pools 402 and 404. The mixed integer program selects bipartite graph edges specific to one or more optimizers that are differentiated following a desired node type. By using mixed integer programming, several population distributions in the set of hybrids identified at 310 are also maintained. The advantages to be included in the equations (equations 1 to 4 below)Chemical deviceAnd

used as an input to the mixed integer program, which is then used in the mixed integer program to identify a set of hybrids. The female quadratic equation (equation 1) is as follows:

maximization

In this regard, equation 1 is constrained by equation 2:

the male quadratic equation (equation 3) is as follows:

maximization

In this regard, equation 3 is constrained by equation 4:

in the female quadratic (and similarly for the male quadratic),

and

linear manifestation and secondary diversity of strain use are indicated, whereinIs the probability of success of a female line (e.g., by averaging the probabilities of related hybrids, or by determining and/or retrieving a female-specific productProbability of a series, etc.). The value for a female line with 100% homology will be "1". The value for a female line with 0% homology will be "0". Most lines will share some homology and are scored as a decimal between 0 and 1. An exemplary pairwise matrix or S of lines in the female heterosis pool is provided in Table 3 below_f。

TABLE 3

In addition to this, the present invention is,andlinear manifestation and secondary diversity of strain use are indicated, whereinIs the probability of success of a male line (e.g., by averaging the probabilities of related hybrids, or by determining and/or retrieving probabilities specific to a female line, etc.). Likewise, a male line with 100% homology will have a value of "1". The value for a male line with 0% homology will be "0". Most lines will share some homology and be scored as a decimal number between 0 and 1. An exemplary pairwise matrix or S of lines in the male heterosis pool is provided in Table 4 below (and based on the clustering of the lines, as described below)_m。

TABLE 4

Genetic diversity is included in the set identification algorithm to limit and/or mitigate risks associated with the use of lines with a high strength of similar genetic background within the identified set of hybrids. Once these distributions of line usage are identified, they are introduced by selectionEngine 110 adopts optimizer

Andto identify a set of hybrids, subject the set of hybrids to a constraint that follows a desired and/or required line usage with a given and/or desired probability of success (e.g., a relatively higher or highest probability of success).

In conjunction with the above, selection engine 110 employs the following mixed integer algorithm to identify hybrid set x from the hybrid population at 310_OPT. This exemplary algorithm below (equation 5) is combined or integrated with the above quadratic equations (equations 1-4), also referred to herein as a set identification algorithm.

In this regard, equation 5 is constrained by equations 6-11:

for the above, at 310, selection engine 110 is provided to identify r hybrids into a set of hybrids, where r may include, for example, 100 hybrids.

Term p_iIndicating the probability of success and is generated by a predictive algorithm for the hybrid. In particular, the term p_iIs calculated as a combination of the predictive score (determined at 306) and one or more phenotypic traits. Then, the term p_iA linear combination of the main traits is reflected, with weights defined by mutual information related to the historical data. In this manner, a more discrete way of assessing performance is provided for the cross population than the broader progeny pool described above.

In equations 7 and 8, the patterns to be followed by the set of hybrids are provided according to quadratic equations (e.g., equations 1-5, etc.), as above

And

in addition, item M_mA correlation matrix indicating a set of hybrids from a set of male lines, wherein the presence of a particular male line is a "1" and the absence of a particular male line is a "0". A simplified example matrix is shown in table 5 below.

TABLE 5

Item M_fA correlation matrix indicating a set of hybrids from a set of female lines, wherein the presence of a particular female line is a "1" and the absence of a particular female line is a "0". A simplified example matrix is shown in table 6 below.

TABLE 6

Based on the above, when the set of hybrids (x) deviates from the pattern of male line distribution and female line distribution, the set identification algorithm in equation 5 will impose a penalty or cost that may repeatedly instill over-representation (over representation) of some lines from the set of hybrids to be identified.

According to the above, equations 7 and 8 provide the deviation θ from the form defined by the above quadratic equation, which is the desired form_m(i) And theta_f(i) In that respect When the deviations are contained in equation 5 (set identification algorithm), then each deviation provides a cost or penalty to the set of hybrids for the deviation from the desired pattern. That is, for both male line distribution and female line distribution, a cost is assigned to the deviation from the desired pattern. Although provided in a particular manner in this exemplary embodiment, the strain distribution (or possibly even hybrids) of one or both of the male and/or female strains may be provided in other manners in different embodiments (or even omitted as a factor in other embodiments).

Further, the set identification algorithm (equation 5) takes into account the heterosis diversity of each of the male lines and the female lines included in the hybrid set by equations 9 and 10. As shown in fig. 4, each of the lines in heterosis pools 402 and 404 are grouped into one or more clusters. In particular, for example, the selection engine 110 or other computing device associated with the method 300 may classify the inbred into a heterosis pool using the following distance metric (as represented by equation 12 and equation 13).

l_ii:＝-∑_j，j≠il_ij(13)

Here, s_ijIs the similarity between the ith and jth lines, and l_ijIs the ijth hybridization entry of the laplace matrix L. In this example, the selection engine 110 employs spectral clustering followed by eigen-analysis to determine/estimate the number of clusters (i.e., FIG. 4)Three in each of the heterosis pools 402 and 404) and then clustering the inbred lines into the heterosis pools using the K-means. However, it should be understood that various other known clustering techniques may alternatively be used. In this exemplary embodiment, clustering is performed separately on the set of male inbreds and the set of female inbreds to identify genetic pools in the lines. In this example, the selection engine 110 utilizes Eigen Analysis (Eigen Analysis) to estimate the cluster number in an unsupervised manner.

Then, once the required number of clusters is determined, dimensionality reduction is performed by the selection engine 110 by, for example, projecting the laplacian matrix L onto the dominant eigenmodes via the equations provided below (equations 14 and 15). In the first equation (equation 14) below, L is based on the similarity distance s_ijA Laplace matrix is created, and

is a normalized laplacian matrix normalized by a diagonal matrix D.In the second equation below (equation 15), the normalized Laplace matrix is decomposed using singular value decomposition. matrix ∑ contains eigenvalues that capture the number of clusters from the spectral clusters.selection engine 110 then uses the K-means algorithm to line F₁To F₁₁And strain M₁To M₁₁Clustering is performed (in their respective heterosis pools 402 and 404). Because the K-means algorithm is a speculative or random clustering mechanism, in this example, the selection engine 110 can cluster the lines in multiple different implementations of the K-means algorithm, select a maximum or relatively high inter-cluster distance, and so on. Also, although spectral clustering is used herein, it should be understood that other clustering algorithms may be employed by the selection engine 110 or other computing device, including, for example, hierarchical clustering, bayesian clustering, C-means clustering, and the like.

As shown in fig. 4, each of the lines is included in one line cluster of a line cluster and is associated with a distance to or similarity to other lines within the cluster. It will also be appreciated that in this embodiment, a similarity matrix or similarity matrix based on the same labels is provided to characterize the diversity in the quadratic equations described above. Thus, the same similarity matrix may form an item s in a cluster_ijAnd used to classify lines as a heterosis pool.

In addition, itemA correlation matrix from progeny to male heterosis population is indicated, where the presence of male lines in the cluster is indicated by "1" and the absence of male lines in the cluster is indicated by "0". Simplified exemplary matrixShown in Table 7 below, where the clusters in FIG. 4 are designated as C for the male heterosis pool 402₁、C₂And C₃。

TABLE 7

In addition, item

A correlation matrix from progeny to male heterosis population is indicated, where the presence of a female line in the cluster is indicated by a "1" and the absence of a female line in the cluster is indicated by a "0". Simplified exemplary matrix

As shown in Table 8 below, where the clusters in FIG. 4 are designated as C for the male heterosis pool 402₁、C₂And C₃。

TABLE 8

In addition, referring to equation 9, the term

The average of the probability scores for hybrids of male lines from the ith heterosis pool is indicated. Item(s)May be determined, for example, by multiplying the score vector by a mapping matrixAnd (4) obtaining. And, with reference to equation 10, the termThe average of the probability scores for hybrids indicating the female line from the ith heterosis pool. Item(s)

May be determined, for example, by multiplying the score vector by a mapping matrix

And (4) obtaining.

According to the above, equations 9 and 10 provide the deviation γ of the desired pattern of heterosis diversity relative to male and female lines, respectively_m(i) And gamma_f(i) In that respect When the deviations are contained in equation 5, then each deviation provides a cost or penalty to the set of hybrids for the deviation of the desired pattern relative to heterosis diversity. That is, for both male and female heterosis diversity, the phases are reversedThe cost is assigned to the deviation of the desired profile. Although provided in a particular manner in this exemplary embodiment, heterosis diversity, or more generally genetic diversity (or even possibly hybrids) of one or both of the male and/or female lines may be provided in other ways in different embodiments (or even omitted as a factor in other embodiments).

Referring now to equation 11, the termIndicating trait T from hybrid_kAnd thus a matrix as the matrix described above, wherein the values in the matrix comprise for each hybrid, for example, a 1 or 0, said 1 or 0 indicating whether the trait is present in said hybrid. It will be appreciated that a matrix of hybrids can be provided other than 0 or 1 to provide a more accurate indication of not only the presence of a trait, but also the extent of the trait for certain types of traits.

In this way, the item

Can be used for controlling character combination according to market division. For example, MS is segmented for five markets₁、MS₂、MS₃、MS₄And MS₅And for each of the hybrids, terms may be employed based on their yield, disease susceptibility, etcTo identify which market segments a trait can potentially be provided and/or pushed into. The following matrix in table 9 provides a simple exemplary matrix of hybrids for market segmentation.

TABLE 9

As shown, similar to the matrix above, the matrix of Table 9 includes a "1" toAn indication that the hybrid is likely a potential candidate for market segmentation, and a "0" is included to indicate that the hybrid is not a candidate for market segmentation. One hybrid may qualify for multiple market segments. In the above example, M₁+F₁Aiming at market segmentation MS₁、MS₄And MS₅Indicated. When the matrix is multiplied by the decision vector x in equation 11_jIt will produce a combinatorial distribution of hybrids in different market segments. The selection engine 110 can then implement and/or understand the boundaries as defined by one or more breeding and/or business strategies according to market segmentation requirementsAnd

they are the trait T_kA combined upper limit and a combined lower limit. The value of the boundary may be selected by, for example, a human breeder based on one or more business constraints and/or considerations (e.g., desired market segment engagement, desired trait type, etc.) or otherwise. It should be appreciated that in this exemplary embodiment, equation 11 does not impose any penalty or cost on the suitability of the hybrid set for market segmentation, but rather imposes a strict constraint on the set identification algorithm, and therefore must satisfy the strict constraint. That is, the set of hybrid species identified by equation 5 must include a set of hybrid species that satisfy the upper and lower limits provided in equation 11.

However, it should be understood that in other method embodiments the trait factors (e.g., market segmentation factors, etc.) may be different such that the trait factors (e.g., line distribution and/or heterosis diversity) impose a cost and/or penalty on equation 5 (or other suitable algorithm) rather than a strict constraint. It should be further understood that other factors described herein may be provided in the set identification algorithm as strict constraints, as described above with respect to the trait factors (thereby forcing the algorithm to satisfy the constraints).

Further, while the market segmentation is determined and/or considered in a particular manner in the provision in this exemplary embodiment, the market segmentation may be considered and/or provided in other manners in different embodiments (or even omitted as a factor in other embodiments).

Further, as indicated above, equation 5 includes a plurality of different weighting factors, one of which is related to the probability of success λ_pA weighting factor is associated with male lines

A weighting factor associated with female linesA weighting factor is related to heterosis diversity of male lines

And a weighting factor is associated with heterosis diversity of female lines

And the like. It should be understood that the weights are selected by the human breeder to set priority among the different factors related to the weights. Wherein, for example, where line distribution is more important, weighting factors may be applied to increase the cost and/or penalty for deviations from the desired pattern

And

in addition, the weight or a portion of the weight can be selected based on historical data or the like associated with the line and/or hybrid. In addition, the weight of the trait combination distribution can be determined (see equation 11 above), whereby the weight will provide a penalty or cost for the deviation of the trait combination distribution of the identified set of hybrids from the desired profile, whereby the trait combination distribution will not be strictly constrained.

In addition to the specific factors (e.g., performance factors, etc.) described above, risks may also be included as strain costs in one or more of the quadratic and/or mixed integer problems (or potentially as strict constraints in some embodiments). For a given set of hybrids, risk can be modeled as the chance of failure of one or more inbreds or hybrids. In characterizing the risk of a line, selection engine 110 may consider, for example, standability, disease susceptibility, etc., or other traits and/or performance indicators, etc., of the line. Additionally or alternatively, when characterizing the risk of a hybrid, the selection engine 110 can model the hybrid risk by standability, disease susceptibility, and cost of goods, among others. It should be understood that risk can be modeled as a linear cost with a negative coefficient such that the desired set of identified hybrids (e.g., modified in the above quadratic equations (e.g., equations 1-4, etc.) and/or equation 5 to include risk, etc.) will then provide limits and/or constraints on the risk associated with the set of identified hybrids (as compared to other potential sets of hybrids).

As indicated above, specific factors of line distribution, heterosis diversity, and market segmentation are presented for illustrative purposes and are not intended to limit the different permutations of factors that may be included in one or more set identification algorithms. Thus, different permutations of factors and different weights (or no weights) described herein may be employed in other set identification algorithms, which may then be used by the selection engine 110 with different permutations of the factors and different weights (or no weights), where the algorithm may depend on the probability of success of the hybrid, the lines that make up the hybrid, or some other basis for including the hybrid in the set of hybrids to be identified, etc. It should be apparent that other set selection algorithms may be employed in other method embodiments.

However, in the exemplary embodiment, selection engine 110 solves equation 5 in conjunction with other equations to provide for x_iThe vector of (a), the vector comprising a "1" to include a hybrid in the set of hybrids and a "0" to exclude the hybrid from the set of hybrids, whereby the hybrid is identified at 310Other hybrid species collection. In the above example, selection engine 110 compares x_i∈{0，1}^NDetermined as a vector with 100 hybrids associated with the representation of the contained "1". In addition, as shown in fig. 3, the selection engine 110 then enters 312 the set of hybrid seeds into further iterations of the culture and testing phase 106 and/or into the validation phase 108, thereby advancing the set of hybrid seeds toward commercial activities. In connection therewith, one or more hybrids from the collection of hybrids can be included and/or compiled into seeds and/or other plant products as needed and also included in the growth space (e.g., one or more greenhouses, shaders, nurseries, breeding plots, fields, etc.) of breeding pipeline 102. (e.g., during the incubation and testing phase 106 and/or to the validation phase 108, etc.).

In addition to the above, data relating to the selection of hybrids by selection engine 110 into a set of hybrids, as well as other data relating to the performance of the set of hybrids, is included in data structure 112 for further and/or subsequent iterations of the methods described herein for identifying hybrids for use in a plant breeding pipeline (e.g., in pipeline 102, etc.).

In view of the foregoing, the systems and methods herein allow for identification of hybrids to be advanced in a breeding pipeline. In particular, as described above, the number of potential hybrids from an inbred line is greatly reduced in a commercial breeding pipeline. In this manner, the expected, inclined, and/or hypothesized role of the breeder is reduced in the process, resulting in more efficient capture of commercially viable hybrids from a variety of potential hybrids. Through the systems and methods disclosed herein, breeders can vastly improve the associated breeding pipeline based on analysis of the vast amount of data relating to hybrids, in order to identify and potentially select these hybrids for advancement, where conventional breeding methods are limited in what can be considered. Further, the systems and methods herein are not limited in any way by geographic or other aspects. For example, if a crop can be grown in a given area, the selection engine 110 herein can be used to identify a set of hybrids for that particular market/environment by weighting data corresponding to certain traits that affect crop performance and/or success in that environment. Such environments may be represented globally or regionally, or it may be as fine grained as a particular location in a field (such that the same field is identified as having different such environments). In this manner, the systems and methods herein may be used for development of products specific to certain markets, geographies, soil types, etc., or for maximizing profit, maximizing customer satisfaction, minimizing production costs, etc.

Accordingly, it should be understood that in some embodiments, the functions described herein may be described as computer-executable instructions stored on a computer-readable medium and executable by one or more processors. The computer readable medium is a non-transitory computer readable medium. By way of example, and not limitation, such computer-readable media can comprise RAM, ROM, EEPROM, CD-ROM or other optical disk storage, magnetic disk storage or other magnetic storage devices, or any other medium that can be used to carry or store desired program code in the form of instructions or data structures and that can be accessed by a computer. Combinations of the above should also be included within the scope of computer-readable media.

It should also be understood that one or more aspects of the present disclosure transform a general-purpose computing device into a special-purpose computing device when configured to perform the functions, methods, and/or processes described herein.

As will be appreciated based on the foregoing specification, the above-described embodiments of the disclosure may be implemented using computer programming or engineering techniques including computer software, firmware, hardware, or any combination or subset thereof, wherein the technical effects may be achieved by performing at least one of the following: (a) accessing a data structure comprising data representative of a pool of hybrid seeds; (b) determining, by at least one computing device, a predicted score for at least a portion of the hybrids included in the hybrid pool based on data included in the data structure, the predicted score indicating a probability of hybrid selection and/or a probability of success of the hybrids based on the historical data; (c) selecting, by the at least one computing device, a cross population from the pool of progeny based on the predicted score; (d) identifying, by the at least one computing device, a set of hybrids from the hybrid population based on the expected performance of the set of hybrids and/or one or more factors associated with the hybrids and/or the lines comprising the hybrids; and (e) allowing the set of hybrids to enter further iterations of a stage of the breeding pipeline and/or enter different stages of the breeding pipeline.

Examples and embodiments are provided so that this disclosure will be thorough and will fully convey the scope to those skilled in the art. Numerous specific details are set forth such as examples of specific components, devices, and methods to provide a thorough understanding of embodiments of the present disclosure. It will be apparent to those skilled in the art that specific details need not be employed, that example embodiments may be embodied in many different forms and that example embodiments should not be construed as limiting the scope of the disclosure. In some example embodiments, well-known processes, well-known device structures, and well-known technologies are not described in detail. Additionally, advantages and improvements that may be realized using one or more of the exemplary embodiments disclosed herein may provide all, none, or all of the above advantages and improvements and still fall within the scope of the present disclosure.

The specific values disclosed herein are exemplary in nature and do not limit the scope of the disclosure. Particular values and particular ranges of values for a given parameter disclosed herein do not preclude other values and ranges of values that may be useful in one or more examples disclosed herein. Further, it is contemplated that any two particular values for a particular parameter described herein may define endpoints that are applicable to a range of values for a given parameter (i.e., disclosure of a first value and a second value for a given parameter may be interpreted to disclose that any value between the first value and the second value may also be used for the given parameter). For example, if parameter X is illustrated herein as having a value a and is also illustrated as having a value Z, it is contemplated that parameter X may have a range of values from about a to about Z. Similarly, it is contemplated that two or more ranges of values for a disclosed parameter (whether such ranges are nested, overlapping, or distinct) encompass all possible combinations of ranges of values that may be claimed using endpoints of the disclosed ranges. For example, if parameter X is exemplified herein as having a value in the range of 1-10 or 2-9 or 3-8, it is also contemplated that parameter X may have other ranges of values, including 1-9, 1-8, 1-3, 1-2, 2-10, 2-8, 2-3, 3-10, and 3-9.

The terminology used herein is for the purpose of describing particular example embodiments only and is not intended to be limiting. As used herein, the singular forms "a", "an" and "the" are intended to include the plural forms as well, unless the context clearly indicates otherwise. The terms "comprises," "comprising," and "having" are inclusive and therefore specify the presence of stated features, integers, steps, operations, elements, and/or components, but do not preclude the presence or addition of one or more other features, integers, steps, operations, elements, components, and/or groups thereof. The method steps, processes, and operations described herein are not to be construed as necessarily requiring their performance in the particular order discussed or illustrated, unless specifically identified as an order of performance. It should also be understood that additional or alternative steps may be employed.

When a feature is referred to as being "on," "engaged to," "connected to," "coupled to," "associated with," "in communication with," or "contained within" another element or layer, the feature may be directly on, engaged to, connected to, or coupled to the other feature, or associated with or in communication with or included in the other feature, or intervening features may be present. As used herein, the term "and/or" includes any and all combinations of one or more of the associated listed items.

Although the terms "first," "second," etc. may be used to describe various features, these features should not be limited by these terms. These terms may be used only to distinguish one feature from another. Terms such as "first," "second," and other numerical terms when used herein do not imply a sequence or order unless clearly indicated by the context. Thus, a first feature discussed herein may be termed a second feature without departing from the teachings of the example embodiments.

The foregoing description of embodiments has been presented for purposes of illustration and description. It is not intended to be exhaustive or to limit the disclosure. Individual elements or features of a particular embodiment are generally not limited to that particular embodiment, but are interchangeable under appropriate circumstances and can be used in a selected embodiment, even if the embodiment is not specifically shown or described. As such, may vary in many ways. Such variations are not to be regarded as a departure from the disclosure, and all such modifications are intended to be included within the scope of the disclosure.

27页详细技术资料下载

Methods and systems for identifying hybrids for use in plant breeding

相关技术

网友询问留言