Methods and systems for identifying hybrids for use in plant breeding
阅读说明:本技术 用于识别供植物育种使用的杂交种的方法和系统 (Methods and systems for identifying hybrids for use in plant breeding ) 是由 S·P·K·查瓦利 S·达斯古普塔 M·加达里哈 N·波拉瓦拉普 王梓 于 2018-12-07 设计创作,主要内容包括:公开了用于识别供植物育种流水线使用的杂交种的示例性方法。一种示例性的由计算机实现的方法包括访问包括代表杂交种池的数据的数据结构,以及基于所述数据结构中包括的所述数据确定所述池中包括的至少一部分杂交种的预测得分。所述预测得分指示基于历史数据对杂交种进行选择的概率和/或所述杂交种的成功概率。所述方法还包括基于所述预测得分从所述池中选择杂交种群;基于杂交种集合的预期表现和/或与所述杂交种和/或构成所述杂交种的品系相关的一个或多个因素从所述杂交种群中识别所述杂交种集合;以及使所述杂交种集合进入所述育种流水线中的进一步迭代或不同阶段。(Exemplary methods for identifying hybrids for use in a plant breeding pipeline are disclosed. An exemplary computer-implemented method includes accessing a data structure including data representing a pool of hybrids, and determining a predicted score for at least a portion of the hybrids included in the pool based on the data included in the data structure. The prediction score indicates a probability of selection of a hybrid based on historical data and/or a probability of success of the hybrid. The method further comprises selecting a hybrid population from the pool based on the predicted score; identifying a set of hybrids from the hybrid population based on expected performance of the set of hybrids and/or one or more factors associated with the hybrids and/or the lines comprising the hybrids; and bringing the set of hybrids into further iterations or different stages in the breeding pipeline.)
1. A method for identifying hybrids for use in a plant breeding pipeline, the method comprising:
accessing a data structure comprising data representative of a pool of hybrid seeds;
determining, by at least one computing device, a predicted score for at least a portion of the hybrids included in the hybrid pool based on the data included in the data structure, the predicted score indicating a probability of hybrid selection based on historical data and/or a probability of success of the hybrids;
selecting, by the at least one computing device, a cross population from a pool of progeny based on the predicted score;
identifying, by the at least one computing device, a set of hybrid species from the hybrid population based on: (ii) the expected performance of the set of hybrids and/or one or more factors associated with the hybrids and/or the lines comprising the hybrids; and
the hybrid pool is entered into further iterations of a stage of a breeding pipeline or into different stages of the breeding pipeline.
2. The method of claim 1, wherein said historical data comprises historical phenotypic data associated with a plurality of hybrids and/or lines and historical selection of each hybrid in said plurality of hybrids; and
further comprising generating, by the at least one computing device, a predictive model based on the historical phenotypic data and the historical selections, a plurality of hybrids and/or lines associated with plant material of a type consistent with a plant type of the hybrid pool; and is
Wherein determining the predictive score for the at least a portion of the hybrids included in the hybrid pool comprises determining the predictive score based on the predictive model.
3. The method of claim 1, wherein the one or more factors comprise one or more of: strain distribution of male lines, strain distribution of female lines, heterosis diversity of male lines, heterosis diversity of female lines, trait or trait type, market segmentation, risk, product cost, trait availability/readiness.
4. The method of claim 1, wherein identifying the set of hybrids is based on a maximization of:
the terms are constrained by:
and maximization of:
the terms are constrained by:
5. the method of claim 1, wherein said data comprises phenotypic data representative of said pool of hybrids; and is
Wherein said selecting said hybrid population comprises selecting said hybrid population when the predicted score of the selected hybrid meets one or more thresholds.
6. The method of claim 1, wherein the set of hybrids (x) is identifiedOPT) Is based on the following set identification algorithm:
7. the method of claim 6, wherein the set identification algorithm is constrained by at least one of:
and/or
8. The method of claim 7, wherein the set identification algorithm is constrained by at least one of:
9. The method of claim 8, wherein said identification of said set of hybrids is constrained by the following algorithm:
10. the method of claim 1, wherein entering the set of hybrids into different stages of the breeding pipeline comprises including a plant product in a growth space of the breeding pipeline after identifying the set of hybrids, the plant product based on at least one hybrid in the identified set of hybrids.
11. A system for identifying hybrids for use in a plant breeding pipeline, the system comprising:
a data structure comprising phenotypic data associated with pools of hybrids, each of the hybrids being based on two lines from different heterosis pools; and
a computing device communicatively coupled with the data structure and configured to:
accessing said phenotypic data associated with said pool of hybrids;
determining a predictive score for each hybrid in the pool of hybrids based on the accessed phenotypic data, the predictive score indicating a probability of selection of the hybrid based on historical data and/or a probability of success of the hybrid;
selecting a hybrid population from the pool of hybrids based on the predicted score;
identifying a set of hybrids from the selected hybrid population based on one or more factors associated with the hybrids; and
the set of hybrids is subjected to a validation stage of planting and/or testing.
12. The system of claim 11, wherein the computing device is configured to identify the set of hybrids based at least in part on a deviation of the identified set of hybrids from a desired pattern for at least one of: strain distribution, heterosis diversity and market segmentation.
13. The system of claim 12, wherein the computing device is further configured to identify the set of hybrids (x) based onOPT):
The terms are constrained by each of the following:
and is
14. The system of claim 13, further comprising the breeding pipeline communicatively coupled with the computing device; and is
Wherein the breeding pipeline comprises a culturing and testing phase and a validation phase; and is
Wherein the computing device is configured to receive at least a portion of the phenotypic data included in the data structure from the culturing and testing phase and store the at least a portion of the phenotypic data included in the data structure; and is
Wherein after the set of hybrids is entered into the breeding pipeline, a plant derived from at least one hybrid in the set of hybrids is planted in the growth space of the validation stage of the breeding pipeline.
15. The system of claim 13, wherein the computing device is further configured to identify the pool of hybrids based on user input prior to determining the predicted score for each hybrid in the pool of hybrids.
16. The system of claim 13, further comprising a growth space comprising one or more plants, wherein the one or more plants are derived from the identified set of hybrids.
17. A non-transitory computer-readable storage medium comprising executable instructions for identifying a hybrid for use in a plant breeding pipeline that, when executed by at least one processor, cause the at least one processor to:
accessing a data structure comprising data representative of a pool of hybrid seeds;
determining a predictive score for at least a portion of the pool of hybrids based on the data included in the data structure, the predictive score indicating a probability of hybrid selection based on historical data;
selecting a hybrid population from the pool of hybrids based on the predicted score;
identifying a set of hybrids from the hybrid population based on a probability of success of the set of hybrids and at least one factor associated with the hybrids; and
the hybrid collection is entered into further iterations of the culture and testing phase of the breeding pipeline and/or into the validation phase of the breeding pipeline.
18. The non-transitory computer-readable storage medium of claim 17, wherein the at least one factor comprises at least one of: strain distribution of male lines, strain distribution of female lines, heterosis diversity of male lines, heterosis diversity of female lines, traits or trait patterns, market segmentation, risk, product cost, trait availability/readiness; and/or
Wherein the executable instructions, when executed by at least one processor, cause the at least one processor to identify the set of hybrid species further based on a desired pattern of the at least one factor.
19. The non-transitory computer-readable storage medium of claim 17, wherein the executable instructions, when executed by the at least one processor, cause the at least one processor to:
generating a predictive model based on historical phenotypic data and historical selections of the hybrids and/or lines included in the data structure, the historical phenotypic data being associated with plant material of a type consistent with a plant type of a progeny pool; and/or
Wherein determining the prediction score comprises determining the prediction score based on the predictive model.
20. The non-transitory computer-readable storage medium of claim 17, wherein the data comprises phenotypic data representative of the pool of hybrids; and/or
Wherein said selecting a population of progeny comprises selecting said population of hybrids when the predicted score of the selected hybrid meets one or more thresholds.
Technical Field
The present disclosure relates generally to systems and methods for use in plant breeding, and in particular to systems and methods for identifying a set of hybrids from a pool of potential hybrids, and populating a breeding pipeline with the identified set of hybrids.
Background
This section provides background information related to the present disclosure that is not necessarily prior art.
In plant development, plants are modified by selective breeding or genetic manipulation. Also, when the desired improvements are achieved, commercial products are often developed by planting plants/seeds to achieve the desired improvements and harvesting the resulting seeds via several generations. Throughout development, many decisions are made based on the characteristics and/or traits of the plant being evaluated, and similarly on the characteristics and/or traits of the progeny, which are not guaranteed to inherit or exhibit the desired trait of the parent. Traditionally, as part of selecting a particular plant for further development, the parental genome is assessed for genetic sequences that, when crossed, can produce origins having the desired characteristics and/or traits, which can then be selected and/or filtered out by testing the plant. Plant development is known to involve a large number of possible lines and origins from which breeders make final breeding decisions (and/or select commercial products) by conventional techniques.
Drawings
The drawings described herein are for illustrative purposes only of selected embodiments and not all possible implementations, and are not intended to limit the scope of the present disclosure.
FIG. 1 illustrates an exemplary system of the present disclosure that is adapted to identify a set of hybrids from a pool of potential hybrids to enable advancement in one or more breeding pipelines;
FIG. 2 is a block diagram of a computing device that may be used in the exemplary system of FIG. 1;
FIG. 3 is an exemplary method suitable for use with the system of FIG. 1 to identify a set of hybrids from a pool of potential hybrids; and is
FIG. 4 includes bipartite graphs representing the identification of a set of hybrids from multiple lines, which forms a pool of hybrids.
Corresponding reference characters indicate corresponding parts throughout the several views of the drawings.
Detailed Description
Exemplary embodiments will now be described more fully with reference to the accompanying drawings. The description and specific examples included herein are intended for purposes of illustration only and are not intended to limit the scope of the present disclosure.
However, given a plurality of lines and a pool of hybrids (from which hybrids are selected), particularly when the pool contains a large number of hybrids (e.g., in a commercial setting), it is difficult to accurately select for high performing hybridsA collection of different hybrid species to be identified, said collection of hybrid species to be identified being reducible to aboutIn one illustrative example, where a human breeder is selecting one hundred hybrids (r) provided from one hundred male lines (m) and one hundred female lines (n), the number of potential collections to be identified, as an indicator of complexity, is quantified to about 10200. By way of this example, as well as other practical numbers of lines/hybrids, it is apparent that there is considerable complexity in selecting hybrids, particularly when trait distribution and/or genetic diversity is needed and/or desired to be considered.
Uniquely, the systems and methods herein allow for the identification of a set of hybrids from a pool of potential hybrids for inclusion in one or more breeding pipelines. Specifically, the selection engine selects a hybrid population from a pool of hybrids based on the predicted score, and then identifies a set of hybrids from the hybrid population based on one or more other factors associated with the hybrids. In particular, the selection engine may employ algorithms that take into account predicted performance, but control the set of hybrids identified by one or more factors and/or constraints (e.g., based on one or more desired traits, line distributions, heterosis diversity, risk, or desired market segments, etc.), for example, as described below. In this way, the complexity associated with the identification of a set of hybrids to be pushed towards commercialization can be mitigated and/or reduced, while maintaining considerable accuracy in selecting and considering possible performance and/or genetic diversity in the set of hybrids.
A hybrid is a cross of two separate plants or inbred lines, which are progeny of some historical origin. As used herein, lines refer to one or more parents of a hybrid and are to be construed as singular or plural where applicable. The lines may be divided into genetically distinct groups, also known as heterosis groups. The heterosis population may be referred to as the "male pool" and the "female pool". When similarity, e.g., based on markers, is used as a measure of distance between inbred lines, a male heterosis group and a female heterosis group are identified as two sets that can be separated into two different groups. Such terms are used to distinguish two heterosis groups from which two lines are selected for a given cross. The terms "male" and "female" are not intended to convey any information other than that the male and female lines are from different heterosis groups. Phenotypic data, trait distributions, ancestry, genetic sequences, commercial success, and additional information for the lines are generally known and may be stored in memory as described in more detail below.
As used herein, phenotypic data includes, but is not limited to, information about the phenotype of a given line or hybrid or a population of said given lines or hybrids. Phenotypic data may include size and/or vigor (traits) of the line (e.g., plant height, stalk circumference, stalk strength, etc.), yield, maturation time, resistance to biotic stress (e.g., disease or insect resistance), resistance to abiotic stress (e.g., drought or salt and alkali tolerance, etc.), growth climate, or any additional phenotype, and/or combinations thereof. It is to be understood that the systems and methods herein generally relate to and/or rely on phenotypic data associated with one or more lines, hybrids, and the like. That is, it is to be understood that, in one or more exemplary embodiments, genotype data can be used in conjunction with or in combination with (or otherwise) phenotypic data described herein (e.g., to supplement the phenotypic data and/or further inform models, algorithms, and/or predictions, etc. herein), which can then aid in the selection of a hybrid population or set of hybrids consistent with the description herein
Fig. 1 illustrates an
As shown in fig. 1,
In certain breeding pipeline embodiments (e.g., large industrial breeding pipelines, etc.), hundreds, thousands, or more lines, hybrids, etc. may be tested, selected, and/or promoted at several sites over several years in multiple stages to yield reduced hybrid collections, etc., which are then selected for commercial product development. Briefly,
In this exemplary embodiment,
As shown in fig. 1,
In the hybrid initiation phase 104, a pool of potential hybrids is provided from one or more line sets. Lines may be selected, for example, by a breeder, or otherwise depending on the particular type of plant, etc. Lines (and subsequently their associated origins) may also be selected, for example, based on an origin selection system and/or (at least in part) based on the methods and systems disclosed in U.S. patent application 15/618,023 entitled "methods for Identifying Crosses for use in Plant Breeding," the entire disclosure of which is incorporated herein by reference. Once lines, both male and female lines, are selected, the lines are combined to provide a pool of hybrids. The pool of hybrids then enters a cultivation and
Once the hybrids are grown, each hybrid is tested to derive and/or collect phenotypic data for the hybrids, whereby the phenotypic data is stored in one or more data structures described below. The test may include, for example, any suitable technique for determining phenotypic data. Such techniques may include any number of tests, trials or analyses known to be useful for assessing plant performance, including any phenotype known in the art. In preparation for such testing, samples of embryo and/or endosperm material/tissue can be harvested/removed from progeny in a manner that does not kill or otherwise prevent the seed or plant from surviving the test. For example, seed sections may be employed to obtain tissue samples from progeny for determining desired phenotypic data. Any other method of harvesting a tissue sample may also be used, such as performing the assay directly on seed tissue, without removing the tissue sample. In certain embodiments, the embryo and/or endosperm remains attached to other tissues of the seed. In certain other embodiments, the embryo and/or endosperm is separated from other tissues of the seed (e.g., embryo rescue, embryo excision, etc.). Common examples of phenotypes that may be obtained by such tests include, but are not limited to, size, shape, surface area, volume, mass, and/or amount of chemicals in at least one tissue of the seed (e.g., anthocyanins, proteins, lipids, carbohydrates, etc. in the embryo, endosperm, or other seed tissue). Where a hybrid has been selected or otherwise modified (e.g., grown from seeds, etc.) to produce a particular chemical (e.g., a drug, toxin, fragrance, etc.), the hybrid can be assayed to quantify the desired chemical.
In this sense, it should be understood that the culturing and
With continued reference to FIG. 1, the transition of hybrids from one incubation and
Selection engine 110 is configured to perform the operations described herein by way of computer-executable instructions and/or one or more algorithms herein (or variants thereof). Further, it should be understood that selection engine 110 can be configured to provide (e.g., generate and cause display at a computing device of a human breeder) and/or be responsive to one or more user interfaces through which a human breeder (broadly, a user) can provide input regarding a hybrid or hybrid desired trait and/or that can be used by algorithms herein (e.g., number of hybrids selected, input indicative of market segmentation, input defining desired traits, other input specific to one or more breeding strategies, or more generally, other aspects of identification of a set of hybrids; etc.). The user interface may be provided directly at a computing device of the human breeder in which the selection engine 110 is employed (e.g., computing device 200, etc., as described below), or via one or more network-based applications through which a remote user (again, possibly a human breeder) may be able to interact with the selection engine 110 as described herein.
In addition, as shown in FIG. 1,
Table 1 includes the data from a series of maize plant hybrids (H)1,1To Hm,n) Wherein variable values are provided for yield and standability for each line from which the hybrid is derived. It should be understood that other data, particularly phenotypic data, of corn plants and other types of plants may be included, as contemplated herein.
TABLE 1
In addition to the specific phenotypic data for each hybrid, table 1 of
In this exemplary embodiment, the selection engine 110 is configured to generate a predictive model based in whole or in part on historical data included in the
In particular, for example, the predictive model may be consistent with a random forest, which is a collection of multiple decision tree classifiers. Each of the decision trees is trained on randomly sampled data from a training data set (e.g., the training data set included in table 1, etc.). Further, a random subset of features (e.g., as indicated by phenotypic data or the like) may then be selected to generate the individual trees. The final prediction score generated by the random forest is computed by the selection engine 110 as an aggregation of the individual trees and is related to the prediction of whether true or false (i.e., whether to advance) with respect to the features on which the tree generation is based.
Also, although this particular example, it should be understood that any suitable technique may be employed by the selection engine 110 to generate the predictive model.
Once the model is generated, the selection engine 110 is configured to determine a predictive score for each hybrid in the pool of hybrids (in the culture and
TABLE 2
Further, the selection engine 110 is configured to select a hybrid population from the pool of hybrids based on the predicted score. In particular, the selection engine 110 may be configured to select hybrids for which the predicted score meets one or more thresholds, or alternatively sort hybrids based on the predicted score and then select a plurality of hybrids based on the index. In table 2, for example, the hybrid population selected by the selection engine 110 includes hybrids designated as "true" and does not include hybrids designated as "false".
The selection engine 110 is further configured to subsequently identify a set of hybrid species from the hybrid population to advance to the next iteration of the culture and
Finally, in
FIG. 2 illustrates an exemplary computing device 200, which exemplary computing device 200 can be used in
Exemplary computing device 200 may include, for example, one or more servers, workstations, personal computers, laptop computers, tablets, smartphones, other suitable computing devices, combinations thereof, and the like. Further, computing device 200 may comprise a single computing device, or it may comprise multiple computing devices, located in close proximity or distributed over a geographic area, and coupled to each other via one or more networks. Such networks may include, but are not limited to, the internet, an intranet, a private or public Local Area Network (LAN), a Wide Area Network (WAN), a mobile network, a telecommunications network, a combination thereof, or one or more other suitable networks, and the like. In one example, the
In this sense, the illustrated computing device 200 includes a processor 202 and a memory 204 coupled to (and in communication with) the processor 202. Processor 202 may include, but is not limited to, one or more processing units (e.g., in a multi-core configuration, etc.) including a Central Processing Unit (CPU), a microcontroller, a Reduced Instruction Set Computer (RISC) processor, an Application Specific Integrated Circuit (ASIC), a Programmable Logic Device (PLD), a gate array, and/or any other circuit or processor that has the functionality described herein. The above list is exemplary only, and is thus not intended to limit in any way the definition and/or meaning of the processor.
As described herein, memory 204 is one or more devices that enable information (e.g., executable instructions and/or other data) to be stored and retrieved. Memory 204 may include one or more computer-readable storage media such as, but not limited to, Dynamic Random Access Memory (DRAM), Static Random Access Memory (SRAM), Read Only Memory (ROM), Erasable Programmable Read Only Memory (EPROM), solid state devices, flash drives, CD-ROMs, thumb drives, tapes, hard disks, and/or any other type of volatile or non-volatile physical or tangible computer-readable medium. Memory 204 may be configured to store, but is not limited to, a
In an exemplary embodiment, the computing device 200 also includes a presentation unit 206, the presentation unit 206 being coupled to (and in communication with) the processor 202. Presentation unit 206 outputs or presents to a user of computing device 200 (e.g., a breeder, etc.) by, for example, displaying and/or otherwise outputting information (e.g., without limitation, a selected hybrid, progeny from the hybrid as a commercial product, and/or any other type of data). It should be further understood that in some embodiments, presentation unit 206 may include a display device such that various interfaces (e.g., applications (web-based or otherwise) and the like) may be displayed at computing device 200, particularly at the display device, to display such information and data and the like. And in some examples, computing device 200 may cause the interface to be displayed at a display device of another computing device that includes, for example, a server hosting a website having a plurality of web pages, or a server interacting with a web application employed at another computing device, or the like. The presentation unit 206 may include, but is not limited to, a Liquid Crystal Display (LCD), a Light Emitting Diode (LED) display, an organic LED (oled) display, an "electronic ink" display, combinations thereof, and the like. In some embodiments, presentation unit 206 includes a plurality of units.
Computing device 200 also includes an input device 208 that receives input from a user. An input device 208 is coupled to (and in communication with) the processor 202 and may include, for example, a keyboard, a pointing device, a mouse, a stylus, a touch-sensitive panel (e.g., a touchpad or touchscreen, etc.), another computing device, and/or an audio input device. Further, in some exemplary embodiments, a touch screen (such as a touch screen included in a tablet or similar device) is used as both the presentation unit 206 and the input device 208. In at least one exemplary embodiment, the presentation unit and the input device are omitted.
In addition, the illustrated computing device 200 includes a network interface 210, the network interface 210 being coupled to (and in communication with) the processor 202 (and in some embodiments also coupled to the memory 204). Network interface 210 may include, but is not limited to, a wired network adapter, a wireless network adapter, a telecommunications adapter, or other device capable of communicating with one or more different networks. In at least one embodiment, the network interface 210 is employed to receive input to the computing device 200. For example, the network interface 210 may be coupled to (and in communication with) a field data collection device in order to collect data for use as described herein. In some example embodiments, the computing device 200 may include a processor 202 and one or more network interfaces incorporated into the processor 202 or with the processor 202.
FIG. 3 illustrates an exemplary method 300 for identifying a set of hybrids to be advanced in a breeding pipeline from a pool of potential hybrids. The exemplary method 300 is described herein in connection with the
First, a breeder (or other user) initially selects a plant type (e.g., corn, etc.) for which to identify a set of hybrids. According to this option, a series of lines are identified for plant type, wherein the lines are segregated into two heterosis pools: male and female lines. FIG. 4 shows a
Although there is an example shown in fig. 4 in which 100 male lines (n-100) and 100 female lines (m-100) are identified to the selection engine 110, the selection engine 110 may identify a set of 100 hybrids (r), for example, by using the method 300.
As shown in FIG. 3, selection engine 110 then initially accesses phenotypic data for the hybrids within
Once the predictive model is generated, the selection engine 110 generates a predictive score (e.g., F) for each hybrid in the pool of hybrids based on the phenotypic data of the hybrids (accessed in the
Subsequently, the selection engine 110 selects a hybrid population from the pool of hybrids at 308 based on the predicted score generated at 306. The selection by the selection engine 110 can be accomplished in a variety of different ways using the prediction scores. In this exemplary embodiment, for example, the selection engine 110 indexes hybrids (e.g., in order from highest to lowest, etc.) based on the associated prediction scores, and the selection engine 110 then selects a hybrid population from the ordered hybrid pool as the maximum number (e.g., the maximum 6,000 hybrids, etc.) based on the index (at 308). In other examples, the selection engine 110 can apply one or more thresholds to the predicted score to retain hybrids for which the predicted score meets one or more thresholds (e.g., greater than (or less than) the one or more thresholds, etc.), while not selecting hybrids for which the predicted score fails to meet the one or more thresholds. From the hybrid population of FIG. 4, hybrid F is identified at 308, for example as shown in Table 21+M1、F2+M1、F3+M1And Fn+M1Selection from the pool of hybrids into the hybrid population, and hybrid F1+M2、F1+MmAnd Fn+MmIs not selected to the hybrid populationIn (1).
Next, in method 300, selection engine 110 identifies a set of hybrid species from the hybrid population based on one or more set identification algorithms at 310. Typically, the one or more set identification algorithms are based on a probability of success for a hybrid that is a predicted score for each hybrid in the hybrid population (e.g., as determined at 306, etc.) and/or is derived from the predicted score. In addition, the selection engine 110 also relies on one or more factors to improve and/or alter the set of hybrids that may be identified based solely on the prediction scores. For example, the selection engine 110 may impose a trait limit on a set of hybrids to be identified, or define a desired line distribution or heterosis diversity pattern, the identified set of hybrids define a deviation or error or the like relative to the trait limit or the desired line distribution or heterosis diversity pattern, and then count the deviation or error as a penalty or cost, for example, for the probability of success of the hybrids in the identification of the set of hybrids. Other factors may include, for example, risk, cost of production (e.g., cost of goods, etc.), disease resistance or other traits (alone or in combination), market segmentation, trait integration, trait availability or readiness, or other factors related to the performance of the hybrid from the perspective of growth, availability, and/or commercial success.
In the exemplary embodiment, selection engine 110 employs a set identification algorithm as a series of algorithms that define the system of equations to be solved. In particular, two quadratic equations are provided, one for each male hybrid (equation 1) and one for each female hybrid (equation 3). Each equation is solved to provide the line distribution that is followed by the final identification of the set of hybrids (i.e., as a continuous variable). That is, for the bipartite graph of FIG. 4, the quadratic equations are associated with heterosis pools 402 and 404. The mixed integer program selects bipartite graph edges specific to one or more optimizers that are differentiated following a desired node type. By using mixed integer programming, several population distributions in the set of hybrids identified at 310 are also maintained. The advantages to be included in the equations (equations 1 to 4 below)Chemical deviceAnd
used as an input to the mixed integer program, which is then used in the mixed integer program to identify a set of hybrids. The female quadratic equation (equation 1) is as follows:maximization
In this regard, equation 1 is constrained by equation 2:
the male quadratic equation (equation 3) is as follows:
maximization
In this regard, equation 3 is constrained by equation 4:
in the female quadratic (and similarly for the male quadratic),
andlinear manifestation and secondary diversity of strain use are indicated, whereinIs the probability of success of a female line (e.g., by averaging the probabilities of related hybrids, or by determining and/or retrieving a female-specific productProbability of a series, etc.). The value for a female line with 100% homology will be "1". The value for a female line with 0% homology will be "0". Most lines will share some homology and are scored as a decimal between 0 and 1. An exemplary pairwise matrix or S of lines in the female heterosis pool is provided in Table 3 belowf。TABLE 3
In addition to this, the present invention is,andlinear manifestation and secondary diversity of strain use are indicated, whereinIs the probability of success of a male line (e.g., by averaging the probabilities of related hybrids, or by determining and/or retrieving probabilities specific to a female line, etc.). Likewise, a male line with 100% homology will have a value of "1". The value for a male line with 0% homology will be "0". Most lines will share some homology and be scored as a decimal number between 0 and 1. An exemplary pairwise matrix or S of lines in the male heterosis pool is provided in Table 4 below (and based on the clustering of the lines, as described below)m。
TABLE 4
Genetic diversity is included in the set identification algorithm to limit and/or mitigate risks associated with the use of lines with a high strength of similar genetic background within the identified set of hybrids. Once these distributions of line usage are identified, they are introduced by selectionEngine 110 adopts optimizer
Andto identify a set of hybrids, subject the set of hybrids to a constraint that follows a desired and/or required line usage with a given and/or desired probability of success (e.g., a relatively higher or highest probability of success).In conjunction with the above, selection engine 110 employs the following mixed integer algorithm to identify hybrid set x from the hybrid population at 310OPT. This exemplary algorithm below (equation 5) is combined or integrated with the above quadratic equations (equations 1-4), also referred to herein as a set identification algorithm.
In this regard, equation 5 is constrained by equations 6-11:
for the above, at 310, selection engine 110 is provided to identify r hybrids into a set of hybrids, where r may include, for example, 100 hybrids.
Term piIndicating the probability of success and is generated by a predictive algorithm for the hybrid. In particular, the term piIs calculated as a combination of the predictive score (determined at 306) and one or more phenotypic traits. Then, the term piA linear combination of the main traits is reflected, with weights defined by mutual information related to the historical data. In this manner, a more discrete way of assessing performance is provided for the cross population than the broader progeny pool described above.
In equations 7 and 8, the patterns to be followed by the set of hybrids are provided according to quadratic equations (e.g., equations 1-5, etc.), as above
Andin addition, item MmA correlation matrix indicating a set of hybrids from a set of male lines, wherein the presence of a particular male line is a "1" and the absence of a particular male line is a "0". A simplified example matrix is shown in table 5 below.TABLE 5
Item MfA correlation matrix indicating a set of hybrids from a set of female lines, wherein the presence of a particular female line is a "1" and the absence of a particular female line is a "0". A simplified example matrix is shown in table 6 below.
TABLE 6
Based on the above, when the set of hybrids (x) deviates from the pattern of male line distribution and female line distribution, the set identification algorithm in equation 5 will impose a penalty or cost that may repeatedly instill over-representation (over representation) of some lines from the set of hybrids to be identified.
According to the above, equations 7 and 8 provide the deviation θ from the form defined by the above quadratic equation, which is the desired formm(i) And thetaf(i) In that respect When the deviations are contained in equation 5 (set identification algorithm), then each deviation provides a cost or penalty to the set of hybrids for the deviation from the desired pattern. That is, for both male line distribution and female line distribution, a cost is assigned to the deviation from the desired pattern. Although provided in a particular manner in this exemplary embodiment, the strain distribution (or possibly even hybrids) of one or both of the male and/or female strains may be provided in other manners in different embodiments (or even omitted as a factor in other embodiments).
Further, the set identification algorithm (equation 5) takes into account the heterosis diversity of each of the male lines and the female lines included in the hybrid set by
lii:=-∑j,j≠ilij(13)
Here, sijIs the similarity between the ith and jth lines, and lijIs the ijth hybridization entry of the laplace matrix L. In this example, the selection engine 110 employs spectral clustering followed by eigen-analysis to determine/estimate the number of clusters (i.e., FIG. 4)Three in each of the heterosis pools 402 and 404) and then clustering the inbred lines into the heterosis pools using the K-means. However, it should be understood that various other known clustering techniques may alternatively be used. In this exemplary embodiment, clustering is performed separately on the set of male inbreds and the set of female inbreds to identify genetic pools in the lines. In this example, the selection engine 110 utilizes Eigen Analysis (Eigen Analysis) to estimate the cluster number in an unsupervised manner.
Then, once the required number of clusters is determined, dimensionality reduction is performed by the selection engine 110 by, for example, projecting the laplacian matrix L onto the dominant eigenmodes via the equations provided below (equations 14 and 15). In the first equation (equation 14) below, L is based on the similarity distance sijA Laplace matrix is created, and
is a normalized laplacian matrix normalized by a diagonal matrix D.In the second equation below (equation 15), the normalized Laplace matrix is decomposed using singular value decomposition. matrix ∑ contains eigenvalues that capture the number of clusters from the spectral clusters.selection engine 110 then uses the K-means algorithm to line F1To F11And strain M1To M11Clustering is performed (in their respective heterosis pools 402 and 404). Because the K-means algorithm is a speculative or random clustering mechanism, in this example, the selection engine 110 can cluster the lines in multiple different implementations of the K-means algorithm, select a maximum or relatively high inter-cluster distance, and so on. Also, although spectral clustering is used herein, it should be understood that other clustering algorithms may be employed by the selection engine 110 or other computing device, including, for example, hierarchical clustering, bayesian clustering, C-means clustering, and the like.
As shown in fig. 4, each of the lines is included in one line cluster of a line cluster and is associated with a distance to or similarity to other lines within the cluster. It will also be appreciated that in this embodiment, a similarity matrix or similarity matrix based on the same labels is provided to characterize the diversity in the quadratic equations described above. Thus, the same similarity matrix may form an item s in a clusterijAnd used to classify lines as a heterosis pool.
In addition, itemA correlation matrix from progeny to male heterosis population is indicated, where the presence of male lines in the cluster is indicated by "1" and the absence of male lines in the cluster is indicated by "0". Simplified exemplary matrixShown in Table 7 below, where the clusters in FIG. 4 are designated as C for the male heterosis pool 4021、C2And C3。
TABLE 7
In addition, item
A correlation matrix from progeny to male heterosis population is indicated, where the presence of a female line in the cluster is indicated by a "1" and the absence of a female line in the cluster is indicated by a "0". Simplified exemplary matrixAs shown in Table 8 below, where the clusters in FIG. 4 are designated as C for the male heterosis pool 4021、C2And C3。TABLE 8
In addition, referring to
According to the above,
Referring now to equation 11, the termIndicating trait T from hybridkAnd thus a matrix as the matrix described above, wherein the values in the matrix comprise for each hybrid, for example, a 1 or 0, said 1 or 0 indicating whether the trait is present in said hybrid. It will be appreciated that a matrix of hybrids can be provided other than 0 or 1 to provide a more accurate indication of not only the presence of a trait, but also the extent of the trait for certain types of traits.
In this way, the item
Can be used for controlling character combination according to market division. For example, MS is segmented for five markets1、MS2、MS3、MS4And MS5And for each of the hybrids, terms may be employed based on their yield, disease susceptibility, etcTo identify which market segments a trait can potentially be provided and/or pushed into. The following matrix in table 9 provides a simple exemplary matrix of hybrids for market segmentation.TABLE 9
As shown, similar to the matrix above, the matrix of Table 9 includes a "1" toAn indication that the hybrid is likely a potential candidate for market segmentation, and a "0" is included to indicate that the hybrid is not a candidate for market segmentation. One hybrid may qualify for multiple market segments. In the above example, M1+F1Aiming at market segmentation MS1、MS4And MS5Indicated. When the matrix is multiplied by the decision vector x in equation 11jIt will produce a combinatorial distribution of hybrids in different market segments. The selection engine 110 can then implement and/or understand the boundaries as defined by one or more breeding and/or business strategies according to market segmentation requirementsAnd
they are the trait TkA combined upper limit and a combined lower limit. The value of the boundary may be selected by, for example, a human breeder based on one or more business constraints and/or considerations (e.g., desired market segment engagement, desired trait type, etc.) or otherwise. It should be appreciated that in this exemplary embodiment, equation 11 does not impose any penalty or cost on the suitability of the hybrid set for market segmentation, but rather imposes a strict constraint on the set identification algorithm, and therefore must satisfy the strict constraint. That is, the set of hybrid species identified by equation 5 must include a set of hybrid species that satisfy the upper and lower limits provided in equation 11.However, it should be understood that in other method embodiments the trait factors (e.g., market segmentation factors, etc.) may be different such that the trait factors (e.g., line distribution and/or heterosis diversity) impose a cost and/or penalty on equation 5 (or other suitable algorithm) rather than a strict constraint. It should be further understood that other factors described herein may be provided in the set identification algorithm as strict constraints, as described above with respect to the trait factors (thereby forcing the algorithm to satisfy the constraints).
Further, while the market segmentation is determined and/or considered in a particular manner in the provision in this exemplary embodiment, the market segmentation may be considered and/or provided in other manners in different embodiments (or even omitted as a factor in other embodiments).
Further, as indicated above, equation 5 includes a plurality of different weighting factors, one of which is related to the probability of success λpA weighting factor is associated with male lines
A weighting factor associated with female linesA weighting factor is related to heterosis diversity of male linesAnd a weighting factor is associated with heterosis diversity of female linesAnd the like. It should be understood that the weights are selected by the human breeder to set priority among the different factors related to the weights. Wherein, for example, where line distribution is more important, weighting factors may be applied to increase the cost and/or penalty for deviations from the desired patternAndin addition, the weight or a portion of the weight can be selected based on historical data or the like associated with the line and/or hybrid. In addition, the weight of the trait combination distribution can be determined (see equation 11 above), whereby the weight will provide a penalty or cost for the deviation of the trait combination distribution of the identified set of hybrids from the desired profile, whereby the trait combination distribution will not be strictly constrained.In addition to the specific factors (e.g., performance factors, etc.) described above, risks may also be included as strain costs in one or more of the quadratic and/or mixed integer problems (or potentially as strict constraints in some embodiments). For a given set of hybrids, risk can be modeled as the chance of failure of one or more inbreds or hybrids. In characterizing the risk of a line, selection engine 110 may consider, for example, standability, disease susceptibility, etc., or other traits and/or performance indicators, etc., of the line. Additionally or alternatively, when characterizing the risk of a hybrid, the selection engine 110 can model the hybrid risk by standability, disease susceptibility, and cost of goods, among others. It should be understood that risk can be modeled as a linear cost with a negative coefficient such that the desired set of identified hybrids (e.g., modified in the above quadratic equations (e.g., equations 1-4, etc.) and/or equation 5 to include risk, etc.) will then provide limits and/or constraints on the risk associated with the set of identified hybrids (as compared to other potential sets of hybrids).
As indicated above, specific factors of line distribution, heterosis diversity, and market segmentation are presented for illustrative purposes and are not intended to limit the different permutations of factors that may be included in one or more set identification algorithms. Thus, different permutations of factors and different weights (or no weights) described herein may be employed in other set identification algorithms, which may then be used by the selection engine 110 with different permutations of the factors and different weights (or no weights), where the algorithm may depend on the probability of success of the hybrid, the lines that make up the hybrid, or some other basis for including the hybrid in the set of hybrids to be identified, etc. It should be apparent that other set selection algorithms may be employed in other method embodiments.
However, in the exemplary embodiment, selection engine 110 solves equation 5 in conjunction with other equations to provide for xiThe vector of (a), the vector comprising a "1" to include a hybrid in the set of hybrids and a "0" to exclude the hybrid from the set of hybrids, whereby the hybrid is identified at 310Other hybrid species collection. In the above example, selection engine 110 compares xi∈{0,1}NDetermined as a vector with 100 hybrids associated with the representation of the contained "1". In addition, as shown in fig. 3, the selection engine 110 then enters 312 the set of hybrid seeds into further iterations of the culture and
In addition to the above, data relating to the selection of hybrids by selection engine 110 into a set of hybrids, as well as other data relating to the performance of the set of hybrids, is included in
In view of the foregoing, the systems and methods herein allow for identification of hybrids to be advanced in a breeding pipeline. In particular, as described above, the number of potential hybrids from an inbred line is greatly reduced in a commercial breeding pipeline. In this manner, the expected, inclined, and/or hypothesized role of the breeder is reduced in the process, resulting in more efficient capture of commercially viable hybrids from a variety of potential hybrids. Through the systems and methods disclosed herein, breeders can vastly improve the associated breeding pipeline based on analysis of the vast amount of data relating to hybrids, in order to identify and potentially select these hybrids for advancement, where conventional breeding methods are limited in what can be considered. Further, the systems and methods herein are not limited in any way by geographic or other aspects. For example, if a crop can be grown in a given area, the selection engine 110 herein can be used to identify a set of hybrids for that particular market/environment by weighting data corresponding to certain traits that affect crop performance and/or success in that environment. Such environments may be represented globally or regionally, or it may be as fine grained as a particular location in a field (such that the same field is identified as having different such environments). In this manner, the systems and methods herein may be used for development of products specific to certain markets, geographies, soil types, etc., or for maximizing profit, maximizing customer satisfaction, minimizing production costs, etc.
Accordingly, it should be understood that in some embodiments, the functions described herein may be described as computer-executable instructions stored on a computer-readable medium and executable by one or more processors. The computer readable medium is a non-transitory computer readable medium. By way of example, and not limitation, such computer-readable media can comprise RAM, ROM, EEPROM, CD-ROM or other optical disk storage, magnetic disk storage or other magnetic storage devices, or any other medium that can be used to carry or store desired program code in the form of instructions or data structures and that can be accessed by a computer. Combinations of the above should also be included within the scope of computer-readable media.
It should also be understood that one or more aspects of the present disclosure transform a general-purpose computing device into a special-purpose computing device when configured to perform the functions, methods, and/or processes described herein.
As will be appreciated based on the foregoing specification, the above-described embodiments of the disclosure may be implemented using computer programming or engineering techniques including computer software, firmware, hardware, or any combination or subset thereof, wherein the technical effects may be achieved by performing at least one of the following: (a) accessing a data structure comprising data representative of a pool of hybrid seeds; (b) determining, by at least one computing device, a predicted score for at least a portion of the hybrids included in the hybrid pool based on data included in the data structure, the predicted score indicating a probability of hybrid selection and/or a probability of success of the hybrids based on the historical data; (c) selecting, by the at least one computing device, a cross population from the pool of progeny based on the predicted score; (d) identifying, by the at least one computing device, a set of hybrids from the hybrid population based on the expected performance of the set of hybrids and/or one or more factors associated with the hybrids and/or the lines comprising the hybrids; and (e) allowing the set of hybrids to enter further iterations of a stage of the breeding pipeline and/or enter different stages of the breeding pipeline.
Examples and embodiments are provided so that this disclosure will be thorough and will fully convey the scope to those skilled in the art. Numerous specific details are set forth such as examples of specific components, devices, and methods to provide a thorough understanding of embodiments of the present disclosure. It will be apparent to those skilled in the art that specific details need not be employed, that example embodiments may be embodied in many different forms and that example embodiments should not be construed as limiting the scope of the disclosure. In some example embodiments, well-known processes, well-known device structures, and well-known technologies are not described in detail. Additionally, advantages and improvements that may be realized using one or more of the exemplary embodiments disclosed herein may provide all, none, or all of the above advantages and improvements and still fall within the scope of the present disclosure.
The specific values disclosed herein are exemplary in nature and do not limit the scope of the disclosure. Particular values and particular ranges of values for a given parameter disclosed herein do not preclude other values and ranges of values that may be useful in one or more examples disclosed herein. Further, it is contemplated that any two particular values for a particular parameter described herein may define endpoints that are applicable to a range of values for a given parameter (i.e., disclosure of a first value and a second value for a given parameter may be interpreted to disclose that any value between the first value and the second value may also be used for the given parameter). For example, if parameter X is illustrated herein as having a value a and is also illustrated as having a value Z, it is contemplated that parameter X may have a range of values from about a to about Z. Similarly, it is contemplated that two or more ranges of values for a disclosed parameter (whether such ranges are nested, overlapping, or distinct) encompass all possible combinations of ranges of values that may be claimed using endpoints of the disclosed ranges. For example, if parameter X is exemplified herein as having a value in the range of 1-10 or 2-9 or 3-8, it is also contemplated that parameter X may have other ranges of values, including 1-9, 1-8, 1-3, 1-2, 2-10, 2-8, 2-3, 3-10, and 3-9.
The terminology used herein is for the purpose of describing particular example embodiments only and is not intended to be limiting. As used herein, the singular forms "a", "an" and "the" are intended to include the plural forms as well, unless the context clearly indicates otherwise. The terms "comprises," "comprising," and "having" are inclusive and therefore specify the presence of stated features, integers, steps, operations, elements, and/or components, but do not preclude the presence or addition of one or more other features, integers, steps, operations, elements, components, and/or groups thereof. The method steps, processes, and operations described herein are not to be construed as necessarily requiring their performance in the particular order discussed or illustrated, unless specifically identified as an order of performance. It should also be understood that additional or alternative steps may be employed.
When a feature is referred to as being "on," "engaged to," "connected to," "coupled to," "associated with," "in communication with," or "contained within" another element or layer, the feature may be directly on, engaged to, connected to, or coupled to the other feature, or associated with or in communication with or included in the other feature, or intervening features may be present. As used herein, the term "and/or" includes any and all combinations of one or more of the associated listed items.
Although the terms "first," "second," etc. may be used to describe various features, these features should not be limited by these terms. These terms may be used only to distinguish one feature from another. Terms such as "first," "second," and other numerical terms when used herein do not imply a sequence or order unless clearly indicated by the context. Thus, a first feature discussed herein may be termed a second feature without departing from the teachings of the example embodiments.
The foregoing description of embodiments has been presented for purposes of illustration and description. It is not intended to be exhaustive or to limit the disclosure. Individual elements or features of a particular embodiment are generally not limited to that particular embodiment, but are interchangeable under appropriate circumstances and can be used in a selected embodiment, even if the embodiment is not specifically shown or described. As such, may vary in many ways. Such variations are not to be regarded as a departure from the disclosure, and all such modifications are intended to be included within the scope of the disclosure.
- 上一篇:一种医用注射器针头装配设备
- 下一篇:治疗管理系统、方法和设备