Corn molecule breeding method based on whole genome association analysis and multi-environment prediction model

文档序号:831802 发布日期:2021-03-30 浏览:12次 中文

阅读说明:本技术 一种基于全基因组关联分析和多环境预测模型的玉米分子育种方法 (Corn molecule breeding method based on whole genome association analysis and multi-environment prediction model ) 是由 马娟 曹言勇 于 2020-12-17 设计创作,主要内容包括:本发明公开了一种基于全基因组关联分析和多环境预测模型的玉米分子育种方法,该方法包含表型和基因型缺失数据的处理方法,显著关联SNP表型变异解释率的计算,多环境预测中核函数、不同环境效应和基因型与环境互作效应模型构建,最佳多环境联合预测和最佳SNP预测体系确定以及根据育种值综合选择优良材料的筛选方法。本发明建立了针对多环境数据鉴定和筛选玉米优良材料的方法。相比常规育种,根据基因型选择多环境目标性状高的材料,可实现高效、定向和精确育种。(The invention discloses a corn molecule breeding method based on whole genome association analysis and a multi-environment prediction model, which comprises a method for processing phenotype and genotype deletion data, calculation of obvious association SNP phenotype variation interpretation rate, construction of a kernel function, different environmental effects and genotype and environmental interaction effect model in multi-environment prediction, determination of an optimal multi-environment combined prediction and optimal SNP prediction system and a screening method for comprehensively selecting excellent materials according to breeding values. The invention establishes a method for identifying and screening excellent corn materials aiming at multiple environmental data. Compared with conventional breeding, the method selects materials with high environmental target characters according to genotypes, and can realize efficient, directional and accurate breeding.)

1. A corn molecule breeding method based on whole genome association analysis and multi-environment prediction model is characterized by comprising the following steps:

(1) materials and field trial design

Planting multiple maize germplasm inbred lines in different environments, adopting a random block experimental design in the planting process, selecting ears with good pollination in each cell, and measuring the row grain number and the ear row number of the maize ears;

(2) statistical analysis of phenotypic data

Calculating the best linear unbiased estimation value of the grain number of the ear row and the row number of the ear row, namely BLUE value, for different environments by utilizing the AOV function of QTL IciMapping v 4.0; performing correlation analysis on the line grain number and the ear number of different environments by utilizing correlation coefficient analysis of an Excel data analysis tool;

(3) genotyping and analysis

Performing genotype typing on multiple inbred lines by adopting a GBS simplified sequencing method, comparing to a reference genome by utilizing BWA software, and detecting group SNP by adopting SAMTOOLS software; screening to obtain a plurality of high-quality SNPs for association analysis by taking the deletion rate of less than 0.10, the heterozygosity rate of less than 0.10 and the minimum allele frequency of more than 0.05 as screening standards;

(4) whole genome association analysis

Carrying out whole genome association analysis by utilizing the line grain number and the ear row number of different environments and BLUE environments to screen SNP (single nucleotide polymorphism) obviously associated with a line grain number phenotype and an ear row number phenotype, wherein the methods adopted for carrying out whole genome association analysis are CMLM, MLMM and FarmCPU, and 3 methods all adopt Q + K models;

(5) selection criteria for multi-environment whole genome selection model and multi-environment joint prediction

Because the genotype and phenotype data are both lost, the whole genome prediction needs to be deleted and filled, the filling is carried out by randomly selecting a marker according to the genotype frequency of the known genotype, the filled SNP genotype is coded, the high-frequency homozygous allele type code is 2, the low-frequency homozygous allele type code is 0, the heterozygous genotype code is 1, and the filling and the coding are realized in the R language;

performing multiple interpolation on the phenotype missing value by utilizing a pmm method of an R language mic packet, wherein the number of filling matrixes of the multiple filling method is set to be 5, and the number of iteration times is 50;

the basic model for fitting the dominant effect model, i.e. the MM model, between different environments is as follows:

y=μ1+ZEβE+Zμμ+ε

y=(y1,...,yj...,ys) ' is a vector of observations, yjIs an observed value vector, Z, of an inbred line of the jth environmentEIs an environmental effect betaEDesign matrix of betaEFor a fixed effect, ZμIs a design matrix of genetic effects mu between different environments, is the variance of the dominant effects of the different environments,namely GBLUP core, X is a mark matrix, and p is the number of marks;

the basic model for fitting the model of the main effect and genotype-environment interaction effect among different environments, i.e., the model of the MDs, is as follows:

y=μ1+ZEβE+Zμμ+μe+ε

μ e is a random effect, obedience Is the internal product of the haddaar system,is the variance of genotype-environment interaction, the rest is the same as the MM model, wherein the model also adopts GBLUP nucleus;

respectively selecting different environment combinations by using MM and MDs models to perform multi-environment combined prediction, and setting a high-correlation environment and a low-correlation environment for the multi-environment combined prediction of the different environment combinations according to correlation coefficients among different environments with different numbers of rows and grains of ears;

(6) selection criteria for SNP Density

Comparing the detection efficacy of the three whole genome association analysis methods, selecting a model with the most significant association sites, sorting the models from small to large according to the significance (P value) of the SNP and the trait association calculated by the BLUE value under the model, and respectively selecting the top 500 SNPs, 1000 SNPs, 5000 SNPs, 10000 SNPs, 20000 SNPs, all SNPs and the most significant SNPs with the traits selected by Bonferroni correction to perform multi-environment combined prediction;

all multi-environment joint prediction adopts a 5-time cross validation method, the multi-environment joint prediction is repeated for 100 times, and the correlation coefficient mean value between the breeding value and the true value which are calculated for 100 times is used as the standard for evaluating the multi-environment model prediction accuracy;

(7) screening criteria for good materials

Determining an optimal system for joint prediction of different environments according to prediction accuracy; under an optimal prediction system, utilizing a cbind (fit $ yHat, y) function in BGGE to call breeding values of respective cross-line materials in different environment combinations; and calculating the rank of each material breeding value by using rank.AVG in Excel, calculating the rank average value of each material breeding value, and selecting the first materials with the minimum rank average value as important germplasm.

2. The method according to claim 1, wherein the number of different environments in step (1) is 2-8 environments.

3. The method of claim 1, wherein the random block trial design in step (1) is performed by: setting 3 times of repetition, wherein the cell is a 2-row area, the row spacing is 60cm, and the plant spacing is 25 cm; in each cell, 3 ears with good pollination are selected.

4. The method according to claim 1, wherein the significance threshold set in step (4) according to the Bonferroni correction method is 1.72E-05, at which SNPs with significant associations between ear number and grain number are detected using the FarmCPU, CMLM and MLMM methods.

5. The method according to claim 1, wherein in step (4), the Q value in the Q + K model is calculated by using Structure v2.3.4, the number of subgroups within a population is firstly set to 1-8, the length of burn-in period is set to 5000, the number of Monte Carlo repetitions is set to 50000, the number of iterations of each subgroup is 3, and the Q value at which the number of subgroups is 2 is determined for correlation analysis according to Δ K; k values were calculated using the center _ IBS method of TASSEL v5.0 with the significance threshold set at P-1/58129-1.72E-05; the interpretation rate of phenotypic variation of the CMLM method is given by the software; the PVE of the MLMM and FarmCPU methods is calculated by a linear regression model, and the formula is as follows: y ═ α + β X + epsilon, where Y is the observed value, α is the intercept, β is the slope, X is the marker code (2, 0, 1), epsilon is the random error; calculated by regression modelsPVEs that are MLMM and FarmCPU; wherein the content of the first and second substances,is an estimate of the observed value(s),the average of the observations.

Technical Field

The invention belongs to the field of plant molecular breeding, and particularly relates to a corn molecular breeding method based on whole genome association analysis and a multi-environment prediction model.

Background

The whole genome association analysis is a linkage disequilibrium-based positioning method, and is an important method for analyzing quantitative trait genetic constitution at the whole genome level. The whole genome association analysis does not need to specially construct mapping populations, has the advantages of high resolution and high flux, and is favorable for identifying favorable alleles in the existing germplasm resources.

Using genome-wide association analysis, researchers have explored a number of key sites that control important agronomic traits. However, how these sites are applied to field breeding is an important subject to be solved urgently at present. The molecular marker-assisted selection can only carry out marker-assisted selection on quantitative characters by using coseparation markers with large effect values, and has no effect on a little-effective sites. However, research practice has shown that most of the important agronomic traits are complex quantitative traits and most are controlled by micro-effective polygenes. Genetic improvement of these traits by molecular marker assisted selection has been of little consequence.

The whole genome selection utilizes the genotype and phenotype data modeling of the training population to carry out the phenotype prediction and selection on the breeding population with only genotype, and is a very effective method for improving the complex agronomic traits. Common models include least squares, optimal linear unbiased prediction, and bayesian models, among others. At present, the prediction is mostly based on single characters under a single environment, and the association among different environments is ignored. However, in actual breeding, many years of multi-point tests are often required to evaluate the quality of a single material or variety. If the single-environment model is used for processing, the effects among different environments and the influence of the interaction between the genotype and the environment are ignored, and the performance of the strain or variety cannot be accurately evaluated. Therefore, the corn molecular breeding method based on the whole genome association analysis and the multi-environment selection model is provided, so that excellent materials can be quickly evaluated and screened, and the breeding development is promoted.

Disclosure of Invention

The invention aims to provide a corn molecule breeding method based on whole genome association analysis and a multi-environment prediction model.

The purpose of the invention can be realized by the following technical scheme:

a corn molecule breeding method based on whole genome association analysis and multi-environment prediction model comprises the following steps:

(1) materials and field trial design

Planting multiple maize germplasm inbred lines in different environments, adopting a random block experimental design in the planting process, selecting ears with good pollination in each cell, and measuring the row grain number and the ear row number of the maize ears;

(2) statistical analysis of phenotypic data

Calculating the optimal linear unbiased estimation value of the grain number of the ear row and the row number of the ear row, namely BLUE value for different environments by utilizing the AOV function of QTL IciMapping v4.0, and using the BLUE value and a plurality of environments for whole genome association analysis and multi-environment joint prediction. Performing correlation analysis on the line grain number and the ear number of different environments by utilizing correlation coefficient analysis of an Excel data analysis tool;

(3) genotyping and analysis

Genotyping is carried out on a plurality of inbred lines by adopting a GBS (genotyping by sequencing) simplified sequencing method, and a sequencer is Illumina HiSeq PE150 double-end sequencing; comparing to a reference genome (ftp:// ftp. ensible genomes. org/pub/plants/release-36/fasta/Zea _ mays/dna/Zea _ mays. AGPv4.dna. topple. fa. gz) by using BWA software, and detecting population SNP by using SAMTOOLS software; screening to obtain a plurality of high-quality SNPs for association analysis by taking the deletion rate less than 0.10, the heterozygosity less than 0.10 and the Minimum Allele Frequency (MAF) more than 0.05 as screening standards;

(4) whole genome association analysis

Carrying out genome-wide association analysis and screening of SNPs (single nucleotide polymorphisms) which are significantly associated with a line number phenotype and an ear number phenotype by utilizing the ear number and the line number of different environments and BLUE (BLUE ambient environment), wherein the adopted methods are CMLM (complex mixed linear model), MLMM (multiple localized mixed model) and farm CPU (fixed and random model circulating basic association), and 3 methods all adopt a Q (group structure) + K (genetic relationship) model;

as a detailed technical scheme, the Q value utilizes Structure v23.4, calculating, namely firstly setting the number of subgroups in the group to be 1-8, setting the length of burn-in period to be 5000, setting the number of Monte Carlo repetitions to be 50000, setting the number of iterations of each subgroup to be 3, and determining a Q value for correlation analysis when the number of subgroups is 2 according to delta K; k values were calculated using the center _ IBS method of TASSEL v5.0 with the significance threshold set at P-1/58129-1.72E-05; the phenotypic variation interpretation rate (PVE) of the CMLM method is given by the software; the PVE of the MLMM and FarmCPU methods is calculated by a linear regression model, and the formula is as follows: y ═ α + β X + epsilon, where Y is the phenotype, α is the intercept, β is the slope, X is the marker code (2, 0, 1), epsilon is the random error; calculated by regression modelsPVEs that are MLMM and FarmCPU; wherein the content of the first and second substances,is an estimate of the observed value(s),the average value of the observed values;

(5) selection criteria for multi-environment whole genome selection model and multi-environment joint prediction

Because the genotype and phenotype data are both lost, the whole genome prediction needs to be deleted and filled, the filling is carried out by randomly selecting a marker according to the genotype frequency of the known genotype, the filled SNP genotype is coded, the high-frequency homozygous allele type code is 2, the low-frequency homozygous allele type code is 0, the heterozygous genotype code is 1, and the filling and the coding are realized in the R language;

performing multiple interpolation on the phenotype missing value by utilizing a pmm (predictive mean value matching) method of an R language micro packet, wherein the number of filling matrixes of the multiple filling method is set to be 5, and the number of iteration times is 50;

the basic model for fitting the dominant effect model, i.e. the MM model, between different environments is as follows:

y=μ1+ZEβE+Zμμ+ε

y=(y1,...,yj…,ys) ' is a vector of observations, yjIs the observed value vector of a certain inbred line in the j-th environment. ZEIs an environmental effect betaEDesign matrix of betaEIs a fixed effect. ZμIs a design matrix of genetic effects mu between different environments. Is the variance of the dominant effects of different environments.The GBLUP nucleus. X is the label matrix and p is the number of labels.

The basic model for fitting the model of the main effect and genotype-environment interaction effect among different environments, i.e., the model of the MDs, is as follows:

y=μ1+ZEβE+Zμμ+μe+ε

μ e is a random effect, obedienceIs the Haddamar inner product.Is the variance of genotype-environment interactions. The rest is the same as the MM model. Among these, the model also employs the GBLUP kernel.

Respectively selecting different environment combinations by using MM and MDs models to perform multi-environment combined prediction, and setting a high-correlation environment and a low-correlation environment for the multi-environment combined prediction of the different environment combinations according to correlation coefficients among different environments with different numbers of rows and grains of ears;

when multi-environment joint prediction is performed, the number of the selected different environment combinations is determined according to the number of the selected different environments in the step (1), for example, two environments, three environments, four environments and five environments are respectively selected for joint prediction when the step (1) selects the BLUE environments for planting multiple maize germplasm inbred lines in four different environments and calculating 4 different environments. And setting a high-correlation environment and a low-correlation environment for the joint prediction of the two environments, the three environments and the four environments.

(6) Selection criteria for SNP Density

And comparing the detection efficacies of the three whole genome association analysis methods, and determining the model with the most obvious association sites. According to the significance (P value) of the SNP and trait association calculated by the BLUE value under the model, the first 500 SNPs, 1000 SNPs, 5000 SNPs, 10000 SNPs, 20000 SNPs and all SNPs (58129) with the most significant traits and selected by Bonferroni correction are respectively selected from the top 500 SNPs, 1000 SNPs, 5000 SNPs, 10000 SNPs, 20000 SNPs and 21 SNPs with significant association with the grain number and the ear number in the line number from small to large for multi-environment combined prediction.

All multi-environment joint prediction adopts a 5-time cross validation method, the multi-environment joint prediction is repeated for 100 times, and the correlation coefficient mean value between the breeding value and the true value which are repeatedly calculated for 100 times is used as the standard for evaluating the prediction accuracy; the model is realized by using a BGGE packet of R, the iteration time is 15000 times, the burn-in time is 5000 times, and the thin is set to be 1.

(7) Screening criteria for good materials

And determining an optimal system for joint prediction of different environments according to the prediction accuracy. Under an optimal prediction system, a cbind (fit $ yHat, y) function in BGGE is used for calling different environments to jointly predict breeding values of respective cross-line materials. And calculating the rank of the breeding value of each material by using rank.AVG in Excel, calculating the rank average value of each material, and selecting the first materials with the minimum rank average value as important germplasm.

Preferably, the number of different environments in the step (1) is 2-8 environments.

Preferably, the random block experiment design method in step (1) is as follows: setting 3 times of repetition, wherein the cell is a 2-row area, the row spacing is 60cm, and the plant spacing is 25 cm; in each cell, 3 ears with good pollination are selected.

Preferably, the significance threshold set in step (4) according to the Bonferroni correction method is 1.72E-05, and SNPs with significant association between the number of ears and the number of grains per row are detected by the FarmCPU, CMLM and MLMM methods at the threshold.

The invention has the beneficial effects that:

compared with conventional breeding, the method selects materials with high target properties according to genotypes, and can realize efficient, directional and accurate breeding.

Drawings

FIG. 1 shows the accuracy of MDs multi-environment model to predict mark densities with different line grain numbers.

FIG. 2 is a graph of the accuracy of MM multi-environment models in predicting marking densities with different line grain numbers.

FIG. 3 shows the accuracy of prediction of different mark densities of the ear row number by the MDs multi-environment model.

FIG. 4 shows the prediction accuracy of different mark densities of MM models for predicting the number of rows of spikes in different multi-environments.

FIG. 5 shows the rank, the mean of rank and the mean of breeding value estimated by the line-size-count multi-environment optimal prediction system.

FIG. 6 shows the rank, the mean of rank and the mean of breeding value estimated by the optimal prediction system with multiple rows per ear.

Detailed Description

The present invention will be described in detail with reference to specific examples. From the following description and these examples, one skilled in the art can ascertain the essential characteristics of this invention, and without departing from the spirit and scope thereof, can make various changes and modifications of the invention to adapt it to various usages and conditions.

Example 1

1 materials and methods

1.1 materials and field test design

The related groups comprise 309 parts of Huang-Huai-Hai backbone inbred lines, domestic core germplasm and American GEM and the like. The seedlings were planted in the Shangqiu Yu city, Xinxiang Yuyang and Hainan san in 2017. Only the original sun is planted in 2019. A randomized block trial design was used, 3 replicates. The plot is a 2-row plot, the row spacing is 60cm, and the plant spacing is 25 cm. In each cell, 3 ears with good pollination are selected, and the row number and the grain number of the ears are measured.

1.2 statistical analysis of phenotypic data

And calculating the best linear unbiased estimation value of the ear row number and the grain number, namely BLUE value, of the 2017 Yu city, 2017 Mitsu, 2017 Yuyang and 2019 Yuyang by utilizing the AOV function of QTL Isimulping v 4.0. Both BLUE values and 4 environments were used for genome-wide association analysis and multi-environment prediction. And (4) carrying out correlation analysis on the line grain number and the ear number of different environments by utilizing correlation coefficient analysis of an Excel data analysis tool.

1.3 genotype identification and analysis

A GBS (genotyping by sequencing) simplified sequencing method is adopted to genotype 309 inbred lines, and an Illumina HiSeq PE150 double-end sequencing instrument is adopted as a sequencing instrument. The reference genome was aligned using BWA software (ftp:// ftp. ensible genomes. org/pub/plants/release-36/fasta/Zea _ mays/dna/Zea _ mays. AGPv4.dna. topple. fa. gz). SAMTOOLS software was used to perform population SNP detection. A total of 58129 high quality SNPs were obtained for association analysis using a deletion rate of less than 0.10, a heterozygosity rate of less than 0.10, and a Minimum Allele Frequency (MAF) of greater than 0.05 as screening criteria.

1.4 Whole genome Association analysis

And (3) performing whole genome association analysis by using the ear row number and the row grain number of the Yucheng 2017, the Yuyang 2019, the Mitsui 2017 and the BLUE value, and screening the SNP obviously associated with the row grain number phenotype and the ear row number phenotype. The methods used for genome-wide association analysis are CMLM (compressed mixed linear model), MLMM (multiple localized mixed model) and farmCPU (fixed and random model circulating basic association). The 3 methods all use Q (population structure) + K (genetic relationship) models.

The Q value is calculated using Structure v2.3.4. Firstly, setting the number of subgroups in a group to be 1-8, setting the length of burn-in period to be 5000, setting the number of Monte Carlo repetitions to be 50000, and setting the number of iterations of each subgroup to be 3. From Δ K, the Q value at a subpopulation number of 2 was determined for correlation analysis. K values were calculated using the center _ IBS method of TASSEL v 5.0. Since the CMLM method is a single-site detection method, multiple tests are required to determine significant association thresholds. The FarmCPU and MLMM methods are all multi-site detection methods and do not require multiple tests. Therefore, the medium Bonferroni correction method was chosen, i.e. the significance threshold was set to P-1/58129-1.72E-05.

The phenotypic variation interpretation rate (PVE) of the CMLM method is given by the software. The PVE of the MLMM and FarmCPU methods is calculated by a linear regression model, and the formula is as follows: y ═ α + β X + epsilon, where Y is the phenotype, α is the intercept, β is the slope, X is the marker code (high frequency homozygous allele code is 2, low frequency homozygous allele code is 0, heterozygous genotype code is 1), epsilon is the random error. Calculated by linear regression modelPVEs that are MLMM and FarmCPU; wherein the content of the first and second substances,is an estimate of the observed value(s),the average value of the observed values;

1.5 selection criteria for Multi-environmental Whole genome selection model and Multi-environmental Joint prediction

Because the genotype and phenotype data are both deleted, deletion filling is needed when whole genome prediction is carried out. Markers were randomly selected to fill in based on genotype frequency for known genotypes. And coding the filled SNP genotype, wherein the high-frequency homozygous allele type code is 2, the low-frequency homozygous allele type code is 0, and the heterozygous genotype code is 1. Both padding and encoding are implemented in the R language.

The phenotype missing value is subjected to multiple interpolation by utilizing a pmm (predictive mean value matching) method of an R language micro packet, the number of filling matrixes of the multiple interpolation method is set to be 5, and the number of iterations is 50.

And (3) comparing the detection efficacies of the three whole genome association analysis methods, and selecting the model with the most obvious association sites. According to the significance (P value) of SNP and character association calculated by BLUE value under the model, the first 500 SNPs, 1000 SNPs, 5000 SNPs, 10000 SNPs, 20000 SNPs, all SNPs (58129) and the most significant SNPs (5 SNPs with significant grain number and 21 SNPs with significant ear number) of characters selected by Bonferroni correction are selected from the small to large sequence for two-environment, three-environment, four-environment and five-environment combined prediction. The model adopted is a model for fitting the main effect (MM) among different environments and the main effect and genotype-environment interaction effect (MDs) among different environments, and comprises the following specific steps:

MM model: y ═ μ 1+ ZEβE+Zμμ+ε

y=(y1,...,yj...,ys) ' is a vector of observations, yjIs the observed value vector of a certain inbred line in the j-th environment. ZEIs an environmental effect betaEDesign matrix of betaEIs a fixed effect. ZμIs a design matrix of genetic effects mu between different environments. Is the variance of the dominant effects of different environments.The GBLUP nucleus. X is the label matrix and p is the number of labels.

MDs model: mu is1+ZEβE+Zμμ+μe+ε

μ e is a random effect, obedienceIs the Haddamar inner product.Is the variance of genotype-environment interactions. The rest is the same as the MM model. Among these, the model also employs the GBLUP kernel.

And respectively selecting two environments, three environments, four environments and five environments by using the MM and MDs models for joint prediction. And according to the correlation coefficients of different environments of the ear row number and the row grain number, setting a high correlation environment and a low correlation environment for the combined prediction of the two environments, the three environments and the four environments.

1.6 selection criteria for SNP Density

And comparing the detection efficacies of the three whole genome association analysis methods, and determining the method for detecting the most significant association sites. According to the significance (P value) of the SNP and trait association calculated by the BLUE value under the method, the first 500 SNPs, 1000 SNPs, 5000 SNPs, 10000 SNPs, 20000 SNPs and all SNPs (58129) with the most significant traits and selected by Bonferroni correction are respectively selected from the top 500 SNPs, 1000 SNPs, 5000 SNPs, 10000 SNPs, 20000 SNPs and 21 SNPs with significant association with the grain number and the ear number in the line order from small to large, and the multi-environment combined prediction is carried out.

309 material were divided into training and prediction sets using 5-fold cross validation and repeated 100 times. And taking the mean value of the correlation coefficients between the breeding values and the observed values estimated 100 times in the prediction set as a standard for evaluating the prediction accuracy. The model is realized by using a BGGE packet of R, the iteration time is 15000 times, the burn-in time is 5000 times, and the thin is set to be 1.

1.7 standard for comprehensive screening of excellent materials by multi-environment joint prediction

And determining an optimal system for joint prediction of different environments according to the prediction accuracy. Under an optimal prediction system, the breeding values of the cross-line materials of different environments are jointly predicted by using a cbind (fit $ yHat, y) function in BGGE (fit is an MM or MDs model, yHat is an estimated breeding value, and y is an actual observed value). And calculating the rank of the breeding value of each material by using rank.AVG in Excel, calculating the rank average value of each material, and selecting the first materials with the minimum rank average value as important germplasm.

2 results and analysis

2.1 correlation between grain size and ear row number in rows between different environments

The BLUE value is the best estimate of the mean of the 4 original environments, and the correlation with the 4 environments is high (tables 1 and 2). The correlation between the line grain number BLUE and 2017 Yu is highest (r is 0.79), followed by 2017 trinitron (r is 0.70), 2017 original yang (r is 0.69), and 2019 original yang is lowest (r is 0.63). The number of rows per ear BLUE has the highest correlation with 2019 original yang (r is 0.73) and the lowest correlation with 2017 Yu (r is 0.61). Both traits showed low correlation between 4 original environments (r ═ 0.19-0.46).

2.2 Whole genome Association analysis results

According to the Bonferroni correction method, the significance threshold was set at 1.72E-05. At this threshold, a total of 5 row size number significantly associated SNPs (P <1.72E-05) were detected using the FarmCPU, CMLM and MLMM methods (table 3). Wherein S1_173095105, S5_127421583, S2_35077012, and S2_35076923 were all detected in 3 models. 4 SNPs account for 5.3% -9.0% of the variation in the line size phenotype. A total of 21 significant SNPs were detected (P <1.72E-05), with 18 significant SNPs detected by FarmCPU and 3 significant SNPs detected by MLMM (table 4). Among them, S8_71716395, S9_10867079 and S9_107695183 explain that the phenotypic variation rate of the panicle number is higher, and is 9.18%, 8.65% and 9.20%, respectively.

2.3 optimal prediction system for two-environment joint prediction

Since the FarmCPU model detects the largest number of significant sites, the top 500 SNPs, 1000 SNPs, 5000 SNPs, 10000 SNPs, 20000 SNPs, all SNPs (58129), and the most significant SNPs for traits selected by Bonferroni correction (5 significant SNPs for grain number and 21 significant SNPs for ear number) were selected for two-environment combined prediction by ranking the significance (P value) of the association of SNPs with traits calculated from the BLUE value under the model from small to large.

According to the results of the line-particle correlation analysis (table 1), BLUE has the highest correlation with 2017 Yu, 0.79 (table 1); 2017 Yu and 2017 Yuanyang are the two environments with the highest correlation (0.46) of the original environments. Therefore, the column selection 2017 Yu +2017 Yuan (low correlation environment) and 2017 Yu + BLUE (high correlation environment) are used to perform two-environment joint prediction. Similarly, the two environments selected as the number of rows of ears are 2017 Yuan +2019 Yuan (low correlation environment) and 2019 Yuan + BLUE (high correlation environment), respectively.

The prediction accuracy of the MM model for the ear row number and the row grain number is higher than that of the MDs model except for 21 SNPs in the ear row number (FIGS. 1-4). The prediction accuracy of two environments with high correlation of two traits is higher than that of two environments with low correlation (fig. 1-4). The prediction accuracy of the 500-20000 SNPs with obvious selection is higher than that of all SNPs. Compared with other marker densities, the 5 SNPs with the most significant number of grains per panicle and the 21 SNPs with the most significant number of lines per panicle were the lowest in prediction accuracy, 0.14-10.19 and 0.34-0.40, respectively (fig. 1-fig. 2). In the MM model of the corn poppy + BLUE environment in 2017, the 1000 most significant SNPs selected by the row size number have the highest prediction accuracy (0.58) (fig. 2), which is the optimal prediction system for the two environments of the row size number. The number of rows per ear is in the MM model of 2019 original sun + BLUE environment, and the most significant 10000 SNPs with the highest prediction accuracy (0.62) are selected as the optimal prediction system of the two environments of the number of rows per ear (fig. 4).

2.3 optimal prediction system for three, four and five environment joint prediction

Different environment selection and marker density selection criteria are predicted jointly with the two environments. In both environments, the MM models in the three, four and five environments all predicted more accurately than the MDs models, except for the 5 and 21 most significant SNPs (fig. 1-4). Moreover, the accuracy of the joint prediction of the high-correlation environment in the three-environment and the four-environment is higher than that of the low-correlation environment prediction in the two-environment joint prediction. In the three-environment prediction model, under the 2017 Yuanyang +2017 Yucheng + BLUE environment MM model, the most significant 5000 SNPs have the best effect of predicting the number of grains in a row, and the prediction accuracy is 0.60 (FIG. 2). The prediction effect of 500 SNPs in the MM model is the best by joint prediction of the number of rows per ear using 2017 Yuyang, 2017 Yu City and BLUE environment, and the accuracy is 0.58 (FIG. 4).

In the four-environment combined prediction, the line grain number and the ear row number are the best by using the MM model of the environment of 2017 Yuyang +2017 Yu City +2019 Yuyang + BLUE (fig. 2 and 4). The optimal marker density for the number of grains in line was 5000 significant SNPs with an accuracy of 0.55 (fig. 2). Whereas the optimal marker density for ear row number was 500 significant SNPs with an accuracy of 0.54 (fig. 4).

In the combined prediction of five environments, the optimal SNP density of the MM model of the grain number of grains in rows and the optimal SNP density of the MM model of the ear number of grains in rows are 5000 and 500 respectively, and the prediction accuracy is 0.55 and 0.49 respectively (FIG. 2 and FIG. 4).

2.4 materials with higher number of rows and lines of grains selected by multi-environment model optimal prediction system

According to an optimal system of combined prediction of a plurality of different environments, breeding values of 309 self-bred lines in two-environment, three-environment, four-environment and five-environment optimal prediction systems are respectively called by using a cbind (fit $ yHat, y) function of BGGE. And calculating the rank of each material breeding value by using RANK.AVG in Excel according to the breeding values, calculating the rank average value of each material breeding value, and selecting the first 20 materials with the minimum rank average value for displaying. Ranks, mean ranks and mean values of breeding values of the first 20 inbred lines with different numbers of rows and lines and environmental forecast breeding values are shown in fig. 5 and 6. The top 5 materials selected by row size were L10, L8, L20, L9 and L248, with row size between 24.09-24.63 (fig. 5), based on rank average of breeding values. The top 5 materials selected by the row number for ears on rank average were L85, L18, L101, L121, and L96, with row number for ears between 14.03-15.57 (FIG. 6). The materials can be used as important germplasm to carry out further breeding research work.

TABLE 1 correlation coefficient between different environments of the number of grains in the row

TABLE 2 correlation coefficient between different numbers of rows per ear

TABLE 3 number of row grains significantly associated SNPs detected by CMLM, MLMM and FarmCPU in different environments

TABLE 4 different environments MLMM and FarmCPU detected significant correlation SNP of number of ears

13页详细技术资料下载
上一篇:一种医用注射器针头装配设备
下一篇:一种基因定点敲入载体构建方法、系统及平台

网友询问留言

已有0条留言

还没有人留言评论。精彩留言会获得点赞!

精彩留言,会给你点赞!