Research Article

 

Yield response of winter wheat cultivars to environments modeled by different variance-covariance structures in linear mixed models

 

Marcin Studnicki

Warsaw University of Life Sciences, Department of Experimental Design and Bioinformatics. Nowoursynowska 159. 02-776 Warsaw. Poland.

Wiesław Mądry

Warsaw University of Life Sciences, Department of Experimental Design and Bioinformatics. Nowoursynowska 159. 02-776 Warsaw. Poland.

Kinga Noras

Warsaw University of Life Sciences, Department of Experimental Design and Bioinformatics. Nowoursynowska 159. 02-776 Warsaw. Poland.

Elżbieta Wójcik-Gront

Warsaw University of Life Sciences, Department of Experimental Design and Bioinformatics. Nowoursynowska 159. 02-776 Warsaw. Poland.

Edward Gacek

Research Center for Cultivar Testing (COBORU). 63-022 Słupia Wielka, Poland

 

Abstract

The main objectives of multi-environmental trials (METs) are to assess cultivar adaptation patterns under different environmental conditions and to investigate genotype by environment (G×E) interactions. Linear mixed models (LMMs) with more complex variance-covariance structures have become recognized and widely used for analyzing METs data. Best practice in METs analysis is to carry out a comparison of competing models with different variance-covariance structures. Improperly chosen variance-covariance structures may lead to biased estimation of means resulting in incorrect conclusions. In this work we focused on adaptive response of cultivars on the environments modeled by the LMMs with different variance-covariance structures. We identified possible limitations of inference when using an inadequate variance-covariance structure. In the presented study we used the dataset on grain yield for 63 winter wheat cultivars, evaluated across 18 locations, during three growing seasons (2008/2009−2010/2011) from the Polish Post-registration Variety Testing System. For the evaluation of variance-covariance structures and the description of cultivars adaptation to environments, we calculated adjusted means for the combination of cultivar and location in models with different variance-covariance structures. We concluded that in order to fully describe cultivars adaptive patterns modelers should use the unrestricted variance-covariance structure. The restricted compound symmetry structure may interfere with proper interpretation of cultivars adaptive patterns. We found, that the factor-analytic structure is also a good tool to describe cultivars reaction on environments, and it can be successfully used in METs data after determining the optimal component number for each dataset.

Additional key words: adaptability patterns; factor analytic; multi-environmental trials; unbalanced dataset; winter wheat.

Abbreviations used: AIC (Bayesian information criterion); AMMI (additive main effects and multiplicative interaction); BIC (Akaike’s information criterion); BLUE (best linear unbiased estimation); BLUP (best linear unbiased prediction); CS (compound symmetry matrix); FA (factor-analytic matrix); G (cultivar); G×E (genotype – environment interactions); GGE (genotype main effects and G×E interaction effects); L (location); LMM (linear mixed model); LS (least squares means); MET (multi-environmental trial); PVTS (Polish Post-Registration Variety Testing System); UN (unstructured matrix); Y (year).

Authors’ contributions: Conception and design of the study: MS, WM and KN. Data collection: KN and EG. Analysis and interpretation of data: MS. The paper written by: MS and EWG.

Citation: Studnicki, M.; Mądry, W.; Noras, K.; Wójcik-Gront, E.; Gacek, E. (2016). Yield response of winter wheat cultivars to environments modeled by different variance-covariance structures in linear mixed models. Spanish Journal of Agricultural Research, Volume 14, Issue 2, e0703http://dx.doi.org/10.5424/sjar/2016142-8737.

Received: 04 Oct 2015. Accepted: 27 Apr 2016

Copyright © 2016 INIA. This is an open access article distributed under the Creative Commons Attribution License (CC by 3.0), which permits unrestricted use, distribution, and reproduction in any medium, provided the original work is properly cited.

Funding: The authors received no specific funding for this work.

Competing interests: The authors have declared that no competing interests exist.

Correspondence should be addressed to Marcin Studnicki: marcin_studnicki@sggw.pl


 

CONTENTS

Abstract

Introduction

Material and methods

Results

Discussion

References

IntroductionTop

The main objectives of multi-environmental trials (METs) are assessing cultivar adaptation patterns under different environmental conditions (locations) and investigate genotype – environment (G×E) interactions. The common approach used to analyze multi-environmental trials is the classical analysis of variance (ANOVA). However, the classical ANOVA models are restricted by assumptions regarding variances of experimental effects and experimental error (Crossa et al., 2010; Hu et al., 2013). ANOVA models assume homogeneity of variances and do not include covariances for random effects, which is often unrealistic in case of multi-environmental trials (So & Edwards, 2009; Hu & Spilke, 2011).

Lately, linear mixed models (LMMs) have become recognized and widely used for analyzing METs data (Smith et al., 2005; Yang, 2010). These models are especially useful in the analysis of incomplete datasets. In LMMs, an asset is a chance to use more complex variance-covariance structures for describing the G×E interaction effects. The most realistic one is unstructured (UN) variance-covariance matrix (Meyer, 2009). This variance-covariance matrix is the most liberal. Each parameter (variance or covariance) in the matrix is different and is estimated uniquely from the data (Littell et al., 2006; Gilmour et al., 2009; Hu & Spilke, 2011). The disadvantage of the UN structure in comparison to other structures is the large number of parameters to estimate. This can contribute to complexity of numerical calculations. On the other end of the spectrum, we have the compound symmetry (CS) structure, which is the most restrictive variance-covariance matrix for LMM. This structure assumes equal variances and equal covariances.

Recently, researchers more and more often suggest using factor-analytic (FA) structure for the METs data (Piepho, 1997, 1998; Kelly et al., 2007; Meyer, 2009; Stefanova & Buirchell, 2010; Burgueño et al., 2011). That variance-covariance matrix also offers the modeler a flexibility comparable to the UN structure but with a lower number of parameters to estimate. The FA model can be regarded as the mixed model equivalent of the additive main effects and multiplicative interaction (AMMI) model with similar fixed-effects as genotype main effects and G×E interaction effects (GGE). In contrast to AMMI model, which is based on principal components analysis via singular value decomposition, the factor-analytic model is based on factor analysis with a Cholesky factorization (Smith et al., 2001; Burgueño et al., 2011). The factor-analytic model depends on the decomposition of an unstructured variance-covariance matrix.

In case of the analysis of the METs data with LMMs, it is always recommended to carry out a comparison of the models with different variance-covariance structures first (So & Edwards, 2009; Burgueño et al., 2011). The second step is to use the selected model for proper analysis and evaluation of the cultivars. Comparison of the models involves assessment of the goodness of fit and the prediction accuracy. Several studies have evaluated the models with different variance-covariance structures (including FA). Piepho (1998) used and evaluated LMMs with FA structure starting the analysis with one component and ending with seven. He recommended using the two-component FA structure. The model was later used by Kelly et al. (2007) and Burgueño et al. (2011). Hu & Spilke (2011) compared LMMs for five variance-covariance structures, including the FA structure with one component, and concluded that the FA variance-covariance structure is more accurate than the classical ANOVA and numerically less complicated than the UN structure.

The LMMs analysis for METs data are mainly used to assess the yield adaptability of cultivars, through the estimation of the adjusted means conducted as linear combinations of appropriate BLUP (best linear unbiased prediction) and BLUE (best linear unbiased estimation) effects. Adjusted means of yield for genotype and environment combinations obtained from LMM are then used to analyze the G×E interaction effects by AMMI model, GGE plot or pattern analysis. Improperly chosen variance-covariance structures may lead to the biased estimation of adjusted means. Based on the incorrectly estimated adjusted means, G×E interaction analysis may result in incorrect conclusions. So far, other studies were focused on measures of the goodness of fit or the accuracy of prediction for LMMs with different variance-covariance structures. However, it may also be important to assess the G×E interactions in models with different variance-covariance structures. Therefore, in this work we focused on adaptive response of cultivars on the environments modeled by the LMMs with different variance-covariance structures. We also tried to identify possible limitations of inference when using an inadequate variance-covariance structure.

Material and methodsTop

Multi environmental trial dataset

The Polish Post-Registration Variety Testing System (PVTS) (Mądry et al., 2011) was established to evaluate the yield and other related traits of newly released cultivars through METs. One of the most common crop tested through METs is winter wheat (Triticum aestivum L.). The tests facilitate reliable recommendation to Polish farmers on cultivars which appear to be the most adapted to varied agricultural environments. Unfortunately, in most cases the trials lead to an unbalanced (incomplete) matrix of the number of cultivars, trial locations and years. The majority of cultivars had been tested for 2-3 years. Consequently, the raw plot data collected for grain yield (or other traits) were arranged according to cultivar (G), location (L), and year (Y). The trials were led as a split-block design. In the present study we used a dataset on grain yield for 63 winter wheat cultivars, evaluated across 18 locations, during 3 growing seasons from 2008/2009 to 2010/2011. In the raw data three-way table, 2327 combinations (cells) were filled, that represent 68.4 % of the balanced classification.

Linear mixed models for METs

We analysed raw data for grain yield using a two-stage approach (Smith et al., 2005; Möhring & Piepho, 2009; Welham et al., 2010; Piepho et al., 2012). In the first stage, raw data were analysed for each trial (each location-year combination) separately using the ANOVA mixed-model for a split-block design. In the analysis, we assumed cultivars to be fixed effects and blocks to be random (Spilke et al., 2005). The least squares (LS) means for cultivars were estimated. Then, the LS means were combined across trials and years to obtain an unbalanced G×L×Y data table. In the second stage, we performed the combined analysis of the means in the unbalanced G×L×Y table based on the linear mixed model:

where Xijk is the LS mean of yield for the 3-factorial combination of the k-th year, the i-th (i=1,2, …, I) location, the j-th cultivar; m is the general mean; Li is the fixed main effect of the i-th location; G(L)ij is the random effect of the j-th cultivar within the i-th location; Yk is the random main effect of the k-th year; YLik is the random interaction effect of the k-th year and the i-th location; YG(L) ijk is the random interaction effect of the j-th cultivar within the i-th location and the k-th year; and is the random residual effect of the mean experimental error.

We used models with homogeneous and heterogeneous variance-covariance structures for cultivar within environment G(L) effects. The random effects of G(L)ij were modelled with CS, UN and FA structures. The FA structure for the presented linear mixed model was fitted with different number of components, from one component, FA1, to seventeen components, FA17. The linear mixed models with different variance-covariance structures were fitted using ASReml-R (Gilmour et al., 2009).

The popular method used to choose the best variance-covariance structure is based on information criteria, such as Akaike’s information criterion (AIC) or Bayesian information criterion (BIC) (So & Edwards, 2009; Hu & Spilke, 2011). It depends on evaluating the predictive accuracy of the effects (Piepho, 1998; Kelly et al., 2007). In our study, the classical evaluation of the variance-covariance structures was based on the AIC. This is a commonly used criterion to choose the best variance-covariance structure in linear mixed model for METs. The AIC was obtained using the following formula:

where LogL is the logarithm of maximum restricted likelihood of the model; and p is the number of estimated variance-covariance parameters in the model. The smaller AIC values, the better fit of the variance-covariance structure. The expression “–2 × LogL” is called “deviance” and is also used in calculating other information criteria (e.g. BIC) and likelihood ratio tests.

In the cultivar selection and recommendation process, the cultivar ranking (Roostaei et al., 2014) and its compatibility across locations (Hu, 2015), are very important. This is why in this research we analyzed the Spearman rank correlation. We calculated the correlation coefficients for cultivars effects G and cultivars within the locations G(L) effects estimators from the models with difference variance-covariance structures. Inconsistency of cultivars ranking between the models informs us which model leads to an inaccurate evaluation of cultivars adaptability patterns.

For the description of the cultivars adaptation to the agricultural environments, we calculated adjusted means for cultivar and location G(L) combination in all compared models with different variance-covariance structures. The adjusted means were calculated as combinations of appropriate BLUPs or/and BLUEs effects, using an algorithm described by Welham et al. (2004). The modeled adjusted means for all considered cultivars across 18 locations using different variance-covariance structures are presented in Figs. 1-2. Additionally, in the results we indicated two winter wheat cultivars characterized by wide environmental adaptation ('Mulan' and 'Smaragd'), bred in Germany.

Figure 1. The winter wheat cultivars adaptive yield response patterns across eighteen environments modeled with compound symmetry (a) and unstructured (b) variance-covariance structure. Environmental means, dashed line; 'Mulan' cultivar, blue line; 'Smaragd' cultivar, red line.

Figure 2. The winter wheat cultivars adaptive yield response patterns across eighteen environments modeled with FA variance-covariance structure: a) FA1, b) FA2, c) FA3, d) FA4. Environmental means, dashed line; 'Mulan' cultivar, blue line; 'Smaragd' cultivar, red line.

ResultsTop

The lowest AIC value was observed for the model with FA2 variance-covariance structure for G(L) effects (Table 1). The FA structure with three components was the second best. The next lowest levels of AIC were observed in FA structures with one, four and five components. In contrast, the highest value of AIC was found for an UN variance-covariance matrix, indicating interior fit. The model with this variance-covariance structure has the lowest value of the logarithm of the restricted likelihood ratio LogL. For the model with compound symmetry structure we obtained the highest value of LogL. The LogL value is not in constant decrease with increasing number of components for models with FA variance-covariance structure.


Table 1. Akaike information criterion and restricted log-likelihood values for models with different variance-covariance structures fitted to winter wheat dataset.


We observed significant and positive correlation for cultivar effects between all variance-covariance structures (Table 2). The results with the UN structure were highly correlated with ones obtained using the FA4 structure. They showed that the rankings of cultivar effects were compatible. The correlation calculated among G and G(L) effects using the FA structure with smaller number of components and the UN structure were weak. The results acquired for the CS structure were moderately correlated with other variance-covariance structures. This indicates that the cultivar rankings were not the same. In case of cultivars within location effects we observed similar patterns. However, the values of Spearman correlation coefficients were slightly lower than for cultivars effects. The use of CS model or FA with smaller number of components models can contribute to erroneous selection of superior cultivars.


Table 2. Spearman rank correlations matrix for main cultivars effects (lower triangular matrix) and cultivar within the locations effects (upper triangular matrix) estimates using difference variance-covariance structures.


In the Fig. 1a, we present cultivars yield response across 18 agricultural environments (locations) modeled with the CS variance-covariance structure. The yields of cvs. 'Mulan' and 'Smaragd' modeled using this structure were on the first place in all tested locations. The modeled locations range (max-min) of cultivars’ grain yield stayed on a similar level, amounting to about 0.80 t/ha, in all tested locations. The minimum yield was 0.63 t/ha for location L6 and maximum 0.98 t/ha for L18 (Table 3).


Table 3. Minimal, maximum and range values for the cultivars adjusted yield means (t/ha) for all tested environments modeled by differences variance-covariance structures.


The cultivar yield response across environments modeled by linear mixed model with unstructured variance-covariance matrix is presented in the Fig. 1b. Contrary to cultivars response on environments modeled by the CS structure, the UN structure diversifies the ranges of yield values in tested environments. For this variance-covariance structure we observed that one of the environments had exceptionally greater diversity of cultivars yield than others. The modeled yields of cvs. 'Mulan' and 'Smaragd' were above the mean in all environments but, in most cases, they were not the highest. The modeled ranges of cultivars yields in different locations were from 0.69 t/ha for L6 to 2.44 t/ha or L5. There are environments where the range of grain yield values is three times greater than in other tested environments (Fig. 1b).

Fig. 2 presents reaction of cultivars to the tested environments modeled with an FA structure for the first four components. Particularly for the model with FA1 we observe that some environments are characterized by a very small difference in cultivars yield. The range of yield values is around only 0.20 t/ha, e.g. for location L10 their yield difference was 0.16 t/ha. For this model in environment L5 cultivars yield difference was equal to 2.30 t/ha, minimum yield of 5.50 t/ha, and the maximum one –7.84 t/ha (Table 3). In most locations the cultivars yields on average differed by about 0.40 t/ha. For FA1 structure, 'Mulan' and 'Smaragd' cultivars had yield below environmental mean in some locations. It might suggest that these cultivars were characterized by narrow adaptation.

With increasing number of components for the FA models, the cultivars adaptive yield response patterns across eighteen environments become slightly more similar to the results of the model with UN structure. For FA4, the modeled range of cultivar yields in different locations were from 0.65 t/ha for location L_6 to 2.37 t/ha for L_5. These results were compatible with the UN structure. For the FA model with more than four components the cultivars response to environments did not change much (Table 3, figure not presented).

DiscussionTop

The adaptive yield response of winter wheat cultivars on environments modeled by the CS structure suggests a lack of interaction. In Fig. 1 we observed almost parallel lines of cultivars reaction on different environments, so the yield diversity of the tested cultivars was the same across environments according to the model. The results are far away from the reality. It is very rare that all environments have equal discrimination power for cultivars. In reality there are always differences in cultivar yield reactions to different environments. Thus, evaluation of cultivar adaptability to environments using the CS model is incomplete and may lead to erroneous conclusions. Especially, the ranking for cultivars and cultivars within the locations from the CS structure was not consistent with those from other variance-covariance structures. Other empirical studies also suggest poor performance of the models with CS structure (So & Edwards, 2009; Hu & Spilke, 2011).

A completely different situation occurs when using the UN structure. With the help of this model we can discern environments where cultivars yields were very different from the ones with low discriminatory power. When the profiles for the cultivars reaction to the environment are not parallel, this indicates significant G×E interaction. Using this structure allows us to evaluate both qualitative and quantitative G×E interactions. The UN structure has a number of numerical and computational problems, inter alia predictions may be characterized by lower accuracy. Nevertheless, cultivars yield reaction to environments described by the UN structure more realistically describes G×E interactions in comparison to the CS structure.

Analysis with the FA structure with an optimal number of components gives the same results as the UN structure. But it is free of most of the UN structure problems. The model for PVTS data, which also describes well the reaction of varieties to the environments is the FA4 model. The models with FA4 and UN structures have consistent ranking of cultivars mean and cultivar within the locations effects. However, we have to remember that models with a number of components too low can contribute to an inappropriate description of cultivars response patterns. A larger number of components allows to perform the real description of the cultivars’ response on the environments. The FA models with more than 4 components did not increase the value of Spearman correlations among G or G(L) effects. This means that the ranking of cultivars effects among these models was compatible (similar). To select the optimal number of components in the FA structure we may use the information criterion, e.g. AIC. However, as shown in this paper, this solution is not perfect. In our case, AIC indicated as the best-fitting an FA structure with 2 components, i.e. FA2. Looking at the Fig. 2 we cannot say that the cultivars adaptability is properly characterized by the FA2 model. By contrast, a model with four components FA4, was on the 3rd place according to the AIC criterion. Thus, the use of AIC can give us only a suggestion on how many components for FA structure should be used, requiring further examination. We can pre-select models using the AIC criterion and then plot their results against each other for further interpretation. Piepho (1998) and Kelly et al. (2007) suggested that the optimum number of components is two. However, the datasets used in their studies were lower in the number of levels within each factor than in our dataset.

In order to fully describe the cultivars adaptive patterns we should use the variance-covariance structure for linear mixed models without restrictive assumptions. A good example of that structure is the unstructured variance-covariance matrix. In contrast to the model with compound symmetry structure, it does not limit the inference on cultivars adaptive patterns. The factor-analytic structure is also a good tool to describe the cultivars’ yield reaction to the environments. In our case, for the winter wheat datasets coming from the Polish Post-registration Variety Testing System, the reasonably low optimal number of components is 4 but each dataset requires separate determination of the optimal number of the factor-analytic components.


ReferencesTop

Burgueño J, Crossa J, Cotes JM, Vicente FS, Das B, 2011. Prediction assessment of linear mixed models for multienvironment trials. Crop Sci 51: 944-954. http://dx.doi.org/10.2135/cropsci2010.07.0403
Crossa J, Vargas M, Joshi AK, 2010. Linear, bilinear, and linear-bilinear fixed and mixed models for analyzing genotype x environment interaction in plant breeding and agronomy. Can J Plant Sci 90: 561574. http://dx.doi.org/10.4141/CJPS10003
Gilmour AR, Gogel BJ, Cullis BR, Thompson R, 2009. ASReml User Guide Release 3.0 VSN International Ltd, Hemel Hempstead, HP1 1ES, UK. 398 pp.
Hu X, 2015. A comprehensive comparison between ANOVA and BLUP to valuate location-specific genotype effects for rape cultivar trials with random locations. Field Crop Res 179: 144-149. http://dx.doi.org/10.1016/j.fcr.2015.04.023
Hu X, Spilke J, 2011. Variance-covariance structure and its influence on variety assessment in regional crop trials. Field Crop Res 120: 1-8. http://dx.doi.org/10.1016/j.fcr.2010.09.015
Hu X, Yan S, Shen K, 2013. Heterogeneity of error variances and its influence on genotype comparison in multi-location trials. Field Crop Res 149: 322-328. http://dx.doi.org/10.1016/j.fcr.2013.05.011
Kelly AM, Smith AB, Eccleston JA, Cullis BR, 2007. The accuracy of varietal selection using factor analytic models for multi-environment plant breeding trials. Crop Sci 47: 1063-1070. http://dx.doi.org/10.2135/cropsci2006.08.0540
Littell RC, Milliken GA, Stroup WW, Wolfinger RD, Schabenberger O, 2006. SAS for mixed models, 2nd ed. SAS Institute, Cary, NC, USA. 814 pp.
Mądry W, Gacek ES, Paderewski J, Gozdowski D, Drzazga T, 2011. Adaptive yield response of winter wheat cultivars across environments in Poland using combined AMMI and cluster analyses. Int J Plant Prod 5: 299-309.
Meyer K, 2009. Factor-analytic models for genotype × environment type problems and structured covariance matrices. Gene Sel Evol 41: 21. http://dx.doi.org/10.1186/1297-9686-41-21
Möhring J, Piepho HP, 2009. Comparison of weighting in two-stage analyses of series of experiments. Crop Sci 49: 1977-1988. http://dx.doi.org/10.2135/cropsci2009.02.0083
Piepho HP, 1997. Analyzing genotype-environment data by mixed models with multiplicative terms. Biometrics 53: 761-766. http://dx.doi.org/10.2307/2533976
Piepho HP, 1998. Empirical best linear unbiased prediction in cultivar trials using factor analytic variance-covariance structures. Theor Appl Genet 97: 195-201. http://dx.doi.org/10.1007/s001220050885
Piepho HP, Möhring J, Schulz-Streeck T, Ogutu JO, 2012. A stage-wise approach for analysis of multi-environment trials. Biometrical J 54: 844-860. http://dx.doi.org/10.1002/bimj.201100219
Roostaei M, Mohammadi R, Amri A, 2014. Rank correlation among different statistical models in ranking of winter wheat genotypes. Crop J 2: 154-163. http://dx.doi.org/10.1016/j.cj.2014.02.002
Smith AB, Cullis BR, Thompson R, 2001. Analyzing variety by environment data using multiplicative mixed models and adjustments for spatial field trend. Biometrics 57: 1138-1147. http://dx.doi.org/10.1111/j.0006-341X.2001.01138.x
Smith AB, Cullis BR, Thompson R, 2005. The analysis of crop cultivar breeding and evaluation trials: An overview of current mixed model approaches. J Agric Sci 143: 1-14. http://dx.doi.org/10.1017/S0021859605005587
So Y, Edwards J, 2009. A comparison of mixed-model analyses of the Iowa crop performance test for corn. Crop Sci 49: 1593-1601. http://dx.doi.org/10.2135/cropsci2008.09.0574
Spilke J, Piepho HP, Hu X, 2005. Analysis of unbalanced data by mixed linear models using the MIXED procedure of the SAS System. J Agron Crop Sci 191: 47-54. http://dx.doi.org/10.1111/j.1439-037X.2004.00120.x
Stefanova KT, Buirchell B, 2010. Multiplicative mixed models for genetic gain assessment in lupin breeding. Crop Sci 50: 880-891. http://dx.doi.org/10.2135/cropsci2009.07.0402
Welham SJ, Cullis BR, Gogel BJ, Gilmour AR, Thompson R, 2004. Prediction in linear mixed models. Aust Nz J Stat 46: 325-347. http://dx.doi.org/10.1111/j.1467-842X.2004.00334.x
Welham SJ, Gogel BJ, Smith AB, Thompson R, Cullis BR, 2010. A comparison of analysis methods for late-stage variety evaluation trials. Aust New Zeal J Stat 52: 125-149. http://dx.doi.org/10.1111/j.1467-842X.2010.00570.x
Yang RC, 2010. Towards understanding and use of mixed-model analysis of agricultural experiments. Can J Plant Sci 90: 605-627. http://dx.doi.org/10.4141/CJPS10049