Soybean yield modeling using bootstrap methods for small samples
Abstract
One of the problems that occur when working with regression models is regarding the sample size; once the statistical methods used in inferential analyzes are asymptotic if the sample is small the analysis may be compromised because the estimates will be biased. An alternative is to use the bootstrap methodology, which in its non-parametric version does not need to guess or know the probability distribution that generated the original sample. In this work we used a set of soybean yield data and physical and chemical soil properties formed with fewer samples to determine a multiple linear regression model. Bootstrap methods were used for variable selection, identification of influential points and for determination of confidence intervals of the model parameters. The results showed that the bootstrap methods enabled us to select the physical and chemical soil properties, which were significant in the construction of the soybean yield regression model, construct the confidence intervals of the parameters and identify the points that had great influence on the estimated parameters.
Downloads
References
Aiken LS, West SG, 1991. Multiple regression: Testing and interpreting interactions. Sage Publications, Thousand Oaks, CA, USA. 224 pp.
Akaike H, 1973. Information theory and an extension of the maximum likelihood principle. Proc. 2nd Int. Symp. on Information Theory; Petrov BN, Csaki F (eds.). pp: 267–281. Akadémia Kiado, Budapest.
Alakukku L, Elomen P, 1995. Long-term effects of a single compaction by heavy field traffic on yield and nitrogen uptake of annual crops. Soil Till Res 36(3-4): 141-152. http://dx.doi.org/10.1016/0167-1987(95)00503-X
Al-Marshadi AH, 2011. New weighted information criteria to select the true regression model. Aust J Basic Appl Sci 3(3): 317-312.
Austin P, Tu J, 2004. Bootstrap methods for developing predictive models. Am Stat 58(2): 131–137. http://dx.doi.org/10.1198/0003130043277
Beyaztas U, Alin A, 2013. Jackknife-after-bootstrap method for detection of influential observations in linear regression models. Commun Stat Simulat C 42(6): 1256-1267. http://dx.doi.org/10.1080/03610918.2012.661908
Busscher WJ, Bauer PJ, Camp CR, Sojka RE, 1997. Correction of cone index water content differences in a coastal plain soil. Soil Till Res 43(3-4): 205-217. http://dx.doi.org/10.1016/S0167-1987(97)00015-9
Chaves-Neto A, Faria, TMB, 2015. Bootstrap for order identification in Arma(p,q) structures. Ind J Manag Prod 6(1): 169-181. http://dx.doi.org/10.14807/ijmp.v6i1.244
CONAB, 2015. Soja – Brasil: Série histórica de produtividade. http://www.conab.gov.br. [24 March 2015].
Cook RD, 1977. Detection of influential observation in linear regression. Technometrics 19(1): 15-18. http://dx.doi.org/10.1080/00401706.1977.10489493
Cunha WJ, Colosimo EA, 2003. Intervalos de confiança bootstrap para modelos de regressão com erros de medida. Rev Mat Estat 21(2): 25-41.
Davison AC, Hinkley DV, 1997. Bootstrap methods and their application. Press syndicate of the University of Cambridge, Cambridge, UK. 582 pp. http://dx.doi.org/10.1017/CBO9780511802843
Dourado Neto D, Dario GJA, Barbieri APP, Martin TN, 2014. Biostimulant action on agronomic efficiency of corn and common beans. Biosci J 30(1): 371-379.
Dubreuil S, Berveiller M, Petitjean F, Salaün M, 2014. Construction of bootstrap confidence intervals on sensitivity indices computed by polynomial chaos expansion. Reliab Eng Syst Safe 121: 263-275. http://dx.doi.org/10.1016/j.ress.2013.09.011
Efron B, 1979. Bootstrap methods: Another look at the jackknife. Ann Stat 7(1): 1-26. http://dx.doi.org/10.1214/aos/1176344552
Efron B, 1982. The jackknife, the bootstrap and other resampling plans. SIAM, Philadelphia, PA, USA. 93 pp. http://dx.doi.org/10.1137/1.9781611970319
Efron B, 1992. Jackknife-after-bootstrap standard errors and influence functions. J R Stat Soc 54: 83-127.
Efron B, Tibsshirani R, 1986. Bootstrap methods for standard errors, confidence intervals and other measures of statistical accuracy. Stat Sci 1(1): 54-75. http://dx.doi.org/10.1214/ss/1177013815
EMBRAPA, 2013. Sistema brasileiro de classificação de solos, 3ª Ed. – Centro Nacional de Pesquisa de Solos, EMBRAPA – SPI, Rio de Janeiro. 412 pp.
Freddi OS, Carvalho MP, Veronesi-Jr V, Carvalho GJ, 2006. Relationship between maize yield and soil mechanical resistance to penetration under conventional tillage. Eng Agric 26(1): 113-121.
Freedman DA, 1981. Bootstrapping regression models. Ann Statist 9(6): 1218-1228. http://dx.doi.org/10.1214/aos/1176345638
Freud RJ, Littell RC, 2000. SAS system for regression, SAS Inst., Cary, NC, USA. 264 pp.
García-Gallego JM, Chamorro-Mera A, García-Galán MM, 2015. The region-of-origin effect in the purchase of wine: The moderating role of familiarity. Span J Agric Res 13(3): e0103. http://dx.doi.org/10.5424/sjar/2015133-7581
Garcia-Paredes JD, Olson KR, Lang JM, 2000. Predicting corn and soybean productivity for Illinois soils. Agric Syst 64(3): 151-170. http://dx.doi.org/10.1016/S0308-521X(00)00020-2
Hao L, Naiman DQ, 2010. Assessing inequality. Sage, Thousand Oaks, CA, USA. 149 pp. http://dx.doi.org/10.4135/9781412993890
Hoerl R, Snee RD, 2012. Statistical thinking: Improving business performance. John Wiley & Sons, Hoboken, USA. 544 pp. http://dx.doi.org/10.1002/9781119202721
Hoogenboom G, 2000. Contribution of agrometeorology to the simulation of crop production and its applications. Agric For Meteorol 103: 137-157. http://dx.doi.org/10.1016/S0168-1923(00)00108-8
Ireland CR, 2010. Experimental statistics for agriculture and horticulture. Cambridge University Press, Cambridge, UK. 384 pp.
Junges AH, Fontana DC, 2011. Agrometeorological-spectral model to estimate wheat yield in the state of Rio Grande do Sul, Brazil. Rev Ceres 58(1): 9-16. http://dx.doi.org/10.1590/S0034-737X2011000100002
Kamo K, Yanagihara H, Satoh K, 2013. Bias-corrected AIC for selecting variables in poisson regression models. Commun Stat A – Theory 42(11): 1911-1921.
Khakural BR, Robert PC, Huggins DR, 1999. Variability of corn/soybean yield and soil/landscape properties across a southwestern Minnesota landscape. In: Precision Agriculture; Robert PC, Rust RH, Larson WE (eds.). pp: 573-579. Am. Soc. Agron., Madison, WI, USA.
Kulcheski FR, Molina LG, Fonseca GC, Morais GL, Oliveira LFV, Margis R, 2016. Novel and conserved microRNAs in soybean floral whorls. Gene 575(2): 213-223. http://dx.doi.org/10.1016/j.gene.2015.08.061
Levy P, Lemeshow S, 1980. Sampling for health professionals. LLP, Belmont, CA, USA. 320 pp.
Lobell DB, Ortiz-Monasterio I, Asner GP, Naylor RL, Falcon WP, 2005. Combining field surveys, remote sensing, and regression trees to understand yield variations in an irrigated wheat landscape. Agron J 97: 241-249.
Losada B, Blas C, García-Rebollar P, Cachaldora P, Méndez J, Ibáñez M, 2015. Short communication: Prediction of apparent metabolisable energy content of cereal grains and by-products for poultry from its chemical composition. Span J Agric Res 13(2):06SC02. http://dx.doi.org/10.5424/sjar/2015132-6573
Martin MA, Roberts S, 2010. Jackknife-after-bootstrap regression influence diagnostics. J Nonparametric Stat 22(2): 257-269. http://dx.doi.org/10.1080/10485250903287906
Meloun M, Militký J, 2001. Detection of single influential points in OLS regression model building. Anal Chim Acta 439(2): 169-191. http://dx.doi.org/10.1016/S0003-2670(01)01040-6
Mercante E, Lamparelli RAC, Uribe-Opazo MA, Rocha JV, 2010. Linear regression models to soybean yield estimate in the west region of the state of Paraná, Brazil, using spectral data. Eng Agríc 30(3): 504-517.
Oliveira IP, Costa KAP, Faquin V, Maciel GA, Neves BP, Machado EL, 2009. Effects of calcium sources on Grass growth in monoculture and intercropping. Ciênc Agrotec 33: 592-598. http://dx.doi.org/10.1590/S1413-70542009000200036
Paes AT, 1998. Essential items in biostatistics. Arq Bras Cardiol 71(4): 575-580. http://dx.doi.org/10.1590/S0066-782X1998001000003
Penalba OC, Bettolli ML, Vargas WM, 2007. The impact of climate variability on soybean yields in Argentina. Multivariate regression. Meteorol Appl 14: 3-14. http://dx.doi.org/10.1002/met.1
Peng RD, 2008. Simpleboot: Simple bootstrap routines. R package version 1.1-3.
Pettigrew WT, 2008. Potassium influences on yield and quality production for maize, wheath, soybean and cotton. Physiol Plant 133: 670-681. http://dx.doi.org/10.1111/j.1399-3054.2008.01073.x
Popp JS, Griffin TW, Popp MP, Baker WH, 2002. Profitability of variable rate phosphorus in a two crop rotation. J Ark cad Sci 56: 125-133.
R Core Team, 2014. R: A language and environment for statistical computing. R Foundation for Statistical Computing, Vienna, Austria.
Rahman MS, 2014. Coefficient estimation of regression model and hypothesis testing by bootstrap method. Res Rew J Stat 3(2): 1-7.
Rao P, 1971. Some notes on misspecification in multiple regressions. Am Statistician 25(5): 37-39.
Rizopoulos D, 2009. BootStepAIC: Bootstrap stepAIC. R package version 1.2-0.
Rosolem CA, Foloni JSS, Tiritan CS, 2002. Root growth and nutrient accumulation in cover crops as affected by soil compaction. Soil Till Res 65:109-115. http://dx.doi.org/10.1016/S0167-1987(01)00286-0
Sabaghnia N, Dehghani H, Alizadeh B, Mohghaddam M, 2010. Interrelationships between seed yield and 20 related traits of 49 canola (Brassica napus L.) genotypes in non-stressed and water-stressed environments. Span J Agric Res 8(2): 356-370. http://dx.doi.org/10.5424/sjar/2010082-1195
Shasha D, Wilson M, 2011. Statistic is easy. Morgan & Claypool Publishers, San Rafael, CA, USA. 162 pp.
Siegel S, 1956. Nonparametric statistics for the behavioral sciences. McGraw-Hill, New York. 312 pp.
Sutton NJ, Cho S, Armsworth PR, 2016. A reliance on agricultural land values in conservation planning alters the spatial distribution of priorities and overestimates the acquisition costs of protected areas. Biol Cons 194: 2-10. http://dx.doi.org/10.1016/j.biocon.2015.11.021
Tao F, Yokozawa M, Liu J, Zhang Z, 2008. Climate-crop yield relationships at provincial scales in China and the impacts of recent climate trends. Clim Res 38: 83-94. http://dx.doi.org/10.3354/cr00771
Vera-Diaz MC, Kaufmann RK, Nepstad DC, Schlesinger P, 2008. An interdisciplinary model of soybean yield in the Amazon Basin: The climatic, edaphic, and economic determinants. Ecol Econ 65(2): 420-431. http://dx.doi.org/10.1016/j.ecolecon.2007.07.015
Zheng H, Chen L, Han X, Zhao X, Ma Y, 2009. Classification and regression tree (CART) for analysis of soybean yield variability among fields in northeast China: The importance of phosphorus application rates under drought conditions. Agric Ecosyst Environ 132: 98-105. http://dx.doi.org/10.1016/j.agee.2009.03.004
© CSIC. Manuscripts published in both the print and online versions of this journal are the property of the Consejo Superior de Investigaciones Científicas, and quoting this source is a requirement for any partial or full reproduction.
All contents of this electronic edition, except where otherwise noted, are distributed under a Creative Commons Attribution 4.0 International (CC BY 4.0) licence. You may read the basic information and the legal text of the licence. The indication of the CC BY 4.0 licence must be expressly stated in this way when necessary.
Self-archiving in repositories, personal webpages or similar, of any version other than the final version of the work produced by the publisher, is not allowed.