INTRODUCTION
⌅The proposal of precision agriculture (PA) aimed at localized soil management involves agronomic, technological and, consequently, financial changes, which hinders adoption of the PA techniques, especially by small producers (Srinivasan, 2006). PA allows studying and analysing the spatial variability of crop yield and the chemical and physical attributes of the area under study. The study of this spatial variability of spatially georeferenced continuous variables can be done using geostatistical techniques (Cressie, 2015), which is crucial for the feasibility of differentiated and localized management practices, and allowing to measure the degree of spatial dependence between the sample elements in a given region and to describe the spatial dependence structure of the georeferenced variable throughout the area, thus generating thematic maps (Cressie, 2015).
An important part of the PA is to obtain data for the elaboration of maps of attributes that influence crop productivity (Bernardi et al., 2014). There are commercially available equipment and sensors that can be highlighted for obtaining these data, as well as for collecting soil samples (Gonçalves et al., 2020).
Agricultural sensors are instruments that can be attached to implements or agricultural machinery, allowing, for example, to analyze soil attributes or characteristics in real time. Scherer et al. (2018) proposed the development of a sensor to determine the electrical conductivity of the soil at low cost. Prudente et al. (2021) compared the NDVI (Normalized Difference Vegetation Index) spectro-temporal profiles obtained by active (GreenSeeker 505 Handheld) and passive (FieldSpec4 Standard-Res model) proximal sensors to monitor soybeans and beans. By means of multispectral sensors onboard the AT120 Remotely Piloted Aircraft Systems, Facco & Pegoraro (2019) captured aerial images, and thus built digital surface models, orthoimages, contour lines, generated vegetation indices and performed the detection of planting failures. Thinking about agility and quality, Resende et al. (2020) used a remotely piloted aircraft equipped with an RGB camera and a MAPPIR 3 camera to capture images of a corn crop, to estimate the leaf area index of a plot infested by Spodoptera frugiperda. Furthermore, the wireless sensor network can be used in underground and terrestrial environments to detect soil and climate conditions, as well as detect pests and insects (Bayrakdar, 2020a,b).
Another important procedure in PA is soil sampling, which represents a critical step in the assessment of soil fertility, and may be responsible for 98% of the errors made in the inappropriate recommendation of fertilizers (Mendes et al., 2001). Still, the choice and influence of the location of the sampled points, as well as the distance that separates them, are essential for the success of the sampling design (Carvalho & Nicollela, 2002).
In this context, it is necessary to determine a soil sampling scheme that is as efficient as possible, with the smallest sample size to minimize operating costs and maximize the quality of the results of the spatial variability analysis (Marchant & Lark, 2010).
For the choice of this sampling scheme, there are traditional spatial sampling designs, such as simple random sampling and the systematic design, which are classified as design-based, assuming that the values of the georeferenced variable are considered fixed. Traditional spatial sampling designs are advantageous in terms of the simplicity of choice of sampling points in the area under study, either because of the occurrence of different spacing between pairs of sampling points (in simple random sampling) or because of coverage uniformity of the area under study (in systematic sampling) (Haining, 2015).
However, the spatial heterogeneity and spatial autocorrelation perform a particular impact on the variability analysis accuracy, in the context of the sampling design. The spatial distribution of an attribute is considered heterogeneous when the attribute varies in the study area, thus requiring division of the area into homogeneous subareas. On the other hand, spatial autocorrelation indicates that the georeferenced samples nearby have similar values.
Characteristics of spatial variability, such as anisotropy, can guide a choice of the most intense number of sampling points in the direction of greater spatial variability (Delmelle, 2009). Or the inclusion of more sampling points with a smaller distance radius in the systematic sampling design, to minimize the nugget effect problem, as proposed by Diggle & Lophaven (2006), in the “lattice plus close pairs” and “lattice plus infill” designs.
The spatial sampling designs that consider the sample as a realization of a probability process are classified as model-based design or geostatistical design (Diggle & Lophaven, 2006). The spatial sampling design that considers the spatial correlation makes it possible to reduce the sample size by restricting redundant spatial information, in relation to traditional sampling (Haining, 2015; Dal’Canton et al., 2021).
Several methodologies for spatial re-planning of the sample, with sample size reduction, are developed in the literature, such as calculation of the effective sample size (Dal’Canton et al., 2021) or the optimized sample re-planning (Maltauro et al., 2019; 2021). Moreover, prior knowledge of the spatial autocorrelation of the variable can be used in the spatial stratified sampling, which provides a lesser loss of precision of the estimate than spatial simple random and systematic sampling. The spatial stratified sampling can use relevant previous historical data in the same area, such as prior knowledge. Associated with the multivariate clustering analysis, this information allows generating zones, which will represent the stratified space (Wang et al., 2013).
In the PA context, the identification of homogeneous Management Zones (MZs) within the cultivation areas becomes a more technically and economically viable strategy for the implementation of localized soil management, allowing a long-term that all area treatments to be carried out uniformly within each zone (Molin, 2006; Srinivasan, 2006; Bernardi et al., 2014). Thus, to produce more stable MZs, it is recommended to use variables that do not vary significantly over time, such as topographic data and physical data (Aikes et al., 2021).
On the other hand, the chemical attributes are important and useful to generate the zones for recommendations of variable rate fertilizer applications (Aikes et al., 2021). These zones are called application zones (AZs) (Molin, 2006). Therefore, the difference between MZs and AZs is related to the variables available (stables or not) and with the intention of generating the zones (long-term use or only for a future application of fertilizers). Moreover, the spatial statistics and the multivariate cluster analysis are methods commonly used by MZ and AZ.
These MZs or AZs make up a database that can be used to direct, in the future, the reduction of the sample size in areas more homogeneous (Rodrigues Jr. et al., 2011). In this context, there are several ways to select samples, which depend on the purpose and the resources available for the survey (Haining, 2015). When we do not have information about the spatial variability, the sampling points can be selected using traditional sampling criteria (Benedetti et al., 2015), or multi-phase adaptive sampling schemes (Marchant & Lark, 2010).
Determination of the sample configurations with reduced size by the use of AZs as a spatial stratified sampling provide the following advantages: AZs generate more homogeneous regions from the point of view of spatial similarity, allowing the researcher to choose a smaller number of sampling points in each AZ; the methodology considers the spatial information redundant; there is a design-based and model-based combination; use of less computational time than in optimized sampling; and greater simplicity of execution.
Given the above, the objective of the paper was to assess the acquisition of reduced sample configurations contained in application zones generated by spatial multivariate clustering. The sampling protocol proposed in this work is intended to guide the definition of new intervention cycles in the study area, with a reduced sample size, but maintained the refined localized management in this agricultural area.
MATERIAL AND METHODS
⌅Agricultural area or experimental field
⌅The data set regarding the soil chemical attributes in the three soybean harvest years (2013-2014, 2014-2015, and 2015-2016) used in this research was observed in a 167.35 ha commercial grain production area located at the Three Girls Farm in the city of Cascavel-PR, Brazil, located approx. 24.95º S 53.37º W with a mean altitude of 650 m above sea level. The soil is classified as Dystroferric Red Latosol, with a clayey texture. The region’s climate is classified as mesothermal and super-humid temperate, climate type Cfa (Köeppen), and the mean annual temperature is 21ºC (Embrapa, 2013). The area under study had 102 sampling points arranged in the lattice plus close pairs design (Chipeta et al., 2017), with a minimum distance between the points of the regular grid of 141 meters and, in some places, randomly chosen, the sampling points were arranged at smaller distances (75 and 50 m between point pairs (Fig. 1b).
The samples were located and georeferenced using a GNSS receiver (GeoExplorer, Trimble Navigation Limited, Sunnyvale, CA, USA) in a Datum WGS84 coordinate reference system, UTM (Universal Transverse Mercator) projection. Soil sampling was performed at each point indicated (Fig. 1b). At these points, four soil sub-samples were collected, from 0.0 to 0.2 m deep, mixed and placed in plastic bags, with samples of approx. 500 g comprising the representative sample of the plot.
Descriptive and geostatistical analyses
⌅Each harvest year, descriptive and geostatistical analyses were performed for each of the soil chemical attributes, in order to verify the presence of directional trend and anisotropy. Anisotropy was assessed through the analysis of the directional semivariograms (Guedes et al., 2018) and the non-parametric Maity & Sherman´s (2012) test, considering 5% significance. Spatial dependence was assessed by the nugget-to-sill radio classification (Cambardella et al., 1994), which is a good and consolidated method to assess the intensity of spatial dependence, being used in several works in the field of Soil Science (Siqueira et al., 2010; Guedes et al., 2018; Maltauro et al., 2019, 2021; Dal’Canton et al., 2021). The soil chemical attributes that presented spatial dependence were studied (Cambardella et al., 1994), referring to each harvest years (Fig. 1a).
The following models were estimated by the maximum likelihood method: exponential, Gaussian, and Matérn family with shape parameter =2.5 (Cressie, 2015). The best model was chosen by means of the cross-validation method (Faraco et al., 2008). Subsequently, the spatial prediction was carried out in non-sampled locations in the agricultural area under study, by kriging, and thematic maps of each attribute were prepared (Landim, 2006).
Acquisition of the spatial and multivariate dissimilarity matrix
⌅Subsequently, all the locations were compared in pairs. For this, in each pair of i and j locations (i, j=1,...,n) in which the p attributes had already been measured (Fig. 1a), the similarity coefficient proposed by Gower (1971) was calculated and, for quantitative attributes, the practical range is a form of standardizing the attributes (Eq. 1; Fig. 1d). The dissimilarity matrix was obtained based on Oliver & Webster (1989).
In the principal component analysis (PCA) of the original data (Eq. 2; Fig. 1d), the first principal component (pc1) was selected, as this explains most of the data variation. Considering the pc1 scores, the geostatistical models were estimated in a way analogous to the methodology used for the soil chemical attributes. The dissimilarity matrix was obtained with the estimation of the parameters of the pc1 scores’ geostatistical model (Eq. 3; Fig. 1d). In this way, the matrix adds information about the Euclidean distance between the sample elements, as well as the spatial dependence structure of the attributes.
The columns of matrix (Eq. 10; Fig. 1d) are the new variables. Consequently, the number of columns corresponding to the number of original attributes was selected. Subsequently, a geostatistical model was estimated and data interpolation by kriging was made. The interpolated data were used to obtain the AZs (Gavioli et al., 2016).
Clustering and choice of the number of clusters and criteria to evaluate the clusterings
⌅Initially for each harvest year, the best clustering method was chosen among the following: Fanny, Fuzzy C-means, McQuity, Ward, and K-means. Details about the clustering methods evaluated are described in Ward Jr. (1963), McQuitty (1966), MacQueen (1967), Bezdek (1981), and Kaufman & Rousseeuw (2009). This choice was made by means of the following indices: Dunn, Davies Bouldin, C, SD, and variance reduction (Dunn, 1974; Hubert & Levin, 1976; Davies & Bouldin, 1979; Halkidi et al., 2000; Gavioli et al., 2016, respectively). To define the number of clusters, the scatter plot of the sum of squares of errors (SSE) versus the number of clusters (knee graph) was used, as well as the silhouette scatter plot versus the number of clusters (Yi et al., 2013).
Sample configuration
⌅With the best AZ chosen for each harvest year, a sample reduction was performed in each AZ. Sample configuration was reduced considering 75% and 50% of the sampling points and using different sample configurations (Fig. 1c). Greater reductions were not possible, as the number of sampling points would not meet the geostatistical analysis criteria (at least 30 pairs for calculation of the semivariances; Journel & Huijbregts, 1976).
First, sample reduction was carried out considering random (R) and proportional random (PR) sampling, selecting, respectively, the sampling points randomly within each AZ or proportionally to the number of hectares within each AZ. In the sample reduction using systematic (S) and proportional systematic (PS) sampling, the sampling points were obtained by selecting of points from the regular sampling grid, to obtain 50% and 75% of the sampling points. The points of the agricultural area’s lattice plus close pairs were selected until the number of required points was completed.
With each reduced sample configuration, the exploratory and geostatistical data analyses were repeated. Finally, the initial and the reduced sample configurations were compared using the overall accuracy (Anderson et al., 2001) and the Kappa and Tau agreement indices (Krippendorff, 2013).
Computational resources
⌅All statistical and geostatistical analyses were developed using the R software (R Development Core Team, 2021), or its geoR package (Ribeiro Jr. & Diggle, 2001).