Predicting soil organic matter and soil moisture content from digital camera images: comparison of regression and machine learning approaches

Perry Taneja; Hiteshkumar Bhogilal Vasava; Solmaz Fathololoumi; Prasad Daggupati; Asim Biswas

doi:10.1139/cjss-2021-0133

How to translate text using browser tools

31 March 2022 Predicting soil organic matter and soil moisture content from digital camera images: comparison of regression and machine learning approaches

Perry Taneja, Hiteshkumar Bhogilal Vasava, Solmaz Fathololoumi, Prasad Daggupati, Asim Biswas

Author Affiliations +

Canadian J. of Soil Science, 102(3):767-784 (2022). https://doi.org/10.1139/cjss-2021-0133

Abstract

Appropriate soil management maintains and improves the health of the entire ecosystem. Soil appropriate administration necessitates proper characterization of its properties including soil organic matter (SOM) and soil moisture content (SMC). Image-based soil characterization has shown strong potential in comparison with traditional methods. This study compared the performance of 22 different supervised regression and machine learning algorithms, including support vector machines (SVMs), Gaussian process regression (GPR) models, ensembles of trees, and artificial neural network (ANN), in predicting SOM and SMC from soil images taken with a digital camera in the laboratory setting. A total of 22 image parameters were extracted and used as predictor variables in the models in two steps. First models were developed using all 22 extracted features and then using a subset of six best features for both SOM and SMC. Saturation index (redness index) was the most important variable for SOM prediction, and contrast (median S) for SMC prediction, respectively. The color and textural parameters demonstrated a high correlation with both SOM and SMC. Results revealed a satisfactory agreement between the image parameters and the laboratory-measured SOM (R² and root mean square error (RMSE) of 0.74 and 9.80% using cubist) and SMC (R² and RMSE of 0.86 and 8.79% using random forest) for the validation data set using six predictor variables. Overall, GPR models and tree models (cubist, RF, and boosted trees) best captured and explained the nonlinear relationships between SOM, SMC, and image parameters for this study.

Introduction

Soil organic matter (SOM), an indicator of soil health and quality (Zhang et al. 2006), is a significant component of any ecosystem (Li et al. 2013) and influences agricultural sustainability, food security, and climate (Were et al. 2015). Organic carbon (OC), as a key element of soil, plays an essential role in the global carbon cycle, so it is critical to measure its content in the soil (Kumar and Lal 2011; Yang et al. 2016). Soil moisture content (SMC), another significant component, not only influences the growth of crops, but also is a key factor in any crop management decisions including precision agriculture practices (Chukalla et al. 2015; Feki et al. 2018). Therefore, quantification of the spatial and temporal distribution and dynamics of SOM and SMC provide critical information to authorities concerned with the management and policymaking regarding soil and climate (Meersmans et al. 2008), food production (Taghizadeh-Mehrjardi et al. 2016), ecosystem modeling (Li et al. 2003), agriculture, forestry, land degradation management, environment protection, and most importantly land-use planning (Li et al. 2013). However, for detailed characterization, traditional measurement approaches are expensive, often involve use of hazardous chemical reagents, and time- and labor-intensive (Sudarsan et al. 2016; Lazzaretti et al. 2020). This often leads to delays in making decisions or resorting to outdated data, which ultimately forces users to make wrong decisions.

With the advancement of machine learning techniques and increasing access to digital image acquisition systems, digital image processing has emerged as an inexpensive technique to deal with these problems (Sudarsan et al. 2016; Fu et al. 2020; Swetha et al. 2020). With digital image processing approaches, soil properties, including but not limited to SOM, SMC, soil texture, iron, and fine particle contents, can be quantitatively estimated by formulating relationships between laboratory-measured soil properties and readily measurable soil image color and texture features (Levin et al. 2005; Rossel et al. 2008; Zhu et al. 2011; Sudarsan et al. 2016). Generally, color and (or) reflectance of soil can be attributed to numerous properties of soil such as SMC, SOM, parent material, mineralogy, and texture (Hummel et al. 2001; Fu et al. 2020; Gholizadeh et al. 2020; Taneja et al. 2021). This association justifies developing relationships between soil reflectance and its properties to predict their content using modeling.

In developing predictive relationships with image parameters, regression-based methods have been used in many fields including soil science (Persson 2005; dos Santos et al. 2016; Wu et al. 2017; Sakti et al. 2018). While it showed variable performance, various linear and nonlinear regression-based methods are still the commonly and popularly used methods (Rossel et al. 2008; dos Santos et al. 2016; Swetha et al. 2020). Recently, with advances in data processing and computing power, several data-driven modeling and machine learning approaches, including support vector machine (SVM), ensembles of trees (cubist, random forest, boosted trees, bagged trees), and Gaussian process regression (GPR), have been utilized with variable performance and reasonable success in developing predictive relationships in many fields (Gill et al. 2006; Matei et al. 2017; Chen et al. 2019; Kotlar et al. 2019). However, image-related approaches and the details of models’ performance in predicting soil properties are limited and need further studies. In addition, due to the variable performances in the variable image set, it is difficult to understand and compare the performance of these algorithms compared to conventional regression-based methods in predicting soil properties. However, some researchers have compared the performance of two to three different algorithms in creating a predictive relationship (Gregory et al. 2006; Rossel et al. 2008; Wu et al. 2018). A complete comparison can only provide a good justification for the choice of method for predicting soil properties, especially SOM and SMC.

In addition to modeling, the collection of soil images is another important component that determines the success of image-based soil characterization. With a focus on targeted laboratory applications, most of the studies collected soil images under defined enclosures illuminated by controlled sources of light (Rossel et al. 2008; Zhu et al. 2011; Gómez-Robledo et al. 2013; Sakti et al. 2018; Wu et al. 2018; Fu et al. 2020). However, with a focus on developing computer vision or image analysis-based proximal soil sensors for various in situ applications, including precision agriculture, research carried out in controlled environment may not provide as much useful information as required. Collection of images in natural conditions would improve universality of the relationship developed.

Studies on the direct use of machine learning algorithms on image-derived color and texture data for SOM and SMC prediction have not yet appeared. Therefore, the overall objective of this research was to comprehensively compare the performance of various commonly used regression-based methods with machine learning based methods in predicting SOM and SMC from soil images collected with an inexpensive digital camera. The specific objectives of this study were to

assess the feasibility and usability of digital images to predict SOM and SMC in soil;
optimize the image parameters for developing predictive relationships;
and comprehensively compare the performance of a range of regression and machine learning algorithms (22 in total) in predicting SOM and SMC.

Proper assessment and comparison of various modeling algorithms and an optimum set of image parameters will serve as an informative guide on the use of digital images for predicting SOM and SMC.

Materials and methods

The overall methodology is divided into three sections: data collection, image analysis, and data analysis (Fig. 1).

Fig. 1.

Overview of the framework used for this study. [Color online]

Study site description and sample collection

Soil samples were collected in an earlier study by Ji et al. (2016) from two agricultural fields, Field 26 (~11 ha) and Field 86 (~17 ha) located on the MacDonald Campus research farm of McGill University, Sainte Anne De Bellevue, Quebec, Canada (Fig. 2). These two fields exhibited high spatial variability in terms of soil texture, organic matter, and soil type (Ji et al. 2016). The landscape of this area has undergone numerous processes during the last deglaciation including land-level rise, invasion of saline water, lake formation, retreat of ice, and deposition of glaciers, leading to the formation of highly variable soil. For example, soils of Field 26 ranges from mineral to organic deposits (peat) with high variability in soil textures including clay loam, loam, silt loam, sandy loam, and sand.

Fig. 2.

Geographic location of the study area, Field 86 (left) and Field 26 (right) of Macdonald Campus Farm, McGill University, Quebec, Canada, as well as field elevation maps for Field 26 and Field 86 along with the soil map. The letters in the map represent various soil series. The base map is downloaded from Google Earth and processed in ArcGIS, the projection used in NAD84 with UTM zone 18. [Color online]

Field 86 mostly includes mineral soils with sandy clay loam, loam, sandy loam, clay, and clay loam texture (Fig. 3). Soil samples from the depth of 0–15 cm were collected from Field 26 and Field 86, respectively, in late April and early May in 2015 before seeding following a stratified random sampling strategy.

Fig. 3.

Soil texture classification (following Canadian System of Soil Classification) of soil samples collected from Field 26 and Field 86. The 25 samples selected for this study are represented by red colored signs while blue color signs represent the remaining 95 samples (out of the total 120 samples). The triangle was prepared using “soiltexture” package in R. [Color online]

The fields were under no-tillage practices and corn–soybean rotation with soybean and corn being the preceding crops in Field 26 and Field 86, respectively. A total of 25 soil samples (17 from Field 26 and 8 from Field 86) exhibiting a wide variation in SOM (3.30%–62.70%), representing the range of SOM present in these fields, were carefully selected for this study (Fig. 4). These 25 samples represented both organic (mainly from Field 26) and mineral soils (present in both fields). This was done deliberately to include universality and increase robustness in training models. However, in laboratory terms, by adding different amounts of moisture to the samples, the number of samples used in modeling has increased significantly for processing (125 samples).

Fig. 4.

The 25 soil samples selected for this study collected from Field 26 and Field 86.

Laboratory analysis and soil imaging

The samples were air-dried, ground, and sieved through a 2 mm sieve. The processed samples were then used to capture images as well as measure soil properties in the laboratory. SOM was measured using loss on ignition (LOI) method. Ignition conditions were 550 °C for three hours (Schulte and Hopkins 1996). Processed soil samples were evenly placed in Petri dishes (~8 mm thickness) and the surface of the samples were captured with a 12.1-megapixel digital camera (Canon PowerShot SX270 HS) mounted on a tripod (27 cm) with the lens facing downward toward the sample. The camera was set to a 3000 × 4000 pixel resolution and the “best” jpeg compression, thereby supplying smaller sized images in contrast to uncompressed tiff files, but of comparable quality. A camera lens aperture setting of f/3.5 was regarded as appropriate for image acquisition under the normal lighting conditions of the laboratory, which was determined by repetitive tests conducted on distinct group references (Fu et al. 2020; Taneja et al. 2021).

A total of five sets of pictures were captured on the same soil samples. Before starting image capture, the weight of the empty Petri dishes and dishes with air-dried soil samples were recorded. The first set of images was collected on these soil samples (set 1). Then, water was added carefully (without disturbing the soil surface) and gradually over a period of time to simulate saturation-like conditions. The second set of images was collected corresponding to this condition (set 2). The saturated soil samples were then permitted to dry in open air under laboratory conditions. Two more sets of images were captured during the natural drying process of samples corresponding to two different SMCs. Finally, the soil samples were oven-dried at 105 °C for 24 hours to get 0% SMC and the images were captured of the driest soils. The weight of the soil samples (including the Petri dish) was recorded during each stage of image acquisition to calculate the gravimetric SMC based on the loss of weight during each drying event. Thus, a total of five sets of images corresponding to five different levels of SMC were collected. These sets were then grouped into five categories in increasing order of SMC with the images of oven-dried soil samples forming the Group 1, while those corresponding to the highest SMC and simulating saturation conditions formed the Group 5. Two images were captured for each soil sample in each SMC level (250 initial images). To reduce the uncertainty in modeling different soil parameters using imaging, the imaging was repeated. Then, the average of the two images is used (125 final images used in next steps). Figure 5 shows the SMC (%) of 25 soil samples at five different SMC levels.

Fig. 5.

Soil moisture content (SMC; %) for the 25 soil samples corresponding to five different levels of moisture represented as five groups. [Color online]

In this study, the SMCs of samples in the same group were not kept constant. It was different from the SMC settings of other studies in which soil samples had the same SMC at the same wetting level and abrupt bi- or tri-modal soil moisture distributions were generated (Nocita et al. 2013; Rienzi et al. 2014; Rodionov et al. 2014). However, SMC in a field is likely to follow a normal or quasi-normal distribution. Moreover, soil samples with varying levels of SOM have different water-holding capacities and, thus, have varying drying characteristics. As an example, the sample with 3.3% SOM had a saturation SMC of 36.91% while that with 62.7% SOM had a saturation SMC of 119.60% (Fig. 5). Therefore, setting up a fixed SMC would have biased the image acquisition process. Consequently, the setting of varying soil moisture in this study (not controlling the SMC and allowing it to vary) had advantages to well simulate the continuous variation of soil moisture through space in the field.

In various studies that use digital images to obtain information about soil color, soil samples are confined to defined enclosures that are illuminated by a fixed light source (Rossel et al. 2008; Zhu et al. 2011; Gómez-Robledo et al. 2013; Sakti et al. 2018; Wu et al. 2018). But, in this study, there was not any such limitation laid down during imaging. This was done intentionally to simulate field conditions since variations in lighting conditions in actual field conditions are abrupt, variable, and uncontrolled. Moreover, it is necessary to avoid such restrictions in developing proximal soil sensors that must be used in field conditions and not in controlled laboratory environment.

Image analysis

A proficient image acquisition system tries to capture quality images and appropriate image analysis approaches help to derive useful information from the images and make a substantial contribution to the computer vision applications. Similar to other disciplines, it is necessary to exercise prudence when images captured using digital cameras must be processed. Numerous elements, for instance, reflection from water (in case of high SMC), nonuniform lighting conditions, and foreign particles (plant litter, residues of roots, and white colored powder such as that of lime or fertilizer) present on the surface of the soil, affect the quality of the image (Gonzalez et al. 2004). Thus, corrections must be made before useful information can be extracted from the images.

Image preprocessing-cropping

Images were cropped to a square area of 950 × 950 pixels roughly from the geometric center of the image. This was done to remove the white background as well as to reduce the effects caused by the edges of the Petri dishes (Fig. 6).

Fig. 6.

Images of soil sample with (A) 3.3% and (B) 62.7% soil organic matter (SOM) under five different soil moisture condition, while the columns represent (a) original images, (b) corresponding cropped regions, (c) enhanced images, (d) segmented images, (e) color space converted gray images, and (f) color space-converted hue saturation value (HSV) images, respectively. [Color online]

Image preprocessing enhancement

To enhance the images, contrast adjustment was performed using “imadjust” function of MATLAB (MathWorks 2017). This assisted in segmentation (next step) through exclusion of noise and prevention of useful information from fading into noise (Fig. 6).

Image segmentation

For this study, image segmentation denoted identification and retention of the pixels that represented soil in the images. For instance, certain parts of some images were visibly occupied by residues of small leaves and black cracks (only detected after careful examination) or a film of water which gave rise to exceptionally bright reflections (Fig. 6). Irrespective of the area occupied by the gloss or the foreign particles, it was considered essential to eliminate them to avoid inaccurate calculations. Because the image intensity values corresponding to these areas do not depict the image intensity values of the actual soil surface, the mean value would not denote the mean of the soils’ pixels.

Therefore, an experiential-based segmentation technique was developed based on the image histogram to distinguish the pixels covered by soil from nonsoil particles. To segment the image, noticeable dissimilarities in the intensity values of the pixels of soil and nonsoil were used. Because the nonsoil pixels occupied a small portion in contrast with the whole image, a value was ascertained after several trials. This assumption was made with the conviction that image intensity values whose counts were lower than or equivalent to the defined value were regarded as those belonging to nonsoil matter and subsequently discarded. A value of 3000 was chosen for this study; the value may differ for soil from different regions and parent materials (Taneja et al. 2021). In addition, the pixels analogous to the water film were white-colored. In such cases, the histogram was examined to obtain the “highest count” of the image intensity values falling in the range 248–255 Gy scale values (illustrating a range of values demonstrating the color white). In this situation, different threshold values were set rather than previous for nonsaturated soils. For example, the images were converted to grayscale and the pixels with gray-scale values between 248 and 255 were examined for the “highest count.” The “highest count” was then compared with the threshold set for nonsaturated soils (i.e., 3000) and the greater value was set as the threshold. For example, for a saturated soil sample, the “highest count” of 4518 was recorded on gray-scale value of 251. Then, the 4518 value was compared with the previously optimized value of 3000 and the higher value, which means 4518, was set as the threshold.

Color space conversions and feature extraction

The RGB images (the color of pixel was made up of red, green, and blue components (Kumar and Verma 2010)) were converted to HSV (the colors were represented by hue (tones), saturation (purity), and value (brightness)) and grayscale images using color space conversions (Fig. 6); then, color features such as redness index (RI), coloration index (CI), hue index (HI), and saturation index (SI), as well as texture features including entropy, contrast, energy, and homogeneity, were the sum of squared elements in the gray-level co-occurrence matrix (GLCM). “Homogeneity” was extracted and indices were derived using MATLAB. The list of extracted features and derived indices is presented in Fig. 7.

Fig. 7.

Overview of extracted features and indices derived from the images. RGB, red, green, and blue; HSV, hue, saturation, and value. [Color online]

A total of 22 image parameters were extracted. “Mean” represented average of values of all the pixels in an image. “Median” represented the middle pixel value after all the pixels were sorted in numerical order. “Entropy” was the statistical measure of randomness. “Contrast” was the measure of intensity contrast between a pixel and its neighbor over the whole image. “Energy” was the sum of squared elements in the GLCM. “Homogeneity” was the closeness of distribution of elements in the GLCM to the GLCM. RI, CI, HI, and SI were calculated as:

(1)

(2)

(3)

(4)

Both mean and median values were used as predictors in the modeling due to inconsistent information in the literature. For example, while some researchers used mean values (Rossel et al. 2008; Sudarsan et al. 2016), others employed median values (Persson 2005; Rossel et al. 2008) in their research. In fact, Persson (2005) advocates the application of median values to handle the deviations resulting from the shading of the microrelief developed on the surfaces of the samples of soil. Depending on the viewing angle with respect to the direction of the incident light, there might exist bidirectional reflectance distribution function (BRDF) and shading influences (King 1995; Lillesand et al. 2015). The indices derived from the images (RI, CI, HI, SI) were expected to reduce these influences. Being the ratio indices, a view effectively balances the abnormalities in brightness arising from the disparities in the topography and emphasizes the color content of the samples.

Data preprocessing and division

Multivariate outliers were determined based on the Mahalanobis distance (De Maesschalck et al. 2000). Regression approaches were employed to decide if a specific sample from a sample population was an outlier through the combination of ≥2 variable scores. Following this, data obtained from five images were detected as outliers and were not included in further calculations. The data were split to calibration and validation sets randomly; 70% of the data were used as calibration data (84 images) and 30% of the data were used as validation data (36 images). Statistical distribution of calibration and validation samples was normal. These data also include a wide range of SOM and SMC values. All the necessary standards are considered in the selection and classification of calibration and validation data sets (Table 1).

Table 1.

Descriptive statistics of the whole, calibration, and validation data set for soil organic matter (SOM) and soil moisture content (SMC).

Model development

The image color and texture-related features were used to develop predictive relationships against laboratory-measured SOM and SMC. Under six broad types: (1) linear regression (LR), (2) regression trees, (3) SVMs, (4) GPR, (5) ensemble of trees, and (6) artificial neural network (ANN), a total 22 models were developed. Codes were written in MATLAB to run these models on data sets except for cubist model, which was executed in R program (version 3.5.3) on RStudio (Team 2015).

Model performance assessment

Several statistical parameters were computed to assess the accuracy of the models.

Coefficient of determination (R²): it represents the percentage of total variation in dependent variable. Its value can vary from 0 to 1. Large values imply higher prediction accuracies (eq. 5).
Root mean square error (RMSE): it represents the mean absolute error between the measured and observed values. Lower values are desirable (eq. 6).
Lin's concordance correlation coefficient (LCCC): the LCCC was employed for model quality evaluation since it represents the fit of 1:1 line of the predicted and observed values. Also, because of it being unitless in nature, it is advantageous to compare different models of the same soil property and (or) comparison of models for different soil properties (Sorenson et al. 2017). Large values represent higher prediction accuracy (eq. 7).
Mean of the residuals (Bias): it is used to analyze the underfitting or overfitting of the model predictions. Value of bias = 0 implies unbiased predictions (eq. 8).
Ratio of performance to deviation (RPD): it is the ratio of standard deviation of observed or measured values to the standard error of prediction (Chang et al. 2001). The RPD values >2 are often considered to represent good model performance.
Ratio of performance to interquartile distance (RPIQ): it is the ratio of interquartile range of the observed values to the RMSE of prediction. The RPIQ takes into consideration both the variation of measured values and the prediction error, thereby being an indicator of model quality, which is more objective than the RMSE of prediction and, thus, it can be easily used for the comparison of different models. The greater the value of RPIQ, higher is the model's capacity to predict.

(5)

(6)

(7)

(8)

where N was the number of samples, Y_predicted was the predicted values, Y_observed was the observed values and was the mean of observed values.

All these statistics were tested on both the calibration and validation data sets. At first, all the 22 extracted features (color and texture characteristics) were treated as predictor variables and were used to develop the models for SOM (%) and SMC (%) prediction. Later, a subset of six optimum predictors (optimization described in the next section) were used for model development. A 10-fold cross-validation was performed on the calibration data set as a means of internal validation (IV). Models were also externally validated using an independent validation data set. The residuals (difference between observed and predicted) were also tested for the presence of normality and the absence of autocorrelation and were found satisfactory for regression relations development.

Variable screening to identify optimum predictors

To study the relative importance of predictor variables in predicting SOM and SMC, a z-score was defined following six different analysis: analysis of variance (ANOVA), random forest (RF), cubist, principal component analysis (PCA), Vtreat variable reduction, and correlation analysis. Under each analysis, all the image parameters (predictor variables) were rated on a scale of 0–100 (most important) and then averaged to get a z-score. While some analysis techniques, such as cubist and RF, by default, provided variable importance on a scale of 0–100. Correlation analysis was simply 1:1 correlation between dependent variable and each predictor. The absolute values of the correlation coefficients were first calculated and were scaled at 0–100, with the lowest and highest absolute correlation coefficients being assigned a value of 0 and 100, respectively. For ANOVA, the p value for each predictor variable was scaled to 0–100 with 0 and 100 being assigned to the lowest and highest p value, respectively. “Vtreat” is an R package for looking at the variable importance/significance. The values of R² were scaled to 0–100, with the lowest and highest value being assigned 0 and 100, respectively. These 0–100 scaled values were then added and averaged to get the final scaled values at the range of 0 and 100 and was named z-score for that predictor. The top six predictor variables were then identified as the optimum predictors for both SOM and SMC. All the models were then developed using these six predictors as independent variables and the model performance statistics were recalculated.

Results

Descriptive statistics of soil properties

Table 2 presents the descriptive statistics for SOM, SMC, various image color, and texture features and derived indices. The soil properties and thus image parameters showed a high degree of variation with the coefficient of variation, CV (%) varying between 10.14 and 168.20. The SOM content varied between 3.30 (%) and 62.70 (%) with an average of 18.44 (%) and a standard deviation of 17.23 (%). The SMC also varied between 0.00 (%) and 119.60 (%) with a mean of 25.16 (%) and a standard deviation of 27.48 (%). With an acceptable approximation, all image parameters except mean H, energy, and RI were normally distributed (kurtosis approximately between −3 and 3). The RI had a very large CV of about 168.20%. On the other hand, homogeneity comparatively varied less significantly, with a CV of around 10.14%. The high variability of SOM and SMC presented an opportunity to test the prediction capability of the developed models.

Table 2.

Descriptive statistics of soil organic matter (SOM), soil moisture content (SMC), and soil color measurements.

Linear correlation between SOM, SMC, and soil color

Several soil color parameters showed high correlations with SMC, although comparatively weaker correlations were observed with SOM (Fig. 8). Soil moisture content showed a high negative correlation with mean gray with a correlation coefficient of –0.85. In addition, SMC was also negatively correlated with mean B (–0.84), mean G (–0.84), and mean R (–0.84). SOM content was negatively correlated with median S values (–0.65) followed by SI (–0.62) and mean S (–0.54). SMC was weakly correlated with SI (–0.06) while SOM was weakly correlated with RI (0.16). In general, the reflection intensity decreased with the increase in organic matter and moisture content. Significant correlation was also observed among color and texture parameters to some extent.

Fig. 8.

Correlation plot for soil organic matter (SOM), soil moisture content (SMC), color space model parameters, and indices derived from them. RI, redness index; CI, coloration index; HI, hue index; SI, saturation index. [Color online]

Identification of optimum predictors

To underline which explanatory variables were mainly important for the prediction of SOM and SMC, radial plots were studied following six different analysiss (Figs. 9–12). Color features were more important than textural features in SOM prediction. Also, the impact of mean values in SOM prediction accuracy was greater than that of median values. Whereas, for SMC, the impact of median values and textural features in SOM prediction accuracy were greater than that of mean values and color features.

Fig. 9.

Relative significance of each individual image parameter as a predictor variable for soil organic matter prediction corresponding to (a) analysis of variance (ANOVA), (b) random forest (RF), (c) principal component analyses (PCA), (d) cubist, (e) Vtreat, and (f) correlation. [Color online]

Saturation index was the most important variable for SOM prediction followed by mean H, median R, mean R, mean V, and median S. The least important variable was RI (Fig. 10).

Fig. 10.

z-score of each individual image parameter representing its contribution toward soil organic matter prediction. RI, redness index; CI, coloration index; HI, hue index; SI, saturation index. [Color online]

Fig. 11.

Relative significance of each individual image parameter as a predictor variable for soil moisture content prediction corresponding to (a) analysis of variance (ANOVA), (b) random forest (RF), (c) principal component analyses (PCA), (d) cubist, (e) Vtreat, and (f) correlation. [Color online]

For SMC, contrast was the most important predictor variable followed by median B, median R, mean B, homogeneity, and energy. The least important variable was median S (Fig. 12). Several soil color parameters showed high correlations with SMC, although comparatively weaker correlations were observed with SOM (Fig. 8). Soil moisture content showed a high negative correlation with mean gray with a correlation coefficient of –0.85. In addition, SMC was also highly negatively correlated with mean B (–0.84), mean G (–0.84), and mean R (–0.84). SOM content was negatively correlated with median S values (–0.65) followed by SI (–0.62) and mean S (–0.54). Soil moisture content was weakly correlated to SI (–0.06) while SOM was weakly correlated to RI (0.16). In general, the reflection intensity decreased with the increase in organic matter and moisture content. Significant correlation was also observed among color and texture parameters to some extent.

Fig. 12.

z-score of each individual image parameter representing its contribution toward soil moisture content prediction. RI, redness index; CI, coloration index; HI, hue index; SI, saturation index. [Color online]

Predictive accuracy of the models

Saturation index models developed with 22 and 6 image color- and texture-related features were calibrated and validated against laboratory-measured SOM and SMC. Descriptive regression statistics of the predicted vs. laboratory measured values of soil properties are presented in Tables 3 and 5 for SOM using 22 and 6 predictor variables, respectively, and Tables 4 and 6 for SMC using 22 and 6 predictor variables, respectively.

Table 3.

Accuracy of different models for the prediction of soil organic matter in the calibration and validation data sets using 22 predictor variables.

Table 4.

Accuracy of different models for the prediction of soil moisture content in the calibration and validation data sets using 22 predictor variables.

Prediction of SOM using 22 predictor variables

10-fold cross (internal) validation

In general, the GPR-based models yield the higher predictive accuracy, while the LR-based models were least accurate for the SOM prediction. From the results (Table 3), it was evident that the most accurate predictions were obtained using ANN. The R², RMSE, LCCC, bias, RPD, and RPIQ values were 0.86, 6.32%, 0.91, –0.13, 2.63, and 3.18, respectively. Also, the second accurate predictions were obtained using exponential GPR model with R², RMSE, LCCC, bias, RPD, and RPIQ values of 0.79, 7.60%, 0.88, –0.08, 2.19, and 2.65, respectively. The least accurate predictions were obtained using interactions linear model with R², RMSE, LCCC, bias, RPD, and RPIQ values of 0.01, 256.99%, 0.01, 6.78, 0.06, and 0.08, respectively (Table 3).

External validation

The GPR-based models yield the highest accuracy with an average R² higher than 0.70, followed by SVM and regression tree-based models. The R² value for the model trained using squared exponential GPR producing best predictions was 0.77, the RMSE was 8.87%, the LCCC was 0.85, the bias was –0.68, the RPD was 2.09, and the RPIQ was 2.46 (Table 3). The performance of ANN for the test data set was comparable but relatively weaker with R² of 0.74 and RMSE of 9.88%. The LCCC was 0.80, the bias was –1.31, the RPD was 1.88, and the RPIQ was 2.21. On the other hand, the poorest predictions were produced by interactions linear model giving an R², RMSE, LCCC, bias, RPD, and RPIQ values of 0.14, 69.64%, 0.17, 4.19, 0.27, and 0.31, respectively (Table 3).

Prediction of SMC using 22 predictor variables

10-fold cross (internal) validation

The SMC was predicted with higher accuracy than SOM. Except the interaction LR approach, all other modeling approaches predicted SMC with high accuracy. GPR approaches, however, outperformed other models with consistent higher prediction. The exponential GPR model produced the best predictive relationship between SMC and soil color and texture features with R² = 0.89, RMSE = 9.40%, LCCC = 0.93, bias= –0.10, RPD = 3.02, and RPIQ = 4.78 (Table 4). The interactions linear model exhibited poor predictive performance with an R² of 0.00 and an RMSE of 308.97%, while the LCCC, bias, RPD, and RPIQ were –0.01, –47.91, 0.09, and 0.15, respectively (Table 4).

External validation

Excellent prediction was observed using all the models with the R² > 0.80 and RPD values >2 except for ANN, interactions linear, and PLSR. The R² value for the model trained using exponential GPR, which produced best predictions, was 0.95, the RMSE was 5.21%, the LCCC was 0.96, the bias was 1.12, the RPD was 4.39, and the RPIQ was 4.82 (Table 4). The worst predictions were produced by interactions linear model with R², RMSE, LCCC, bias, RPD, and RPIQ of 0.01, 75.87%, 0.06, −5.65, 0.30, and 0.33, respectively (Table 4).

Prediction of SOM using six predictor variables

10-fold cross (internal) validation

ANN produced the most accurate predictions with R², RMSE, LCCC, bias, RPD, and RPIQ of 0.74, 8.51%, 0.84, –0.26, 1.96, and 2.36, respectively (Table 5). Overall, the ensemble tree and GPR modeling approaches predicted SOM with higher accuracy (R² > 0.65). The LR, regression trees and SVM yield inconsistent prediction accuracy. The model calibrated using cubic SVM produced the worst predictions with R², RMSE, LCCC, bias, RPD, and RPIQ of 0.47, 13.52%, 0.68, –1.15, 1.23, and 1.49, respectively (Table 5).

Table 5.

Accuracy of different models for the prediction of soil organic matter in the calibration and validation data sets using six predictor variables.

External validation

For the external validation data set, the ensemble tree having edge to GPR-based model with consistent higher accuracy. The most accurate predictions were obtained by cubist, with R², RMSE, LCCC, bias, RPD and RPIQ of 0.74, 9.80%, 0.81, –2.02, 1.90 and 2.23, respectively (Table 5). However, using other methods, the RMSE of the cubist model prediction lowered by approximately 9%–44%. The least accurate predictions were those produced by linear SVM model with R², RMSE, concordance, bias, RPD and RPIQ of 0.51, 14.08%, 0.57, –4.78, 1.32 and 1.55, respectively (Table 5).

Prediction of SMC using six predictor variables

10-fold cross (internal) validation

The ensemble tree modeling approaches constantly yield prediction accuracy with R² > 0.82, while all the GPR-based models resulted in same prediction accuracy. The other model approaches also predicted SMC with an average accuracy with R² > 0.70. The most accurate predictions were obtained using boosted trees, with R², RMSE, LCCC, bias, RPD, and RPIQ values of 0.86, 10.86%, 0.91, −1.63, 2.61, and 4.13, respectively (Table 6). The next best predictions were produced by Cubist model with R², RMSE, LCCC, bias, RPD, and RPIQ values of 0.85, 10.88%, 0.92, 0.67, 2.61, and 4.13, respectively. The least accurate predictions were from coarse tree model with R², RMSE, LCCC, bias, RPD, and RPIQ values of 0.69, 15.76%, 0.82, –0.54, 1.80, and 2.85, respectively (Table 6).

Table 6.

Accuracy of different models for the prediction of soil moisture content in the calibration and validation data sets using six predictor variables.

External validation

Overall, excellent predictions were obtained with all the models showing RPD > 2 apart from few models (LR, Robust Linear, Linear SVM, and Coarse Gaussian SVM) showing an RPD < 2. Utilizing R² to evaluate the model performance also produced similar results, with validation R² ≥ 0.69 for all the calibrated models (Table 6). The R² value for the model trained using RF producing best predictions was 0.86, the RMSE was 8.79%, the LCCC was 0.91, the bias was 1.73, the RPD was 2.60, and the RPIQ was 2.86 (Table 6). On the other hand, the poorest predictions were produced by linear SVM model giving an R², RMSE, LCCC, bias, RPD, and RPIQ values of 0.73, 12.29%, 0.81, 3.64, 1.86, and 2.04, respectively (Table 6).

Discussion

Identification of important predictors

Reasonable and similar prediction accuracies were obtained for both soil properties (i.e., SOM and SMC), even after the removal of insignificant predictors compared to that obtained using the full set of predictor variables. This suggested that a lot of parameters explained only a very little portion of the variation and, hence, their identification and removal was necessary. In addition, removal of redundant parameters also facilitated reduction in processing power and time without compromising the accuracy. Other researchers have shown that the large number of model inputs does not necessarily increase its accuracy, and the removal of additional and ineffective parameters improves the model's performance in predicting SOM and SMC (Zhao et al. 2020; Fathololoumi et al. 2021b).

Model performance

The independently validated statistics also showed that both SMC and SOM content of samples could be predicted with high accuracy using appropriate modeling techniques. Overall, SMC was predicted with greater accuracy than SOM content, and the choice of different models had a clear impact on the prediction quality for both SMC and SOM content (Tables 4-6). This result is in line with some previous research (Paloscia et al. 2008; Fang et al. 2020; Zhou et al. 2020). Fu et al. (2020) quantified the effects of soil moisture on the relationship between SOM and the color parameters derived from mobile phone images using univariate LR models. However, in the present study, various neural network and machine learning algorithms were used to evaluate the impact of soil properties on SOM and SMC. The results showed different performance of these algorithms.

A closer look at the results showed that the GPR models demonstrated excellent prediction ability, as compared to all other models, for both calibration data sets and validation data sets (for both SOM and SMC with 22 predictor variables). Its superior performance can be attributed to the fact that it yields reliable responses to the provided input data, thereby increasing its reliability as a probabilistic model (Rasmussen and Nickisch 2010).

Artificial neural network models were observed to perform well during the SOM calibration phase (under both the cases of utilization of 22 and 6 predictor variables). However, it could not sustain its performance as far as the prediction of SOM was concerned during the validation phase. This could be due to the reason that ANNs possess a predefined structure directed only toward minimizing errors on the training data set. Zhao et al. (2020) and Fathololoumi et al. (2020) presented a similar result in their research.

Apart from these, tree models provided satisfactory prediction accuracies (cubist and RF for the prediction of SOM and SMC, respectively using six predictor variables during the validation phase and boosted trees for SMC using six predictor variables and during the calibration phase). The reason for their success could be linked to the several benefits associated with the utilization of tree models (or rule-based decision methods) such as insusceptibility to outliers, insensitiveness to irrelevant predictors, managing the provided data of varying measurement scale and level, instinctive structure of the models, etc. Similar results have been provided by Heung et al. (2016), Dharumarajan et al. (2017), and Hajdu et al. (2018).

Interactions linear model exhibited the poorest performance when 22 predictor variables were used, for both calibration and validation data sets for both soil properties. On the other hand, when six predictor variables were used, its performance was relatively better. On paying closer attention to the structure of the developed model, it was observed that utilization of 22 predictor variables resulted in a huge number of model parameters (interaction terms) as compared to fewer terms when only six predictor variables were used.

Linear SVM showed poor prediction ability during the validation phase for SOM and SMC using six predictors. This is simply because linear SVM does not yield reasonable results on data which are not linearly separable. This issue is dealt by choosing the right kernel, which is why other types of SVM used in this study performed somewhat better but not exceptionally good. In modeling based on regression models, the use of the optimal number of predictor variables is very important. To reduce processing volume and fieldwork, the optimal mode is to use the least number of predictor variables with the highest modeling accuracy. In this study, we reduced the number of 22 variables to 6 variables if the modeling accuracy did not change significantly. This shows that these six variables have been the most important and effective parameters in the modeling process. Although the modeling accuracy did not change significantly, the processing volume was significantly reduced.

Overall, the nonlinear models performed well than the linear ones, it was inferred that there exists a nonlinear relationship between the SOM, SMC, and image parameters. The efficiency of nonlinear models such as RF and cubist for SOM and SMC prediction has shown in some other studies (Taghizadeh-Mehrjardi et al. 2020; Fathololoumi et al. 2021a; Zeraatpisheh et al. 2022).

Conclusions

The SMC and SOM are known to influence the soil color; soil high in humus appears dark black to brown and along with high moisture content even 5% SOM is sufficient for darker appearance. The darker appearance with higher moisture content is attributed to higher light absorbance. However, the long-term higher moisture content also affects the soil color by enhancing anaerobic conditions and affecting state of iron oxides in soil (Jackson 2008). This study reports the calibration and validation of 22 supervised regression and machine learning algorithms to evaluate the potential of soil images captured by a digital camera to predict SOM and SMC. These models developed prediction relationships among SOM and SMC (measured in the laboratory) and various color- and texture-related features derived from images. Color parameters demonstrated high correlation with both SOM and SMC. Overall, the predicted SMC with greater accuracy than SOM implied that SMC exerts a considerable influence in imparting color to the soil. Results revealed a satisfactory agreement between the image parameters and the laboratory-measured SOM (R² and RMSE of 0.74 and 9.80% using cubist) and SMC (R² and RMSE of 0.86 and 8.79% using RF) for the validation data set using six predictor variables. Overall, GPRs and tree models (cubist, RF, and boosted trees) best captured and explained the nonlinear relationships between SOM, SMC, and image parameters for this study. The soil color was also affected by temperature, climate, and mineral content; therefore, more research involving real field condition across different soil type and climatic regions was needed to establish a standard methodology for predicting SMC, SOM, and other soil properties using digital images. The advantage of this methodology over the traditional method would be rapid estimation of soil properties at a much reduced cost and be environmentally safe. Taken together, digital image-based soil characterization provides an opportunity to be used for proximal soil sensing.

Author contributions

P.T. contributed to formal analysis, investigation, data curation, and writing (original draft preparation); H.B.V. was responsible for writing, review and editing, and project administration; S.F. contributed to review and editing; P.D. contributed to resources, supervision, and writing (review and editing); A.B. was responsible for conceptualization, funding acquisition, investigation, methodology, project administration, resources, validation, visualization, writing (review and editing). All authors have read and agreed to the published version of the manuscript.

Funding information

This research was funded by Ontario Ministry of Agriculture, Food and Rural Affairs: UofG-2016-2600 and Natural Sciences and Engineering Research Council of Canada: RGPIN-2014-04100.

References

1.

Chang, C.-W., Laird, D.A., Mausbach, M.J., and Hurburgh, C.R. 2001. Near-infrared reflectance spectroscopy–principal components regression analyses of soil properties. Soil Sci. Soc. Am. J. 65: 480–490. https://doi.org/10.2136/sssaj2001.652480x Google Scholar

2.

Chen, D., Chang, N., Xiao, J., Zhou, Q., and Wu, W. 2019. Mapping dynamics of soil organic matter in croplands with MODIS data and machine learning algorithms. Sci. Total Environ. 669: 844–855. https://doi.org/10.1016/j.scitotenv.2019.03.151. pmid: 30897441 Google Scholar

3.

Chukalla, A.D., Krol, M.S., and Hoekstra, A.Y. 2015. Green and blue water footprint reduction in irrigated agriculture: effect of irrigation techniques, irrigation strategies and mulching. Hydrol. Earth Syst. Sci. 19: 4877–4891. https://doi.org/10.5194/hess-19-4877-2015 Google Scholar

4.

De Maesschalck, R., Jouan-Rimbaud, D., and Massart, D.L. 2000. The Mahalanobis distance. Chemom. Intell. Lab. Syst. 50: 1–18. https://doi.org/10.1016/s0169-7439(99)00047-7 Google Scholar

5.

Dharumarajan, S., Hegde, R., and Singh, S. 2017. Spatial prediction of major soil properties using random forest techniques——A case study in semi-arid tropics of South India. Geoderma Reg. 10: 154–162. https://doi.org/10.1016/j.geodrs.2017.07.005 Google Scholar

6.

dos Santos, J.F., Silva, H.R., Pinto, F.A., and Assis, I.R.d. 2016. Use of digital images to estimate soil moisture. Rev. Bras. de Eng. Agrícola e Ambient. 20: 1051–1056. Google Scholar

7.

Fang, L., Zhan, X., Yin, J., Liu, J., Schull, M. Walker, J.P., et al. 2020. An intercomparison study of algorithms for downscaling SMAP radiometer soil moisture retrievals. J. Hydrometeorol. 21: 1761–1775. https://doi.org/10.1175/jhm-d-19-0034.1 Google Scholar

8.

Fathololoumi, S., Vaezi, A.R., Firozjaei, M.K., and Biswas, A. 2021a. Quantifying the effect of surface heterogeneity on soil moisture across regions and surface characteristic. J. Hydrol. 596: 126132. https://doi.org/10.1016/j.jhydrol.2021.126132 Google Scholar

9.

Fathololoumi, S., Vaezi, A.R., Alavipanah, S.K., Ghorbani, A., and Biswas, A. 2020. Comparison of spectral and spatial-based approaches for mapping the local variation of soil moisture in a semi-arid mountainous area. Sci. Total Environ. 138319. https://doi.org/10.1016/j.scitotenv.2020.138319. pmid: 32408464 Google Scholar

10.

Fathololoumi, S., Vaezi, A.R., Alavipanah, S.K., Ghorbani, A., Saurette, D., and Biswas, A. 2021b. Effect of multi-temporal satellite images on soil moisture prediction using a digital soil mapping approach. Geoderma, 385: 114901. https://doi.org/10.1016/j.geoderma.2020.114901 Google Scholar

11.

Feki, M., Ravazzani, G., Ceppi, A., and Mancini, M. 2018. Influence of soil hydraulic variability on soil moisture simulations and irrigation scheduling in a maize field. Agric. Water Manag. 202: 183–194. https://doi.org/10.1016/j.agwat.2018.02.024 Google Scholar

12.

Fu, Y., Taneja, P., Lin, S., Ji, W., Adamchuk, V., Daggupati, P., and Biswas, A. 2020. Predicting soil organic matter from cellular phone images under varying soil moisture. Geoderma, 361: 114020. https://doi.org/10.1016/j.geoderma.2019.114020 Google Scholar

13.

Gholizadeh, A., Saberioon, M., Rossel, R.A.V., Boruvka, L., and Klement, A. 2020. Spectroscopic measurements and imaging of soil colour for field scale estimation of soil organic carbon. Geoderma, 357: 113972. https://doi.org/10.1016/j.geoderma.2019.113972 Google Scholar

14.

Gill, M.K., Asefa, T., Kemblowski, M.W., and McKee, M. 2006. Soil moisture prediction using support vector machines 1. J. Am. Water Resour. Assoc. 42: 1033–1046. https://doi.org/10.1111/j.1752-1688.2006.tb04512.x Google Scholar

15.

Gómez-Robledo, L., López-Ruiz, N., Melgosa, M., Palma, A.J., Capitán-Vallvey, L.F., and Sánchez-Marañón, M. 2013. Using the mobile phone as Munsell soil-colour sensor: an experiment under controlled illumination conditions. Comput. Electron. Agric. 99: 200–208. Google Scholar

16.

Gonzalez, R.C., Woods, R.E., and Eddins, S.L. 2004. Digital image processing using MATLAB. Pearson Education India. Google Scholar

17.

Gregory, S.D., Lauzon, J.D., O'Halloran, I.P., and Heck, R.J. 2006. Predicting soil organic matter content in southwestern Ontario fields using imagery from high-resolution digital cameras. Can. J. Soil Sci. 86: 573–584. https://doi.org/10.4141/s05-043 Google Scholar

18.

Hajdu, I., Yule, I., and Dehghan-Shear, M.H. 2018. Modelling of near-surface soil moisture using machine learning and multi-temporal sentinel 1 images in New Zealand. IGARSS 2018-2018 IEEE International Geoscience and Remote Sensing Symposium. IEEE. pp. 1422–1525. Google Scholar

19.

Heung, B., Ho, H.C., Zhang, J., Knudby, A., Bulmer, C.E., and Schmidt, M.G. 2016. An overview and comparison of machine-learning techniques for classification purposes in digital soil mapping. Geoderma, 265: 62–77. https://doi.org/10.1016/j.geoderma.2015.11.014 Google Scholar

20.

Hummel, J.W., Sudduth, K.A., and Hollinger, S.E. 2001. Soil moisture and organic matter prediction of surface and subsurface soils using an NIR soil sensor. Comput. Electron. Agric. 32: 149–165. https://doi.org/10.1016/s0168-1699(01)00163-6 Google Scholar

21.

Jackson, R.S. 2008. Wine science: principles and applications. Academic press, New York. Google Scholar

22.

Ji, W., Adamchuk, V.I., Biswas, A., Dhawale, N.M., Sudarsan, B. Zhang, Y., et al. 2016. Assessment of soil properties in situ using a prototype portable MIR spectrometer in two agricultural fields. Biosyst. Eng. 152: 14–27. https://doi.org/10.1016/j.biosystemseng.2016.06.005 Google Scholar

23.

King, D.J. 1995. Airborne multispectral digital camera and video sensors: a critical review of system designs and applications. Can. J. Remote Sens. 21: 245–273. https://doi.org/10.1080/07038992.1995.10874621 Google Scholar

24.

Kotlar, A.M., Iversen, B.V., and de Jong van Lier, Q. 2019. Evaluation of parametric and nonparametric machine-learning techniques for prediction of saturated and near-saturated hydraulic conductivity. Vadose Zone J. 18. https://doi.org/10.2136/vzj2018.07.0141 Google Scholar

25.

Kumar, S., and Lal, R. 2011. Mapping the organic carbon stocks of surface soils using local spatial interpolator. J. Environ. Monit. 13: 3128–3135. https://doi.org/10.1039/c1em10520e. pmid: 22009220 Google Scholar

26.

Kumar, T., and Verma, K. 2010. A theory based on conversion of RGB image to gray image. Int. J. Comp. Appl. 7: 7–10. https://doi.org/10.5120/777-1099 Google Scholar

27.

Lazzaretti, B.P., Silva, L.S.d., Drescher, G.L., Dotto, A.C., Britzke, D., and Nörnberg, J.L. 2020. Prediction of soil organic matter and clay contents by near-infrared spectroscopy-NIRS. Ciênc. Rural, 50. https://doi.org/10.1590/0103-8478cr20190506 Google Scholar

28.

Levin, N., Ben-Dor, E., and Singer, A. 2005. A digital camera as a tool to measure colour indices and related properties of sandy soils in semiarid environments. Int. J. Remote Sens. 26: 5475–5492. https://doi.org/10.1080/01431160500099444 Google Scholar

29.

Li, C., Zhuang, Y., Frolking, S., Galloway, J., Harriss, R. Iii, Moore, et al. 2003. Modeling soil organic carbon change in croplands of China. Ecol. Appl. 13: 327–336. https://doi.org/10.1890/1051-0761(2003)013%5b0327:msocci%5d2.0.co;2 Google Scholar

30.

Li, Q., Yue, T., Wang, C.-Q., Zhang, W.-J., Yu, Y. Li, B., et al. 2013. Spatially distributed modeling of soil organic matter across China: an application of artificial neural network approach. Catena, 104: 210–218. https://doi.org/10.1016/j.catena.2012.11.012 Google Scholar

31.

Lillesand, T., Kiefer, R.W., and Chipman, J. 2015. Remote sensing and image interpretation. John Wiley & Sons, Hoboken, NJ. Google Scholar

32.

Matei, O., Rusu, T., Petrovan, A., and Mihuţ, G. 2017. A data mining system for real time soil moisture prediction. Proc. Eng. 181: 837–844. https://doi.org/10.1016/j.proeng.2017.02.475 Google Scholar

33.

MathWorks, I. 2017. MATLAB 2017b. The MathWorks Inc. Natick, MA. Google Scholar

34.

Meersmans, J., De Ridder, F., Canters, F., De Baets, S., and Van Molle, M. 2008. A multiple regression approach to assess the spatial distribution of soil organic carbon (SOC) at the regional scale (Flanders, Belgium). Geoderma, 143: 1–13. https://doi.org/10.1016/j.geoderma.2007.08.025 Google Scholar

35.

Nocita, M., Stevens, A., Noon, C., and van Wesemael, B. 2013. Prediction of soil organic carbon for different levels of soil moisture using VisNIR spectroscopy. Geoderma, 199: 37–42. https://doi.org/10.1016/j.geoderma.2012.07.020 Google Scholar

36.

Paloscia, S., Pampaloni, P., Pettinato, S., and Santi, E. 2008. A comparison of algorithms for retrieving soil moisture from ENVISAT/ASAR images. IEEE Trans. Geosci. Remote Sens. 46: 3274–3284. https://doi.org/10.1109/tgrs.2008.920370 Google Scholar

37.

Persson, M. 2005. Estimating surface soil moisture from soil color using image analysis. Vadose Zone J. 4: 1119–1122. https://doi.org/10.2136/vzj2005.0023 Google Scholar

38.

Rasmussen, C.E., and Nickisch, H. 2010. Gaussian processes for machine learning (GPML) toolbox. J. Mach. Learn. Res. 11: 3011–3015. Google Scholar

39.

Rienzi, E.A., Mijatovic, B., Mueller, T.G., Matocha, C.J., Sikora, F.J., and Castrignanò, A. 2014. Prediction of soil organic carbon under varying moisture levels using reflectance spectroscopy. Soil Sci. Soc. Am. J. 78: 958–967. https://doi.org/10.2136/sssaj2013.09.0408 Google Scholar

40.

Rodionov, A., Pätzold, S., Welp, G., Damerow, L., and Amelung, W. 2014. Sensing of soil organic carbon using visible and near-infrared spectroscopy at variable moisture and surface roughness. Soil Sci. Soc. Am. J. 78: 949–957. https://doi.org/10.2136/sssaj2013.07.0264 Google Scholar

41.

Rossel, R.A.V., Fouad, Y., and Walter, C. 2008. Using a digital camera to measure soil organic carbon and iron contents. Biosyst. Eng. 100: 149–159. https://doi.org/10.1016/j.biosystemseng.2008.02.007 Google Scholar

42.

Sakti, M.B.G., Komariah, Ariyanto, D.P., and Sumani. 2018. Estimating soil moisture content using red-green-blue imagery from digital camera. IOP Conf. Ser. Earth Environ. Sci. 200: 012004. https://doi.org/10.1088/1755-1315/200/1/012004 Google Scholar

43.

Schulte, E.E., and Hopkins, B.G. 1996. Estimation of soil organic matter by weight loss-on-ignition. InSoil organic matter: analysis and interpretation. Edited by F.R. Magdoff. SSSA Spec. Pub. No. 46. SSSA, Madison. pp. 21–31. Google Scholar

44.

Sorenson, P.T., Small, C., Tappert, M.C., Quideau, S.A., Drozdowski, B., Underwood, A., and Janz, A. 2017. Monitoring organic carbon, total nitrogen, and pH for reclaimed soils using field reflectance spectroscopy. Can. J. Soil Sci. 97: 241–248. https://doi.org/10.1139/cjss-2016-0116 Google Scholar

45.

Sudarsan, B., Ji, W., Biswas, A., and Adamchuk, V. 2016. Microscope-based computer vision to characterize soil texture and soil organic matter. Biosyst. Eng. 152: 41–50. https://doi.org/10.1016/j.biosystemseng.2016.06.006 Google Scholar

46.

Swetha, R., Bende, P., Singh, K., Gorthi, S., Biswas, A., Li, B., et al. 2020. Predicting soil texture from smartphone-captured digital images and an application. Geoderma, 376: 114562. https://doi.org/10.1016/j.geoderma.2020.114562 Google Scholar

47.

Taghizadeh-Mehrjardi, R., Nabiollahi, K., and Kerry, R. 2016. Digital mapping of soil organic carbon at multiple depths using different data mining techniques in Baneh region, Iran. Geoderma, 266: 98–110. https://doi.org/10.1016/j.geoderma.2015.12.003 Google Scholar

48.

Taghizadeh-Mehrjardi, R., Schmidt, K., Amirian-Chakan, A., Rentschler, T., Zeraatpisheh, M., Sarmadian, F., et al. 2020. Improving the spatial prediction of soil organic carbon content in two contrasting climatic regions by stacking machine learning models and rescanning covariate space. Remote Sens. 12: 1095. https://doi.org/10.3390/rs12071095 Google Scholar

49.

Taneja, P., Vasava, H.K., Daggupati, P., and Biswas, A. 2021. Multialgorithm comparison to predict soil organic matter and soil moisture content from cell phone images. Geoderma, 385: 114863. https://doi.org/10.1016/j.geoderma.2020.114863 Google Scholar

50.

Team R. 2015. RStudio: integrated development for R. Vol. 42. RStudio, Inc., Boston, MA. pp. 14. Available from http://www.rstudio.com. Google Scholar

51.

Were, K., Bui, D.T., Dick, Ø.B., and Singh, B.R. 2015. A comparative assessment of support vector regression, artificial neural networks, and random forests for predicting and mapping soil organic carbon stocks across an Afromontane landscape. Ecol. Indic. 52: 394–403. https://doi.org/10.1016/j.ecolind.2014.12.028 Google Scholar

52.

Wu, C., Yang, Y., and Xia, J. 2017. A simple digital imaging method for estimating black-soil organic matter under visible spectrum. Arch. Agron. Soil Sci. 63: 1346–1354. https://doi.org/10.1080/03650340.2017.1280728 Google Scholar

53.

Wu, C., Xia, J., Yang, H., Yang, Y., Zhang, Y., and Cheng, F. 2018. Rapid determination of soil organic matter content based on soil colour obtained by a digital camera. Int. J. Remote Sens. 39: 6557–6571. https://doi.org/10.1080/01431161.2018.1460511 Google Scholar

54.

Yang, R.-M., Zhang, G.-L., Liu, F., Lu, Y.-Y., Yang, F. Yang, F., et al. 2016. Comparison of boosted regression tree and random forest models for mapping topsoil organic carbon concentration in an alpine ecosystem. Ecol. Indic. 60: 870–878. https://doi.org/10.1016/j.ecolind.2015.08.036 Google Scholar

55.

Zeraatpisheh, M., Garosi, Y., Owliaie, H.R., Ayoubi, S., Taghizadeh-Mehrjardi, R., Scholten, T., and Xu, M. 2022. Improving the spatial prediction of soil organic carbon using environmental covariates selection: a comparison of a group of environmental covariates. Catena, 208: 105723. https://doi.org/10.1016/j.catena.2021.105723 Google Scholar

56.

Zhang, F., Li, C., Wang, Z., and Wu, H. 2006. Modeling impacts of management alternatives on soil carbon storage of farmland in Northwest China. Biogeosciences, 3: 451–466. https://doi.org/10.5194/bg-3-451-2006 Google Scholar

57.

Zhao, Z., Yang, Q., Sun, D., Ding, X., and Meng, F.-R. 2020. Extended model prediction of high-resolution soil organic matter over a large area using limited number of field samples. Comput. Electron. Agric. 169: 105172. https://doi.org/10.1016/j.compag.2019.105172 Google Scholar

58.

Zhou, T., Geng, Y., Chen, J., Pan, J., Haase, D., and Lausch, A. 2020. High-resolution digital mapping of soil organic carbon and soil total nitrogen using DEM derivatives, Sentinel-1 and Sentinel-2 data based on machine learning algorithms. Sci. Total Environ. 729: 138244. https://doi.org/10.1016/j.scitotenv.2020.138244. pmid: 32498148 Google Scholar

59.

Zhu, Y., Wang, Y., Shao, M., and Horton, R. 2011. Estimating soil water content from surface digital image gray level measurements under visible spectrum. Can. J. Soil Sci. 91: 69–76. https://doi.org/10.4141/cjss10054 Google Scholar

Citation Download Citation

Perry Taneja, Hiteshkumar Bhogilal Vasava, Solmaz Fathololoumi, Prasad Daggupati, and Asim Biswas "Predicting soil organic matter and soil moisture content from digital camera images: comparison of regression and machine learning approaches," Canadian Journal of Soil Science 102(3), 767-784, (31 March 2022). https://doi.org/10.1139/cjss-2021-0133

Received: 16 September 2021; Accepted: 26 February 2022; Published: 31 March 2022

Access the abstract

JOURNAL ARTICLE
18 PAGES

DOWNLOAD PAPER + SAVE TO MY LIBRARY

GET CITATION

< Previous Article

|

Next Article >

ARTICLE SOURCE

Canadian Journal of Soil Science
Vol. 102 • No. 3
September 2022

KEYWORDS

caractérisation du sol

computer vision

couleur et texture de l’image

cubist

digital camera images

forêt d’arbres décisionnels

Show All Keywords

Subscribe to BioOne Complete

Receive erratum alerts for this article

Receive alerts when this article is cited

Introduction

Materials and methods

Fig. 1.

Study site description and sample collection

Fig. 2.

Fig. 3.

Fig. 4.

Laboratory analysis and soil imaging

Fig. 5.

Image analysis

Image preprocessing-cropping

Fig. 6.

Image preprocessing enhancement

Image segmentation

Color space conversions and feature extraction

Fig. 7.

(1)

(2)

(3)

(4)

Data preprocessing and division

Table 1.

Model development

Model performance assessment

(5)

(6)

(7)

(8)

Variable screening to identify optimum predictors

Results

Descriptive statistics of soil properties

Table 2.

Linear correlation between SOM, SMC, and soil color

Fig. 8.

Identification of optimum predictors

Fig. 9.

Fig. 10.

Fig. 11.

Fig. 12.

Predictive accuracy of the models

Table 3.

Table 4.

Prediction of SOM using 22 predictor variables

10-fold cross (internal) validation

External validation

Prediction of SMC using 22 predictor variables

10-fold cross (internal) validation

External validation

Prediction of SOM using six predictor variables

10-fold cross (internal) validation

Table 5.

External validation

Prediction of SMC using six predictor variables

10-fold cross (internal) validation

Table 6.

External validation

Discussion

Identification of important predictors

Model performance

Conclusions

Author contributions

Funding information

References

Show All Keywords

KEYWORDS/PHRASES

PUBLICATION TITLE:

COLLECTION:

PUBLICATION YEARS