Global positioning system (GPS) collars are prone to locational error and missed fixes caused by vegetation and topography, meaning that locational error may be greater, or fix success lower, in certain habitats. These forms of error can lead to bias associated with data loss or censoring. The goals of this paper were to: 1) estimate resource selection functions using logistic regression to map probability of acquisition (Pacq) of a GPS location and subsequent censoring of locational error in relation to landscape features and 2) develop a spatially-explicit map of weighting factors across the landscape to avoid over- or underestimating resource selection. Female mule deer Odocoileus hemionus were used as a case example and to validate maps. Locational error and Pacq were influenced by vegetation and topography, thus necessitating a means to weight the data. Applying logistic regression to quantify Pacq allowed an easy and straightforward approach to mapping Pacq and subsequently, weighting factors (weight = 1/Pacq). Weighting landscape characteristics improved validation of deer-occurrence maps compared to using the original, unweighted landscape values. Using the best validating deer-occurrence map, we found that 87.5-90.2% of locations (N = 1,043) from an independent sample of deer (N = 4) occurred within the highest probability of use bin (∼ 20% of the landscape); 95.4-96.9% of independent locations occurred within the two highest probability of use bins (∼ 40% of the landscape). By accounting for, and modeling, missed GPS fixes and locational error, we improved the predictive ability of maps based on an independent sample of deer. Without correction (i.e. weighting) factors, the importance of habitat types and terrain features may be over- or underestimated, which could have serious consequences when interpreting resource selection by animals and developing management recommendations.
Varying habitat types can impose analytical limitations on the data when missing global positioning system (GPS) locations are not random, meaning that probability of acquisition (Pacq) of a GPS location may be lower in certain habitats, resulting in loss of data (Frair et al. 2004, Nielson et al. 2009). The loss of data then creates a situation where selection of resources by animals can be underestimated for habitats where fix success is lower (Dussault et al. 1999). Most often, a GPS fix is not collected because terrain, slope or vegetation characteristics can obscure the amount of available sky between satellites and the GPS receiver incorporated into the collar of the animal (D'Eon et al. 2002, Frair et al. 2004, Cain et al. 2005, Sager-Fradkin et al. 2007, Jiang et al. 2008, Bourgoin et al. 2009), and is influenced further by activity of the animal (Graves & Waller 2006, Heard et al. 2008). Thus, researchers are continuously seeking methods to reduce this form of bias before conducting resource selection analyses.
Concomitantly, locational error (distance between the estimated GPS location and true GPS location) and quality of fix are also forms of potential bias inherent in the collection of GPS data. Landscape features can influence locational error and quality of fix (Dussault et al. 1999, Girard et al. 2002), and these forms of potential bias also can pose limitations on analytical aspects of the data and the interpretation of results. The two primary measures of fix quality are positional dilution of precision (PDOP; measure of the quality of satellite geometry and an overall estimate of location error precision (Moen et al. 1997)) and whether the location was a two-dimensional (2D; GPS location obtained using three satellites) vs three-dimensional (3D; GPS location obtained using ≥ 4 satellites) location. To overcome issues of locational error, which is influenced primarily by quality of fix (D'Eon et al. 2002, Lewis et al. 2007), some researchers screen or omit locations based on type of fix (2D vs 3D) or PDOP to reduce these forms of bias (Moen et al. 1996, Dussault et al. 2001, D'Eon & Delparte 2005, Lewis et al. 2007). Screening data has pros and cons. Screening locations can reduce positional error; however, it also may lead to significant reductions of data, which then can introduce additional biases into the analysis (Lewis et al. 2007). Therefore, methods that seek to account for biases in the data, while maintaining sample sizes, are preferred over data screening techniques. This concern motivated the research presented here which aimed to advance methods by which researchers may leverage data from GPS collars placed at known locations throughout a study area (i.e. test collars). Test collars were used to correct for potential biases associated with missed fixes and locational error, thus allowing all empirical data from that study area to be incorporated in subsequent analyses and spatial applications.
Ignoring missed GPS fixes, or not accounting for locational error, can lead to an incorrect interpretation of results because relationships (positive or negative) and magnitude of coefficient estimates, and statistical significance (P-value) can be affected (Nielson et al. 2009). However, in most instances, Pacq, locational error and/or fix quality (dependent variables) can be accounted for before or during analysis using correction or weighting factors when a predictable relationship exists between landscape features (independent variables) and the dependent variables (D'Eon et al. 2002, Frair et al. 2004, Sager-Fradkin et al. 2007, Wells et al. 2011). Based on available data, we predicted that Pacq and locational error were: 1) related to landscape features in a predictable fashion and 2) could be modeled and accounted for to reduce bias in the data before analysis of resource selection. We set forth to use inverse weighting (i.e. the influence of each GPS location is weighted by the inverse probability of having acquired the location, based on vegetation and topographic features of the underlying raster image cell values; Frair et al. 2004) to correct for missed GPS fixes and account for locational error (modeled as a missed GPS fix). Our objectives were to: 1) determine how and what vegetation and topographic features influenced Pacq and locational error, 2) spatially depict Pacq (incorporating censored data due to locational error thresholds) using logistic regression to estimate resource selection functions, 3) use inverse weighting of Pacq (weight = 1/Pacq) for each cell to correct for bias in missed fixes and locational error, 4) use weighted geographic information system (GIS) layers for a case study of resource selection by a large ungulate (i.e. mule deer Odocoileus hemionus) and 5) validate weighted maps using an independent sample of mule deer (N = 4).
Material and methods
Our study area comprised ∼ 9,248 ha in southeastern Wyoming, USA, situated along the Wyoming-Colorado border. Most land (81%) was under private ownership (7,504 ha), and, to a lesser extent, under state (907 ha) or federal management (Bureau of Land Management; 837 ha). Land use was primarily rangeland grazed by cattle. No paved road occurred within our study area; only improved (use of heavy equipment to maintain natural or exotic material on road surface) and unimproved roads (heavy equipment was not used to maintain roads).
Topography was variable, ranging from flat to gently sloping grasslands, deep riparian gullies and steep rock outcrops (see Appendix I). Elevation ranged from 2,100 to 3,100 m a.s.l. and slope from 0 to 52°. Five broad classes of vegetation/landscape features occurred throughout the larger landscape that encompassed our study area. These included four vegetation classes (grassland, shrubland, riparian and forest) and one landscape feature (rock outcrop; Fig. 1). Predominant plant species included mountain big sagebrush Artemisia tridentata vaseyana, antelope bitterbrush Purshia tridentata, alderleaf mountain mahogany Cercocarpus montanus, skunkbush sumac Rhus trilobata, Saskatoon serviceberry Amelanchier alnifolia, quaking aspen Populus tremuloides, willow Salix spp., rabbitbrush Chrysothamnus spp., alder Alnus spp., narrowleaf cottonwood Populus angustifolia and lodgepole pine Pinus contorta. Average (1948-2010) minimum January temperature was 12.7°C, average maximum July temperature was 26.8°C and average annual snowfall was 123.2 cm (Available at: http://www.wrcc.dri.edu/; last accessed on 28 September 2011).
Capture and handling
Adult (≥ 2.5 years old) female mule deer (N = 19) were captured from 21 to 22 February 2011 using helicopter and net-gun capture techniques, which have been found safe for a wide range of ungulate species with low direct and post-capture mortalities (Webb et al. 2008). After capture, we manually restrained the legs of the deer and fitted them with blindfolds to reduce stress. We aged the deer according to tooth replacement and wear techniques (Robinette et al. 1957), measured morphometric traits and body condition, and collected blood for tests of pregnancy-specific protein B (BioTracking, LLC, Moscow, Idaho, USA). Deer were fitted with GPS collars (TGW-4583, Telonics, Inc., Mesa, Arizona, USA) and uniquely-numbered ear tags in each ear. GPS collars were equipped with Argos satellite uplink capabilities (available at: www.argos-system.org). Collars were programmed to collect one GPS location every three hours (i.e. eight locations/day). Data were transmitted via the Argos system every seven days. Wyoming Game and Fish Department approved animal capture and handling protocols (Chapter 33 Permit #33-796).
To determine the influence of vegetation and topographic variables on the probability of successfully acquiring a GPS location and on locational error (distance between true GPS location and estimated GPS location), we placed 20 test collars (same make, model and fix acquisition schedule as listed above) at known locations throughout our study area for 26 days (start date = 2 April 2011 at 00:00; end date = 27 April 2011 at 21:00). During this time, collars were capable of recording 4,160 locations. We distributed test collars across three vegetation types (grassland: N = 4, riparian: N = 5, shrubland: N = 3) and one landscape feature (rock outcrop: N = 8) with variable topography (i.e. across a range of elevations and slopes). The areas selected for test collar placement represented the range of values for each habitat type used (Appendix II) by mule deer on our study area during winter (21 February - 26 March), except for forest which was not used by deer at the commencement of this study. GPS collars were equipped with ARGOS uplink, which we used to delineate a boundary used by deer and within which test collars were placed randomly (which represented the range used by deer). The sample size of test collars placed in each vegetation or landscape was based on a qualitative assessment of deer locations around these features (i.e. more test collars were placed in habitats used most by deer, and in the habitats most likely to influence fix success and locational error).
Using a GIS, we developed a vegetation cover-type map using high-resolution (1.0-m) imagery obtained from the National Agriculture Imagery Program (NAIP; U.S. Department of Agriculture, Farm Service Agency, Salt Lake City, Utah, USA) and Feature Analyst® 5.0 (Visual Learning Systems, Inc. 2008) for ArcGIS® 10.0 (ESRI, Redlands, California, USA). Feature Analyst (FA) uses hierarchical learning and clutter removal algorithms based on imagery characteristics such as colour, size, shape, texture, pattern and spatial association. We conducted a supervised classification using delineated polygons of known vegetation type that were heads-up digitized by a researcher familiar with our study area. These polygons of known type were used as training polygons in FA. The true-colour and near-infrared bands were combined using FA, which resulted in four spectral bands (i.e. red, green, blue and near-infrared). We also specified the green spectral band be used to develop a texture band. We used a 10-m digital elevation model (DEM; available at: http://rockyweb.cr.usgs.gov/elevation/dpi_dem.html) to develop an elevation band, which finally resulted in six bands (i.e. four spectral bands, one texture band and one elevation band). FA uses a pixel classifier to assign cell values based on a neighbourhood of cells around an individual cell, which influences the ability of the algorithm to correctly identify features. We varied the resolution and pixel classifier pattern and size combinations based on vegetation type. Before running classifiers, we resampled vegetation cover types that occurred over extensive areas (i.e. forest, grassland and shrubland) to 2-m resolution and vegetation cover types that were more restricted, linear or irregularly shaped (i.e. riparian and rock outcrops) to 1-m resolution. We used the Manhattan classifier pattern and a width of three pixels to classify extensive vegetation types. For more restricted vegetation types, we used the Bull's Eye II classifier and a width of five pixels for rock outcrops and 31 pixels for riparian areas. We iteratively refined the map by creating additional training polygons to correctly classify features throughout the region-of-interest. Finally, the clutter removal tool was used to identify correct and incorrect features as an additional map-refinement technique. For more information on using FA and its applications in resource selection studies, refer to Visual Learning Systems, Inc. (2008), Dzialak et al. (2011) and Webb et al. (2011).
Topography and landscape features
We used GIS to map landscape, topographic and vegetation features known or suspected to influence probability of acquisition, quality of GPS fix or locational error (Frair et al. 2004, Cain et al. 2005, Sager-Fradkin et al. 2007, Jiang et al. 2008, Wells et al. 2011), and behaviour of mule deer (Kufeld et al. 1988, Thomas & Irby 1991, Pierce et al. 2004, D'Eon & Serrouya 2005, Sawyer et al. 2006, 2009). The DEM was reclassified to 30-m resolution, and covariates depicting slope and terrain roughness were generated using the reclassified DEM. Slope was measured in degrees and terrain roughness was calculated as the standard deviation of elevation. Terrain roughness was calculated at three spatial scales: 90-m (3 × 3; number of pixel rows and columns), 540-m (18 × 18) and 1,080-m (36 × 36).
We heads-up digitized features of the landscape at a scale of 1:500 to 1:2,000 in a GIS by manually delineating roads (improved and unimproved), anthropogenic ground disturbance (buildings, houses and structures; hereafter disturbance) and agriculture fields using NAIP aerial imagery. To determine the area disturbed by roads, we measured the width (to nearest 0.01 m) of 30 replicates for each road type (improved and unimproved) occurring on our study area using GIS (1:500 scale). Average road width for improved roads was 8.04 m (± 3.72 SD, 0.46 CV) and 4.14 m for unimproved roads (± 0.86 SD, 0.21 CV). Thus, we set the buffer distance to 4.02 and 2.07 m for improved and unimproved roads, respectively. However, for the case study analysis of winter resource selection by mule deer (see below), we only considered improved roads because most unimproved roads received little use, except for a few days per year. Before analysis, we converted vector layers for vegetation (i.e. grassland, riparian, rock outcrop and shrubland) and landscape features (i.e. roads, disturbance and agriculture) to raster grids. However, for this analysis, we did not include roads, disturbance or agriculture into models assessing locational error or Pacq, or in inverse weighting of layers. These layers were included into the case example of mule deer winter-resource selection because they were known or suspected to influence behaviour of deer.
Using Spatial Analyst for ArcGIS® and the Distance Toolbox, we calculated the Euclidean distance from each grid cell (i.e. focal cell) to the nearest neighbouring cell containing the respective vegetation (i.e. grassland, riparian, rock outcrop and shrubland) and landscape (i.e. agriculture, disturbance and roads) feature, unless the focal cell contained the landscape feature of interest, in which case the value was equal to zero. This process resulted in seven separate raster layers (one for each landscape feature) containing distance values that would later be combined with deer locations and randomly-generated points. For example, if a deer location fell within a cell containing no shrub (i.e. the feature of interest) then the next closest cell classified as shrub would be used to calculate a distance between the deer location and the cell containing shrub. We extracted distance values from all raster layers (N = 7), in addition to slope (degrees) and roughness (SD of elevation), to locations of test collars (N = 4,160), GPS points of deer (N = 4,445), randomly-generated points (N = 13,335) and the validation sample (N = 1,043) using Spatial Analyst. The distance values represented a continuous range of values for each independent landscape feature in the model (i.e. independent variables).
Test collar analysis
Vegetation effects on locational error and fix success
Removing low-quality fix data likely introduces bias into resource selection studies because quality of fix is related directly to locational error (D'Eon et al. 2002, Lewis et al. 2007), which was a primary motivating factor for this work (i.e. to retain all empirical data while accounting for biases). We used general linear models (GLM procedure in SAS® 9.2, SAS Institute, Inc., Cary, North Carolina, USA) to investigate the effect of vegetation and topography covariates (independent variables) on locational error (dependent variable) of test collars (see Appendix II). Specifically, we were interested in whether locational error was non-random with respect to the landscape. We modeled six covariates: slope (30-m), roughness (90-m) and distance to grassland, riparian, rock outcrop and shrubland. We assumed non-random locational error when P ≤ 0.05 for univariable models of each covariate.
We also examined how vegetation and topographic features influenced Pacq (i.e. fix rate). We modeled the six aforementioned variables using generalized linear mixed models (GLMM; GLIMMIX procedure in SAS® 9.2) under a logistic regression framework to model the probability of successfully acquiring a GPS location in relation to vegetation and topographic features (see Appendix II). Probability of acquisition was analyzed as a binary response variable (1 = successful, 0 = unsuccessful; see Appendix II) using a binary distribution and a logit-link function. It is widely known that fix success is a form of bias, just as is locational error, but to try to account for one type of bias (i.e. filtering based on PDOP) can increase bias associated with missed fixes. A novel element of this investigation involved modeling locational error of a particular magnitude as a missed fix, which allows both forms of bias to be modeled simultaneously and in a similar fashion. This enabled us to include data that traditionally would be censored and excluded from analysis. To incorporate traditionally censored data (i.e. locational error) into the logistic regression framework, we modeled locational error data as an unsuccessful fix (value = 0) at two levels of locational error: ≥ 100 m and ≥ 20 m (see Appendix II). This approach avoids filtering empirical data to retain the largest sample of GPS locations possible and, at the same time, accounting for biases.
In this analysis, we created multiple functional forms for the independent covariates that allowed for linear and non-linear (quadratic relationship and log-transformation) relationships between covariates and logit(Pacq). Quadratic terms (quadratic = original2) were created for slope and roughness. We natural log-transformed all distance variables (i.e. distance to grassland, riparian, rock outcrop and shrubland) to allow for a decreasing magnitude of influence with increasing distance. To assure that a natural log-transformation was not attempted on a cell with a value = 0, we added 0.1 to all original values (new = ln(original + 0.1)). We used an information-theoretic approach (Burnham & Anderson 2002) to evaluate the independent covariates and each functional form of the relationship. We selected the most appropriate functional form for each covariate using Akaike's Information Criterion (AIC), adjusted for small sample size (AICc; Burnham & Anderson 2002). We retained the functional form for each covariate with the lowest AICc for inclusion into the full model that contained all six covariates. We included collar identification as a random effect, which estimates a population-averaged estimate of Pacq, to account for repeated observations for each collar and subtle differences in collar orientation that may unknowingly influence fix success.
Mapping fix success: individual vegetation layers
After conducting the logistic regression analysis of the test collars, the coefficient estimates for the four vegetation types (grassland, riparian, rock outcrop and shrubland) and two topographic features (slope and roughness) were used to map Pacq for each respective vegetation type and topographic feature using the raster calculator tool in Spatial Analyst (see Appendix II). We used the fitted GLMM to estimate Pacq and to map each respective layer (i.e. an inverse-logit model). Pacq, for each vegetation type and topographic feature, was mapped for each of three scenarios: raw fix rate (i.e. the probability of acquiring a fix as per the designated fix schedule), modeling locational error ≥ 20 m as a missed fix and modeling locational error ≥ 100 m as a missed fix. The resulting products (maps of Pacq for each vegetation type (N = 4) and each topographic variable (N = 2) in each of the three scenarios) will later be used to develop weighting factors for use with empirical data.
Mapping fix success: composite map
As an alternative product, and for visual depiction of fix success, we developed three maps that spatially depicted Pacq across the landscape (Fig. 2) based on fix rate and locational error (see above), and that considered all vegetation and topographic layers simultaneously (see Appendix II). The maps provide a spatially-explicit representation of the probability of successfully acquiring a GPS location with a high level of accuracy (i.e. locational error; see Fig. 2). Similar to mapping Pacq of the individual layers, we used a resource selection probability function (Manly et al. 2002) to map composite Pacq across the landscape (i.e. an inverse-logit model).
Developing weighting factors
The last step of the test collar data analysis was to weight the individual vegetation and topographic (i.e. slope and roughness) layers. If unaccounted for, a high Pacq will have more weight in an analysis because GPS fixes had a high probability of being taken in a particular landscape. However, this is exactly what we wanted to avoid (i.e. using raw Pacq to estimate resource selection of animals). If a landscape has a low Pacq, that means there is a low probability of getting a GPS fix, although the animal could use that particular habitat a disproportionate amount of time, and without any weighting, the use of that particular habitat would be underestimated. To weight the layers, we took the inverse (1/Pacq) of the individual cell probabilities for each individual layer that was used during the test collar analysis. For example, a Pacq with perfect fix success would have a weight = 1, but in a habitat with only a 0.1 probability of acquisition, the new weight would be 10 otherwise there would be a 10-fold underestimation of that habitat type. We used the raster calculator tool in Spatial Analyst to generate the weighted layers. We used each individual layer to weight deer locations because values for each layer were based originally on values between zero and one. Using each individual layer gave the same number of weights to each deer location as the number of individual layers. If we had used the cumulative probability (Pacq) map, each location would have been given only one weight. Additionally, and most importantly, the cumulative Pacq map has no underlying information regarding the individual layers because it is a weight of all layers. Therefore, this does not allow each covariate (i.e. an individual layer) to be assessed in a resource selection function analysis. All weighted values for each layer were extracted to random and used deer locations (after conducting a fractal analysis for use with a discrete choice model) for a case study that examined winter resource selection of mule deer (see below; see Appendix II).
Winter resource selection by mule deer: an example
We estimated the spatial scale at which mule deer perceived and responded to landscape features using fractal analysis (Nams 2005, Webb et al. 2009, Dzialak et al. 2011). Estimated spatial scale perceived by mule deer was used to set the radius around the used location for use with a discrete-choice model and placement of random locations (see below). This analysis involved using relocations to develop a movement path for each mule deer during winter (23 January - 30 April, 2011) using fractal dimension (D) to quantify tortuosity as a function of spatial scale, and measuring the correlation in tortuosity between adjacent path segments (Nams 2005). We used the VFractal estimator in the program Fractal 5.0 (Nams 1996) to calculate D. All mule deer were combined into a single analysis where each movement path (i.e. one path/mule deer) was treated as a replicate (Nams 1996). We plotted D and correlation vs divider size to detect changes in movement relative to spatial scale (spatial scale being equivalent to the size of the divider used to measure the movement path). The plot of correlation vs spatial scale is most useful for detecting patch size perceived by animals. Correlation is positive when path length is below patch size, negative at patch size and zero when path length is larger than patch size (Nams 2005). Thus, we were looking for negative correlations to identify perceived spatial scale.
Correlation in tortuosity between adjacent path segments first became negative between 251 and 293 m (median = 272 m); 95% CI revealed correlation remained negative through the largest spatial scale (i.e. divider size) estimated (i.e. 1,160 m). Therefore, we set the buffer distance around used deer locations to 272 m for instances where movement distance was ≤ 272 m between successive locations. Otherwise, we used 1,160 m buffer distances for records where movement was between 272 and 1,160 m, and the actual movement distance as the buffer distance when movement was > 1,160 m.
We used a discrete-choice model (McCracken et al. 1998, Cooper & Millspaugh 1999, Manly et al. 2002, Kuhfeld 2010) to estimate the probability of deer selecting a particular resource as a function of: 1) the attributes of the resource and 2) all other available resources (see Appendix II). We matched each used GPS location with a set of three random locations. The discrete-choice framework quantifies a choice made by an individual deer (i.e. used location) relative to the three alternative choices that also were available temporally and spatially but were not chosen (i.e. non-used locations). Non-used locations were drawn from within a circular buffer centered on the GPS location. The radius of the buffer was defined uniquely for each deer and location, and was based on the movement distance of the deer and perceived spatial scale determined by fractal analysis (see above; see Appendix II). We used a stratified Cox proportional-hazards-likelihood-maximization routine using the PHREG procedure in SAS® 9.2 (SAS Institute, Inc. 2008, Kuhfeld 2010). Due to repeated measures on the same deer, we defined the strata as the location set (i.e. used location matched with three non-used locations) nested within deer identification, which modeled the highest level of clustering by computing a robust sandwich covariance matrix estimate to account for the intracluster dependence. Discrete-choice models were run for data extracted from four scenarios; three with weighting and one without weighting applied. We refer to these four schemes as naïve (original landscape data, no weight applied), raw weights (considering only Pacq) and locational error + missed fix weights (two weighted schemes based on Pacq and removal of locational error > 100 and > 20 m). We analyzed data extracted from the three weighting scenarios (see above), and then compared these results to data extracted from a naïve map, without any weighting, that was based on the original landscape values.
Mapping animal occurrence
Based on the population-level coefficient estimates from each of the four analyses (see above), we mapped the relative probability of resource use (see Appendix II) as defined by Manly et al. (2002; i.e. log-linear model). After mapping relative probability of use for the four maps, we calculated quantiles that placed data into five equal-sized bins (highest, high, moderate, low and lowest); each bin represented 20% of the landscape. These five bins were used to validate maps using an independent validation sample of deer (see below).
We validated the naïve, raw weight and locational error + missed fix weight (at > 100 and > 20 m error) maps of deer occurrence (see Appendix II) by plotting locations from an independent validation sample of deer on each map. We used the percentage of locations occurring in each bin as a means of assessing the validity of how well each model mapped spatially. In addition, we used the percentages in each of the five bins (1-5) to test whether the percentage of validation points that occurred within each bin increased monotonically with bin rank using Spearman rank correlation (Zar 2010) implemented in the CORR procedure of SAS® 9.2. The validation sample consisted of 1,043 locations from four female mule deer.
Test collars (N = 20) were capable of recording 4,160 GPS locations over the 26-day deployment period. During this time, 119 GPS fix attempts failed, resulting in an overall probability of acquisition of 97.14% (Table 1). Probability of acquisition ranged from 93.1 to 100% (mean: 98.2%) across the four vegetation types (see Table 1).
Locational error and fix success
Locational error was related positively to slope (in degrees; t4039 = 4.25, P < 0.001), roughness (SD of elevation; t4039 = 5.64, P < 0.001), distance to rock outcrops (t4039 = 7.79, P < 0.001) and distance to shrubland (t4039 = 5.92, P < 0.001); as each of these features increased in value, so did locational error. There was a negative association between locational error and distance to grassland (t4039 = 3.05, P = 0.002) and distance to riparian (t4039 = 4.56, P < 0.001) vegetation types. Because locational error was not random with respect to vegetation types, we censored data (i.e. modeled locational error as a missed fix) based on locational error > 100 and > 20 m. Censoring data based on error, resulted in pseudo-acquisition rates ranging from 89.7 to 99.7% for > 100 m error (mean: 97.1%), and 77.3 to 98.3% for > 20 m error (mean: 92.7%; see Table 1). Subsequently, we used these pseudo-acquisition rates to map Pacq across the landscape and to weight feature layers in GIS for use with an empirical data set of female mule deer locations during winter.
After censoring locational error and considering these data as a failed GPS fix, we modeled Pacq in relation to slope (degrees), roughness (SD of elevation) and distance to shrubland, rock outcrop, riparian and grassland. Probability of a successful acquisition was related, consistently and positively, to slope and distance to rock outcrop (Table 2). There was also a consistent negative relationship between acquisition of a GPS location and roughness, and distance to riparian and grassland (see Table 2). Distance to shrubland was related positively to successful acquisition when modeled using the raw weights (i.e. probability of successful acquisition) and when considering Pacq (missed fix) and locational error > 100 m, but related negatively when modeled using Pacq (missed fix) and locational error > 20 m (see Table 2).
Mapping fix success
We used coefficient estimates from Table 2 and raster calculator in Spatial Analyst of ArcGIS to develop spatially-explicit maps of Pacq across the landscape (see Fig. 2). Although relationships were consistent across most variables in the three models, the spatial depiction of probability of successful acquisition was different based on the percentage of cells in each range of values (see Fig. 2). Most of the landscape provided a high Pacq when mapped using raw acquisition rates (see Fig. 2A). However, Pacq was reduced across the landscape when modeling locational error as a missed fix, which became progressively lower as the number of censored locations increased (see Fig. 2B-C).
Winter resource selection of mule deer: an example
Estimating resource selection of mule deer during winter revealed differences in coefficient estimates and changing relationships (i.e. positive or negative) among the four weighting schemes (three with weighting and one without weighting). During the day, there was a consistent negative relationship for three landscape features (i.e. distance to grass, agriculture and disturbance), meaning deer had a greater probability of selecting areas near these features. Also, during the day, there was a consistent and positive relationship for distance to riparian and improved roads (Table 3); probability of using an area increased farther away from riparian areas and roads. The three remaining landscape features (i.e. roughness, and distance to rock outcrop and shrubland) had changing relationships (positive or negative) depending on what map was used for analysis (see Table 3).
At night, similar relationships were observed for five of the landscape features (three positive and two negative; see Table 3). Similar to daytime selection of resources, 37.5% (N = 3) of landscape features (i.e. distance to rock outcrop, shrubland and agriculture) had varying relationships with deer selection (see Table 3). These findings (i.e. changing relationships and magnitude of coefficient estimates) reveal that accounting, or not accounting, for fix success and locational error can alter interpretation of animal selection patterns, as well as mapping predicted probability of occurrence across the landscape (Fig. 3).
For this particular study area and population, it appears that locational error may be a more significant source of bias than missed GPS fixes. GPS fix success ranged from 98 to 100% for all GPS-collared mule deer, with 17.5% of GPS locations having PDOP ≥ 4. For these reasons, the weighted maps that accounted for missed fixes and locational error (i.e. ≥ 100 or ≥ 20 m) had the highest validation.
During the day, ≥ 71.5% of the validation sample (N = 523 locations) occurred within the highest probability of use bin for all four weighting schemes (Table 4). At night, ≥ 63.1% of the locations (N = 520) from the validation sample occurred within the highest probability of use bin for all four weighting scenarios (see Table 4). On average, when considering day and night validation, the missed fix + > 100 m error-weighted map contained 88.9% of validation locations within the highest probability of use bin, followed by the missed fix + > 20 m error-weighted map (86.6%) and the naïve map (82.5%); the raw weight map, based only on weighting of probability of acquisition, validated poorly (67.3%; see Table 4). All combinations of maps (i.e. naïve, raw weight and missed fix + > 100 or > 20 m locational error) and time periods (i.e. day and night) revealed that the percentage of locations in each bin increased monotonically with bin rank (P ≤ 0.037), except for the raw map at night (ρ = 0.7, P = 0.188), in which percentage of locations did not significantly increase with bin rank (see Table 4).
Inverse weighting proved to be an acceptable and easily applied technique to account for biases associated with acquiring a GPS location (Pacq) and locational error. However, weighting maps using only raw Pacq resulted in the poorest validation of all maps. Taking locational error into account, modeled as a missed fix, greatly improved validation. Not accounting for locational error (i.e. only weighting based on missed fixes) may be the reason why previous researchers (D'Eon 2003, Polfus et al. 2011) suggested that data sets containing > 90% of the data could be analyzed safely without the need for correcting or weighting for missed GPS locations. In this study, even with removal of data due to locational error, we were still able to maintain ≥ 92.7% of all test collar locations (range: 77.3-99.7%; see Table 1). Even in the presence of a high probability of acquisition, we observed significant improvement in spatial mapping, validation and estimation of the effects of landscape features on deer resource selection, likely because locational error (or high PDOP) was a more significant form of bias than missed fixes. This finding is also in contrast to a previous study (Wells et al. 2011) that found coefficient estimates did not change, and accuracy of habitat models only improved 1-2% (on average) after sample weighting in a population of mountain goats Oreamnos americanus. These data, and that of others, reveal that accounting for both missed GPS fixes and locational error may provide more reliable estimates of resource selection than only accounting for missed GPS fixes.
A primary reason for weighting landscape layers before analysis is because Pacq and locational error are not random relative to landscape and topographic features. Locational error and Pacq did occur in a predictable manner; a primary requisite for developing weighting factors. This finding is similar to previous studies that reported varying quality of fixes (that relates to locational error) and Pacq across different habitats and environments (D'Eon et al. 2002, Frair et al. 2004, Cain et al. 2005, Sager-Fradkin et al. 2007, Jiang et al. 2008, Bourgoin et al. 2009). It has been suggested that sample weighting cannot be applied to individual locations (Frair et al. 2004). However, we applied weighting to each individual GPS location through the development of weighted layers in GIS and extraction of these values to points. In this process, each raster cell, based on probability of successful acquisition, was weighted by the inverse probability for that particular cell (1/Pacq). Thus, each GPS location, and random location, was weighted prior to analysis.
Other potential benefits of the outlined approach include: 1) ease of application and 2) ability to maintain sample size (i.e. does not require data reduction or screening of empirical data). We employed commonly used resource selection methods and statistical techniques. We used generalized linear mixed models and logistic regression to spatially depict probability of successful acquisition across the landscape. We further refined the maps by developing weighting factors in a spatially-explicit framework that allows GIS users to weight the landscape prior to obtaining empirical data (if using test collar data to build weighted maps). This makes the extraction of weighted values simple and allows a smooth transition of data directly into a statistical analysis program. Finally, censoring data from test collars, based on locational error, allowed all empirical data to be retained in the final resource selection analysis. Censoring empirical data due to fix quality is common (D'Eon et al. 2002, Frair et al. 2010), but can lead to additional biases (Lewis et al. 2007). Using these methods, Pacq and locational error are only known for test collars, but the reason for locational error in test collars also would extend to GPS collars on animals. This means that if rough topography increased locational error of test collars, then locational error of GPS collars would also occur (including increased PDOP). There likely was inherent locational error in GPS collars worn by mule deer in our case study, but because the relationships between test collars and empirical collars are the same, we limited censoring of empirical locations due to locational error. Censoring test collar data due to locational error ultimately results in increased weighting, thereby maintaining sample size of empirical data and improving validation.
A primary purpose for the development of resource selection models is for their application and ability to predict occurrence with a high level of accuracy (Boyce et al. 2002, Wiens et al. 2008). Our study, and those of others (Frair et al. 2004), demonstrate that varying levels of bias associated with GPS fix success and locational error influence coefficient estimates of resource selection models, depending on the magnitude of missing locations (herein locational error was modeled as a missed GPS fix). Based on these findings, investigators should be cognizant of how bias in the collection of GPS locations can result in altered interpretations of selection patterns. A major concern is that models not accounting for bias, or those that are improperly tested, could lead to serious errors in the interpretation of results, subsequently influencing management decisions (Wiens et al. 2008). It will be up to the investigators to determine how much error is allowable when validating animal occurrence maps, and to decide which map to choose for interpretation. The primary requisite for overcoming bias and proper interpretation of results lies with the validation process itself. Numerous methods exist to validate predictive maps, such as using an independent data set, withholding independent individuals from the model building process (as used here), or jackknifing procedures that withhold individuals, one at a time, and iteratively rebuild maps at each step. At any rate, validation is necessary to make informed decisions on the level of weighting necessary to make reliable inferences from the data.
Our study took place during winter; both test collar data and mule deer locations were collected during the same season. When studies span multiple annual cycles or seasons, developing separate weighted maps for each season may be required because previous research has identified varying fix success rates across months and seasons (Sager-Fradkin et al. 2007). A likely explanation as to why researchers identified varying fix success across seasons may be due to plant phenology. For example, canopy cover changes with season and has been identified as a primary vegetation component influencing fix rate success (Rempel et al. 1995, D'Eon et al. 2002). Plant phenology changes with season, which likely results in changing fix success and locational error during each season. Within the framework presented, developing seasonal weighting factors will provide an easy fix to account for changing plant phenology. Subsequently, extracting values from each weighted map to the season-specific GPS location is straightforward.
We are optimistic that these results will provide baseline methodology to account for both missed GPS fixes and locational error into one unified analytical framework. With that said, there likely is potential to further refine the methods presented herein. First, if spatial autocorrelation is an issue (e.g. Dormann et al. 2007), researchers have two options: 1) empirically model the autocorrelation in the data by including extra parameters known as autocovariates (e.g. neighbourhood statistics, distance to similar habitat patches, easting and westing) or 2) account for the spatial autocorrelation in the data during modeling by specifying the appropriate covariance structure, use of random effects, or strata to designate where the correlation occurs (as in the approach we used). Second, it will be valuable to test, or simulate, the effects of variable temporal resolutions on the weighting and prediction of resource models. For instance, different weighting schemes may need to be developed for GPS locations collected every 30 minutes vs every three hours, or when data are filtered to achieve independence. Finally, it would be prudent to test these methods, and those of others, on various taxa to assess the general applicability of these methods on a wide variety of species.
Based on the data presented herein, relationships (positive or negative) and magnitude of effect (coefficient estimates) differed among the four sets of maps analyzed. Additionally, validation and spatial prediction of occurrence were sensitive to subtle changes in the magnitude of selection (even when all signs (+, -) remained the same). Without weighting factors, the importance of landscape features may be over- or underestimated, which can have serious consequences when planning development, or creating management and conservation guidelines. It is often difficult to overcome both fix-rate bias and location imprecision when analyzing resource selection of animals (Frair et al. 2010). Our approach offers a simple solution to accounting for fix rate bias (i.e. probability of successfully acquiring a GPS location) and location imprecision (i.e. locational error). Fix rate bias is accounted for directly in the analysis whereas locational error of animal locations is modeled indirectly through the censoring of test collar locations, thus providing more weight to habitats that influence both fix success and locational error. This proved to be an effective approach that resulted in increased predictive ability of maps based on an independent validation sample of mule deer withheld from model development. Many studies on resource selection of animals proceed without correcting, or accounting for, sources of bias (missed GPS locations, locational error or quality of fix) that typically influence interpretation of results (Frair et al. 2004). Therefore, we caution researchers against using raw data without testing and accounting for the effects of locational error and missed GPS locations.
We thank Ridgeline Energy, LLC for funding, Tetra Tech EC, Inc. for logistical support, K. Kosciuch, M. Griswold, A. Miller, J. Voorhees, K. Harper, S. Gamo and R. Guenzel for project support and assistance, L. Baker and P. Blomberg for assistance with heads-up digitizing and C. Hedley, I. Storch, S. Kramer and several anonymous reviewers for improving earlier drafts of this manuscript.
Probability of successfully acquiring (Pacq) a GPS location using test collars placed throughout the study area used by female mule deer in southern Wyoming and northern Colorado during winter (January-April 2011). Sum (‘Total’ and ‘Missed’) and average (Pacq) values are given in the last row of the table.
Coefficient estimates, precision (SE) and statistics for modeling probability of successfully acquiring (Pacq) a GPS location that was based on vegetation and topographic features. The variables included had the lowest AICc (within each univariable model for each of the vegetation and topographic features) when considering functional form (linear, quadratic or natural log-transformed).
Coefficient estimates (± SE) modeled using a discrete-choice model for probability of female mule deer resource use during the day and night in southern Wyoming and northern Colorado during winter (January-April 2011). The estimates were based on naïve and weighted maps corrected for probability of acquisition (raw) and probability of acquisition including locational error (> 100 and > 20 m). Roughness (SD of elevation) was modeled as a linear term at 90-m scale, distance (m) to shrub and riparian was modeled as a natural log-transformed term, and distance (m) to rock, grass, agriculture, anthropogenic disturbance and roads were modeled as a linear term. Coefficient estimates in italics were significant at P ≤ 0.05.
Percentage (%) of female mule deer (N = 4) locations (N = 1,043), withheld as a validation data set, correctly classified using naïve and weighted maps corrected for probability of acquisition (raw), and probability of acquisition including locational error (> 100 and > 20 m). Percentages may not sum to 100% due to rounding. A Spearman rank correlation test was used to assess whether the percentage of locations in each of the five bins (1-5) increased monotonically with bin rank.