The number of invasive exotic plant species establishing in the United States is continuing to rise. When prevention of exotic species from entering into a country fails at the national level and the species establishes, reproduces, spreads, and becomes invasive, the most successful action at a local level is early detection followed by eradication. We have developed a simple geographic information system (GIS) analysis for developing watch lists for early detection of invasive exotic plants that relies upon currently available species distribution data coupled with environmental data to aid in describing coarse-scale potential distributions. This GIS analysis tool develops environmental envelopes for species based upon the known distribution of a species thought to be invasive and represents the first approximation of its potential habitat while the necessary data are collected to perform more in-depth analyses. To validate this method we looked at a time series of species distributions for 66 species in Pacific Northwest and northern Rocky Mountain counties. The time series analysis presented here did select counties that the invasive exotic weeds invaded in subsequent years, showing that this technique could be useful in developing watch lists for the spread of particular exotic species. We applied this same habitat-matching model based upon bioclimatic envelopes to 100 invasive exotics with various levels of known distributions within continental U.S. counties. For species with climatically limited distributions, county watch lists describe county-specific vulnerability to invasion. Species with matching habitats in a county would be added to that county's list. These watch lists can influence management decisions for early warning, control prioritization, and targeted research to determine specific locations within vulnerable counties. This tool provides useful information for rapid assessment of the potential distribution based upon climate envelopes of current distributions for new invasive exotic species.
The rapid-assessment geographic information system tool described in this paper is very applicable to management of invasive exotic species. County-level records for weed distributions for large geographic areas are readily available on Websites, unlike point location data. This tool is easy to use and creates potential distribution maps based on climate variables. These maps can then be used to generate watch lists for early detection of the weeds. Early detection can help efforts to eradicate a problem species before a large infestation occurs that is much more difficult to control.
The tool creates an environmental envelope for each environmental variable for each species; this envelope describes the range of environmental variability over which the species can survive. For example, we obtained the lowest recorded temperature and the highest recorded temperature for a species in counties where it is present. We then compared this range to counties where the species is absent and recorded if the county's value fell inside (assigned a value of one) or outside (assigned a value of zero) the range of the recorded presence locations. Finally, we summed these values of one or zero for all of the variables by county. The sum indicates the number of variables for each county that fell within the environmental envelope of the species.
We developed bioclimatic envelopes using climate data for invasive exotic plant species at the county level in the United States. Using these envelopes, we determined the likelihood of a species establishing in a county. These results can be used to develop county-level watch lists of species whose envelope includes the county. This method is not limited to the county-level data sets used here, but could be applied to other taxonomic groups and other data sets such as national park species lists.
Invasive exotic plant species are one of the major threats of the 21st century, negatively impacting human health (Mack et al. 2000), the economy (Pimentel et al. 2005), native species, and ecosystem processes (Vitousek et al. 1996; Wilcove et al. 1998). The rate of exotic species' introductions appears to be increasing with globalization (Levine and D'Antonio 2003; Stohlgren et al. 2008; Work et al. 2005), exacerbating these potential negative impacts. In invasive exotic species management, prevention of a novel exotic species reaching a new location is key to reducing unwanted invasions (Rejmanek and Pitcairn 2002). Prior to species establishment, early detection quickly followed by control and eradication is the most effective course of action in reducing spread. The cost of eradicating an exotic species increases exponentially as an infestation grows (Rejmanek and Pitcairn 2002). The large number of species already established or currently entering the United States coupled with the time and labor demands of screening for potential invasiveness and early detection of key species makes the problem seem intractable (Levine and D'Antonio 2003). Therefore, an early warning system is necessary in the prevention of new infestations (Lodge et al. 2006); the creation of watch lists such as those suggested here are an important component of such a system.
Regrettably, there is often a dearth of specific biological knowledge about any particular exotic species. Although several different methods exist for predicting the potential distribution of an exotic in a new range (Caley and Kuhnert 2006; Krivanek and Pysek 2006; Richardson and Thuiller 2007), these methods generally are used at the scale of countries, and require specific information about the native range of the species (see Ficetola et al. 2007; Richardson and Thuiller 2007). Data on country distributions are generally easily obtained. Herbarium collections may be used to generate lists of invasive exotics for political entities such as countries, states, or counties, but such lists are not inclusive; the species listed are not systematically collected nor are the species lists developed for this purpose. Ecological data concerning a potential invasive exotic species, including its life history requirements, may often be lacking unless the species has displayed invasive characteristics elsewhere or it has been well studied throughout its native range. Collecting these data for new invaders can often be time intensive. When a new exotic species is located, managers may not be able to wait for detailed data collection and analysis before taking action. A quick, general way to prioritize species watch lists at the scale of a management unit such as a U.S. county would be a useful tool for field managers involved in early detection and rapid response activities.
There are many techniques available for predicting species ranges (see recent review by Elith et al. 2006), typically requiring point locations for a species or an overlaid grid with cells identified as present or absent based upon field data. Unfortunately, these types of location data are often not obtained easily by resource managers. Occurrence data for invasive exotic plant species across large spatial extents are often only readily available at county-level (or even state-level) distributions (or as species lists for areas such as national parks or wildlife refuges), although there are several online systems being developed to synthesize disparate field data sets for invasive exotic species. Because of the varied size and shape of U.S. counties, it can be difficult to transform these data into the required point locations or grid of presence locations.
There are two suites of environmental niche models that are useful in determining species occurrences, those requiring presence-only data and those requiring presence and absence data. These models can be generated with location data from many sources, including museum and herbarium records, research survey data such as plot data and transects, and inventories of species for specific areas. Models using presence and absence data will be more discerning and can distinguish between factors related to species absence as well as presence (Brotons et al. 2004; Zaniewski et al. 2002). However, when reliable absence data are unavailable different strategies may be recommended. Generally, absence locations are not implicitly collected in weed surveys (Barnett et al. 2007; North American Weed Management Association 2002), and often may only be inferred if an entire area has been surveyed or all inspected locations are known. However, this information is generally not included in online databases that make presence data readily available (e.g., Invasive Plant Atlas of New England [University of Connecticut 2007]). Other data sets, including those from museums and herbaria and species lists for areas such as counties or national parks, also lack absence data, again resulting from our lack of knowledge about survey locations or because of lack of information on survey targeting and extent for species occurrence data. Where available, absence data has the potential of false absences (e.g., where a species is cryptic or present as a buried seed; Crossman and Bass 2008; Rouget et al. 2001), and the species could be unreported or absent even in highly suitable habitat. Detection of an exotic species can often be difficult early in the invasion process as some exotic species often grow in relatively small numbers for a period of time after the introduction, which is called the lag phase (Crooks 2005). Missing these presence locations can cause errors in models by missing important suitable habitats (Hortal et al. 2008, but see Loiselle et al. 2008). Another kind of false absence may result from the fact that there is a high probability that the new invading species has not yet had the opportunity to establish itself at a particular location, and so is out of equilibrium with its environment. Given opportunity and time, the invader could eventually establish itself and spread into areas where it is currently absent. In these situations, where a species does not occupy all suitable habitat, presence-only models have out-performed presence/absence methods (Brotons et al. 2004; Hirzel et al. 2001) and have been used instead (Gibson et al. 2007). Thus, we choose to use presence-only data in this paper for exotic species distribution modeling.
Given the challenges of obtaining species-specific data for exotic plants, data format (point locations or regular grid) limitations, and inaccuracies of absence data along with the issues associated with species distribution models, we have developed what we believe to be a quick and effective method of providing information early in the invasion process to guide management decisions until the information and resources to develop more detailed and specific models become available. This geographic information system (GIS) program is adapted from an earlier program that we created, which incorporates known point location data to create an environmental envelope for a species (Barnett et al. 2007; Evangelista et al. 2008). This method is simple enough for users who may not have the statistical background necessary to understand more complex predictive modeling techniques. It incorporates county-level species lists and ancillary data layers such as air temperature and annual precipitation as parameters; in this example we chose general bioclimatic parameters (although other environmental parameters such as topographic parameters could be used) that are fundamentally important for most plant species' growth and establishment rather than parameters necessary for a particular species. Here, we detail our system for generating “watch lists” of species based upon currently reported county-level distribution data in association with various bioclimatic factors. We plan to make this system available for use at the National Institute for Invasive Species Science (National Institute of Invasive Species Science 2008). This GIS program will create a bioclimatic envelope of a species' potential distribution based upon where the species is known to currently occur. These envelopes are defined by the range in bioclimatic conditions where a species is currently known and can be used to assess the potential spread of the species and develop watch lists for early detection activities. Information is quickly available while more detailed assessments are gathered.
Materials and Methods
Invasive Exotic Weed Data
We obtained county-level presence data from 2004 and 2007 for the top 100 most problematic invasive exotic plant species within the contiguous 48 states of the United States from the Biota of North America Program (BONAP; Kartesz 2004, 2007). BONAP maintains a county-level database of current occurrence data and historic herbarium records for all known vascular plants in the United States. The top-100 list includes the most problematic invasive exotic species. These species covered a broad range of spatial distributions, from mesquite [Prosopis juliflora (Sw.) DC.] found in one county to curly dock (Rumex crispus L.) found in 1,846 counties across 47 states.
Validating our method required a temporal data set because we were predicting the potential range of an exotic species given an initial distribution after introduction. We used a county time series data set from the INVADERS database (Rice 2006), which records exotic plant occurrence records for all counties in the Pacific Northwest and northern Rocky Mountain states of Washington, Oregon, Idaho, Montana, and Wyoming, hereafter called the Northwest. We queried county-level distributions for all 100 species for 1930, 1960, 1990, and 2005. Some of the species documented only a single occurrence record for a time-step and 27 species were undocumented for these states for all four time periods (not recorded), precluding their use. Thus, sample sizes varied for each time period, resulting in envelopes for 44 species for 1930, 57 for 1960, 66 for 1990, and 69 for 2005.
Climate Data Layers
We derived 19 bioclimatic raster data layers (Appendix A) from average monthly precipitation, minimum temperature, and maximum temperature (Nix 1986) using an ArcAML script (Hijmans 2006). These variables represent annual trends, seasonality, and extreme or limiting bioclimatic factors. To represent current climate conditions and species habitat we used the PRISM data set, (Daly et al. 2000; PRISM Group 2007), an 800-m (2,625-ft) resolution 30-yr average data set for 1971–2000. We then summarized the bioclimatic variables for each county using ArcGIS's Spatial Analyst Zonal Statistics tool1 to calculate the minimum, maximum, mean, and range for each variable for each county. From these four metrics we chose the statistic that matched the variable most closely, for example for Bio1, annual mean temperature, we chose the mean, and for Bio6, minimum temperature of the coldest month, we chose the minimum. This method allowed us to take the extremes in counties rather than simply using an average across the county.
Bioclimatic Envelope Tool
We developed an ArcGIS script to determine the bioclimatic envelope of a species defined by its known polygonal presence locations (in this case, counties). We created a bioclimatic envelope for each variable for each species; we define a bioclimatic envelope as the range of bioclimatic variability over which the species can survive. For example, we obtained the lowest recorded temperature and the highest recorded temperature for a species in counties where it is present. We then compared this range to counties where the species is absent according to the BONAP data set and recorded if the county's value fell inside (assigned a value of one) or outside (assigned a value of zero) the range of the recorded presence locations. Finally, we summed these values of one or zero for all of the variables by county. The sum indicates the number of variables for each county that fell within the bioclimatic envelope of the species. Since 19 variables were used, a value of 10 would mean that the county was within the range of 10 variables and outside the range of nine variables. We did not differentiate between the variables, so counties with a value of 10 would not necessarily fall within the range of the exact same 10 variables.
For validation of the method we developed a bioclimatic envelope for each of the 100 worst exotic species present in the Northwest in 1930 and compared it to the species' recorded distribution in 1960, 1990, and 2004. We used the Northwest data set because we could use the time periods to check validity. We performed the same comparison using the updated envelopes based upon the new species location records for both 1960 and 1990 to further validate the technique. Assessment metrics included percentage of new occurrences captured by the envelope, sensitivity and specificity (Fielding and Bell 1997), and the number of counties added to the watch list. Sensitivity is the probability that observed presence locations were predicted correctly; specificity is the probability that absence locations were predicted correctly. Because the assessment metrics required binary data, we defined anything with an envelope value of at least 15 as present. We selected 15 as the cutoff by examining the number of presence locations in future years that fell into each of the 19 envelope count classes and selected the one where the values leveled off for all species. The envelope from each time period for the Northwest and the envelope from 2004 were also compared to the 2007 BONAP data set. After validation we examined an application of this bioclimatic envelope method, calculating the bioclimatic envelope in the United States for each of the 100 worst invasive exotics in the BONAP data set to examine potential species distributions.
Results and Discussion
Validation with Time Period Analysis
Because we examined 100 species, we present general trends and a few detailed examples (for all 100 species and occurrences see Appendix B). A minimum of 15 occurrence records was required to capture future occurrences, as determined by examination of sensitivity and sample size, thus we used this value as a cutoff for including species within further analyses reported here. For all species in the time series, average sensitivity of the envelope was 92, 95, and 96% for 1930 applied to 1960, 1960 applied to 1990, and 1990 applied to 2005, respectively (Table 1). However, specificity, which was calculated by defining all counties not reporting a species as “absence” locations, was much lower, meaning that the envelope overpredicted the species distribution (27, 24, and 25%, respectively; Table 1). These low specificity values could be caused by calculating the metrics using absence locations that were not necessarily unsuitable locations for the species to grow. Rather, these were places where the species has not been recorded either because of sampling errors (these data are based on museum records and not a statistical sampling design) or because of suitable habitat where the exotic species has not yet arrived. All species have continued to be recorded in new locations for the time period, including the most recent, although this period was half that of the others. Although this could be a result of failing to detect or report a species, in previous analyses using the INVADERS data, we determined that at least some of the new records through time are due to species spread (Stohlgren et al. 2008). Another reason for the drastically different sensitivity and specificity values relates to the development of the envelope. Factors limiting the distribution of the species may not have been included in the suite of predictors, leading to overprediction. Other methods for determining species distributions that develop statistical relationships with variables using both presence and absence data may be better able to differentiate suitable habitat.
Results from predicted distribution with the envelope model compared to actual distribution.
For example, we created a bioclimatic envelope for hoary cress [Cardaria draba (L.) Desv.] using the data from 1930 (Figure 1a) and 1960 (Figure 1b) and then compared the envelope's prediction to the reported distribution from the next time periods (1960 and 1990, respectively). The 1930 envelope for hoary cress captured many of the new locations in 1960, but not as great a proportion of the future time-step's new locations. The 1960 envelope captures more of the future time-step's new locations because the species had spread to locations with bioclimatic conditions not encompassed by the 1930 recorded distribution. By rerunning the envelope with the new locations from 1960 the envelope improves by encompassing these novel environments, supporting the need for an iterative approach to improve these models as new records are added to the database (Stohlgren and Schnase 2006).
Selecting all counties with a 1930 envelope score of at least 15 for each of the species, on average 88% of locations reported as present by 2007 were captured by the envelope. The time series results indicate that this is a useful technique to reduce potential locations to watch for such species to appear. County watch lists may be generated by adding species to county lists when the county has a high envelope score.
Based upon the results from the Northwest time series, we found this method to be informative for creating species' watch lists. This simple model captured many of the new occurrences reported in future time steps. The benefits of this approach are that little has to be known about the individual species, which is helpful for unresearched, newly established exotic species. This method provides immediately useful information while more detailed information is being collected and analyzed. More detailed information could be used to predict locations within an at-risk county where the species will be most likely to occur. In every case, the number of counties on a watch list generated from the envelope results was still fewer than the 199 counties in the Northwest region (Table 1).
This method may be especially useful in situations where errors of omission (a species is predicted absent when present) far outweigh those of commission (a species is predicted present when absent). The method performed very well at capturing new locations and new potential locations. However, occasionally it overpredicted, perhaps due to capturing appropriate bioclimatic conditions for growth rather than the subset of those locations a species is limited to by interactions with other organisms. For the species we examined, it is difficult to know if these species have reached the full range of their potential distribution or if they are still spreading. The BONAP data set compiled in 2007 showed increases for all but five of the 100 species from the 2004 data set (an average increase of 99 counties added to a species' distribution), suggesting that the species examined are still being found in new locations.
Application of the bioclimatic envelope for the 100 worst invasive exotics suggested that all species could spread relative to the 2004 BONAP data set distribution. On average, species were recorded in 635 counties in 29 states. The average number of counties for each species with an envelope value of at least 15 (e.g., at least 15 of the 19 parameters for the county were within the range of the envelope) was 2,513 counties in 43 states, for a predicted average increase of 1,878 counties in 14 states from the 2004 distribution. Thus, a species could be added to an average number of 1,878 county watch lists. Although this number is large, the envelope for 45 of the 100 species included fewer than 10 new states, meaning that 45 of the species would be added to the watch lists of fewer than 10 states.
Almost all species in the BONAP data set did have increased occurrence records between 2004 and 2007. Eleven species could not be compared due to changes in taxonomy, which made it difficult to differentiate between distribution changes based upon renaming a species and actual spread. For the remaining 89 species, the average number of species per county increased from 635 counties in 29 states in 2004 to 686 counties in 31 states in 2007, an average increase of 98 counties over the 3-yr period. These data again suggest that the selected species are still increasing in distribution, further validating the method as the bioclimatic envelope models based upon the 2004 distributions showed potential increase in distribution.
As an example of the results, we selected two species with different current distributions—clustered vs. highly dispersed—to discuss in detail. Mary's-grass [Microstegium vimineum (Trin.) A. Camus var. imberbe (Nees) Honda], introduced into Tennessee in 1919, was found in 325 counties in 23 states in the eastern United States in 2004 and had a small predicted bioclimatic envelope (Figure 2a). Yellow starthistle (Centaurea solstitialis L.), introduced in the mid-1800s, was also found in small number of counties (218 counties in 32 states), but these locations were widely distributed across the United States and in more states rather than clumped (Figure 2b). The predicted envelope subsequently had a larger predicted distribution. Species such as curly dock and green foxtail [Setaria viridis (L.) Beauv.] were reported in at least half the counties within the contiguous United States and had predicted extents covering most counties. However, even for these species, unique counties such as hot, dry counties in the Southwest and hot, moist ones in the southern tip of Florida had a lower habitat match value and therefore lower number of parameters within the envelope. The species Mary's-grass would then be added to the watch lists of a fewer counties than yellow starthistle, which would be added to almost all counties' lists. This method for generating watch lists may be more beneficial for species such as Mary's-grass (species only on a few counties' lists) than potentially widespread species.
Generalist species such as thistles tend to spread easily due in part to their plumose seed dispersion method, and such coarse-scale modeling techniques may not be beneficial, as with yellow starthistle. These generalist species do well in most habitats and tend to have potential habitat in the vast majority of counties within the United States, and may be more difficult to model (Evangelista et al. 2008). However, for species that are highly restricted by environment in their distributions, such melaleuca [Melaleuca quinquenervia (Cav.) Blake], this technique could inform resource managers in diverse locations whether or not they need to monitor for the appearance of this plant. Melaleuca grows primarily in hot and wet conditions, which means that the bioclimatic envelope of this species is very specific. Managers working in the desert southwest or cold mountainous regions can probably rule out the need to monitor for such a plant. Although Mary's-grass is not as specialized as melaleuca, it still appears more restricted in its distribution than a thistle, and managers in the western United States could again leave it off a watch list (Figure 2a). It is this ability to rule out species for an area that is particularly helpful in the development of species watch lists.
If data were available for watersheds or ecoregions rather than politically defined units such as counties, we would recommend using these data because they would be less prone to the errors associated with amalgamating climatic data across a large, diverse county. However, data for politically defined regions are much more readily available, and despite the issues associated with a single county encompassing very diverse conditions, this technique still has some value. Also, by using metrics other than simply means for the county, we were able to capture some of the extremes that do exist (e.g., if minimum temperature is limiting, using the lowest minimum temperature found anywhere within the county would indicate whether the species could survive anywhere within the county). Additionally, this technique is not limited to the bioclimatic predictors used here. Other variables deemed important for a particular species or a suite of species could be used to define the environmental envelope of a species.
This method is not meant to replace other, more detailed methods. It only predicts locations that may be suitable climatically, and with the variables chosen in the example presented in this paper, and does not explore other potentially limiting factors such as biotic interactions. It can be used as a first approximation of potential habitat after the establishment of a species thought to be invasive while the necessary data are collected to perform more in-depth analyses. As illustrated by the time series data, the methods described here could provide a useful means to quickly develop watch lists for the network of county weed coordinators across the country requiring few additional resources. The models may also be useful in selecting priority weed species for control based on their potential spread, and can certainly provide utility as a first-iteration modeling approach to inform immediate actions while more detailed data are collected.
We thank Alycia Crall, Paul Evangelista, Misako Nishino, and Sunil Kumar for comments on the methodology and early versions of the manuscript. We also thank several anonymous reviewers for helpful suggestions. Funding was provided by a grant from the U.S. Geological Survey Climate Change Program and the U.S. Geological Survey Center of Excellence for Geospatial Information Science. T. Stohlgren benefitted from collaborations from the USDA CSREES/NRI 2008-35615-04666 grant. Logistic support was provided by the U.S. Geological Survey Fort Collins Science Center and the Natural Resource Ecology Laboratory at Colorado State University. To all we are grateful.
Appendix A. Nineteen bioclimatic variables derived from average monthly precipitation, minimum temperature, and maximum temperature, based on Nix (1986).Appendix A
Nineteen bioclimatic variables derived from average monthly precipitation, minimum temperature, and maximum temperature, based on Nix (1986).Appendix A