Benchmarks provide context and are a critical element of all ecological assessments. Over the last 25 y, hundreds of papers have been published on various aspects of ecological assessments, and most of the analyses described in these papers depend on specifying an ecological benchmark for context. Freshwater scientists and managers usually use reference sites (typically sites in natural or least-disturbed condition) to assess the ecological conditions at other sites. Accurate and precise assessments require that assessed sites be matched with appropriate reference conditions. Two general types of approaches have been proposed to predict reference conditions: classifications based on natural environmental settings and models that use continuously variable environmental attributes as inputs. Two types of classifications have been examined: geographic-dependent regionalizations based on general landscape features and geographic-independent typologies that are typically based on combinations of regional and channel features. We examined >1000 papers that addressed some aspect of predicting the reference condition in freshwater ecosystems. We focused on 5 types of benchmarks: ecological, thermal, hydrologic, geomorphic, and chemical. Our review showed that over the last 25 y, researchers have developed increasingly sophisticated methods that can be used to predict reference conditions. Most disciplines have increasingly moved toward site-specific modeling approaches as a way to improve both accuracy and precision of predictions, although typological approaches dominate geomorphic characterizations. Papers published in J-NABS have been especially important in advancing and refining methods for predicting ecological benchmarks. Much of the progress made in the science of ecological assessment emerged from research that advanced our understanding of how the spatial and temporal distributions of freshwater biota are related to naturally occurring environmental features and how those relationships can be most accurately and precisely described and predicted. Thus, the performance of ecological assessments is critically linked to how well we characterize freshwater environments, and research in the watershed sciences that addresses predicting thermal, hydrologic, geomorphic, and chemical attributes of freshwater ecosystems has paralleled research focused on predicting biota. We anticipate that knowledge produced from future collaborations between ecologists and watershed scientists coupled with the application of modern modeling techniques will largely determine progress in characterizing and predicting biota–environment relationships and, thus, the accuracy and precision of future ecological assessments.
The objective of an ecological assessment is to measure the status of an ecological resource. Such assessments depend on 2 elements: a measure of the ecological resource of interest (often expressed in terms of an index) and a benchmark (i.e., a reference condition) from which we can judge if the measured condition of the resource differs from a desired, expected, or previous condition. Without benchmarks, little context exists for interpreting the measured value of an ecological resource because resource states (e.g., number or types of species and their abundances or nutrient concentrations) can vary markedly with natural differences among sites. Understanding the natural variability of ecological resources and the abiotic conditions associated with that variation is fundamental to the development of sound environmental policy and management directives.
Policy makers and resource managers generally agree that in the context of ecological assessments benchmarks should represent ecological properties associated with naturalness (e.g., Landres et al. 1999, Hering et al. 2003, Stoddard et al. 2006 [Fig. 1]), but no general consensus exists regarding how pristine a condition a benchmark should characterize. In fact, the operational definition of a benchmark is almost always based on some merging of abstract concepts of pristineness, empirical knowledge derived from sites that are the best of what is left and whose quality can differ considerably from historical (pristine), and the ecological conditions desired by human society (which also might differ from pristine conditions). Regardless of the degree of naturalness implied by a specific definition of benchmark, the accuracy and precision of ecological assessments are dependent on the degree to which those benchmarks can be quantified and predicted.
In our paper, we review the development of concepts and methods relevant to predicting reference conditions in stream ecosystems. In doing so, we first summarize the literature published to date that describes different approaches ecologists have used to characterize reference conditions in streams. These approaches generally include methods based on either classification of landscape elements within which streams flow or models that predict site-scale characterizations. We identified 2 types of classifications: geographic-dependent regionalizations that generally describe large, unique, spatially discrete geographic units and geographic-independent typologies that typically describe smaller, spatially repeatable types of landscape settings. We also identified 2 types of models: those models limited to predicting conditions at individual sites (single-site models) and those models that can make predictions of local conditions across a range of environmental settings (site-specific models).
We then review why benchmarks are needed in ecological assessments and discuss a central issue in the application of any assessment: how the inferences we draw regarding the ecological condition of assessed sites are affected by how we define and estimate benchmark conditions. We next discuss the general role that prediction plays in establishing benchmark conditions, primarily focusing on the 2 major methods ecologists have used to make these predictions: classification and modeling. We conclude by describing how research within the broad discipline of watershed science has helped, and should continue to help, more accurately and precisely predict how environmental conditions vary across natural spatial and temporal gradients and, hence, the most appropriate ecological benchmarks for individual sites.
To identify relevant literature, we conducted keyword searches in both the Institute for Scientific Information (ISI) Web of KnowledgeSM (1982 to July 2009) and Google Scholar®. We also inspected the literature citation sections of the papers compiled from the Web of Knowledge and Google Scholar searches to identify additional relevant papers that these searches missed. We conducted separate literature searches for papers that focused on: 1) establishing and applying ecological reference conditions in all types of freshwater ecosystems and 2) papers in the physical sciences that specifically focused on the prediction of temperature, hydrologic regime, channel geomorphology, or sediment size, and water chemistry in stream ecosystems. We generally restricted our analyses to the primary literature (journals) but included some secondary literature that provided especially important institutional guidance regarding management practices.
For the ecological literature, we first conducted a Web of Knowledge search based on the keywords reference condition, regionalization, ecoregion, classification, typology, RIVPACS, AusRivAS, and BEAST to identify an initial set of papers that focused on general aspects of the description, prediction, or application of the reference condition in freshwater ecosystems. We then used Google Scholar to identify other candidate publications based on simple combinations of keywords or authors (Google Scholar does not allow Boolean logic searches). We compiled the results of these searches in 3 ways. First, we summarized the number of papers published by individual journals (and their citation rates) that explicitly addressed development or testing of reference condition approaches. Second, we summarized papers that addressed or used some aspect of the reference condition in an ecological assessment by the type of reference condition approach used: regionalization, typology, single-site, or site-specific models. This latter compilation was substantially larger than the former compilation because it included case studies that did not necessarily develop or assess the performance of the prediction approach used. Last, we summarized the individual papers that appear to have had the most influence on development of reference condition thinking. We identified papers that have been cited ≥40 times. The 40-citation cutoff was arbitrary but represented a distinct break from the next most cited paper (18 total citations).
For the physical sciences literature, we conducted 4 separate Web of Knowledge and Google Scholar searches for papers that individually included the keywords temperature, hydrology or hydrologic regime, channel geomorphology or bed sediment, or water chemistry together with the general keywords stream, river, lotic, reference condition, regionalization, classification, typology, model, and prediction. These latter keywords were common to all 4 searches. We identified articles dating back to 1877 that were published in 192 different journals, books, and government documents. We then categorized the information in these papers with respect to whether predictions of reference conditions could be made, and if so, what type of approach to prediction was used.
Results of Bibliographic Analyses of Ecological Papers
Our searches of the ecological literature identified 54 candidate journals (Table 1). Of these journals, 29 had published ≥1 paper addressing some aspect of the development or testing of a reference condition approach in freshwater ecosystems (Table 1). Of these journals, Hydrobiologia, J-NABS, and Freshwater Biology (FWB) published 60% of the 184 papers we identified and received 59% of the total citations. Of these 3 core journals, J-NABS and FWB papers had the highest mean annual citation rates. However, a few papers published in more general ecological journals had higher citations/paper and citations/y (e.g., Journal of Applied Ecology, Australian Journal of Ecology, Ecological Applications, Global Ecology and Biogeography, Ecological Applications, and Ecological Modelling).
Our broad survey of the literature showed that both regionalization and site-specific modeling approaches to establishing reference conditions have been frequently used (Table 2). We associated 413 papers with some type of biological assessment study; 217 studies were based on a classification approach (regionalization or typology), and 187 studies were based on a site-specific modeling approach. Only 9 papers described studies in which the reference condition was based on individual reference sites.
We identified 18 influential papers (Table 3). Among these, Karr (1991; Fig. 1) has been most frequently cited and also has the highest annual citation rate. Karr (1991) is a general treatment of the need for ecological assessments given environmental protection policy in the USA. This paper does not focus specifically on development of the reference condition, but Karr does discuss the importance of accurately specifying expected ecological conditions, a topic that received only cursory treatment in Karr (1981), a paper that set the stage for development of multimetric indices (MMIs). Wright et al. (1984; Fig. 1) described the initial analyses in support of the predictive modeling approach to bioassessment (i.e., River InVertebrate Prediction and Classification System [RIVPACS]; Moss et al. 1987; Fig. 1). Wright et al. (1984) is the single most frequently cited paper that specifically addresses prediction of the local biota expected under different natural environmental conditions. Of the remaining 16 papers, 10 built on Wright et al. (1984) and described aspects or refinements of the modeling approach, 3 explored aspects of the regionalization or typology approaches to setting reference expectations (Hughes et al. 1986, Barbour et al. 1996, Hawkins et al. 2000a4; Fig. 1), 1 treated the overall challenges in identifying reference conditions and applying them in a uniform manner (Stoddard et al. 2006), 1 paper described how paleolimnological data could be used in lakes to identify historical conditions for individual sites (Dixit et al. 1999), and 1 compared the performance of different types of biotic indices (Reynoldson et al. 1997; Fig. 1). Reynoldson et al. (1997) is the single most cited paper on bioassessment topics published in J-NABS. Its main focus was on comparing indices, but the paper was critical in establishing the need to tease apart the potentially confounding effects on bioassessments of how the reference condition is predicted from the ecological index used.
J-NABS papers have been an eclectic contribution to the development of reference condition ideas (Fig. 1, Table 3). Three of the 18 influential papers identified in Table 3 were published in J-NABS, but we identified 5 other J-NABS papers that either have influenced reference-condition ideas or that we think probably will do so in the future. Barbour et al. (1996) built on the pioneering paper of Hughes et al. (1986) in describing the implementation of a regionalization approach for setting biological expectations. Eleven years after Hughes et al. (1986), Reynoldson et al. (1997) described the first rigorous evaluation of the performance of 2 different index types (both of which depend on comparison with reference conditions). Three years later, Hawkins et al. (2000a) synthesized the results of several papers published in a special issue of J-NABS on landscape classification (volume 19, issue 3) that evaluated how well regionalizations and typologies accounted for natural biological variation among streams. Snelder et al. (2004b; Fig. 1) followed with an evaluation of perhaps the most comprehensive ecologically based typology of streams developed to date. In the same year, Chessman and Royal (2004; Fig. 1) described an approach to estimating reference conditions when no reference sites were available. Van Sickle (Van Sickle et al. 2005, Van Sickle 2008; Fig. 1) provided new tools to enhance the development, interpretation, and application of RIVPACS-type models. Most recently, Herlihy et al. (2008; Fig. 1) built on the synthetic work of Stoddard et al. (2006) and others to describe the challenges implementing a consistent approach to setting reference expectations at a continental scale. As evident in Fig. 1, there has been considerable activity over the last 25 y in developing and refining approaches to describe and predict ecological benchmarks in freshwater ecosystems. Figure 1 provides context for our more detailed review below of the contributions that these and other researchers have made to the theory and practice of ecological assessments.
Developing and Applying Benchmarks in Ecological Assessments
The need for control and replication in ecological assessments
Why are benchmarks needed in ecological assessments? Simply put, a benchmark should serve the same purpose as a control treatment in a manipulative experiment. Thus, benchmarks are one element of a study design that helps resource managers draw scientifically credible inferences regarding the ecological condition of the resources they manage. Like the controls used in a good manipulative experiment, benchmarks should accomplish 2 main tasks: 1) control for the effects of factors other than the ones being studied (e.g., natural factors vs anthropogenic stressors) and 2) provide sufficient replication to estimate the range of variation in the values of an ecological index that is associated with both natural variability and sampling variability. The extent to which we accomplish these 2 tasks will greatly affect the robustness of inferences regarding the ecological condition of a water body.
Experimental control and replication are fundamental requirements in any ecological assessment, but the specific way these design elements have been applied has been strongly influenced by practical constraints imposed by a heterogeneous landscape and by the statistical sophistication of practitioners. Tightly matched controls can be applied when conducting manipulative field experiments designed to evaluate the potential effects of one of more types of stressor or levels of a stressor on an ecological variable of interest (see review by Cooper and Barmuta 1993), but such highly controlled approaches are difficult if not impossible to use for post hoc assessments of the effects of environmental alterations that have already occurred (potentially over a period of ≥100 y) at individual sites (e.g., Underwood 1994). The need to conduct post hoc assessments at individual sites prompted development of approaches to experimental control that were designed to characterize better the sources and magnitude of natural variability that occur at the scale of entire ecosystems (e.g., Landres et al. 1999).
Two general approaches to matching control sites with an assessed site are frequently used in freshwater ecological assessments. The 1st approach requires the identification of nonperturbed reference sites that are assumed to be sufficiently similar to one another and the predisturbance condition of the assessed site that they can be used as replicates of fixed controls. The 2nd approach assumes that natural environmental and ecological variability among reference sites will be so great that their direct use as fixed replicates could mask human-caused effects (see following paragraph). Instead of using reference sites as fixed replicates, this approach uses biota–environment relationships derived from the reference sites to predict the most likely ecological reference condition at any individual assessed site (analogous to regression).
Replication is a critical component in any type of assessment, but its use in ecological assessment and monitoring has differed in important ways from its use in classic controlled experiments (i.e., analysis of variance [ANOVA]-type designs). Replication in controlled experiments assumes treatments are fixed, and replicates are used to estimate sampling error. That estimate of sampling error is then used to test statistically the likelihood that the observed control and treatment means are different. In ecological assessments of individual sites, the null hypothesis that control and treatment means are identical is not appropriate because there is only one treatment observation. It might be tempting to use replicate samples taken within an assessed site to conduct such a test, but the variance associated with within-site replicates typically will be smaller than the variance observed among reference sites (within-site + among-site variation), thereby invalidating the assumption of equal variance and, hence, the statistical test. If used in such tests, the replicate samples taken within a site represent pseudoreplicates (sensu Hurlbert 1984), and their use in a t-test (or similar test) would inflate the likelihood of observing a statistically significant difference in means. The appropriate hypothesis when assessing individual sites is whether the condition observed at that site is outside the range of index values expected among an appropriate set of reference sites (see McBride et al. 1993, Kilgour et al. 1998, Smith et al. 2005, Bowman and Somers 2006 for appropriate statistical approaches).
Precision, bias, and inferences in ecological assessments
Reliable inferences of ecological condition assume that predictions of the reference condition are both acceptably precise and unbiased. Ideally, the only variation associated with estimating the reference condition is that associated with random sampling variability, i.e., the variation in an ecological attribute that would occur among replicate samples taken at an individual site at the same time. In reality, sampling variability is usually confounded with 2 additional types of variation in ecological assessments: 1) natural, systematic variation that occurs over time at individual sites and that exists among the populations of reference sites used as spatial replicates and 2) systematic variation associated with prediction error. Because all natural ecosystems are inherently dynamic, they exhibit a range of conditions associated with temporal variation in both abiotic and biotic forces (Fig. 2). Short-term temporal variation can be associated with either predictable seasonal processes, such as life-history schedules of component species or effects of seasonal variation in climate, or stochastic events, such as individual floods or periods of extreme drought. Because of this seasonal variation in assemblage structure, and thus, metric or index values, assessments based on comparisons of samples taken in one season with reference expectations derived from other seasons will lead to biased predictions and incorrect inferences (e.g., Linke et al. 1999, Reece et al. 2001, Feio et al. 2006). Longer-term year-to-year differences in environmental conditions associated with climate can directly or indirectly affect annual variation in species composition, richness, or relative abundances via effects on the recruitment success and survivorship of individual species (McElravy et al. 1989, Bradley and Ormerod 2001, Scarsbrook 2002, Bêche and Resh 2007).
The effects of natural variation typically have been treated in different ways. Controlling for seasonal variability can be done, in part, by specifying a relatively short period within which samples are collected (e.g., an index period; Barbour et al. 1999; Fig. 1) or by pooling multiple samples taken across seasons to estimate the annual ecological conditions that characterize a site (e.g., Wright 2000). Year-to-year variation typically is considered to be an important component of the range of natural variability that characterizes a site. This type of variation is fundamentally different from random sampling variability in that it is clearly a consequence of reasonably well-understood causal processes. It is also an important component of ecological variation that should be used to assess whether sites are in reference condition (Landres et al. 1999). Spatial variation among reference sites is an outcome of the pervasive and multidimensional environmental heterogeneity that exists naturally across all landscapes and waterways. Thus, such sites can be, at best, coarse replicates of one another. In regionalization or typology approaches (i.e., any type of landscape classification), this variation is explicitly or implicitly assumed to represent part of the range of natural variation associated with a given water body type, and some lower value of that distribution of values is used as a benchmark below which an assessed site would be considered to be in nonreference condition. In site-specific approaches, this type of variation represents a potentially important confounding factor that can reduce estimates of precision and, therefore, should be taken into account by modeling.
Systematic prediction bias occurs as a consequence of nonrandom prediction errors and can occur in any type of prediction scheme. For example, given that we know that the distribution of biota varies strongly with stream size, a classification of sites that treats small and large reference streams as equivalent would probably either over- or underpredict the true reference value for any individual site. Such biased predictions also can occur in models that are developed to control for naturally occurring environmental variation among sites. Predictions at some sites will be inaccurate if the models imperfectly describe environment–biota relationships.
These issues of prediction errors have attracted increasing scrutiny as their existence and magnitude have become more apparent over the last ∼10 y. Most analyses of assessment error have focused on aspects of precision (Clarke 2000, Hawkins et al. 2000b [Fig. 1], Ostermiller and Hawkins 2004, Clarke and Hering 2006 [Fig. 1], Davy-Bowker et al. 2006 [Fig. 1], Nichols et al. 2006, Van Sickle et al. 2007, Stribling et al. 2008, Mazor et al. 2009). Recent research has been directed toward documenting and understanding the magnitude of systematic prediction errors (e.g., Ostermiller and Hawkins 2004, Hawkins 2006, Cao et al. 2007, Hargett et al. 2007, Rehn et al. 2007, Ode et al. 2008). These studies have made it increasingly clear that substantial variation in reference-site biota and the indices derived from them can be associated with natural gradients, even after modeling or adjusting for the effects of natural environmental variation. For example, as much as 30% of the variation in reference-site index values was associated with natural environmental gradients in the USA's recent National Wadeable Stream Assessment (USEPA 2006) (Table 4). The magnitude of this variation implies that assessment errors associated with biased predictions might be frequent and that the issue of bias needs much more attention than it has received.
A brief history of approaches used to estimate ecological benchmarks
Ecological benchmarks can be estimated in at least 4 ways: 1) use of the historical record, 2) extrapolation or interpolation from extant reference-quality sites of appropriate type, 3) hindcasting based on models that describe current stressor–indicator relationships, and 4) prediction derived from mechanistic models that describe how natural processes influence an ecological attribute of interest. Matching of existing reference-quality sites with assessed sites provides a means of estimating aspects of both assemblage composition (what taxa should occur) and structure (what abundances should occur) and is, by far, the main approach used in modern ecological assessments. Matching can be done in several ways, of which landscape regionalizations, typologies, and site-specific predictions are most common. Use of the historical record, hindcasting, and mechanistic modeling are much less well developed (discussed below).
Landscape regionalizations.—In an effort to produce ecological assessments broadly based on the design elements of control and replication, Hughes et al. (1986) advocated use of a regional reference site approach. In the regional reference site approach, some set of ecologically least-disturbed sites is identified within a relatively homogeneous terrestrial landscape (the controls), and the range of biotic attributes observed across this population of sites (the replicates) is used to estimate the range of natural variability that presumably would occur at an individual test site (a site of unknown condition) assuming it was in reference condition. If the value of an ecological index falls outside a threshold value determined from the distribution of reference site values, the site would be inferred to be in nonreference condition. Inferences regarding the condition of a test site initially were based on somewhat arbitrary and simple thresholds (e.g., the 10th or 25th percentile of reference-site values), but formal statistical tests (e.g., Kilgour et al. 1998, Bowman and Somers 2006) can be applied to such data to determine more rigorously whether an assessed site is in reference condition.
Hughes et al. (1986) proposed the use of ecoregions (Omernik 1987; Fig. 1) as landscape strata within which assessed sites would be compared with reference sites. The general rationale for using ecoregions as a basis for predicting aquatic biota was based, in part, on Hynes' (1975) view that streams are a product of the terrestrial ecosystems they traverse (see fig. 1 in Hughes et al. 1986). Hughes et al. (1986) suggested that if ecoregions are defined in terms of their climate, topography, geology, soil, and vegetation (Omernik 1987), then their streams should differ in water chemistry, flow regime, habitat structure, and food sources. In turn, these proximate environmental features should strongly influence the biotic character of streams. Thus, ecoregions were viewed as a way to partition (control for) the collective effects of the most important natural factors that control the distribution and abundance of aquatic biota. An important property of the ecoregion approach to regionalization is that the spatial units used for prediction are generally all unique, i.e., ecoregions are a type of geographic-dependent classification in which each class typically has only one member. Thus, no true replicates of ecoregions exist, and assessing the transferability of knowledge gleaned from one ecoregion (or set of ecoregions) to others is problematic.
The regional reference site approach developed nearly in parallel with a major effort by the US Environmental Protection Agency (EPA) to define and describe ecoregions within the USA (Omernik 1987), and subsequently, an ecoregional reference site approach was widely promoted by the EPA as a method for controlling, in whole or part, natural ecological variation when conducting biological assessments in the USA (Gibson et al. 1996 [Fig. 1], Barbour et al. 1999). A regionalization approach to defining reference conditions was subsequently applied elsewhere (e.g., European Commission 2000, Moog et al. 2004, Verdonschot and Nijboer 2004). Studies published in J-NABS did not lay the original foundations for the use of such landscape regionalizations in ecological assessments, but several papers that documented refinement of the approach and case studies of its application appeared in J-NABS (e.g., Lenat 1988, 1993, Barbour et al. 1996, Pan et al. 1996, 2000, Gerritsen et al. 2000, Maxted et al. 2000, Van Sickle and Hughes 2000, Waite et al. 2000).
Typologies.—The eventual recognition that almost any type of regionalization (ecoregion, drainage basin, political boundaries) was associated with about the same, and relatively low, amounts of biotic variation among sites (Hawkins et al. 2000a, Marchant et al. 2000, Sandin and Johnson 2000, Van Sickle and Hughes 2000, Waite et al. 2000, Herlihy et al. 2006, 2008, Herlihy and Sifneos 2008) implied that reassessment was needed of assumptions regarding which specific proximate features most strongly influence the distribution of aquatic biota (e.g., Power et al. 1988, Hawkins et al. 1997, Poff 1997, Mykrä et al. 2007) and what landscape and catchment attributes best predicted those specific features. This recognition anticipated development and testing of more finely resolved, geographic-independent classifications for streams based on spatially repeatable ecotypes or landscape typologies (e.g., Brierley and Fryirs 2000, Waite et al. 2000, Snelder and Biggs 2002 [Fig. 1], Fryirs 2003, Balestrini et al. 2004, Munné and Prat 2004, Snelder et al. 2004a, b, Higgins et al. 2005, Chessman et al. 2006, Seelbach et al. 2006, Schmitt et al. 2007, Orr et al. 2008). Although not initially a main thrust of stream classification research, such typological thinking had an earlier start in the hydrogeomorphic classification of wetlands (e.g., Brinson 1993 [Fig. 1], Brinson and Rheinhardt 1996, Shaffer et al. 1999). The mandate of the European Union Water Framework Directive (WFD) that ecological assessments must be comparable across member nations prompted several studies that assessed the effectiveness of different typologies in bioassessment (e.g., Lorenz et al. 2004, Verdonschot and Nijboer 2004, Dodkins et al. 2005, Ferréol et al. 2005, Aroviita et al. 2008, Turak and Koop 2008), especially regarding how to develop optimal typologies based on only those catchment and waterbody features that most strongly influence biota (e.g., Snelder et al. 2004b, 2007, Dodkins et al. 2005, Snelder and Hughey 2005, Heino and Mykrä 2006, Sánchez-Montoya et al. 2007, Aroviita et al. 2008). However, tests of the effectiveness of the typological approach in partitioning natural ecological variation have not produced encouraging results (Davy-Bowker et al. 2006, Inglis et al. 2008).
Papers published in J-NABS were among the first to explore the utility of a priori classifications other than ecoregions for bioassessment purposes (e.g., review by Hawkins et al. 2000a, Waite et al. 2000), and general ecological syntheses describing the ecological knowledge that underlies the conceptual foundations on which typologies are based were published in J-NABS (e.g., Power et al. 1988, Poff 1997). However, most of the recent advances in development and testing of typologies (see above) have been published elsewhere, although the pioneering analyses of Snelder et al. (2004b) is an important exception.
Site-specific prediction of reference conditions.—During the same time period that the regional reference site approach was being developed, an alternative approach to establishing reference conditions was being developed and refined in Great Britain. This approach was designed to derive nearly site-specific reference expectations by adjusting index values for the effects of natural environmental attributes that can vary both among and within regions. Moss et al. (1987) (see also Furse et al. 1984 [Fig. 1], Wright et al. 1984, Wright 1995 [Fig. 1], Clarke et al. 1996 [Fig. 1], 2003, Moss et al. 2001) developed statistical procedures (RIVPACS) to predict taxon-specific probabilities of detection (P) at different sites from naturally occurring environmental features. The statistical models on which RIVPACS-type systems are based are calibrated with observations made at many reference-quality sites, and the models relate taxon occurrences to multiple environmental gradients. In effect, this approach simultaneously models the niche relationships of many taxa.
In the original RIVPACS and its many refinements and derivatives (e.g., Reynoldson et al. 1995 [Fig. 1], Parsons and Norris 1996, Smith et al. 1999, Hawkins et al. 2000b, Simpson and Norris 2000 [Fig. 1], Joy and Death 2002, Davy-Bowker et al. 2006, Hawkins 2006, Kokeš et al. 2006, Feio et al. 2009, Poquet et al. 2009), no a priori classification of reference sites is used (unlike the regional reference or typology approaches). Instead, RIVPACS-type models predict P by first classifying reference-quality streams based on their taxonomic composition or assemblage structure (not their landscape or channel environmental attributes) and then applying a multivariate predictive model (e.g., linear discriminant functions models [LDMs]) to predict class membership from environmental variables that are generally invariant (unresponsive to anthropogenic activity). Because LDMs predict the probabilities that new observations belong to each of the different classes, the Ps at a site can be estimated by weighting the frequencies of occurrence of each taxon within each group by these probabilities of group membership. This weighting allows prediction of P continuously across the entire predictor variable space encompassed by the reference sites. In essence, the models interpolate the likelihood of collecting different taxa at a site from the reference-site groupings. Excessive extrapolation is prevented by determining if the predictor space occupied by an assessed site is outside that described by the reference sites (e.g., Moss et al. 1987, Clarke et al. 2003).
In the RIVPACS approach, taxon-specific Ps are used to derive ecological quality ratios (EQRs) (Clarke et al. 1996, 2003). For example, O/Etaxa is calculated by first summing the Ps to estimate the number of taxa expected (E) in a sample. The O/Etaxa index is the proportion of those specific taxa predicted to occur in the sample (i.e., E) that are observed (O). When used in conjunction with other ecological information, such as tolerance values, the Ps also can be used to estimate EQRs that assess if an observed tolerance-based biotic index value is different from expected (e.g., the Biological Monitoring Working Party [BMWP] indices; Clarke et al. 2003). In both cases, inferences regarding the ecological condition of a site are based on estimates of modeling error, i.e., the distribution of model residual values. In theory, the distribution of O/E values calculated at reference sites should have a mean of 1, and the variance of this distribution is a direct measure of the precision of a model. If these values are normally distributed, they allow the same type of statistical tests of whether an assessed site is in reference condition or not as previously mentioned (e.g., Kilgour et al. 1998, Bowman and Somers 2006). As discussed above, matching the spatial and temporal extents over which indices are calculated and how reference-site variability (i.e., modeling error + natural variation) is estimated should reduce Type I and II statistical errors of inference. Although EQRs typically have been estimated following the traditional RIVPACS modeling approach, EQRs (including O/Etaxa) also can be calculated from an a priori classification of sites (e.g., Aroviita et al. 2008, 2009). Moreover, in the null model approach developed by Van Sickle et al. (2005) as an aid in evaluating the performance of RIVPACS-type models, O/E is calculated by assuming that all reference sites belong to a single regional class.
As with the regional reference site approach, foundational research in developing site-specific reference conditions was not published first in J-NABS. However, J-NABS has published the 2nd-most papers that the ISI Web of Knowledge (as of July 2009) identified from the keywords RIVPACS or AusRivAS (Australian River Assessment System) (15 of 106 papers). Moreover, a J-NABS paper (Reynoldson et al. 1997) has been cited more often (201 citations) than any of these other papers. Many of the J-NABS papers on RIVPACS-type topics focus on ways to improve predictions (Van Sickle et al. 2005, 2006 [Fig. 1], Cao et al. 2007, Mykrä et al. 2008b), better estimate and interpret prediction errors (Ostermiller and Hawkins 2004, Van Sickle et al. 2005, Yuan 2006, Ode et al. 2008, Yuan et al. 2008), or derive more sensitive indices based on model outputs (Van Sickle 2008). Other papers describe case studies that illustrate the utility and potential of a site-specific reference condition approach (Sloane and Norris 2003, Hose et al. 2004, Cao et al. 2007).
In contrast, J-NABS has not led in the publication of RIVPACS-type assessments in which methods other than biotic-based site classification coupled with LDMs are used to predict probabilities of capture. Some recent work has focused on predictive schemes that do not require a preclassification step. This work appears to have been stimulated primarily by the argument that the classification step used in RIVPACS might not minimize prediction errors. For example, Joy and Death (2003) and Olden (2003) explored the effectiveness of linking the outcomes of many taxon-specific models to estimate Ps. Others have used artificial neural networks (ANNs) to jointly predict multitaxa Ps directly from environmental data (e.g., Joy and Death 2005, Olden et al. 2006). Still others have developed approaches that identify that subset of reference sites that are best physically matched with a test site of interest—a type of site-specific typology (e.g., Linke et al. 2005, Prins and Smith 2007, Chessman et al. 2008). These modeling techniques generally exhibit desirable performance in precision, accuracy, or sensitivity (i.e., ability to identify a site as being in nonreference condition) but have yet to be widely incorporated into practice, perhaps because their results do not show uniform improvement over traditional RIVPACS models or because the improvements are marginal.
Back to the future: the regional reference site approach revisited.—Estimation of site-specific reference conditions through modeled interpolation circumvented one of the major concerns that Hughes (1995) had with site-specific biological criteria—that site-specific reference expectations derived from individual sites could not be adequately extrapolated to other sites. Even though proponents of the regional reference site approach clearly recognized that “Ecoregional reference sites still require some level of habitat classification at the site scale” (Hughes 1995, p. 35), there was concern that each test site would have to be matched to ≥3 appropriate reference sites (i.e., replication based on matching specific site characteristics, such as catchment size or channel type), and the cost of sampling 150 to 300 reference sites each year (assuming a management agency assessed 50–100 sites/y) would be exorbitant. On the other hand, Hughes et al. (1994, p. 138) earlier had acknowledged that “Use of regional reference lakes or stream reaches (Hughes et al. 1986) to develop water resources criteria is hindered by ecoregion heterogeneity.” Hughes et al. (1994, p. 138) further stated that “…sites with substantial natural differences…in the same ecoregion should not have the same set of reference sites.”, and “If (subregionalization is impossible), reference sites for each type of natural gradient, substrate, or water quality should be selected.”
The fact that comparisons of biological assessment methods have been confounded by mixing the approach to predicting reference conditions with the type of index used (e.g., Reynoldson et al. 1997, Hawkins et al. 2010) is perhaps the main reason that consensus did not emerge from the index wars of the mid 1990s and early 2000s regarding how best to conduct ecological assessments (e.g., Gerritsen 1995, Norris 1995, Karr and Chu 2000, Norris and Hawkins 2000). Recent studies have demonstrated that the modeling approach is effective when applied to the development of MMIs (e.g., Oberdorff et al. 2001 [Fig. 1], 2002 [Fig. 1], Baker et al. 2005, Pont et al. 2006, Cao et al. 2007), and other investigators have started to evaluate the independent effects of prediction method and type of ecological index on the performance of ecological assessments. For example, Mazor et al. (2006) showed that RIVPACS or BEAST (i.e., Reynoldson et al. 1995) modeling improved assessment sensitivity, and both Hawkins (Hawkins 2006, Hawkins et al. 2010) and Davy-Bowker et al. (2006) have shown that modeling can improve the performance of different types of indices. Aroviita et al. (2009) recently showed that the performance of O/Etaxa indices derived from traditional RIVPACS-type models and a priori typologies were similar when indices were based on reference sites that spanned a limited geographic range, but that modeled indices performed better than typologies when geographic range increased.
Does the taxonomic group matter?—An implicit assumption regarding the general utility of any type of a priori classification in predicting reference-state biota is that the classification will be applicable to all taxonomic groups. Differential response of biota in different taxonomic groups to natural environmental gradients would require development of separate taxon-specific classifications to assess the condition of >1 taxonomic group. The degree to which different taxonomic groups show concordant patterns of variation across natural environmental gradients should provide insight regarding this assumption. Several studies have now addressed this issue (e.g., Paavola et al. 2003, 2006, Grenouillet et al. 2008, Infante et al. 2009, Virtanen et al. 2009). In general, these studies show that concordance is usually weak, although sometimes statistically significant. Fish, invertebrate, and plant taxa tend to respond most strongly to different natural environmental gradients and to different human-caused alterations to those environments (e.g., Carlisle et al. 2008, Mykrä et al. 2008a). The lack of strong concordance among taxa in their responses to both natural and human-influenced environmental gradients seriously constrains the application of regionalizations and typologies as a general tool in ecological assessments.
The issue of variable reference-site quality.—One of the most vexing issues that continue to complicate use of reference site data for predicting ecological benchmarks at test sites is that no sites are truly pristine any longer, and the degree of ecological alteration present at the least-disturbed sites can vary markedly both between and within regions (Stoddard et al. 2006, Herlihy et al. 2008). The consequence of using reference sites of variable quality is that assessed sites probably will be held to different standards. Thus, direct comparisons among sites is problematic at best and potentially difficult to justify to the water-policy regulated community (e.g., industry, agriculture). The issue is also problematic from a resource conservation perspective because as conditions progressively degrade over time, society's expectations regarding what is natural or acceptable can change. This shifting baseline syndrome has attracted considerable attention with respect to the conservation and management of marine resources (Pauly 1995, Dayton et al. 1998, Greenstein et al. 1998, Pinnegar and Engelhard 2008) and has similar implications for establishing expectations for freshwater ecosystems. The recent National Wadeable Stream Assessment (WSA) in the USA provides a clear illustration of how problematic this issue is (Herlihy et al. 2008). Furthermore, human-caused climate change will exacerbate the shifting baseline problem because even the most pristine ecosystems that currently exist will be affected. Even if we have already sampled many pristine ecosystems, climate change will confound our ability to assess the natural variability of these systems and, thus, will compromise our ability to specify accurately the true ranges of natural variability when conducting ecological assessments.
Theoretically, the reference condition of water bodies can be estimated by various types of statistical hindcasting. Such techniques are fundamental to the reconstruction of lake environmental conditions from fossil biotic assemblages (e.g., Birks et al. 1990 [Fig. 1], Birks 1998, Simpson et al. 2005). The use of these techniques in lakes is aided by the presence of data-rich sediment cores that can be dated and that allow either quantitative or qualitative identification of the assemblage structure present prior to anthropogenic alterations. The structure of these prealteration assemblages provides a direct estimate of the biotic reference condition for individual lakes, and the use of transfer functions (reviewed by Birks 1998) that relate biotic structure to environmental conditions across modern lakes (i.e., surface sediments) can be used to infer the prealteration water quality (e.g., pH, P concentrations, water temperature) at individual lakes.
Lack of fossil records generally precludes routine use of paleolimnological methods in flowing-water ecosystems (but see Thoms et al. 1999, Gell et al. 2005), but attempts have been made to hindcast reference conditions by adjusting statistically for the relationships observed between biota and landuse alteration and related stressors. In the WSA, an attempt was made to hindcast reference conditions to more natural and, hence, comparable benchmarks by regressing MMI values observed at reference sites against the degree of alteration occurring across those sites. In this case, Herlihy et al. (2008) used a synthetic stressor gradient derived from a principal component analysis to quantify the joint variation in several stressor variables. This attempt to hindcast appeared to be only partially successful in generating benchmarks of equivalent quality because streams in the heavily altered central portions of the USA were estimated to be in better ecological condition than were streams in the less-altered Eastern Highlands.
We found 5 other papers (Dodds and Oakes 2004, Baker et al. 2005, Kilgour and Stanfield 2006, Robertson et al. 2006, and Angradi et al. 2009) that describe conceptually similar approaches to hindcasting but that used different analytical techniques to estimate reference conditions. Dodds and Oakes (2004) used analysis of covariance with ecoregions as categories and % anthropogenic land use as covariates to predict background (reference) nutrient concentrations. Baker et al. (2005) used stepwise multiple linear regression (MLR) to predict fish, water-quality, and stream-habitat metrics from a suite of natural and anthropogenic catchment variables and then adjusted metric values to 0 levels of catchment alteration. Kilgour and Stanfield (2006) also used stepwise MLR of biotic and stream habitat variables on landuse variables to estimate reference conditions at levels of 0 land use. Robertson et al. (2006) used spatial regression-tree analyses to delineate zones of similar reference concentrations of P and suspended sediment. Angradi et al. (2009) used quantile regression to hindcast fish MMI values to 0 stress levels. In all of these examples, the authors acknowledged several potential problems in applying such models. Perhaps the most problematic issue is that of extrapolating beyond the range of calibration data, a “damned if you do, damned if you don't” situation. By extrapolating beyond the range of the calibration data, we risk serious prediction errors; by not extrapolating, we have no estimate of what historical reference conditions were like.
Chessman and Royal (2004) explored a different approach to the variable-quality reference site dilemma. These authors built on the environmental filters concept (Poff 1997), which assumes that key environmental features screen potential colonists based on the ecological traits (tolerances) possessed by each taxon. If the original or desired environmental reference conditions at a test site can be estimated, it should be possible to predict the set of taxa that should occur at a site based on knowledge of their environmental tolerances. A comparison of the observed taxa with the predicted taxa provides an assessment measure. This approach is conceptually appealing because it is grounded in ecological first principles and, therefore, should be widely applicable and transferable. However, that it is a first principles approach also might be its chief limitation because its performance will depend critically on how well we understand the adaptive traits of individual species (e.g., Lytle and Poff 2004, Poff et al. 2006) and how well we can describe environmental filters in a way relevant to how multiple species perceive them. This approach has yet to be tested thoroughly, but it might be the only approach with the potential to overcome the issues associated with statistical extrapolation described above. Other research has shown that ecological indices based on functional ecological traits can discriminate between human-stressed and reference sites (Gayraud et al. 2003) and that trait structure might be reasonably stable across reference sites and, thus, potentially much easier to quantify and predict than taxonomic composition (e.g., Bady et al. 2005, Statzner et al. 2005). However, Statzner et al. (2005) acknowledged that ecologists have not developed a strong predictive theory regarding the ecological reasons for the spatial and temporal variation in trait structure that has been observed among sites (see Dolédec and Statzner 2010).
Predicting Physical-Chemical Reference Conditions
The accuracy and precision with which we can establish an ecological benchmark is, in part, determined by how well we characterize and predict the natural physical and chemical conditions expected at different sites within diverse landscapes. Many of these natural conditions at a site can be affected by anthropogenic activity, so we cannot use field measurements to predict the reference-condition biota. The use of coarse surrogates, e.g., elevation for temperature, allows us to adjust predictions of biota for differences in naturally occurring environmental gradients among sites to some degree. However, such surrogates are often imprecise predictors, at best, of the environmental factors that most strongly influence the abundance and distributions of biota (e.g., local temperature, substrate, flow, and water chemistry). If we can estimate the physical-chemical reference state in terms of specific physical-chemical attributes, we should be able to predict more accurately the reference-state biota expected at a site. For example, we have developed RIVPACS-type models to predict stream biota that use the output of stream temperature models as a predictor. These RIVPACS models more accurately predict the composition of stream invertebrates assemblages than do models that use elevation, latitude, and catchment area as surrogates for stream temperature (RAH and CPH, unpublished data).
Considerable potential now exists for stream ecologists to take advantage of research conducted across various subdisciplines of the watershed sciences. Researchers studying stream temperature, hydrology, water chemistry, and channel geomorphology and substrate have made significant advances in developing methods that might be useful in predicting the physical-chemical conditions expected under reference conditions. In the following 4 sections, we describe these advances and discuss how they are being, or might be, used in ecological assessments.
Predicting thermal reference conditions
Temperature strongly influences the distributions of ectotherms (reviews by Allan 1995, Begon et al. 1996) through its effect on metabolic and growth rates (Newell and Minshall 1978, Merritt et al. 1982), phenology (Sweeney and Vannote 1981), and fecundity (Vannote and Sweeney 1980). Therefore, accurate and precise predictions of the thermal conditions that characterize reference sites should lead to better ecological assessments.
Stream temperature researchers have approached temperature prediction mainly with 2 of the approaches used by stream ecologists to predict overall reference conditions: single-site and site-specific predictions (Table 2). We found no examples in the literature that used regionalization to characterize the thermal environment of streams. Site-specific predictions can be further divided into 2 modeling approaches and 2 spatial domains: 1) physical single-site, 2) physical multisite, 3) empirical single-site, and 4) empirical multisite. Physical single-site models typically are based on energy balance equations, whereas empirical single-site models most often relate water temperature to air temperature at a nearby temperature station. Physical multisite models typically are designed to predict temperature at many unmeasured locations within a single catchment and require measurement and parameterization of several local environmental factors. Empirical multisite models typically relate stream temperature to landscape features known to affect or covary with stream temperature, e.g., latitude, drainage area, or elevation.
Typologies.—The typological approach has not been used extensively to predict stream temperatures (Table 2). Snelder and Biggs (2002) proposed a hierarchical classification of New Zealand streams based on their climatic variability (air temperature and precipitation) and source of flow. Their classification has been tested for its ecological (Snelder et al. 2004a, b) and hydrologic (Snelder et al. 2005) predictive power, but has not been tested for how well it predicts stream temperature. Nelitz et al. (2007) used classification and regression-tree analysis to classify 104 streams in British Columbia, Canada, into 6 summer maximum weekly average temperature (MWAT) classes based on catchment size, catchment elevation, and air temperature. For each class, they used Bayesian regression to predict the probable response of MWAT to forest management practices, such as road building. Another typological approach to characterize stream temperatures is to classify streams based on the thermal preferences of the biological assemblage found within them (e.g., Lyons 1996, Wehrly et al. 2003), i.e., the fauna serve as surrogates for temperature. However, this latter approach has no utility in predicting reference-condition temperature for ecological assessment purposes because of its obvious circularity.
Site-specific models.—Site-specific physical stream temperature models emerged in the 1960s (Edinger et al. 1968, Brown 1969) largely in response to the need to assess the effects of forest and catchment management practices on aquatic ecosystems (Brown 1970). This research has produced a significant body of literature (Table 2) and some software packages (e.g., SNTEMP by Theurer et al. 1984, CEQUEAU by Morin et al. 1987). The physical-based model of Theurer et al. (1984) was designed to predict both spatial and temporal variation in stream temperature within an entire stream network. Physical-based models have contributed significantly to our understanding of the processes that affect stream temperature (reviewed by Caissie 2006, Webb et al. 2008) but have not been used much in ecological assessments because they require detailed measurements of numerous fine-scale environmental variables, such as local wind speed, relative humidity, and ground reflectivity. However, Allen et al. (2007) developed a model (BASINTEMP) based on a simplified characterization of the main physical factors that influence stream heat budgets (solar radiation input, vegetation cover, groundwater input) that produces site-specific predictions within a geographical information system (GIS) framework. BASINTEMP predictions are constrained to locations within a single stream network, but this model was an important step toward development of physically based models that could be applied to larger regions.
Initial development of single-site, empirical models also occurred in the 1960s. The first models were based on simple sinusoidal functions fit to annual seasonal variation in monthly mean stream temperatures (Ward 1963). Later, linear and nonlinear regression models were developed that related stream temperature to air temperature recorded at a nearby weather station to create temporally continuous predictions of water temperature (e.g., Cluis 1972, Mohseni et al. 1998). These latter models have been used to examine the potential effects of climate warming on specific streams (Mohseni et al. 2003).
Less emphasis has been placed on developing models that predict how temperatures vary across streams (Table 2), and many of these models were developed primarily in support of ecological analyses. Miyake and Takeuchi (1951) regressed monthly mean temperature from 20 rivers in Japan against air temperature, and their model appears to be the first example of an empirical model capable of predicting at multiple sites. Over the last 30 y, several authors have used linear regression to model various measures of stream temperature (i.e., total annual degree days, daily, monthly, or annual means) as functions of latitude (Vannote and Sweeney 1980), air temperature (Walker and Lawson 1977, Smith 1981, Stefan and Preudhomme 1993, Eaton and Scheller 1996, Pilgrim et al. 1998), or elevation (Rahel and Nibbelink 1999). Still others have used fish distributions to determine elevation isoclines of fish occurrence, which are assumed to be predictive of stream temperature (Meisner 1990, Rieman and McIntyre 1995). In addition to latitude, air temperature, and elevation, more recent research has incorporated local and catchment factors into models (e.g., catchment area, catchment slope, soil characteristics, landcover) (Jones et al. 2006, Wehrly et al. 2006).
Statistical approaches other than linear regression also have been evaluated. Gardner et al. (2003) used a geostatistical technique (kriging) to interpolate temperatures between measured sites within a catchment. Guillemette et al. (2009) used canonical correlation analysis to plot 20 stations in environmental space (catchment and site physical attributes). Once stations were projected into environmental space, the monthly maximum stream temperatures were interpolated between stations to predict temperature at unmeasured sites based on their physical attributes. Wehrly et al. (2009) compared 3 statistical techniques (MLR, generalized additive models, and linear mixed models) to predict July mean stream temperatures from catchment and site characteristics and found little difference between methods. The root mean square error reported for the 3 models was 2.0 to 2.6°C, which are comparable to errors reported in other studies (e.g., Smith 1981, Pilgrim et al. 1998). Ordinary kriging also was tested but did not perform as well as the other techniques (RMSE = 2.95°C; Wehrly et al. 2009).
Future research needs: challenges and opportunities.—Because temperature is so critical to stream biota, it is imperative that we accurately describe the thermal conditions that characterize reference streams. Many of the empirical multisite models we reviewed were developed within a single catchment, a small region, or in terrain, such as the central USA, that is topographically homogeneous. Our own analyses relating stream temperatures to GIS-derived catchment and local predictors show that complex topography, like that found in many parts of the western USA, can contribute greatly to the heterogeneity of local environmental conditions and, therefore, to errors associated with temperature prediction (RAH and CPH, unpublished data). Physical models have greatly improved our understanding of how local environmental conditions control stream temperature, and thoughtful inclusion of those factors into multisite, spatially explicit models should improve their performance and allow large-scale spatial prediction of stream temperature as functions of both natural and anthropogenic influences.
Predicting hydrologic reference conditions
Hydrologic regime is another important aspect of the habitat template that influences biotic assemblages. Thus, improving characterization of hydrologic reference conditions expected at sites should improve ecological assessments. Stream flow can be described in many ways, but magnitude, timing, and variability appear to be particularly important to biota (e.g., Gustard 1984, Poff et al. 1997). Hydrologists were motivated initially by a need for landscape-scale predictions of flood risk, and among scientists in the 4 subdisciplines reviewed here, hydrologists have applied the widest variety of techniques to make predictions. However, empirical models and simple physically based rainfall–runoff models with empirically estimated parameters have become the prevailing approaches to making site-specific predictions of flow because of a lack of flow data and the multiscale complexity inherent in catchment dynamics. In addition, modeling has expanded from predicting one aspect of flow to predicting the entire hydrograph over time as the need for predictions has moved from assessing flood potential to predicting available water supplies, hydropower potential, and ecological flows.
Regionalizations.—Predictions of hydrologic regime began in the 1930s and 1940s with delineations of regions with homogeneous flow characteristics, topography, and climate (Krasovskaia 1997). Our use of the term, regionalization, as the development and identification of geographic regions that are homogeneous with respect to physical-chemical attributes differs from how the term is used in the hydrology literature (i.e., development of a regionally applicable model). For consistency throughout our paper, we use the former meaning of the term. Mosley (1981) derived hydrologic regionalizations via cluster analyses of hydrologic data. Others (Singh 1971, Cunnane 1988) have predicted flow for regions with reasonably homogeneous flow characteristics by applying regionally representative flows to all catchments in those regions. The regionalization approach to flow prediction does not appear to be widely used (Table 2), but the approach is still used for prediction in some cases (e.g., Chen et al. 2006).
Typologies.—Juncker (1971) developed a global hydrologic typology to predict hydrologic regimes, but since then relatively few studies have explored this approach (Table 2). Most typologies classify catchments by their physical characteristics via cluster analysis (Acreman and Sinclair 1986). The Ecological Limits of Hydrologic Alteration framework (ELOHA; Henriksen et al. 2006, Apse et al. 2008) uses a typology to predict flow regime. Snelder et al. (2005) developed a flow regime typology that explained significantly more variation in flow among sites than did a regionalization approach.
Site-specific predictions.—Hydrologists recognize that most landscapes are too heterogeneous for effective regional classification, so they have focused on development of site-specific prediction methods that can be applied across regions (Table 2). Interpolation or mathematical modeling is used to predict flow at ungauged sites from data collected at gauged sites. Flow regimes of gauged sites have been interpolated to ungauged sites in 2 main ways (reviewed by Holmes et al. 2002): 1) spatial interpolation from data collected at those gauges that are closest to the ungauged site (e.g., Skoien et al. 2006) or 2) interpolation based on flow records collected at streams that are most similar in physical characteristics to the ungauged site (interpolation in environmental space; e.g., Acreman and Wiltshire 1989, Burn 1990). We found few studies that based predictions on interpolation approaches. Empirical and physical models were used more commonly, and both approaches build on a proposal by Nash (1960) to estimate unit hydrograph parameters from catchment characteristics via regression. The choice between empirical and physical models typically has depended on the level of detail required (NERC 1975). Stratifying natural variation among sites by applying a typology often improves predictions of both empirical and physical models (Tasker 1982, Burn and Boorman 1993), especially in hydroclimatically diverse regions (Sanborn and Bledsoe 2006).
Thomas and Benson (1970) built on earlier empirical models developed for different regions and regressed 71 different flow indices nation-wide on 21 physical catchment characteristics. Others expanded the use of empirical models by introducing weighted and generalized least squares regression (Stedinger and Tasker 1985), nonlinear models (Pandey and Nguyen 1999), ANNs (Nayebi et al. 2006), and Random Forests (Carlisle et al. 2009). Empirical models avoid some important problems associated with physical models, such as model structure uncertainty and difficulties in identifying unique parameters, and are considered more robust than physical models (Croke and Norton 2004).
In earlier studies, hydrologists developed process-based physical models to predict flow at ungauged sites by parameterizing models for ungauged sites with parameters from models developed at gauged sites. This approach accounted for 49% of the hydrology publications we examined. Originally, physical models were calibrated with data from all gauged catchments in a region, and then model parameters were statistically related to catchment attributes, such as climate, topography, land use, geology, soils, or composites of these variables (Sefton and Howarth 1998). These attributes could then be measured at an ungauged site and the statistical relationships could be used to estimate the parameters for the model at the ungauged site (Benson and Matalas 1967, Ross 1970, Heerdegen and Reich 1974, Jarboe and Haan 1974, NERC 1975). Various physical models have been used; these models range from the simple lumped conceptual rainfall–runoff models IHACRES (Sefton et al. 1995) and SYMHID (Peel et al. 2000) to the more complex spatially distributed Stanford Watershed (Ross 1970), HBV (Seibert 1999), and SWAT models (Heuvelmans et al. 2006). Fernandez et al. (2000) refined this approach by simultaneously maximizing the fit of model parameters to catchment characteristics and gauge data. Physical models also can be calibrated to ungauged sites by spatially interpolating parameter values from gauged sites, but this approach has produced mixed results (Merz and Bloschl 2004, 2005, McIntyre et al. 2005, Oudin et al. 2008). Other refinements include constraining parameter space during model calibration (Gotzinger and Bardossy 2007), relating model parameters to catchment attributes using weighted or sequential regression (Kay et al. 2006), and making predictions from ensembles of models (McIntyre et al. 2005). Increasingly, site-specific predictions are made with either complex physically based models that exploit spatially extensive fine-scale data (Ivanov et al. 2004, Immerzeel and Droogers 2008) or new techniques for estimating model parameters, like ANNs (Heuvelmans et al. 2006).
Future research needs: challenges and opportunities.—Since 1960, hydrologists have developed multiple techniques that potentially could be used to predict the reference hydrologic regime for any site within a given region. Recent research has focused on how to quantify and reduce the uncertainty associated with these predictions (Sivapalan et al. 2003) and how to apply predictions to ecological assessments (Henriksen et al. 2006, Sanborn and Bledsoe 2006, Kennen et al. 2008). Sivapalan et al. (2003) established the International Association of Hydrological Sciences (IAHS) Decade on Predictions in Ungauged Basins to focus research efforts on measuring and reducing prediction uncertainties, and several researchers have begun to address this challenge (e.g., Wagener and Wheater 2006, Zhang et al. 2008). Sanborn and Bledsoe (2006) used an empirical model to predict 84 streamflow metrics (with estimates of uncertainty) that describe flow regime at 150 ungauged sites across 3 states in the US. They also explained how these predictions could be used to: 1) improve ecological assessments by predicting flow characteristics affecting taxa distributions, 2) assess causes of impairment by quantifying flow disturbances, or 3) design a restoration that maximizes recruitment of cottonwood trees. Kennen et al. (2008) estimated parameters for a physical model from spatial data (i.e., topography, soils, impervious surface cover, precipitation, and temperature) and then predicted reference flow conditions. They compared reference flow to current flow conditions and related these differences in flow to macroinvertebrate assemblage composition. Advances in hydrologic modeling have made predictions of the hydrologic regime possible, and these predictions have great potential to improve models of biotic–environment relationships. We see no reason why these hydrologic models should not be used more frequently in the future to provide better understanding of the reference condition of streams.
Predicting geomorphic and bed sediment reference conditions
Structural habitat is a key factor that determines the spatial distribution of aquatic taxa and biotic assemblages. Here, we define structural habitat as the channel geomorphic characteristics and sediment size distributions that compose the physical environment upon and within which stream biota live (Minshall 1984). Stream geomorphic structure and sediment size are important to biota, so ecological assessments should benefit from their improved prediction at unmeasured sites. Studies of stream structural habitat often consider aspects of channel planform (e.g., straight, meandering, or braided), stream channel width, channel depth, channel slope, and streambed sediment size distributions. The physical characteristics of streams change with stream size (Leopold et al. 1964) and landscape setting (Lotspeich 1980).
Regionalizations.—We found no examples of attempts to classify whole regions with respect to either the geomorphic or sediment characteristics expected in streams (Table 2). However, some researchers have used regionalization as a first step towards a hierarchical classification of channel types (e.g., Maxwell et al. 1995).
Typologies.—Geomorphologists recognized early that streams and rivers show systematic patterns in structure that are amenable to classification (Gilbert 1877, Davis 1890). Numerous classification systems have been proposed (Table 2) based on stream size (Horton 1945, Strahler 1957), channel planform (Leopold and Wolman 1957), or the dominant geomorphic processes that form streams (Schumm 1977). More recently, Lotspeich (1980) proposed a hierarchical framework for catchments, and Frissell et al. (1986) refined this classification by adding finer nested levels of geomorphic units (catchment > stream system > valley segment > reach > pool/riffle > microhabitat). Much research in the 1990s focused on describing the structure of these units and the processes that create them, e.g., valley segments (Nanson and Croke 1992), reaches (Rosgen 1994, Montgomery and Buffington 1997), and pools and riffles (Hawkins et al. 1993). To the best of our knowledge, no typologies have been proposed that focus exclusively on differences in substrate size among reaches.
Several classification systems have been suggested as management tools for aquatic resources (e.g., Maxwell et al. 1995). These classification systems were designed to inform land managers about how different stream classes should respond to management activities (Downs 1995), were often created for a specific region (Paustian et al. 1992), and can include stream condition as part of the classification scheme (Rosgen 1996). Kondolf et al. (2003) provide a comprehensive discussion of geomorphic classification systems and their purposes and uses.
Site-specific predictions.—Physical models that treat sediment size as an independent variable to predict channel slope, sediment movement, and yield (Dade and Friend 1998) can be rearranged to predict sediment size at unmeasured locations. However, few studies have attempted to make site-specific predictions of sediment size from physical relationships (Table 2). Here, we briefly review those efforts and describe several recent attempts to develop empirical models that predict sediment characteristics for use in aquatic assessments.
Montgomery et al. (1998) suggested that Shields' equation (Shields 1936), which describes the shear stress needed to move the median sediment size (D50) in a stream reach at bankfull discharge, can be rearranged to predict D50 from bankfull depth and channel slope. Buffington et al. (2004) used an approach similar to analysis of covariance to show that certain geomorphic river classes (Montgomery and Buffington 1997) have smaller sediment sizes than predicted from Shields' equation. This analysis was used to adjust sediment size predictions in 3 river networks in Washington, USA. Buffington et al. (2004) used drainage area to estimate bankfull depth and a 10-m-resolution digital elevation model (DEM) to derive channel slope. Kaufmann et al. (2009) rearranged the Shields' equation to estimate relative bed stability (RBS) in 101 streams in the Pacific Coastal Range, USA. These authors related RBS to a gradient of land use and found that reference-condition streams tended to be more stable at high flows. In addition, Kaufmann et al. (2009) used MLR to predict RBS values from field and GIS-derived variables (R2 = 0.62). The selected predictor variables included basin lithology, a surrogate for unit stream power (Aws0.5S, where Aws is basin area and S is the channel slope), and management-related factors (road density, a riparian condition index), results implying that reference-site sediment character depends largely on discharge (function of watershed area) and basin geology.
Several authors recently attempted to predict sediment size with empirical models that used GIS-derived predictors. Flores (2004) regressed sediment D50 estimates from stream sites in Oregon and Colorado, USA, against DEM channel slope, drainage area, elevation, 2 catchment lithology variables, 2 valley confinement variables, and mean precipitation. His model explained 84% of the variation in D50, but probably suffered from some overfitting, given the sample size (n = 39) and number of predictors (8). Two other studies (Mugodo et al. 2006, Frappier and Eckert 2007) used a suite of GIS variables to predict various local habitat features including sediment characteristics (e.g., % sand, % cobble, % gravel). Both models suffered from large prediction errors. However, the modeled estimates of sediment size in Mugodo et al. (2006) predicted the distributions of 7 fish species as effectively as field measurements of sediment size, results implying that their modeled sediment sizes were predicted well enough to be ecologically useful.
Future research needs: challenges and opportunities.—Use of geomorphic typologies can aid in conducting inventories of aquatic resources (Brenden et al. 2007) and predicting the biotic assemblages expected at unvisited locations (Naiman 1998). Advances in GIS have allowed deployment of various typology schemes that account for a suite of habitat characteristics, including geomorphology, and that have promise for use in bioassessments (e.g., Snelder et al. 2004b).
Future progress in sediment size prediction will require expansion of the scale of investigations (i.e., larger sample sizes and geographic extents) and incorporation of local processes known to affect sediment size. However, incorporation of local processes at the spatial scale required for many ecological assessments might be challenging, especially for small streams where stochastic events, such as a single tree fall within the channel, can have a disproportionately large effect on the geomorphic structure when compared with downstream reaches (e.g., Swanson et al. 1976, Keller and Swanson 1979, Gooderham et al. 2007). Numerous studies have examined the factors controlling downstream variation in sediment size in individual rivers (e.g., Brierley and Hickin 1985, Hoey and Bluck 1999, Rice 1999). Recently, these papers have emphasized the importance of local natural controls that disrupt the idealized trends in the downstream reduction in sediment size (Ferguson et al. 2006). Local controls can include lateral sediment contributions from hill slopes (Benda and Dunne 1997, Bigelow et al. 2007) and tributaries (Rice 1998, reviewed by Benda et al. 2004), adjustments in flow and channel competence resulting from large woody debris inputs (Massong and Montgomery 2000, Gomi et al. 2006, Kaufmann et al. 2009; see Van Sickle and Gregory 1990, Sobota et al. 2006, Wohl and Jaeger 2009 for models of large wood recruitment to streams), fine sediment retention by small mountain lakes (Arp et al. 2007), and local geology (Mureşan 2009). From a practical application context, it is important to note that each of the natural processes listed above potentially could be mapped in a GIS. In addition, these processes have received much attention recently, and significant progress has been made toward understanding why, where, and how much each factor affects local stream sediment size. Additional progress toward predicting reference sediment characteristics will require explicit incorporation of these natural processes in modeling efforts.
Predicting water-chemistry reference conditions
Water chemistry, defined as the concentrations of major cations (e.g., Ca2+, Mg2+), major anions (e.g., Cl−, SO4−), and nutrients (e.g., N, P), is important as a direct measure of water quality and as a factor structuring freshwater assemblages. For example, Egglishaw and Morgan (1965), Sutcliffe and Carrick (1973), and Minshall and Minshall (1978) showed that streams with low ionic strength and pH had low taxon richness and abundances, and experiments done by Willoughby and Mappin (1988) showed that water chemistry restricts the distributions of some taxa. Therefore, describing water chemistry expected under reference conditions should improve our ability to describe the ecological benchmarks appropriate under different landscape settings.
In the 1980s and 1990s, most predictive modeling efforts for water chemistry focused on the effects of acid rain on water chemistry, and >½ of the publications we reviewed addressed this one aspect. Acidification research was directed mainly toward development of physical models that were calibrated with measured water-chemistry and atmospheric inputs and that were designed to make temporal predictions at individual sites (reviewed by Reuss et al. 1986, Christophersen et al. 1993, and Breuer et al. 2008). Exceptions to this tendency include development of regional, typological, or empirical modeling approaches to predict water chemistry to aid in determining attainable ecosystem conditions or predicting taxon distributions.
Paleolimnological approaches that relate water-chemistry conditions to the distributions of diatom or chironomid taxa have been developed for lakes (e.g., Dixit et al. 1999). These models can be applied to the taxa observed in sediment cores to estimate historical water-chemistry conditions at individual sites. When coupled with information on core dates, such models can identify water-chemistry conditions at individual lakes prior to human alteration of the landscape. However, these models cannot directly estimate the water-chemistry conditions expected at unsampled lakes.
Catchment lithology is one of the major factors controlling water chemistry, so geologic data sometimes can be used as a surrogate for water chemistry when describing how ecological properties vary with water chemistry or when creating ecologically relevant regionalizations or typologies. For example, Leathwick et al. (2005) used measurements of rock P and Ca concentrations as potential predictors in models of fish distributions. Geology also is used to control for variations in water chemistry in the European Union's WFD System-A typology (Davy-Bowker et al. 2006).
Regionalizations and typologies.—Landscape classifications comprised a minority of the publications we found regarding prediction of water chemistry (Table 2), but they were important to understanding variation in water chemistry and continue to be used in both bioassessments and in the establishment of nutrient criteria (Johnson and Host 2010). The development of a method for predicting stream alkalinities based solely on geologic regions by Bricker and Rice (1989) represents one of the earliest uses of a regionalization to predict water chemistry, i.e., the proportion of catchments with acid neutralizing capacity <200 µeq/L within 1% of independently measured values. Ecoregions (Omernik 1987) are the most frequently used regionalization and might be expected to partition some of the variation in water chemistry because they are partly derived from 2 primary drivers of water-chemistry variability—lithology and land use. However, the success of ecoregions in partitioning the spatial variation in water chemistry has been mixed (Harding et al. 1997, Pan et al. 2000, Jenerette et al. 2002, Rohm et al. 2002, Herlihy and Sifneos 2008). Typological approaches also have incorporated aspects of predicted water chemistry into catchment and reach classifications, e.g., Snelder and Biggs (2002) and the WFD System-A typology (Davy-Bowker et al. 2006). Both regional and typological methods have been important in recent efforts to establish nutrient reference conditions and criteria (Rohm et al. 2002, Robertson et al. 2006), although some investigators have raised doubts about the ability of a classification approach to partition natural variation sufficiently well to establish defensible nutrient criteria (e.g., Smith et al. 2003, Herlihy and Sifneos 2008).
Site-specific predictions.—Most water-chemistry predictive models have been developed to make predictions at single sites. The pioneering work of Johnson et al. (1969), who modeled stream water chemistry as a joint function of soil water and precipitation with empirically derived parameters, led to much of the later research that explored both empirical and physical modeling approaches. However, physical modeling quickly became the dominant approach once the Birkenes Model had been developed (Christophersen et al. 1982). This model is composed of submodels for hydrology and different ion species and produces estimates of water chemistry over time at a site. Later models followed this approach, expanded the number of ionic species modeled, and included nutrient concentrations. Papers describing physical models now constitute 50% of the literature, with MAGIC (Cosby et al. 1985), ILWAS (Goldstein et al. 1984, Gherini et al. 1985), PnET (Aber and Federer 1992), and INCA (Whitehead et al. 1998) models among those most often used. These models can make temporal predictions, but they cannot predict water chemistry at unmeasured locations or be used to assess the impact of land use because they must be calibrated with water chemistry measured at each site. Most models that potentially would be capable of site-specific predictions still require site measurements of water chemistry (Aherne et al. 2003) or model entire regions (a form of regionalization) and do not make site-specific predictions (Hornberger et al. 1989).
Stumm et al. (1983; see also Schnoor and Stumm 1986) developed simpler steady-state physical models for site-specific predictions based on mass-balance relationships, without incorporating hydrology into the model. Publications describing these types of models represent ∼12% of the literature, and ALCHEMI (Schecher and Driscoll 1988), SSWC (Henriksen et al. 1992), and PROFILE (Warfvinge and Sverdrup 1992) models are the leading examples. Because of their simple structure and low data requirements, these models often are used to assess the sensitivity of water bodies to acid loading across regions from large data sets of site measurements, but predictions still require ≥1 site measurement (but see the regional version of the SSWC model in Jones et al. 1990). The PROFILE model (Warfvinge and Sverdrup 1992) is an exception and can make spatial predictions of water chemistry from mapped soil properties.
Publications describing physical models dominated the literature in the 1980s, but their limitations in making predictions at multiple unmeasured sites stimulated development of empirical approaches. For example, Billett and Cresser (1992) developed site-specific empirical models that predicted water chemistry from mapped soil chemistry (see also Small and Sutton 1986). This work was later expanded to include predictions based on multiple predictor variables (Smart et al. 1998, 2001, Cresser et al. 2000). Publications focused on site-specific empirical modeling now make up ∼26% of the literature. End Member Mixing Analysis (Christophersen et al. 1990, Hooper et al. 1990) is another important empirical technique but is not practical for use at large scales because it requires more data than do physical models. Geostatistical techniques have been used to predict water chemistry in 4 studies and have produced better predictions than have empirical models (e.g., Peterson et al. 2006).
Future research needs: challenges and opportunities.—The empirical modeling approaches described above and recent developments in physical and statistical modeling should allow ecologists to estimate better the water-chemistry characteristics of reference conditions. In some cases, these models are already in use. For example, Baker et al. (2005) used empirical models to establish reference chemical conditions in the upper Midwest, USA, as part of an ecological assessment, and then compared these predictions to current measurements to show that 22 to 35% of sites were impaired. Soranno et al. (2008) used empirical models to establish reference P conditions for 374 lakes in Michigan, USA. Physical models are now being used to estimate preindustrial water chemistry (Erlandsson et al. 2008, Wade et al. 2008, Zhai et al. 2008), but these models account only for changes in atmospheric deposition. Physical models also are being combined with empirical models to allow spatially explicit predictions. Chen and Driscoll (2005) used empirical models and GIS data to parameterize the PnET-BGC model and applied the model across the Adirondack Mountains in New York, USA. Evans et al. (2006) linked the MAGIC model with an empirical model to predict water chemistry spatially throughout a catchment and temporally from 1850 to 2050. They related these predictions to invertebrate taxon distribution models to show how invertebrate distributions changed as acidity increased. Newer statistical approaches, like ANNs, also are yielding site-specific predictions with reasonable accuracy (pH within 2% of measured values, Ehrman et al. 1996; a model of total N with R2 = 0.94, Amiri and Nakane 2009). Despite these advances, the amount of uncertainty in predictions based on both physical and empirical approaches must be assessed fully, especially when the models are applied to heterogeneous terrains (Hill et al. 2002).
Reference conditions, numeric criteria, and water-quality standards
Quantitative ecological assessments ultimately require that numeric criteria be established that describe either the maximum acceptable level of a water-quality element (e.g., a nutrient concentration) or the minimum acceptable value of a biotic index. In the past, both regional and typological methods have been used to establish water-quality criteria (e.g., Rohm et al. 2002, Robertson et al. 2006) and to develop biocriteria (USEPA 2002). However, water chemistry, temperature, sediment, and flows can vary markedly within ecoregions (e.g., Lewis et al. 1999, Dodds and Welch 2000), hence a one-size-fits-all approach to establishing numeric criteria almost certainly would produce water-quality or ecological standards that either are not realistically attainable given natural conditions (e.g., Smith et al. 2003) or are not protective enough (e.g., Dodds and Oakes 2004). Applying site-specific models instead of regional or typological methods when developing numeric criteria should allow establishment of water-quality and biological criteria that better match the true potential of individual sites.
Physical-chemical predictions and ecological assessments
Advances in the watershed sciences have greatly increased our ability to predict the physical-chemical habitat at a site, and these advances should be exploited by the ecological assessment community. We found examples of successful predictions of the physical-chemical attributes expected under reference conditions in all 4 watershed sciences subdisciplines that we reviewed. Linking these predictions with the biota observed at reference sites should allow us to establish more robust biota–environment relationships that should, in turn, improve precision and reduce bias of ecological assessments. For example, predicted water temperature could be used instead of surrogates of temperature, such as elevation, in RIVPACS-type predictive models to estimate E. Our modeling of taxon distributions in Colorado, USA, streams showed that predicted reference-state stream temperatures led to more accurate predictions of distributions of stream biota than did temperature surrogates (RAH and CPH, unpublished data). In addition, use of predicted site-specific physical-chemical attributes in biota–environment models should aid in interpreting the causal relationships embedded in these empirical models, something that is difficult to do when surrogate predictors are used. The use of predictors that have a likely causal link to the distribution of biota also should aid in the diagnosis of at least some causes of biological impairment, i.e., those landscape or waterway alterations that affect stream temperature, hydrology, geomorphology, and water chemistry. These benefits alone might justify the use of modeled physical-chemical stream characteristics when predicting reference-state biota, even if they do not significantly improve the precision or bias of ecological assessments.
The 4 watershed sciences subdisciplines that we examined are not equally mature. For example, the ability to predict hydrologic regimes is further developed than the ability to predict in the other 3 disciplines. Hydrologic regime papers comprised almost ½ of the papers that describe models capable of site-specific predictions of reference conditions (Table 2). The reliance of most physical models developed for temperature and water chemistry on site-specific parameterization currently limits their use in establishing expected site-specific reference conditions in 2 ways. The need for multiple site measurements makes application to large populations of streams prohibitively expensive, but more importantly, these measurements (e.g., soil water chemistry or channel width–depth ratios) often are affected by anthropogenic stressors, a trait that makes any prediction of reference conditions suspect. Scientists working in other subdisciplines should consider exploring methods used by hydrologists to develop physical models that can be applied regionally without relying on site measurements. We predict that the accuracy and precision of biota–environment models, and hence ecological assessments, will improve as site-specific predictions for each of these physical-chemical attributes are more widely applied.
Regional and typological classifications also have been used to partition some of the natural variation in physical-chemical conditions (Table 2). However, the collapsing of continuous gradients into classes that is inherent in these approaches might not be necessary given the state of development of both empirical and physical models. Model predictions in all 4 subdisciplines also have been used to help explain distributional patterns in biota, but only hydrologic predictions have been applied explicitly in an ecological assessment context. We think that ecological assessments would benefit from increased use of temperature, hydrologic regime, water chemistry, and substrate predictive models to predict reference conditions. Applications of these models should be tested explicitly to determine how much they improve ecological predictions. As these modeling approaches continue to evolve to include measures of prediction uncertainty, ecologists also should develop methods to assess and account for the propagation of these errors when these models are applied to ecological assessments.
Concluding Remarks and Recommendations
Our review showed that the science supporting ecological assessments has matured greatly over the past 25 y (sensu Creutzburg and Hawkins 2008). This maturation occurred as a consequence of: 1) a large and active bioassessment research community within the North American Benthological Society and elsewhere and 2) robust testing and refinement of those concepts and methods most critical to assessing the status of ecological resources. Research in the area of ecological assessment has been so active that we were unable to cover all of the contributions that have been made regarding the many factors that influence the estimation of reference conditions. Considering the ever-growing threats to the planet's ecological resources, we will need to refine further our ability to conduct accurate and precise ecological assessments of freshwater ecosystems. In doing so, we expect to see researchers and managers who work on other types of ecosystems take better advantage of the wealth of knowledge that the freshwater bioassessment community has produced. We also expect to see more cross-disciplinary research between ecological and physical scientists interested in predicting reference conditions. In all cases, future progress will benefit greatly if research is collaborative in nature and actively questioning in spirit.
From our review, we think sufficient information now exists to support the following recommendations: 1) We should move forcefully toward adopting methods that make site-specific prediction the standard approach to estimating reference conditions. Regionalizations, typologies, and other a priori classification schemes can produce coarse estimates of reference conditions, but we do not think they have the potential to improve the accuracy and precision needed by resource managers to set appropriate numerical criteria or to detect ecologically meaningful deviation from those criteria. Future efforts should be directed toward predictive modeling approaches that can deliver these improvements. 2) We need to understand and characterize site-specific aspects of natural variability in both the ecological and physical-chemical attributes of freshwater ecosystems and to distinguish this source of variability from sampling and prediction error. 3) We need to focus specifically on how to minimize or account for systematic prediction bias in ecological assessments. In our view, the consequences of ignoring or not accounting for prediction bias could result in more serious assessment errors than do problems associated with poor precision. 4) We need to make research designed to estimate and adjust for differences in reference-site quality a priority. The effect of variation in reference-site quality on both within- and cross-region comparisons of ecological condition remains a serious and unresolved issue. Variation in reference-site quality is probably an equally, if not more, serious problem affecting comparisons of ecological assessments than is the use of either different methods of predicting the reference condition or different types of indices.
We must continue to bring better order to the science that supports ecological assessments. Clearly, some aspects of both ecological systems and the way humans develop and use ecological knowledge can appear to be chaotic, but research published in J-NABS and elsewhere has contributed greatly to a process that brings needed order to both the applied science of ecological assessment and the basic science on which sound ecological assessments depend. We congratulate the authors, editors, and manuscript reviewers who, through 25 y of dedicated work, have made J-NABS one of the world's leading journals in the field of ecological assessment.
We thank Nora Burbank, Brian Creutzburg, and Ellen Wakeley of the Utah State University Western Monitoring Center and Aquatic Ecology Lab for constructive comments on earlier drafts of the manuscript. Comments by 2 anonymous referees and Associate Editor Bruce Chessman helped hone our thinking and presentation. CPH extends a special thanks to John Van Sickle for generously sharing his insights regarding modeling and statistical hypothesis testing in ecological assessments. Our own research on the topic of reference conditions and ecological benchmarks was supported by grants R-828637-01 and R-830594-01 from the National Center for Environmental Research (NCER) Science to Achieve Results (STAR) Program of the US EPA. We thank the J-NABS editors for the opportunity to discuss the evolution of this important topic.
Summary of the 184 papers published in 29 journals that addressed some aspects of predicting the ecological reference condition in freshwater ecosystems including number of papers identified in each journal (papers), the total number of citations received by those papers (total cites), the average number of times each paper has been cited (cites/paper), and the average number of citations each paper received/y (cites/year). Twenty-five journalsa did not contain papers that met our criterion for inclusion. These 184 papers have received 4810 citations (Institute for Scientific Information [ISI] Web of Science) as of July 2009
Numbers of papers reviewed in each disciplinary area classified by the approach used in setting reference conditions. Individual papers were sometimes assigned to multiple approaches if they compared ways to set reference conditions. The number of different sources (journals, books, reports) for each discipline were: ecology (60), temperature (74), hydrology (53), geomorphology (39), and water chemistry (70)
Eighteen influential papers (i.e., >40 citations) that address reference condition issues in freshwater ecosystems. Papers are ranked by total citations
Results of Random Forests regression modeling assessing associations between 2 types of biological index (multimetric index [MMI] and observed/expected taxa ratio [O/E]) and natural variables. Index values were regressed against 7 natural variables (DOY = day of year sample was taken, TMEAN = mean long-term annual air temperature, PRECIP = mean annual long-term precipitation, LAST-0 = mean long-term average day of the last freeze, WSAREA = watershed area, CARB = % of watershed with carbonate geology, and TRANGE = difference between mean long-term average maximum and minimum air temperatures). MMI values were standardized by ecoregional means to have within-region means of 1. Variance = % of variance in index values associated with the natural gradients. Wadeable Streams Assessment (WSA) ecoregions (USEPA 2006, Herlihy et al. 2008) are Coastal Plains (CPL), Northern Appalachian Mountains (NAP), Northern Plains (NPL), Southern Appalachian Mountains (SAP), Southern Plains (SPL), Temperate Plains (TPL), Upper Midwest (UMW), Western Mountains (MTN), and Western Xeric (XER)